Designation: G16 − 13Standard Guide forApplying Statistics to Analysis of Corrosion Data1This standard is issued under the fixed designation G16; the number immediately following the designation indicates the year of originaladoption or, in the case of revision, the year of last revision.Anumber in parentheses indicates the year of last reapproval.Asuperscriptepsilon (´) indicates an editorial change since the last revision or reapproval.1. Scope1.1 This guide covers and presents briefly some generallyaccepted methods of statistical analyses which are useful in theinterpretation of corrosion test results.1.2 This guide does not cover detailed calculations andmethods, but rather covers a range of approaches which havefound application in corrosion testing.1.3 Only those statistical methods that have found wideacceptance in corrosion testing have been considered in thisguide.1.4 The values stated in SI units are to be regarded asstandard. No other units of measurement are included in thisstandard.2. Referenced Documents2.1 ASTM Standards:2E178 Practice for Dealing With Outlying ObservationsE691 Practice for Conducting an Interlaboratory Study toDetermine the Precision of a Test MethodG46 Guide for Examination and Evaluation of Pitting Cor-rosionIEEE/ASTM SI 10 American National Standard for Use ofthe International System of Units (SI): The Modern MetricSystem3. Significance and Use3.1 Corrosion test results often show more scatter thanmany other types of tests because of a variety of factors,including the fact that minor impurities often play a decisiverole in controlling corrosion rates. Statistical analysis can bevery helpful in allowing investigators to interpret such results,especially in determining when test results differ from oneanother significantly. This can be a difficult task when a varietyof materials are under test, but statistical methods provide arational approach to this problem.3.2 Modern data reduction programs in combination withcomputers have allowed sophisticated statistical analyses ondata sets with relative ease. This capability permits investiga-tors to determine if associations exist between many variablesand, if so, to develop quantitative expressions relating thevariables.3.3 Statistical evaluation is a necessary step in the analysisof results from any procedure which provides quantitativeinformation. This analysis allows confidence intervals to beestimated from the measured results.4. Errors4.1 Distributions—In the measurement of values associatedwith the corrosion of metals, a variety of factors act to producemeasured values that deviate from expected values for theconditions that are present. Usually the factors which contrib-ute to the error of measured values act in a more or less randomway so that the average of several values approximates theexpected value better than a single measurement. The patternin which data are scattered is called its distribution, and avariety of distributions are seen in corrosion work.4.2 Histograms—A bar graph called a histogram may beused to display the scatter of the data. A histogram isconstructed by dividing the range of data values into equalintervals on the abscissa axis and then placing a bar over eachinterval of a height equal to the number of data points withinthat interval. The number of intervals should be few enough sothat almost all intervals contain at least three points; however,there should be a sufficient number of intervals to facilitatevisualization of the shape and symmetry of the bar heights.Twenty intervals are usually recommended for a histogram.Because so many points are required to construct a histogram,it is unusual to find data sets in corrosion work that lendthemselves to this type of analysis.4.3 Normal Distribution—Many statistical techniques arebased on the normal distribution. This distribution is bell-shaped and symmetrical. Use of analysis techniques developedfor the normal distribution on data distributed in anothermanner can lead to grossly erroneous conclusions.Thus, beforeattempting data analysis, the data should either be verified asbeing scattered like a normal distribution, or a transformation1This guide is under the jurisdiction of ASTM Committee G01 on Corrosion ofMetals and is the direct responsibility of Subcommittee G01.05 on LaboratoryCorrosion Tests.Current edition approved Dec. 1, 2013. Published December 2013. Originallyapproved in 1971. Last previous edition approved in 2010 as G16–95 (2010). DOI:10.1520/G0016-13.2For referenced ASTM standards, visit the ASTM website, www.astm.org, orcontact ASTM Customer Service at

[email protected] For Annual Book of ASTMStandards volume information, refer to the standard’s Document Summary page onthe ASTM website.Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States1should be used to obtain a data set which is approximatelynormally distributed. Transformed data may be analyzed sta-tistically and the results transformed back to give the desiredresults, although the process of transforming the data back cancreate problems in terms of not having symmetrical confidenceintervals.4.4 Normal Probability Paper—If the histogram is notconfirmatory in terms of the shape of the distribution, the datamay be examined further to see if it is normally distributed byconstructing a normal probability plot as described as follows(1).34.4.1 It is easiest to construct a normal probability plot ifnormal probability paper is available. This paper has one linearaxis, and one axis which is arranged to reflect the shape of thecumulative area under the normal distribution. In practice, the“probability” axis has 0.5 or 50 % at the center, a numberapproaching 0 percent at one end, and a number approaching1.0 or 100 % at the other end. The marks are spaced far apartin the center and close together at the ends. A normalprobability plot may be constructed as follows with normalprobability paper.NOTE 1—Data that plot approximately on a straight line on theprobability plot may be considered to be normally distributed. Deviationsfrom a normal distribution may be recognized by the presence ofdeviations from a straight line, usually most noticeable at the extreme endsof the data.4.4.1.1 Number the data points starting at the largest nega-tive value and proceeding to the largest positive value. Thenumbers of the data points thus obtained are called the ranks ofthe points.4.4.1.2 Plot each point on the normal probability paper suchthat when the data are arranged in order: y (1), y (2), y (3), .,these values are called the order statistics; the linear axisreflects the value of the data, while the probability axis locationis calculated by subtracting 0.5 from the number (rank) of thatpoint and dividing by the total number of points in the data set.NOTE 2—Occasionally two or more identical values are obtained in aset of results. In this case, each point may be plotted, or a composite pointmay be located at the average of the plotting positions for all the identicalvalues.4.4.2 If normal probability paper is not available, thelocation of each point on the probability plot may be deter-mined as follows:4.4.2.1 Mark the probability axis using linear graduationsfrom 0.0 to 1.0.4.4.2.2 For each point, subtract 0.5 from the rank and dividethe result by the total number of points in the data set. This isthe area to the left of that value under the standardized normaldistribution. The cumulative distribution function is thenumber, always between 0 and 1, that is plotted on theprobability axis.4.4.2.3 The value of the data point defines its location on theother axis of the graph.4.5 Other Probability Paper—If the histogram is not sym-metrical and bell-shaped, or if the probability plot showsnonlinearity, a transformation may be used to obtain a new,transformed data set that may be normally distributed. Al-though it is sometimes possible to guess at the type ofdistribution by looking at the histogram, and thus determine theexact transformation to be used, it is usually just as easy to usea computer to calculate a number of different transformationsand to check each for the normality of the transformed data.Some transformations based on known non-normaldistributions, or that have been found to work in somesituations, are listed as follows:y =logxy= exp xy5œxy = x2y =1/xy5sin21œx/nwhere:y = transformed datum,x = original datum, andn = number of data points.Time to failure in stress corrosion cracking usually is bestfitted with a log x transformation (2, 3).Once a set of transformed data is found that yields anapproximately straight line on a probability plot, the statisticalprocedures of interest can be carried out on the transformeddata. Results, such as predicted data values or confidenceintervals, must be transformed back using the reverse transfor-mation.4.6 Unknown Distribution—If there are insufficient datapoints, or if for any other reason, the distribution type of thedata cannot be determined, then two possibilities exist foranalysis:4.6.1 A distribution type may be hypothesized based on thebehavior of similar types of data. If this distribution is notnormal, a transformation may be sought which will normalizethat particular distribution. See 4.5 above for suggestions.Analysis may then be conducted on the transformed data.4.6.2 Statistical analysis procedures that do not require anyspecific data distribution type, known as non-parametricmethods, may be used to analyze the data. Non-parametric testsdo not use the data as efficiently.4.7 Extreme Value Analysis—In the case of determining theprobability of perforation by a pitting or cracking mechanism,the usual descriptive statistics for the normal distribution arenot the most useful. In this case, Guide G46 should beconsulted for the procedure (4).4.8 Significant Digits—IEEE/ASTM SI 10 should be fol-lowed to determine the proper number of significant digitswhen reporting numerical results.4.9 Propagation of Variance—If a calculated value is afunction of several independent variables and those variableshave errors associated with them, the error of the calculatedvalue can be estimated by a propagation of variance technique.See Refs (5) and (6) for details.4.10 Mistakes—Mistakes either in carrying out an experi-ment or in calculations are not a characteristic of the populationand can preclude statistical treatment of data or lead toerroneous conclusions if included in the analysis. Sometimes3The boldface numbers in parentheses refer to a list of references at the end ofthis standard.G16−132mistakes can be identified by statistical methods by recogniz-ing that the probability of obtaining a particular result is verylow.4.11 Outlying Observations—See Practice E178 for proce-dures for dealing with outlying observations.5. Central Measures5.1 It is accepted practice to employ several independent(replicate) measurements of any experimental quantity toimprove the estimate of precision and to reduce the variance ofthe average value. If it is assumed that the processes operatingto create error in the measurement are random in nature and areas likely to overestimate the true unknown value as tounderestimate it, then the average value is the best estimate ofthe unknown value in question. The average value is usuallyindicated by placing a bar over the symbol representing themeasured variable.NOTE 3—In this standard, the term “mean” is reserved to describe acentral measure of a population, while average refers to a sample.5.2 If processes operate to exaggerate the magnitude of theerror either in overestimating or underestimating the correctmeasurement, then the median value is usually a betterestimate.5.3 If the processes operating to create error affect both theprobability and magnitude of the error, then other approachesmust be employed to find the best estimation procedure. Aqualified statistician should be consulted in this case.5.4 In corrosion testing, it is generally observed that averagevalues are useful in characterizing corrosion rates. In cases ofpenetration from pitting and cracking, failure is often definedas the first through penetration and in these cases, averagepenetration rates or times are of little value. Extreme valueanalysis has been used in these cases, see Guide G46.5.5 When the average value is calculated and reported as theonly result in experiments when several replicate runs weremade, information on the scatter of data is lost.6. Variability Measures6.1 Several measures of distribution variability are availablewhich can be useful in estimating confidence intervals andmaking predictions from the observed data. In the case ofnormal distribution, a number of procedures are available andcan be handled with computer programs. These measuresinclude the following: variance, standard deviation, and coef-ficient of variation. The range is a useful non-parametricestimate of variability and can be used with both normal andother distributions.6.2 Variance—Variance, σ2, may be estimated for an experi-mental data set of n observations by computing the sampleestimated variance, S2, assuming all observations are subject tothe same errors:S25(d2n 2 1(1)where:d = the difference between the average and the measuredvalue,n−1 = the degrees of freedom available.Variance is a useful measure because it is additive in systemsthat can be described by a normal distribution; however, thedimensions of variance are square of units.Aprocedure knownas analysis of variance (ANOVA) has been developed for datasets involving several factors at different levels in order toestimate the effects of these factors. (See Section 9.)6.3 Standard Deviation—Standard deviation, σ, is definedas the square root of the variance. It has the property of havingthe same dimensions as the average value and the originalmeasurements from which it was calculated and is generallyused to describe the scatter of the observations.6.3.1 Standard Deviation of the Average—The standarddeviation of an average, Sx¯, is different from the standarddeviation of a single measured value, but the two standarddeviations are related as in (Eq 2):Sx¯ 5S=n(2)where:n = the total number of measurements which were used tocalculate the average value.When reporting standard deviation calculations, it is impor-tant to note clearly whether the value reported is the standarddeviation of the average or of a single value. In either case, thenumber of measurements should also be reported. The sampleestimate of the standard deviation is s.6.4 Coeffıcient of Variation—The population coefficient ofvariation is defined as the standard deviation divided by themean. The sample coefficient of variation may be calculated asS/x¯ and is usually reported in percent. This measure ofvariability is particularly useful in cases where the size of theerrors is proportional to the magnitude of the measured valueso that the coefficient of variation is approximately constantover a wide range of