Designation: E2139 − 05 (Reapproved 2011)Standard Test Method forSame-Different Test1This standard is issued under the fixed designation E2139; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon (´) indicates an editorial change since the last revision or reapproval.1. Scope1.1 This test method describes a procedure for comparingtwo products.1.2 This test method does not describe the Thurstonianmodeling approach to this test.1.3 This test method is sometimes referred to as the simple-difference test.1.4 A same-different test determines whether two productsare perceived to be the same or different overall.1.5 The procedure of the test described in this test methodconsists of presenting a single pair of samples to each assessor.The presentation of multiple pairs would require differentstatistical treatment and it is outside of the scope of this testmethod.1.6 This test method is not attribute-specific, unlike thedirectional difference test.1.7 This test method is not intended to determine themagnitude of the difference; however, statistical methods maybe used to estimate the size of the difference.1.8 This test method may be chosen over the triangle orduo-trio tests where sensory fatigue or carry-over are aconcern, or where a simpler task is needed.1.9 This standard may involve hazardous materials,operations, and equipment. This standard does not purport toaddress all of the safety concerns, if any, associated with itsuse. It is the responsibility of the user of this standard toestablish appropriate safety and health practices and deter-mine the applicability of regulatory limitations prior to use.2. Referenced Documents2.1 ASTM Standards:2E253 Terminology Relating to Sensory Evaluation of Mate-rials and ProductsE456 Terminology Relating to Quality and StatisticsE1871 Guide for Serving Protocol for Sensory Evaluation ofFoods and Beverages2.2 ASTM Publications:2Manual 26 Sensory Testing Methods, 2nd EditionSTP 758 Guidelines for the Selection and Training of Sen-sory Panel MembersSTP 913 Guidelines for Physical Requirements for SensoryEvaluation Laboratories2.3 ISO Standard:3ISO 5495 Sensory Analysis—Methodology—Paired Com-parison3. Terminology3.1 For definition of terms relating to sensory analysis, seeTerminology E253, and for terms relating to statistics, seeTerminology E456.3.2 Definitions of Terms Specific to This Standard:3.2.1 α (alpha) risk—probability of concluding that a per-ceptible difference exists when, in reality, one does not (alsoknown as Type I Error or significance level).3.2.2 β (beta) risk—probability of concluding that no per-ceptible difference exists when, in reality, one does (alsoknown as Type II Error).3.2.3 chi-square test—statistical test used to test hypotheseson frequency counts and proportions.3.2.4 ∆ (delta)—test sensitivity parameter established priorto testing and used along with the selected values of α, β, andan estimated value of p1to determine the number of assessorsneeded in a study. Delta (∆) is the minimum difference inproportions that the researcher wants to detect, where thedifference is ∆ = p2− p1. ∆ is not a standard measure ofsensory difference. The same value of ∆ may correspond todifferent sensory differences for different values of p1(see 9.5for an example).3.2.5 Fisher’s Exact Test (FET)—statistical test of the equal-ity of two independent binomial proportions.1This test method is under the jurisdiction ofASTM Committee E18 on SensoryEvaluation and is the direct responsibility of Subcommittee E18.04 on Fundamen-tals of Sensory.Current edition approved Aug. 1, 2011. Published August 2011. Originallyapproved in 2005. Last previous edition approved in 2005 as E2139–05. DOI:10.1520/E2139-05R11.2For referenced ASTM standards, visit the ASTM website, www.astm.org, orcontact ASTM Customer Service at

[email protected] For Annual Book of ASTMStandards volume information, refer to the standard’s Document Summary page onthe ASTM website.3Available from American National Standards Institute (ANSI), 25 W. 43rd St.,4th Floor, New York, NY 10036, http://www.ansi.org.Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States13.2.6 p1—proportion of assessors in the population whowould respond different to the matched sample pair. Based onexperience with using the same-different test and possibly withthe same type of products, the user may have a prioriknowledge about the value of p1.3.2.7 p2—proportion of assessors in the population whowould respond different to the unmatched sample pair.3.2.8 power 1-β (beta) risk—probability of concluding thata perceptible difference exists when, in reality, one of size ∆does.3.2.9 product—material to be evaluated.3.2.10 sample—unit of product prepared, presented, andevaluated in the test.3.2.11 sensitivity—term used to summarize the performancecharacteristics of this test. The sensitivity of the test is definedby the four values selected for α, β, p1, and ∆.4. Summary of Test Method4.1 Clearly define the test objective in writing.4.2 Choose the number of assessors based on the sensitivitydesired for the test. The sensitivity of the test is in part relatedto two competing risks: the risk of declaring a difference whenthere is none (that is, α-risk), and the risk of not declaring adifference when there is one (that is, β-risk).Acceptable valuesof α and β vary depending on the test objective. The valuesshould be agreed upon by all parties affected by the results ofthe test.4.3 The two products of interest (A and B) are selected.Assessors are presented with one of four possible pairs ofsamples: A/A, B/B, A/B, and B/A. The total number of samepairs (A/Aand B/B) usually equals the total number of differentpairs (A/B and B/A). The assessor’s task is to categorize thegiven pair of samples as same or different.4.4 The data are summarized in a two-by-two table wherethe columns show the type of pair received (same or different)and the rows show the assessor’s response (same or different).A Fisher’s Exact Test (FET) is used to determine whether thesamples are perceptibly different. Other statistical methods thatapproximate the FET can sometimes be used.5. Significance and Use5.1 This overall difference test method is used when the testobjective is to determine whether a sensory difference exists ordoes not exist between two samples. It is also known as thesimple difference test.5.2 The test is appropriate in situations where samples haveextreme intensities, give rapid sensory fatigue, have longlingering flavors, or cannot be consumed in large quantities, ora combination thereof.5.3 The test is also appropriate for situations where thestimulus sites are limited to two (for example, two hands, eachside of the face, two ears).5.4 The test provides a measure of the bias where judgesperceive two same products to be different.5.5 The test has the advantage of being a simple andintuitive task.6. Apparatus6.1 Carry out the test under conditions that prevent contactbetween assessors until the evaluations have been completed,for example, booths that comply with STP 913.6.2 For food and beverage tests, sample preparation andserving sizes should comply with Practice E1871, or see Refs(1) or (2).47. Definition of Hypotheses7.1 This test can be characterized by a two-by-two table ofprobabilities according to the sample pair that the assessors inthe population would receive and their responses, as follows:Assessor Would ReceiveMatched Pair(AA or BB)Unmatched Pair(AB or BA)Assessor’sResponseSame: 1 − p11−p2Different: p1p2=(=p1+ ∆)Total: 1 1where p1and p2are the probabilities of responding differentfor those who would receive the matched pairs and theunmatched pairs, respectively.7.2 To determine whether the samples are perceptibly dif-ferent with a given sensitivity, the following one-sided statis-tical hypothesis is tested:Ho: p1= p2Ha: p10).Delta (∆) will equal 0 and p1will equal p2if there is nodetectable difference between the samples. This test addresseswhether or not ∆ is greater than 0. Thus, the hypothesis isone-sided because it is not of interest in this test to considerthat responding different to the matched pair could be morelikely than responding different to the unmatched pair.8. Assessors8.1 All assessors must be familiar with the mechanics of thesame-different test (the format, the task, and the procedure ofevaluation). Greater test sensitivity, if needed, may be achievedthrough selection of assessors who demonstrate above averageindividual sensitivity (see STP 758).8.2 In order to perform this test, assessors do not requirespecial sensory training on the samples in question. Forexample, they do not need to be able to recognize any specificattribute.8.3 The assessors must be sampled from a homogeneouspopulation that is well-defined. The population must be chosenon the basis of the test objective. Defining characteristics of thepopulation can be, for example, training level, gender, experi-ence with the product, and so forth.4The boldface numbers in parentheses refer to the list of references at the end ofthis standard.E2139 − 05 (2011)29. Number of Assessors9.1 Choose all the sensitivity parameters that are needed tochoose the number of assessors for the test. Choose the α-riskand the β-risk. Based on experience, choose the expected valuefor p1. Choose ∆, p2− p1, the minimum difference in propor-tions that the researcher wants to detect. The most commonlyused values for α-risk, β-risk, p1and ∆ are α = 0.05, β = 0.20,p1= 0.3, and ∆ = 0.3. These values can be adjusted on acase-by-case basis to reflect the sensitivity desired versus thenumber of assessors.9.2 Having defined the required sensitivity (α-risk, β-risk,p1, and ∆), determine the corresponding sample size fromTable A1.1 (see Ref (9)). This is done by first finding thesection of the table with a p1value corresponding to theproportion of assessors in the population who would responddifferent to the matched sample pair. Second, locate the totalsample size from the intersection of the desired α, p2(or ∆),and β values. In the case of the most commonly used valueslisted in 9.1, TableA1.1 indicates that 84 assessors are needed.The sample size n is based on the number of same and differentsamples being equal The sample sizes listed are the totalsample size rounded up to the nearest number evenly divisibleby 4 since there are four possible combinations of the samples.To determine the number of same and different pairs to prepare,divide n by two.9.3 If the user has no prior experience with the same-different test and has no specific expectation for the value of p1,then two options are available. Either use p1= 0.3 and proceedas indicated in 9.2, or use the last section of Table A1.1. Thissection gives samples sizes that are the largest required, givenα, β, and ∆, regardless of p1.9.4 Often in practice, the number of assessors is determinedby practical conditions (for example, duration of theexperiment, number of available assessors, quantity of product,and so forth) However, increasing the number of assessorsincreases the likelihood of detecting small differences. Thus,one should expect to use larger numbers of assessors whentrying to demonstrate that products are similar compared towhen one is trying to demonstrate that they are different.9.4.1 When the number of assessors is fixed, the power ofthe test (1-β) may be calculated by establishing a value for p1,defining the required sensitivity for α-risk and the ∆, locatingthe number of assessors nearest the fixed amount, and thenfollowing up the column to the listed β-risk.9.5 If a researcher wants to be 90 % certain of detectingresponse proportions of p2= 60 % versus the expectedp1= 40 % with an α-risk of 5 %, then ∆ = 0.60 − 0.40 = 0.20and β = 0.10 or 90 % power. The number of assessors neededin this case is 232 (Table A1.1). If a researcher wants to be90 % certain of detecting response proportions of p2=70%versus the expected p1= 50 % with an α-risk of 5 %, then ∆ =0.70 − 0.50 = 0.20 and β = 0.10 or 90 % power. The number ofassessors needed in this case is 224 (Table A1.1).10. Procedure10.1 Determine the number of assessors needed for the testas well as the population that they should represent (forexample, assessors selected for a specific sensory sensitivity).10.2 It is critical to the validity of the test that assessorscannot identify the samples from the way in which they arepresented. One should avoid any subtle differences in tempera-ture or appearance, especially color, caused by factors such asthe time sequence of preparation. It may be possible to maskcolor differences using light filters, subdued illumination orcolored vessels. Prepare samples out of sight and in anidentical manner: same apparatus, same vessels, same quanti-ties of product (see Practice E1871). The samples may beprepared in advance; however, this may not be possible for alltypes of products. It is essential that the samples cannot berecognized from the way they are presented.10.3 Prepare serving order worksheet and ballot in advanceof the test to ensure a balanced order of sample presentation ofthe two products, A and B. One of four possible pairs (A/A,B/B,A/B, and B/A) is assigned to each assessor. Make sure thisassignment is done randomly. Design the test so that thenumber of same pairs equals the number of different pairs. Thepresentation order of the different pairs should be balanced asmuch as possible. Serving order worksheets should alsoinclude the identification of the samples for each set.10.4 Prepare the response ballots in a way consistent withthe product you are evaluating. For example, in a taste test,give the following instructions: (1) you will receive twosamples. They may be the same or different; (2) evaluate thesamples from left to right; and (3) determine whether they arethe same or different.10.4.1 The researcher can choose to add an instruction to theballot indicating whether the assessor may re-evaluate thesamples or not.10.4.2 The ballot should also identify the assessor and dateof test, as well as a ballot number that must be related to thesample set identification on the worksheet.10.4.3 A section soliciting comments may be includedfollowing the initial forced-choice question.10.4.4 The example of a ballot is provided in Fig. X2.2.10.5 When possible, present both samples at the same time,along with the response ballot. In some instances, the samplesmay be presented sequentially if required by the type ofproduct or the way they need to be presented, or both.This maybe the case, for example, for the evaluation of a fragrance in aroom where the assessor must change rooms to evaluate thesecond sample.E2139 − 05 (2011)310.6 Collect all ballots and tabulate results for analysis.11. Analysis and Interpretation of Results11.1 The data