We could say that the use of sensory panels in a formal sense began in the 1930s, although of course, there was a large amount of research relating to sensory perception prior to this by the ancient Greeks; Aristotle described five of the senses in 350 BC, through the Middle Ages and the sensory studies conducted with animals, onto Descartesâ work on vision in the 1600s, psychophysical research in the 18th and 19th centuries, to the work in the 19th century when touch, pain, hot and cold sensations were documented (Jung, 1984). In the early days of sensory evaluation, many assessments of products were made by company sensory experts using grading methods. Grading generally uses one expert or a small number of experts to make an assessment of the quality of, for example, wines, perfumes and dairy products, some of which are still in use today. An early publication about sensory grading (Crocker and Platt, 1937) stated that expert graders were trained but there was no mention of screening for sensory acuity; however, the experts did check their own assessments against that of colleagues or standard samples.
In 1936, Cover published the first sensory method which she called the âpaired-eating methodâ: similar to the paired comparison discrimination test we use today. Cover recruited a group of people to conduct the paired tests on meat quality. In 1940 she made improvements to her method which included the number, selection and training of judges. As no standards or sensory textbooks were yet available to give her guidance, Cover (1940) states that, âNo method has yet been devised for detecting persons who will make superior judges for using the paired-eating methodâ (p. 391). Bengtsson and Helm (1946) discuss the choice of people to take part in âindustrial taste testingâ (what we might call today analytical sensory tests), stating that there are three groups with differing sensory abilities: a small group of people who have higher sensory acuity, a larger group with average abilities and another small group with lower abilities. They also mention that people who regularly take part in product assessments can develop and improve their abilities to detect and communicate product differences. Bengtsson and Helm suggest the use of the âtriangular testâ to select the most sensitive tasters and also mention that their abilities should be checked on a regular basis by monitoring test results. The authors also discuss the number of tasters to âminimise the effect of chanceâ, stating that 50 to 100 judges assessing unidentified (coded) samples, with forms completed independently of each other, gives excellent results. The Bengtsson and Helm (1946) paper makes an interesting read as it also describes the reasons why staff should not be used for âmass testsâ: the terminology used at the time for consumer testing.
Helm went on to write a more detailed paper about the âSelection of a Taste Panelâ in the same year with Trolle (Helm and Trolle, 1946). As one of the first published papers in sensory science and the selection of a taste panel, it is well worth reading. The reason for the paperâs authorsâ interest in developing better âtaste testsâ can be summarised in a couple of sentences taken directly from the paper:
âThe traditional manner in which taste tests were conducted was not satisfactory. In most cases we were able to establish only the fact that it was not possible to discern the difference between samples with any certaintyâ (p. 181).
The traditional manner they refer to is grading, as well as physical and chemical measurements for the various beer experiments they conducted. They set up a committee to develop taste testing so that more reliable results could be gathered, and one of the first investigations they carried out were experiments to determine the tasting abilities of people on the existing taste panel as well as all the staff at the brewery. There is a nice description of how the authors introduced the selection procedure to the staff: â⌠failure to qualify as an expert tasterâŚâ would not be detrimental to their career at the brewery as â⌠a keen palate is a gift of nature possessed by relatively few personsâ. As mentioned earlier, the authors used the triangular test to determine acuity because it was related to the type of test the authors were interested in to determine the differences between beers. They devised a series of tests to find the correct level of âdifferenceâ between the two beers so that the test would be neither too easy nor too hard. The test involved asking which two of the three samples were similar and also which was preferred, and in the later experiments the test set up was more akin to a 3-alternative forced choice (3-AFC) than a triangle, as the question asked, for example, which samples were strong in bitterness and which were weak in bitterness. The authors recognised that keeping the tasters interested in the experiments was key and so they gave them direct feedback about whether they were âwrongâ or ârightâ in their sample choice and also explained what the differences were. They conducted 6878 tests altogether and used the chi-square test to determine which test results indicated statistical significance between the four pairs of products. They used the replicate data for each taster to determine if they could be classified as an âexpertâ: where the p-value for a pair of samples across all replicates was equal to 0.001. Of the 51 people who completed all the tests, only six were classified as expert for all four pairs of products. Twenty people were selected for the taste panel based on the highest percentages of tests correct, however, as each of the four pairs of products were designed to be different in an important aspect, those tasters who were often correct for a particular pair, say in the pair that were designed to be different in bitterness, were listed for selection for tests where bitterness was the attribute of interest. The authors also investigated the effect of age, occupation, experience in tasting and whether or not the taster was a smoker, as well as improvement in the results over the test, the effect of fatigue and memory.
Interestingly, time-intensity methods began to be developed in the 1930s (Holway and Hurvich, 1937) before descriptive profiling methods, and helped researchers realise that taste intensity was not a static measurement. But measuring the intensity of an attribute over time was fraught with issues before computers arrived in the laboratory. Constructing the curves, comparing the panellistsâ outputs and reducing the biasing effects of the clock or timekeeper, were all difficulties faced by the sensory scientists at the time. This resulted in some differences in panel recruitment, as panellists needed to be able to use different equipment to aid the collection of the data. Dijksterhuis and Piggott (2001) and Lawless and Heymann (2010) both give good reviews of dynamic flavour profile methods.
Developments in the late 1940s of further discrimination tests lead researchers to consider how best to recruit and train people for their tests, with consideration given to the potential fatigue and health of the âtasterâ, as well as their memory and sensory acuity. A group from the Carlsberg Brewery refer to their development of the triangle test method in the early 1940s, and it seems that these authors were also concerned about the difference between quality analysis and discrimination tests (Peryam and Swartz, 1950). The authors state that human behaviour can be dealt with scientifically, which was often disputed or simply not understood at the time. The authors created three tests: the triangle, duo-trio and dual-standard, for measuring sensory differences because they wanted more objective methods that were discriminative, not judgemental and also that use statistical analysis to give a more simple, direct and actionable answer.
Dove (1947) was also interested in discrimination tests and the choice of the correct panellist for the task as part of the âSubjectiveâObjective Approachâ suggested by the author. The author uses this terminology to elevate the importance of the âsubjectiveâ assessments, which at the time were being âdiscreditedâ and overlooked by the use of instrumental or âobjectiveâ measures. Dove developed the difference-preference test which is basically the paired comparison with an added preference question using a 10-point scale: âfive equal degrees of acceptability and five equal degrees of non-acceptability are allowedâ. The author also lists requirements for the laboratory where the tests are to be conducted (e.g., air conditioned, segregated booths, prescribed lighting), requirements for sample preparation (e.g., controlled quantity and temperature, hidden codes) and requirements for the judges (selection based on vocabulary, experience and ability in detecting small differences, as opposed to screening with basic tastes). Some other authors had begun this task, but this is one of the most complete lists of the time. An interesting aspect to this paper is the description of conducting taste tests with animals instead of humans on products such as lettuce and cabbage, where humans are âconfusedâ by the taste! Itâs interesting to consider what might have happened to sensory science had these ideas been extended.
Much of the interest in sensory methods around this time came with economic growth and the huge changes as a result of World War II. The 1940s and 1950s saw a vast amount of work on sensory testing, partly due to focus on nutrition during the war years and also due to the interest in the development of new food products by industry in general. In 1950, in an attempt to collect together all the information and make some recommendations for future food testing, the US Bureau of Human Nutrition and Home Economics held a conference (Dawson and Harris, 1951) which was attended mostly by academics and research associations (Howgate, 2015). The conference proceedings are available to download and really give an insight into their difficulties and dilemmas in the testing of food using sensory methods.
Around this time there was also much discussion about the type of person who was best recruited to be a sensory panellist (Helm and Trolle, 1946; Dawson and Harris, 1951; Ferris, 1956; Platt, 1937; Morse, 1942; Bliss et al., 1943; Dove, 1947). Many groups advocated the use of trained staff: flavourists, brewers, product developers, due to their knowledge and experience, while others suggested that these people were too close to the product and the reasons for the testing, to be free from bias.
Due to the rapid development of new food products, it became more and more difficult for experts, who at the time were the main source of sensory data, to contribute across all quality and product development projects, and the work of authors such as Dove, as mentioned earlier, helped the industry realise that the expertâs view was not necessarily related to the consumersâ views. One particularly interesting paper discusses the need for more rigorous consumer testing, stating that previous research appeared inconclusive or transitory and results were not repeatable from study to study (Kiehl and Rhodes, 1956). Kiehl describes the two main research areas working on consumer preference measurements as the âhouseholdâ panel and the âlaboratoryâ panel and makes an important comment on the use of small numbers of people in âdifference-preferenceâ methods, which was pretty much standard at the time, to determine consumer preferences:
âThe inference of expert preferences to the great mass of consumers required a heroic assumption about the representativeness of expertsâ (p1337).
These changes helped create the need for more detailed, applicable and valid data about the sensory aspects of food, and the Flavor Profile Method was created to help meet these needs (Cairncross and Sjostrom, 1950). This was the first descriptive profile method and is therefore a major landmark in the history of sensory science. The method used a small group of highly trained panellists to create a flavour profile using a consensus scoring method. The panellists do not use a scale as such to mark their judgement of intensity, but rather a number choice: for example, âI think this is a 2â. Many publications discussed the uses, advantages and disadvantages of the method (Amerine et al., 1965) with the main concerns related to leader bias, the sensitivity of a 0â3 scale and the consensus scoring method.
At this point, we have the real beginnings of the differentiation in sensory science between the use of naĂŻve consumers, trained panellists and experts, ready for the many discussions and heated debates about who actually is the right person to take part in these âanalyticalâ assessments. The debate is still ongoing and will probably continue, but consideration and discussion about the objectives and action standards for the data gathering and the subsequent use of the information can effectively guide the choice of panel type. Other aspects such as when the data are needed, resources, product type and test schedule will also help cement the decision. For more information on this aspect please see Chapter 13.
The âcontourâ method was developed by Hall et al. (1959) for the production of profiles of paired samples, one of which was designated as the control. A small number of panellists then rated the deviation from the control for odour and flavour on a 0â5 scale for a number of samples compared back to the control. However, this method did not get taken up by the industry possibly due to its complexity (Amerine et al., 1965). In 1953, Dove developed a scale to try to standardise intensity measurements for taste. The scale was based on known concentrations of pure chemicals such as sucrose, to allow uniform comparisons across food stuffs. The scale itself has not been used extensively in sensory profiling but perhaps might be regarded as a precursor for the absolute scales used by some later profiling methods.
Other groups developed profiling methods based on the Flavor Profile Method for use on their product category. The first to be published was the Texture Profile Method (Brandt et al., 1963) which was very similar to the Flavor Profile Method but related to the mechanical (i.e., the response of the product to stress, e.g., hardness, chewiness), geometric (i.e., the size, shape and particle composition, e.g., crumbliness, grittiness, flakiness) and mouthfeel (i.e., surface...