by Elliott Morss, Morss Global Finance
The Judgment of Paris in 1976, chronicled in George Taber’s book, was the first in a series of widely reported blind tastings. Below, I summarize the findings from those tastings (and my own – less widely reported). I also report on what research is telling us. I conclude with some thoughts whether there is anything to learn from future wine tastings.
Judgments of Paris et al
The 1976 tastings in Paris made headlines worldwide. For the first time, US wines (more specifically Californian wines) did as well or better that French wines. And all the judges were French.
Richard Quandt summarized the findings:
“Four French wines were matched against six California reds in one tasting and four French Chardonnays were matched against six American ones in the second tasting. …on the whole the French reds beat the American wines, even though the single best wine was American.” There was no significant difference between the French and US Chardonnays because “while four of the five best wines were American, the two worst wines were also American, one of those by an overwhelming margin.”
Taber drew a broader conclusion from the tasting:
“The Paris Tasting shattered two foundations of conventional wisdom in the world of wine. First, it demonstrated that outstanding wine can be made in many places beyond the hallowed terroir of France. Second, the Paris Tasting showed that winemakers did not need a long heritage of passing the wisdom of the ages down from one generation to the next to master the techniques for producing great wine.”
Several other widely publicized tastings comparing French and Californian wines were carried out over the next decade:
- 1978 (San Francisco),
- 1986 (French Culinary Wine Institute), and
- 1986 (Wine Spectator).
The results in each of them were the same as in 1976: American wines “held their own” against the French.
And while these tastings were being held, as Taber suggested above, good wines from all over the world started to appear in Western markets. I argue below that this development is key to why tastings have lost their significance.
Goldstein et al analyzed data from 6,000 blind tastings – a lot of blind tastings! I quote from their findings:
“Individuals who are unaware of the price do not derive more enjoyment from more expensive wine. …we find that the correlation between price and overall rating is small and negative, suggesting that individuals on average enjoy more expensive wines slightly less….”
Lecocq and Visser analyzed data from three data sets totaling 1,387 observations on French Bordeaux’s and Burgundies. They report similar findings:
“When non-experts blind-taste cheap and expensive wines they typically tend to prefer the cheaper ones.”
The Judgment of Princeton
What can we conclude from the above? No definitive conclusions on French versus Californian wines, and no evidence that more expensive wines taste any better than less expensive wines. So let’s fast forward to the June wine tasting in Princeton sponsored by The American Association of Wine Economists. I quote from my article on what happened at the “Judgment of Princeton”:
“As was done in Paris 36 years ago, the tasting included French and American wines. As previously, the American wines did quite well. But in Princeton, it was a bit different. Instead of wines from California, wines from New Jersey were pitted against some of France’s finest. The New Jersey wines performance? For whites, the average New Jersey ranking was better than the average French ranking. And for reds, New Jersey wines ranked 3rd and 5th.”
The judges’ Chardonnay rankings are presented below. Most striking to me is how often one judge ranked a wine best (or tied for best) while another ranked it worst or tied for worst. It happened for 5 of the 10 wines tasted! The Clos des Mouches was ranked 1st or tied for 1st by 4 judges. But one judge ranked it worst. Two tasters gave the Ventimiglia a tie for worst while one judge gave it a tie for the best ranking.
Click on table for larger image.
Richard Quandt again analyzed the Princeton results. He concluded:
“…the rank order of the wines was mostly insignificant. That is, if the wine judges repeated the tasting, the results would most likely be different. From a statistically viewpoint, most wines were undistinguishable. Only the best white and the lowest ranked red were significantly different from the other wines.”
Quandt is saying that given the tremendous differences in the judges’ rankings, the only thing you could be pretty sure of was that the Clos des Mouches Drouhin was better than the other whites. That means that for the others, the judges scores were so different you could not conclude with any degree of certainty that one was any better than another.
The Judgment of Lenox
Last week, The Lenox Wine Club had its first tasting. The wines were selected to be similar enough to make comparisons meaningful. We also wanted enough price difference to see if it matters. The focus was on heavy whites – four Chardonnays and one Aligoté from Washington – were tasted.
Four of the whites were inexpensive. But we also included the winner at Princeton. The wines and our costs are presented below in Table 2. The Box Set is a 3 liter box. It cost $17.87. That means its equivalent price for a normal 750 ML bottle would only $4.47 – very inexpensive.
The tasting results are given in Table 3. It turns out that the overall winner was the boxed wine! Well okay, the statisticians in our group might say that its final score is not significantly different from either the Yellowtail or the Shooting Star. However, it is notable that the Box got more top votes (5) than any other. Yellowtail came in 2nd with no top rated votes. In contrast, Aligoté received 3 top-rated votes. Aligoté is similar to the Box in that it also got a lot of top votes (4). The Drouhin wine, the winner at Princeton, got only 2 top votes. 3 tasters gave it the worst rating. Most tasters did not like the Raymond wine – 5 rated it worst.
Table 4 gives the best and worst rankings/scores for the wines tasted at Princeton and Lenox. Ties for best and worst have been included in the count. These extremes do a pretty good job at predicting best and worst wines. It is notable that while the Clos des Mouches got 5 top ranks at Princeton, its scores at Lenox were mediocre.
Another interesting statistic measures how well the ranks/scores of individual tasters compare with the average of all tasters. A high positive number indicates a taster is close to the overall average (1.000 would indicate a perfect correlation). Low or negative numbers indicate the opposite. Data on this question is presented in Table 5. The patterns are quite similar. Both had “rogues”. In the Lenox case, taster 11 got a -.727 correlation because he scored the Box Set worst and because his favorites were the Clos des Mouches and the Raymond.
At Paris and Princeton, tasters were chosen for their wine expertise – people who wrote about wine or owned liquor stores or restaurants. In the Lenox case, the judges were also experts – people who had been drinking wine for 30+ years. And that raises an interesting question: why should “experts” (wine writers, wine shop owners, restaurant owners, sommeliers, and raters) be the arbiters? Most certainly, they have well-developed preferences/biases. Any reason to think their preferences/biases should be favored over other wine drinkers?
Studies on the consistency in wine judging do not instill much confidence. Neal Hulkower is a mathematician, a wine lover and an expert on how to award medals at wine tastings. At Princeton, he told me a good approach to selecting judges was to have candidates taste six glasses of wine, with 3 of the 6 glasses holding the same wine. A candidate should be rejected if s/he found differences between the 3 glasses with the same wine.
The results of wine tastings now follow a very similar pattern – maybe one or two wines better or worse than the others, but the rest are indistinguishable as reflected in the large differences in tasters’ scores. There are several possible reasons for these patterns.
First, when I was growing up, we really did have good and bad wines. In the latter category, we had Mateus Rose, Lancer’s Sparkling Rose, and Chianti in the straw bottle. For special occasions, we had the Bordeaux and Burgundies of France. Today, there are good wines from all over. And maybe all we are left with are individual preferences as measured in by the individual correlations presented in Table 5.
Second, it might be that the tasters are just overwhelmed. At Paris and Princeton, the tasters were asked to judge 10 white and 10 red wines. At Princeton, it was done at a single sitting. At Lenox, there were only 5 wines. But I sensed the tasters would have been more comfortable with their judgments if there had only been 4 wines.
Does this mean we will never learn anything definitive again from wine tastings? No. And that is what makes them interesting. I offer an example. For a number of years, the Boston Globe has asked 4 judges to taste 50 sub-$12 wines. In 2009, all 4 tasters chose the 2009 Saint-Peyre Coteaux du Languedoc Picpoul de Pinet. All 4! You just never know!