As I have written, it was striking in 1976 when Californian wines “sort of” beat French wines in Paris tastings. I say “sort of” because Orley Ashenfelter and Richard E. Quandt found that while Californian wines got the highest ranking in both red and white categories, judges found very little difference between the French and US red and whites overall. The American Association of Wine Economists arranged a re-enactment of the Paris tasting in Princeton last summer. But in Princeton, French wines were compared to New Jersey wines. The New Jersey wines did quite well, but again, the judges’ rankings differed significantly.
The Lenox Wine Club
In November 2012, the Lenox Wine Club (LWC) was created. Consisting of 14 “veteran” wine drinkers, it decided to start with four tastings: “heavy whites”, “heavy reds”, “light whites”, and “light reds”. Later, they decided to complement these tastings with one for “heavy red blends”. All tastings address the following questions:
- Among comparably-priced wines, are the judgments of the tasters similar enough to identify a significant preference among the wines, and
- Does price matter?
LWC does its blind tastings are at a restaurant with hors d’oeuvres followed by dinner. Tasters are asked to score 5 wines with 5 best, 1 worst. As reported earlier, 3-liter boxes have done well. They wines got the best scores for the “Heavy Reds”, “Light Reds (tied for best)” and “Heavy Whites” tastings. In all three cases, the boxes (priced at $4/750 ML) beat out wines costing as much as $80+. In the “Heavy Reds Blend” tasting, a Box also did well (tied for second place). But just as was the case at Paris and Princeton, the results were hardly definitive because of the scoring differences among judges.
The “Light White” Tasting
On April 18th, The Lenox Wine Club tasted “Light Whites”. They included:
- a two Chenin Blancs: a Vouvray and a South African;
- a Pinot Grigio;
- a New Zealand Sauvignon Blanc, and
- a Picpoul.
The Chenin Blancs
We drank a 2011 De Morgenzon Chenin Blanc from Stellenbosch, South Africa. Efforts were made to get the 2010 vintage because it earned a 92 rating from Wine Spectator (WS). No luck. Why? Because as I have explained, restaurants in Massachusetts must buy all their wines from distributors, and the distributors used by our restaurant did not have the 2010 vintage. So we settled for the 2011 (WS rating 91).
Our second Chenin Blanc was a Guy Saget Marie de Beauregard Vouvray produced in the Touraine district of the Loire Valley. Again, we tried for the 2010 vintage (WS 92 rating) but had to settle for the 2011 vintage (WS 91 rating).
What would a Light White tasting be without a Sauvignon Blanc? New Zealand is known for its Sauvignon Blancs and we tasted a Greywacke Wild Sauvignon Blanc. Greywacke was started by Kevin Judd. Judd was instrumental in the startup and success of Cloudy Bay, one of the best-known New Zealand wineries. On the vintage, exactly the same story. Tried to get 2010 because of a WS 92 rating. Could not. Had to settle for the 2011 vintage (no rankings as yet).
The Pinot Gris grape is dominant in Pinot Grigio wines, the most popular of all Italian whites. We drank a 2011 Bota Box Pinot Grigio. LWC has already had a Bota Box winner: the Bota Box Cabernet Sauvignon won our Heavy Red tasting.
The Picpoul Blanc grape is grown in the Languedoc region of France. The Domaine St. Peyre Picpoul de Pinet won a large following in the Boston area after it “won” the 2010 tasting done by the Boston Globe. The tasting involves 4 judges tasting 50 wines: 25 white and 25 red. The judges then choose their top 5 reds and whites. The St. Peyre was in the top 5 for all four judges. The chance of getting such good ratings by chance is very low – less than 1%!
Unfortunately, none of our restaurant’s distributors carried the St. Peyre, so we again drank a substitute: the 2011 Guillermarine Picpoul de Pinet.
A Test of the Tasters
Neal Hulkower, a friend/expert on wine scoring/rating/ranking methods, recently introduced me to the writings of Robert Hodgson. Hodgson has his own winery and has been troubled by appeared to be erratic ratings his wines were receiving from judges at tastings. As a consequence, he has been analyzing judge performance at the California State Fair for over a decade. The key result is that only about 10% of the judges are consistent in their ratings, and this 10% are not the same judges year to year. He concludes that competition awards have a major random component. To correct this problem, Hodgson has come up with a method to judge judge candidates. The key to his method? Have the candidates do blind tastings that include more than one glass of the same wine in each tasting. If the candidates do not score glasses of the same wine nearly the same, they do not become judges. Hodgson’s suggested overall scheme is quite rigorous: candidates must do four blind tastings of ten glasses each. At each tasting, there are three glasses from the same bottle. And for a candidate to qualify as a judge, the scores given on the glasses on the same wines must be “close”.
If you think about it, Hodgson’s rationale is quite compelling: judging the same wine quite differently means you do not have the ability to distinguish between wines. And as a consequence, you should not be a judge.
But beyond being a method by which to judge candidates for tasting contests, members of LWC thought it would be interesting to see whether their ratings should be trusted. So for this latest tasting, they employed a radically “stripped-down” version of the Hodgson method: only one flight of five wines but an additional glass containing one of the five wines.
The results for the tasting are presented in Table 1.
The Picpoul and one of the glasses of the Bota Box got the highest scores. That means that in our five tastings this year, a Box has either won or tied for the best.
From a statistical standpoint, there was very little difference between the scores of the Picpoul, Pinot Grigio, Vouvray and South African Chenin Blanc. Quite surprisingly after its high Wine Spectator rating, the Greywacke Sauvignon Blanc from New Zealand got the lowest score and by a significant amount. And as has been usual, in our tastings, there was a negative correlation between price and score.
Testing the Tasters
We tasted two glasses of the Bota Box Pinot Grigio. So I have listed separately the scores for the tasting of each glass along with their average. The “Spread” is the absolute difference of each taster’s score for the two glasses. Three tasters gave the two Bota Box glasses the same rating – I (EM) scored them almost the same. In actual fact, all our “spreads” were low.
Table 2 shows how well our tastings correlated with one another as well as with the overall total. PP’s scores were closest to the average scores while JR was the “rogue” for this tasting.
In Table 3, the correlation between each taster’s choices and the average scores for all the LWC tastings are given. A high positive number indicates a taster is close to the overall average. For example, KM’s correlation of 1.00 in the “Heavy Reds” tasting means KM’s scores are exactly the same as the overall average. Low or negative numbers indicates the opposite. If you look at the average for individuals, it appears that 6 or 7 members had scores that corresponded at least to some extent to the overall average. The others? Not so much.
One other measure is worth mentioning. The Kendall W statistic indicates how much congruence there was among the ratings of the tasters. The Kendall W for the “Light Whites” was .159, up from the “Heavy Red Blends” of only 0.022, but not nearly high enough to instill much confidence in the tasters’ overall ratings.
Wine with Food
Since more wine is consumed at meals, it is odd that most wine tastings are done without food. In our Light Reds tasting, we asked tasters to indicate whether there were any changes in their scores after consuming the main course. We abandoned this approach for the Heavy Red Blends tasting because “it got in the way”. In the Light Whites tasting, tasters were not asked to hand in their scores until we were well into the main course.
Aside from the robust scores of the Box wines, our results reflect the common pattern of most tastings: the ratings/scores of the tasters are all over the map. As noted in my earlier reports, this could be either because the taste differences of the wines could not be picked up by the tasters or because the ratings/scores are dominated by the different taste preferences of the judges. Hodgson’s work suggests it could also be because LWC members cannot really distinguish between wines. But by our “stripped down” version of the Hodgson test of judge candidates (two of six glasses being tasted contain the same wine), our tasters did OK.