The Lenox Wine Club just had its 9th wine tasting dinner. It was special – a challenge dinner! As I detailed in an earlier piece, a leading restaurateur/caterer in my home town was quite skeptical of how well box wines have been doing in our tastings (5 first place, 2 second place and 1 third). So I threw down the gauntlet! I said you choose the grape/varietal and a favorite wine of your in that category. So he chose Syrah and a Californian favorite of his. I chose the other 4 wines and one of them was a box – a Bota Box Shiraz. The Bota won. His selection while coming in 2nd, effectively tied with 2 other wines. The score and price were not correlated: the most expensive wine came in last. And one other interesting note: the rakings and scores differed significantly.
There are three basic goals for our tasting dinners:
- Taste wines similar enough to make comparisons meaningful;
- See if price matters, and
- Have a good time.
Scores and Rankings
We taste 5 wines. In past tastings, we have “rated” wines 1 to 5 with ties allowed. But ratings do not allow one to register differences in the intensities of likes and dislikes. Which should be used and does it matter? As was explained to me by Neal Hulkower and Dom Cicchetti at the American Association of Wine Economists’ annual meeting in Stellenbosch last June, it can matter. So for our recent tastings, we used scores with tasters given the following scoring instructions: 60-70 = Poor/Unacceptable, 71-79 = Fair/Mediocre, 80-89 = Good/Above Average, 90-100=Excellent. From scores, we can get rankings.
How Good Are The Tasters?
While there were 5 wines, we actually tasted one wine in two separate glasses. What is this all about? As I have described in an earlier posting, Robert Hodgson has his own winery and has been troubled by erratic ratings his wines had been receiving from judges at tastings. So he came up with a way to rate potential judges. The key to his method? Have the candidates do blind tastings that include more than one glass of the same wine. If the candidates do not score glasses of the same wine nearly the same, they are not competent to judge wines. Hodgson’s suggested overall scheme is quite rigorous: candidates must do four blind tastings of ten glasses each. We used his methodology in a less rigorous way: two glasses poured from the same bottle: we can’t be as sure this will single out incompetent tasters but the results are “indicative”.
1. Heartland Shiraz Langhorne Creek Directors’ Cut, 2010
With a 93 Wine Spectator (WS) rating, should be a great Australian challenger to the Ojai.
2. Lapostolle Syrah Colchagua Valley Cuvée Alexandre Apalta Vineyard, 2009
This Chilean selection got 91 from WS.
3. Lindemans Shiraz/Cabernet, 2013
I included a Shiraz-Cabernet blend because I have found this blend normally does quite well against wines where Shiraz is the dominant grape. This Australian selection was not rated by WS.
4. Ojai Shiraz, 2010
This Californian wine was not rated by WS for this year. However, it received ratings of 88 and 94 in 2009 and 2008, respectively.
5. Bota Box Shiraz, 2011
Like the Ojai, this is an American Shiraz.
The results of our blind tasting are presented in Table 1 with the wine with the highest score on the left. One of the two Botas won, followed at some distance by the Ojai and Lindemans. The most expensive wine got the lowest score.
Scores Versus Ratings
As mentioned above, there is a real issue on whether you score or rate wines. So we have started to do both to see if it matters. To date, it has not. But in this tasting, it really did. The Heartland that came in last when the wines were scored came in 3rd when the wines were ranked. And the Bota (2) that tied for last when scored was a distant last when wines were ranked (Table 2).
Table 2. – Total Scores and Rankings (Reversed)** To make scores and ranks easy to compare, the ranks are reversed with the highest ranked glass getting a 6 and the lowest getting a 1.
What explains these differences? Two primary possibilities:
- Differing intensities of likes and dislikes: the Heartland was really disliked or
- Tasters used random “starting points”. Table 3 gives the average scores (starting points) of our tasters.
Table 3. – Average Scores of Tasters
Note AV’s high average score. Is this because relative to the other tasters, AV really enjoys the wines more than the other tasters? Or is this somewhat random? Might he have just as well scored around an 80 mean? And JR: a 72.5 score is supposed to mean a wine is “fair/mediocre”. AV gave Heartland his lowest score, so it is not surprising that the Heartland position improves when we switch to ranks.
Testing Tasters’ Competency
The Bota Box “spreads” are given in the right hand column of Table 1. They are not impressive. 7 of the 11 tasters gave the two Bota glasses different scores of 5 or more points. TB actually scored one glass 85 and the other poured from the same box a 65.
What do we make of these findings and how definitive are they? Findings are more definitive if all tasters agree. Perhaps the best statistic for measuring taster’s agreement is the Kendall’s Tau: a higher number indicates greater uniformity among tasters. The Tau for our tasting was only 0.064 suggesting virtually no agreement among tasters.
So are we to conclude that our scores and ratings are just hogwash? A quote from Gilda Radner is in order: “Never Mind”.