by Elliott Morss, Morss Global Finance
The Lenox Wine Club just had its 8th wine tasting dinner. The focus was on German whites – from the Mosel, Pfalz, and Rheingau regions. Two Mosels took the top spots. There were two other interesting results:
- In the seven previous tastings, a box wine has come in either first or second. Not this time.
- In recent tastings, the Club has both “scored” and “ranked” wines. This was the first tasting where scores and rankings differed.
There are three basic goals for our tasting dinners:
1. Taste wines similar enough to make comparisons meaningful;
2. See if price matters, and
3. Have a good time.
We taste 5 wines. In past tastings, we have “rated” wines 1 to 5 with ties allowed. But ratings do not allow one to register differences in the intensities of likes and dislikes. Which should be used and does it matter? As was explained to me by Neal Hulkower and Dom Cicchetti at the American Association of Wine Economists’ annual meeting in Stellenbosch last June, it can matter. So for our recent tastings, we used scores with tasters given the following scoring instructions: 60-70 = Poor/Unacceptable, 71-79 = Fair/Mediocre, 80-89 = Good/Above Average, 90-100=Excellent. From scores, we can get rankings.
The German Whites Tasted
1. J.J. Prum 2011 Estate Kabinett (Mosel)
J.J. Prum is a highly respected German vintner of long standing. This wine earned a 90 rating from Wine Spectator (WS).
2. Maximin Grunhaus 2011 Abtsberg Spätlese (Mosel)
Carl von Schubert, the vintner of this wine, is highly respected in Germany. The wine received a 92 rating from Wine Spectator. I tried to get his 2011 Auslese because it received an even higher 94 from WS. But sadly, I live in Massachusetts where I must buy from wholesalers and they did not have it.
3. Burklin Wolf 2009 Kabinett Trock (Pfalz)
Burklin Wolf is also a well-know German vintner. And unlike the two wines above, this one came from the Pfalz region of Germany. The wine did not get as high a WS rating as the first two. It got an 88: a rating that WS it describes as “very good”.
4. Dr. Nagler 2011 Rudesheimer Berg Kabinett (Rheingau)
Dr. Nagler’s wines are grown on the Rheingau slopes above the river. This is an outstanding Rheingau wine receiving a WS rating of 91.
5. Bota Box Riesling 2011
We had hoped to get a box wine from Germany. In fact, we had found Wurtz 3L box from Rhienhessen in one of the wholesaler’s catalogues. However, the distributor did not have it in stock. So we selected a favorite box brand from our past wine tastings instead: the Bota Box. Unlike the Bota Box Malbec that is from Argentina, the Bota Box Riesling is from California.
Table 1. – Wine Summary
The results of our blind tasting are presented in Table 2. The Spätlese was the winner followed by the Kabinett. For the first time ever in our tastings, the box wine did not score in the top two. This might have to do with the fact that we were forced to use a California Riesling box that did not have the body and sweetness of the big German wines. It is notable that some of our drinkers do not like the full-bodied sweetness of the winners. You can see this very clearly in TB’s ratings: he gave all German wines low scores but scored both glasses of the Bota at 90.
Table 2. – German Whites – Blind Tasting Scores
Testing Tasters’ Competency
While there were 5 wines, we actually tasted one wine in two separate glasses. What is this all about? As I have described in an earlier posting, Robert Hodgson has his own winery and has been troubled by erratic ratings his wines had been receiving from judges at tastings. So he came up with a way to rate potential judges. The key to his method? Have the candidates do blind tastings that include more than one glass of the same wine. If the candidates do not score glasses of the same wine nearly the same, they are not competent to judge wines. Hodgson’s suggested overall scheme is quite rigorous: candidates must do four blind tastings of ten glasses each. We used his methodology in a less rigorous way: two glasses poured from the same bottle: we can’t be as sure this will single out incompetent tasters but the results are “indicative”.
The Bota Box “spreads” are given in the right hand column of Table 2. They are not all that bad, with 4 tasters rating the two glasses of Bota Box the same.
Scores Versus Ratings
As mentioned above, there is a real issue on whether you score or rate wines. So we have started to do both to see if it matters. To date, it has not. But in this tasting, there was a discrepancy as shown in Table 3.
Table 3. – Total Scores and Rankings (Reversed)*
* To make scores and ranks easy to compare, the ranks are reversed
with the highest ranked glass getting a 6 and the lowest getting a 1.
Note that the rankings and scores on the highlighted Bota and Dr. Nagler differ. Why might this happen? Two primary reasons: differing intensities of likes and dislikes: the Bota was liked a lot more than the Dr. Nagler. And if that is what is really happening here, good. But the differing results might also be the result of tasters using random “starting points”. Table 4 gives the average scores (starting points) of our tasters.
Table 4. – Average Scores of Tasters
I am “EM” and thought the heavy German whites were excellent. So I make no apology for my high average score. TV’s average was 72, just above the “Poor/Unacceptable” category. Were they really that bad?
What do we make of these findings and how definitive are they? Findings are more definitive if all tasters agree. Perhaps the best statistic for measuring taster’s agreement is the Kendall’s Tau: a higher number indicates greater uniformity among tasters. The Tau for our tasting was only 0.122 suggesting very little agreement among tasters.
So are we to conclude that our scores and ratings are just hogwash? A quote from Gilda Radner is in order: “Never Mind”.