Comparing Coffee using Pattern Recognition | by Robert McKeon Aloe | Nov, 2020
Exploring coffee similarities using cupping grades and flavors
After trying a variety of coffees from around the world, I have often wondered if coffee grades or flavors could tell me which coffees were similar to each other and which were different. I am particularly interested if such comparisons could be used to help determine the best blends by increasing my understanding of how coffees relate to each other. Luckily, I found some data to investigate just that question.
I amended one of the databases from Sweet Maria’s. I previously pulled their Q-scores, but they also had flavor ratings for each coffee. So I went back, and I pulled flavor ratings for all the beans. I ended up with a slightly larger database than previously at 407 coffees.
This is the second of three articles using this dataset. The first one focused on how cupping grades and flavors relate to themselves and each other. The third article will focus on comparing Sweet Maria’s cupping protocol to the SCA protocol using the CQI data base.
Sweet Maria’s has a slightly different cupping criteria than the SCA criteria summarized below. It is curious to see how sweetness, uniformity, and clean cup compare to the other data. Where these 3 metrics for the SCA scale start out perfect and points are deducted, Sweet Maria’s metrics give a bit more insight into the coffee.
In pattern recognition, a feature vector is used to compare two items, usually two signals or images. A score is computed between the vectors to determine how similar or dissimilar they are to one another.
To compute a similarity score for two coffees, each vector of sub-metrics was compared to all the others using Root-mean-square:
These scores were computed for all coffees vs all coffees as seen below colorized by score and sorted by country:
However, in the breakdowns below, I adjusted each graph to be between 0% and 100%. 100% doesn’t mean perfect match, and 0% isn’t no match. It’s relative to the data in each chart where 100% is the maximum similarity (most similar) and 0% is the minimum similarity (least similar).
Let’s start with comparing regions to each other. African beans are similar to none, not even themselves using Flavor. For cupping grades, they are very similar to themselves and Central America but not much else. There is an interesting region in cupping grades of South America, Central America, and Indonesia.