Coffee Data Science
Coffee Roast Variables Compared
Using a larger set of data from the past year to learn
I have been roasting coffee for 10 years, and this past year, I focused more and more on collecting data on my roasts. From this set of data, I have curated a dataset to understand how different roast metrics relate to another.
Even though I had 400+ roasts this year, I reduced my dataset based on this criteria:
- Weight-loss > 83%
- End Bean Temperature < 225 C
Outside of those parameters, I didn’t have enough roasts to have reliable data. I also excluded roasts that didn’t have the given data. The result is 158 roasts.
Data Types
Of the data collected for each roast, I reduced the analysis to a few key variables listed based on importance:
- Roast Level or roast color
- End Temperature of the Bean
- Weight Loss between the green beans and the roasted beans
- Density of the beans
- Moisture of the beans
- Gap Rate (as defined by the Omix)
- Water Activity
I measured the bean metrics (Color, Density, Moisture, Gap Rate, and Water Activity) using the DiFluid Omix. End temperature was from the bean probe in the Roest. Weight loss was measured using a scientific scale (+/- 0.001g).
Metrics
Correlation is a metric between -1 and 1, and I plot this metric as a percent because that is easier to read for me:
- -1 (-100%) means the two variables are inversely correlated
- 1 (100%) means the two variables are correlated
- 0 means the two variables are not correlated
Correlation is not causation necessarily.
Analysis
I started by correlating the metrics to each other to understand how well they are connected. I highlighted anything that wasn’t no correlation (near 0%):
- Orange is some correlation (>40%).
- Yellow is correlation (>50%).
- Green is good correlation (>70%).
For taste, roast level or color seems to be the closest to taste, but I haven’t studied that as much.
Roast Level, End Temp, and Weight Loss are all very correlated. that’s very interesting. Water Activity and Gap Rate don’t seem to be too correlated to anything.
When looking specifically at End Temperature, I split the beans to the different regions and found more nuance in how things were correlated. Oceania didn’t correlate temperature with weight loss but had a much higher correlation to color. Centeral America also bucked the trend for bean density.
I graphed some of this data to get a better idea how it looks. I added a few data points with robusta roasts just to see how they would fall.
The color trend with End Bean Temperature was interesting as well. Even though it was a little noisy, the trend was more clear for some bean origins than others.
Weight loss also has a fun trend, but it doesn’t look very linear, and I see a lot of outliers.
Bean density seemed to have two trends going on at the same time.
These variables have been fun to track, and my aim is to find interesting data. By analyzing my data, I can see where to make some cuts from what data I am collected.
This study has resulted in me decided to stop measuring Water Activity and Gap Rate because both variables did not seem useful for roasted coffee.
If you like, follow me on Twitter, YouTube, and Instagram where I post videos of espresso shots on different machines and espresso related stuff. You can also find me on LinkedIn. You can also follow me on Medium and Subscribe.
Further readings of mine:
My Second Book: Advanced Espresso
My First Book: Engineering Better Espresso