January 23, 2017
The Pitfalls of Averages of Averages
- There was a region named “Total (all TLAs)” that represented the total but was mixed in with all of the other regions.
- The data was an index, which is calculated as a weighted number for each region, month, and visitor type.
- When using indexes in the dataset, using an average aggregation is appropriate as long as you only use it at the individual region, month, and visitor type level. You can’t use an average of the average to represent the total.
I saw a few people make the mistake of using an average of the average very early on (I won’t shame them publicly), so I thought it would be appropriate to explain averages of averages, why they don’t calculate “accurately", and how people should be using them.