Launch, grow, and unlock your career in data

July 20, 2010

Zero-Based Scale Violated

No comments
Today I was catching up on a webinar that Tableau hosted called "Bullet graphs for monitoring and analysis" by Stephen Few. The webinar was quite good, as you would expect from Stephen, but as he was walking through a demo of how Tableau had to implement reference bands in order to include bullet graphs as an option something struck me as odd. Stephen had violated the principle of the zero-based scale.

I just so happened to be re-reading the zero-based scale section in Stephen's book "Show Me the Numbers" yesterday and in the book Stephen says (p. 169):
    "You should generally avoid starting your graph with a value greater than zero, but when you need to provide a close look at small differences between large values, it is appropriate to do so. Make sure you alert your readers that the graph does not give an accurate visual representation of the values so that your readers can adjust their interpretation of the data accordingly."
There are several examples in the book that demonstrate both the effective use and abuse of this principle, but I was quite surprised that Stephen did it in his presentation. Now, I'm sure it was done simply for illustrative purposes, but should he have added a footnote to point this out? Tableau has a nice caption feature that would have worked well for this.

I was able to reproduce Stephen's graph using the Coffee Chain data source provided by Tableau, including using the exact range for the banding that Stephen used. This time, however, I started with a zero-base scale (which is the default in Tableau).

The profits look much more constant in this example. In Stephen's example, the changes month to month are much more exaggerated.

In the end, I'm sure it was done to merely illustrate the feature, but beware of how you use this feature yourself.

July 16, 2010

Growth Rate vs. Cumulative Growth Rate

No comments
I was watching a presentation/webinar today and the author was reviewing growth rates across time periods. Very straight forward data, quite easy to understand, yet the message was deceiving.

My mind immediately went back to Stephen Few's critique of the way BP was praising its efforts collecting oil from the disaster they themselves caused. In this case, BP was using a cumulative bar chart, which intentionally gave viewers the false impression that containment efforts were improving. Stephen quickly pointed out this fault and presented the data as individual measurements so that you could see the real story.

Back to the session I was attending. A chart was displayed that showed growth rates across time, but as individual points. The growth rate was measured from the previous point, not from the beginning of time. This leads you to believe that a negative growth that is less negative than the last point is actually improving results, yet in fact, the situation is just getting worse.

Think of this as the inverse of the problem Stephen addressed with BP.

I took some data for automotive sales in the United States to demonstrate what I mean. The blue line respresents the growth rate from one point to the next.

A good example to consider is October to December 2008. November experienced negative growth compared to October, but it's less negative than October, so the line goes up, giving you the impression the situation is improving. December is negative compared to November, but the line continues to go up because December is less negative than November.

I feel that a more proper way to tell the story is to use a cumulative line chart. The cumulative view is represented by the orange line. Consider the same time period. From a cumulative perspective, since November is negative compared to October, the line continues to decline. November's negative value has been added to October's negative value. The same situation continues in December.

Now, look at the different stories these lines tell. The blue line indicates you are only experiencing a 1% decline, yet the orange line says you've declined close to 50%.

Believe me, I know both lines are "correct." The point I'm trying to make is that you need to be sure to indicate the point you are measuring against. Very often sales figures are stated as "versus last year," but versus last year could mean many things.

In any event, September 2009 was a terrible month for the industry.

July 12, 2010

World Cup of Doughnuts

No comments
Since I am an avid follower of the Guardian Datablog, I have also become a follower of the Guardian Datastore on flickr so that I can see how people interpret the data provided.

George Primentas of The Missing Graph blog is a consistent contributor to the Datastore, but recently I think he must have started going to Dunkin' Donuts. I've read four blog posts of his about the World Cup and all four of them have doughnut charts. I've been wracking my brain trying to understand why he keeps using them, but I guess it must be because they're cute...or he has a sweet tooth.

In a way I like this particular infographic. It's visually appealing and the level of detail on the background photo interests me as a photographer, but it's a poor visualization of the data. The distinction I'm making is between an infographic and a visualization.

Much like pie charts, doughnut charts are almost always better represented as bar or column charts. Here is his latest example from the post World Cup 2010: Representation of the Continents.

World Cup 2010: Representation of the Continents

In the blog post, there's a set of "interesting facts" presented, but really, George is just stating the facts; there's not anything to gleam from them. That's not George's fault, the data simply isn't interesting.

I've come up with two ways to more effectively present the data for quicker interpretation of the performance of the different continents/confederations (though they don't make the data any more interesting).

In this line chart, you can quickly see the trends. The chart is clean and it tells the story the data wants to tell. I can get away with a line chart because this is a time-series across sequential points in time.

I prefer this bar chart over the line chart. For me, I can more clearly differentiate the continents and it's easy to see how far each continent went in the tournament; the more bars, the farther along they went in the tournament.

In the end, keep in mind whether you want to tell a story with the data or present an aesthetically pleasing graphic. They each have their place, but care must be taken when combining the two.

July 9, 2010

What's wrong with "visual" spend analytics?

No comments
I've been spending time lately watching lots of videos and attending lots of webinars, with the idea of continuous improvement. Today I watched a recorded session on Tableau Software's website titled "Visual Spend Analytics."

While the content and topic were quite interesting, given I work in the consumer packaged goods industry, I was disappointed with several of the visualizations that the author presented. John used Tableau for all of his demonstration, showing primarily views that were built prior to the webinar. I could tell, given my experience with Tableau, that John made changes to the reports/visualizations that Tableau would create automatically itself. In other words, some very basic best practices were broken.

Let's look at a few examples.

The first slide that caught me off guard, again given that the session was hosted by Tableau, was this pie chart.

While four slices isn't too awful, some of the "standard" design of the pie chart is.

  1. The largest slice should be first and it should start at the zero position.
  2. The legend is in an order that makes sense, but the slices don't match that order.
  3. The colors are way too strong.
  4. The third bullet on the left tells a good message, but if you want to see the true impact of that 97%, you need to start at the zero position.
If you insist on a pie chart, here's a better way to do it that addresses all of the rules I've outlined above, but again, a pie chart is not the most effective way to assign quantitative values to 2-D areas.

99 times out of 100, a bar chart is a better alternative to a pie chart. I would display this same set of data like this. I think this clearly demonstrates that there is high value placed on spend analysis.

A bit farther into the presentation, John was showing a table he built for a client. Let's examine it:

A few issues that I see include:

  1. The title of the graph is black with white font, making it difficult to read and it garners too much attention. A light gray background with a black regular font works better.
  2. The text in the table is blue. Why? What value does it add?
  3. A darker or double line to separate the rows from the total would make the total more distinct.
  4. Numbers (as well as their headers) that represent quantitative values should ALWAYS be aligned to the right. Aligning the data to the right allows for quicker comparisons of the values; it's much easier to find the bigger values.

Towards the end of the session, John showed how you can wrap all of the visualization together in a dashboard. While this makes perfect sense, he wasn't careful enough about the design. Let's look at two examples.
First, let's look at this one titled "Dashboard - Sub-Category":

Again it looks like John overrode the best practices that Tableau has built into it and it has taken away from the presentation of the data.

  1. The dashboard title is meaningless. Every dashboard title should be a statement/phrase that captures the attention of the reader and signals what they are looking at. An example might be: "Expected savings were not achieved in the most recent quarter"
  2. The background is of the entire dashboard is gray, making it difficult to read the black font against it. A white background is nearly always preferred.
  3. As before, the chart titles are black with white font. Choose a light gray background and a regular black font.
  4. Six pie charts. Really? Six? Why not bar charts? The purpose appears to be to show rank, in which case a bar chart is preferred.
  5. Each set of two pie charts can be combined into one bar chart. The results would be three bar charts.
  6. The color shelves and quick filters just seem off for some reason. The placement strikes me as needing cleaning up. I'd have to work on this.
The last dashboard John demonstrated was a map of their office sites.

Neat I guess, but I don't see any value in making it interactive. If this is a banner for a home page, then a guess the layout is ok, but if it's meant to give some useful information, then it needs some work:

  1. When someone views a dashboard, their eyes automatically go to the upper left corner of the chart. The first thing they see here is the company logo. This would be best placed on the upper right.
  2. The contact information should then be shown below the company logo, like he has it, but on the right.
  3. The quick filters should be on the right, whether this is a banner for a home page or not.
Every time I attend a webinar I learn something. While most of them are very well done, there are occasions where the presenter should have asked someone with a significant knowledge of visual data display to review the materials.

As part of the wrap up for the session, John said that the company was delivering "innovative analytics" that position them to be a "value added" vendor. Can they add value? Certainly, but they could add a lot more by following more best practices. Are the analytics "innovative?" I don't think so, but then again, I already work with this same type of information.

Atlanta Tableau User Group Meeting - July 29 @ 1PM ET

1 comment

The next ATUG meeting will be July 29 @ 1PM ET

Who - All ATUG members and guests

What – The July in person hands on meeting

Where - Norfolk Southern building located at 1200 Peachtree St NE. Atlanta, GA 30309 -Peachtree room


  1. Networking time
  2. Follow up on open questions from June meeting - Andy Kriebel - Coca Cola
  3. Hands on training with Bullet Graphs – John Hoover – Norfolk Southern
  4. Q and A session – have your questions handy or email them in advance
  5. Stump the chump (Baffle the BI Jedi) – all hands on deck
  6. Team project – tick tock, tick tock - show us what you’ve got

-- This will be a hands on session - Bring your laptop and Tableau with you --

July 6, 2010

Easy to see fireworks

ChartPorn has once again provided a visualization that is worth improving. The mess created this time is caused by the website Bad Firecracker.

Here are the charts to be critiqued:

I have a couple of big issues with these bar charts:
  1. The grid lines are dark, distracting, horizontal, vertical, and overall incredibly distracting. I think the purpose of the charts is to show a trend, but with all of these grid lines, how are you supposed to see the trend?
  2. The data labels are unnecessary. Again, if you're trying to show an overall trend, the focus needs to be on the trend, not each individual point.
  3. Each chart has a color legend out to the right and it refers to the vertical axis. Why not properly label the axis in the first place? The color isn't needed unless you're showing all of the charts together in one dashboard.
Here's how I would do it:

  1. Show all of the data together so that comparisons can be made.
  2. Since all of the data is on one chart, use colors to differentiate the measures.
  3. For the purpose of comparing their charts to mine, I created bar charts. What you will notice though is that there are no distracting grid lines and no data labels.
  4. The axis titles clearly indicate what data is displayed
  5. Only show the year once, at the bottom of the chart.
  6. A data table is at the bottom so that if someone is interested in knowing the exact number, they can quickly look it up. This allows the data labels to be removed from the chart.
Surely you see that I have line charts to the right of the bar charts. I prefer the line charts to the bar charts because they're easier to read and the data-to-ink ratio is low.

Follow these simple principles as you create your charts. Your readers will thank you. Keep it simple.

July 2, 2010

Implicit Comparison

1 comment
The July 1st ChartPorn daily blog post linked to an interesting interactive graphic for economic indicators from the Wall Street Journal, which the author calls "Danger Signs."

The first graph that appears is called Summer Chills. There are two graphs, one for consumer confidence and one for yield on the 10-yr treasury. When I first looked at this, my impression was that the two charts were related. If you carefully read the caption, you can see that they aren't.

I see some issues between the graphs:
  1. The time frames are different. The CCI graph goes from 2007 through June 2010 and it's by month. However, the 10-yr treasury yield is for the current year and it's by day.
  2. The scale on the 10-yr treasury yield graph is not zero-based, which leads to variances between points that are greater than the relative variance.
  3. The reference bands are a bit distracting.
  4. The level of precision on the mark on the 10-yr treasury yield graph is not necessary. Go with two decimals since that's how it's typically reported.
Some things I like:
  1. Highlighting the current period and adding a data label
  2. Chart headers are clear
  3. Fonts used
  4. Line colors stand out, grabbing your attention
Here's how I would represent the data:

The improvements I have made (for each problem noted above):
  1. The time frames are now consistent; they start at January 2007 and go through June 2010. They also are by month only.
  2. The scale on the 10-yr treasury yield graph is now zero-based.
  3. The reference bands have been muted and I've started them at the 2nd range, not at the bottom.
  4. The line for each measure is a different coloring, triggering you to notice they are distinct.
  5. Finally, I added a dual-axis line chart that shows the relationship between the measures.
While the point the author is trying to represent is valid, properly created graphs are essential to tell the true story.