Launch, grow, and unlock your career in data

November 25, 2010

STDs in the USA: Who should you avoid and where?


The CDC publishes an annual report on Health in the United States and included in the report is a “Chartbook”.  It’s 574 pages long, but you can skip to page 32 for the start of the charts.  There are some quite horrendous charts, especially the pie charts, that you will get a kick out of. 

You can download the data on CDC Wonder.  Once you create your query, you get a spreadsheet of the results, a map, and a bar chart.  The bar chart is particularly poor and only allows you to pick two dimensions.

I have downloaded the data and produced an interactive dashboard via Tableau Public.  Within this dashboard you can filter by Gender, Age, State and Disease.  In the end, I have included all of the views from CDC Wonder, plus much more. 

Some observations:

  • The infection rate for the total US has continued to climb for all diseases combined.  This is largely due to Chlamydia.
  • Syphilis infection rates declined from 1996-2001, but have continued to climb since.  Particularly concerning is the rate in Washington, DC.
  • In fact, Washington, DC has the highest infection rate for all three diseases.
  • Alaska’s overall infection rate in twice the national average, with the Chlamydia rate 86% higher than the national average.  This is definitely worth looking into.
  • The overall infection rate for females is more than double that for males.
  • Females between the ages of 15-24 are most likely to get infection, while males are most likely between the ages of 20-24.

There are many more observations and insights to be gleaned from this dashboard.  It is considerably quicker to identify outliers and trends with a simple dashboard like this than with CDC Wonder.  Imagine how much more useful the “Chartbook” would be if the CDC used Tableau.

What other observations can you make?

November 24, 2010

Failed Banks in the US: Popping the Bubbles

No comments

Simon Rogers from the Guardian created a visualization of failed banks in the US using Many Eyes.  The article can be found here.  Before I critique the visualization, take a minute to interact with the bubbles.

A quick word about data integrity.  The article on the Guardian references the data going back to 1935, and in fact, the data compiled does go back to 1935.  I made the assumption that the viz created by Simon went back that far as well, but then I couldn’t find a year filter.  If you look at the data that built the bubble chart, it only covers 2008-2010 and all three years are combined.  But 2010 isn’t even a complete year.  Come on already!  This can be extremely misleading and should be clearly noted, but it’s not! 

The initial view above is assets in failed banks as dollars per person. 

  1. The huge bubble for Nevada surely stands out, but why not a simple bar chart? 
  2. Notice that Total is listed as a State; that doesn’t make any sense.
  3. Which state is #2?  How about #5?  It takes some work.
  4. Do we really care about all 50 States or maybe just the top 10? 
  5. How much bigger is Nevada than the #2 state?

It’s so much easier to compare the size of the bars than the size of the bubbles.  From the bar chart, you can easily see the rank and the relative size of each bar.  It turns out that Nevada is 20x larger than Alabama.  There’s absolutely no way you can identify that in the bubble chart.

Change the Bubbles Size option to Number of failed banks.  Holy smokes!  What state is collapsing?  Oh, it’s not a state; it’s that pesky Total again.  The Total completely distorts the view and makes all other comparisons impossible.  Again, a simple bar chart will suffice.

Finally, since there are two data points that are highlighted by the article (assets per person and number of failed banks), a scatter plot provides one of the best means of seeing the relationship between the two.  In this view, you immediately see the five outliers I have labeled below.

Scroll through the other filters and you continue to see that including Total as a State completely wrecks any insight that could be gleamed from the viz.

The viz below was built with Tableau Public and it includes the data all the way back to 1935.  However, I decided to only focus on the last 20 years; this time period represents the most volatility since the Great Depression. 

NOTE: 2005 and 2006 are not included since there were not any bank failures listed on the FDIC website for those years.  I also excluded 2010 since the year is not complete.

There are three visualizations included.  The line chart (and size) represents the number of bank failures.  The color indicates the estimated loss (adjusted to the value of the dollar as of 31 Dec 2009).  When you choose a Year, you will get the corresponding map and bar charts.  The map and bar chart are sized and colored in the same manner as the line chart.

Naturally, I went straight to 1989.  Texas had 224 bank failures!  Then I went to the surrounding years and Texas was at the top of the list again.  It turns out that there was a banking collapse in Texas in the middle 1980s to early 1990s. 

According to the Dallas Morning News: “In the state's 1980s collapse, an energy bust and a subsequent real-estate wreck leveled hundreds of Texas banks, including longtime pillars of the economy.”

Sound familiar?

November 19, 2010

The Sally Field problem

No comments
via Seth's Blog by Seth Godin on 11/17/10
It doesn't really matter if we like you.

It matters if we like your work.

[Surprisingly, the converse of this rule also works].

Sometimes it seems as though people who are really concerned about one would be better off focusing on the other.


Great advice for a consultant, don't you think?

November 16, 2010

A more effective display of’s hourly forecast

1 comment

My daughter had a soccer game last weekend across town early in the morning and the weather was predicted to be quite cold.  Naturally I went to to check the hourly forecast, but this time something struck me.


Notice the vertical scale.  It’s not zero-based.  Sure, it’s simply showing the changes in temperature, but as I scrolled through the pages, the axis values changed, that is, the range did not stay consistent.  I also noticed that 12am is repeated, that’s kind of odd.  Fusion Charts is their tool of choice.

I would have used Tableau to create a simpler chart.  Unfortunately I lose the nice pictures across the top of each hour, which I really like, and the gentle shading (though why use gold for night hours…doesn’t gold mean sunny?), but I gain a zero-based scale and a line that I can color based on temperature, with the mid-point at 32 degrees.  Below 32 = red, above 32 = green.


In this view the variances in the temperatures are even easier to see.  You can see the huge change from 6am to 3pm and then the dramatic drop as sunset approaches.  Which view works best for you?

Coming to Atlanta - Tableau 6.0 Tour

No comments
The Tableau 6.0 tour stops in Atlanta this Thursday, November 18th. Be sure to let me know if you'll be there; it'd be nice to meet some of you and chat data viz.

+ 2:00 – Registration
+ 2:30 – The End of BI as You Know it
+ 3:00 – Tableau 6.0: Speed, Power and Style
+ 4:00 – Wrap-up
+ 4:30 – Reception: Cocktails, Networking and Hands-on Demos
Registration is FREE!! Click on the image below to learn more.

November 13, 2010

If the glove doesn’t fit, you must acquit!

No comments

The November Atlanta Tableau User Group (ATUG) meeting included over 30 people from industries including transportation, social media, consumer packaged goods, and data visualization consulting, just to name a few.  Over half the group had downloaded Tableau 6.0 the same day as the meeting. 

Our last three user group meetings have all included hands-on exercises and this time we challenged the groups to come up with a dashboard within 30 minutes based on 50 years of crime data.  That might seem like a short amount of time, but that’s the point.  We want the membership to realize the power of Tableau to let you gain rapid-fire insights.  Almost half the group was new users, so having them work with Tableau and putting the power in their hands is the best way to sell the product.

We formed three team of five (yes, I know that doesn’t add up to 30; there were people that left and others than hovered) and told the teams that the best viz would win a prize…t-shirts donated by Tableau.

First place went to Team 2 (as voted by their peers).

Team 1 came in a close second place with their dashboard that contains action filters on each sheet.


Well done to each team.  I’m looking forward to our next meeting on January 20th.  Remember to bring a friend.

November 12, 2010

Is it a cherry pie?

If you've been a follower of this blog for a while, you are well aware of my dislike of pie charts. I criticized a pie chart from a poll conducted by Digital Photography School in a blog post back in April. Today they published the results from another polls, but this time, I'm not too unhappy with their use of a pie chart.

In this case, their use of the pie chart is somewhat acceptable because:
  1. There are only three data points.
  2. The chart starts at the zero position.
  3. The largest slice is first (though it would be better if they were in descending order all the way around).
  4. The results can be easily discerned.
It truly hurts me to say a pie chart doesn't irk me, but I'll let this one slide, mainly because it's my favorite photography website.

November 6, 2010

What is a reverse time-series line chart with a non-zero axis?

No comments
I never knew such a chart existed, but alas I found one, and I hope it becomes extinct! Occasionally I scan through Many Eyes visualizations for ideas and/or blog inspiration. Let's review this simple line chart of Estimated Median Age at First Marriage. Click on the image below to get started.

When I first saw this I thought "Wow! What a huge variance over the years!" But then I looked a bit closer and saw that:
  1. The years are backwards. A time-series line chart should nearly always start with the oldest time period on the left. I can't even think of a way to interpret time backwards. Maybe the DeLorean from Back to the Future could help.

  2. The Y axis does not start at zero. This creates a misleading variance. It appears there has been a 700% variance from highest to lowest, but really it's only 35%.

  3. The Y axis should be rounded to a whole number; this is unnecessary precision.

  4. I find myself having to refer back to the legend to remind myself which sex is represented by which color. They are way too close in hue. Why not use blue for men and pink for women?

  5. The Years on the X axis are at an angle and squished together. If you must show all of the years, the turn them a full 90 degrees. In the end though, I believe the purpose of the chart is to show a trend, so I don't need to see all of the years, just enough so that I know it's a regular interval.

  6. One more thing. It's very subtle. This is NOT a regular interval after all. Between 1890 and 1940, there is only one measure per decade. Only beginning in 1947 is there data for every year. I would only display 1947-2003.
To address all of these problems, the chart could have been created like this.

November 5, 2010

Voter Motivation Made Simple

No comments
Possibly the best, simplest summary of voter intentions ever made.

Via the Indexed blog.

November 3, 2010

REMINDER: Register for the November Atlanta Tableau User Group Meeting

No comments

The next ATUG meeting will be November 30 from 1-4PM ET

Who - All ATUG members and guests

What – The November in person hands on meeting

Where - Norfolk Southern building located at 1200 Peachtree St NE. Atlanta, GA 30309 -Peachtree room


  1. The Greatest Show on Earth - Tableau 6.0
  2. Team project – If the glove doesn't fit, you must acquit!
  3. Open discussion – 2011 plans

-- This will be a hands on session - Bring your laptop and Tableau with you --


November 2, 2010

Transparency International: Corruption Perceptions Index

No comments
On October 26, 2010 the Guardian published the latest Corruption Perceptions Index from Transparency International which is the world's most credible source for measuring corruption.

According to Transparency International:
    The 2010 Corruption Perceptions Index shows that nearly three quarters of the 178 countries in the index score below five, on a scale from 10 (highly clean) to 0 (highly corrupt). These results indicate a serious corruption problem.
Download the data here.

To summarize the 2010 results:
  • Denmark, New Zealand and Singapore are tied at the top of the list with a score of 9.3, followed closely by Finland and Sweden at 9.2.
  • The most corrupt country is Somalia with a score of 1.1. Only slightly less corrupt are Myanmar and Afghanistan, with a score of 1.4, and Iraq at 1.5.
View the map produced by Transparency International here. While their version only contains data for 2010, my version of the map allows you to filter by continent, country or year.

Immediately obvious to me are that:
  • The rankings haven't changed much over the past three years.
  • You should avoid nearly all of Africa and Asia.
  • Western Europe, particularly the Scandinavian countries, are relatively devoid of corruption.
I suspect that the level of corruption could be related to poverty levels, but would need to prove it with the data.

If you want to see a horribly create bubble chart from which you cannot infer anything, go to Many Eyes.