Caution: Using Tableau live in a meeting may lead to lots of work!

I’ve been working on a project for the last couple of weeks, the centerpiece being analysis done in Tableau.  While it took 90% of the time to cobble together the data, it only took about 30 minutes to an hour to build the Tableau workbook.  I used Tableau live in the meeting, exposing the incredible functionality that parameters and data blending provide.  Here are some of the comments I heard:

“How can we get this tool?  It would have saved us weeks of work that we had to spend building bubble charts in Excel.”

“Unbelievable!  It’s so easy to see our opportunities.”

“Do that again!”

“I like what you’ve show us, but I’d love to see ‘X’ vs. ‘Y’ (can’t tell you what).  Can you add that in the next version?” (I proceeded to create a new worksheet and show her exactly what she wanted in under 30 seconds.  To say she was blown away would be a huge understatement.)

The comments and feedback went on and on.  Every single person in the meeting wanted to get their hands on it.  I’ve distributed the workbook via Tableau Reader.  What about that guy that spent all of that time in Excel?  He’s going to buy a license any day now (I better start getting some commissions from Tableau!).

Unfortunately, one of the outcomes of showing these capabilities is that it leads to even more projects.  I guess that’s a good problem to have.

Creating vs. Finishing – A Follow-up to OnFooty for the Premier League

On Football has quickly become one of my favorite blogs to follow (Twitter) and one of the most interesting posts of late was an analysis of “finishing” in Major League Soccer.  OnFooty analyzed conversation rate vs. the number of shots for each team over the total season.  Typically conversion rate is calculated as goals/shots on target, but OnFooty has a terrific case for using shots in lieu of shots on target.

In conclusion, Sarah said “The upper right quadrant (teams that are above average at both creating and finishing) contains the MLS Cup Finalists and the number one seeds for each conference.

This quickly led me to the Premier League and interestingly enough, the same analysis holds true through 30 rounds (29 for some teams) this season.  The top four teams are all in the upper-right quadrant.

If you follow English football, you might note these things as well:

1. Tottenham needs to improve their finishing if they want to secure a returns spot into the Champions League qualification round (for the 4th place finisher)
2. Blackpool (one of the most exciting teams to watch) score at a terrific rate, but since they’re in 15th place in the table, they obviously need to work on their defense (their philosophy is to outscore their opponents since they know they can’t stop them
3. Wigan has the worst conversion rate and sits last in the table…enough said

Play with the view below.  Choose your vertical and horizontal axis, check out the different tabs.  What do you see?

I’m compelled to look historically to see if this theory holds true season after season…maybe one day when I have more time on my hands (I’m busy tallying stats for my daughter’s soccer team).

I HATE the Cowgirls!

I grew up in Philadelphia and if there’s one thing you learn, it’s to HATE the Cowgirls!  Jerry Jones is so egotistical, but then again, he sure knows how to run a profitable business.

View the original blog post here.

The magic washing machine: Hans Rosling

What was the greatest invention of the industrial revolution? Hans Rosling makes the case for the washing machine. With newly designed graphics from Gapminder, Rosling shows us the magic that pops up when economic growth and electricity turn a boring wash day into an intellectual day of reading.

Climbing the mountain of the NBA's all-time leading scorers

I absolutely love this chart.  Why do I like it so much?

1. The active players are clearly marked
2. It’s uncluttered
3. But my most favorite is the ascending ranking; it gives me the impression that Kobe is climbing a mountain, which he really is given how far he needs to go to get anywhere near Kareem

The only thing I would have changed is the background.  The dark gray background makes the blue/gray bars a bit tough to see.

A 17-slice pie chart is a bit excessive

The commentary included with the pie chart below is fantastic, but the pie chart is absolutely horrible.

A simple bar char is much more effective.  With the bar chart, it’s so much easier to see just how favored Ohio State is to win the tournament.  Also, there’s no color legend, therefore you’re not tempted to assume the colors mean something.

With the bar chart, I can clearly see that Florida and UNC are way overrated and Texas is way underrated (the seeds are in parentheses).

Cobb County schools with a high reduction in student absences are down more than 3 to 1 to high increases

There was a biased comment made by Mr. Sweeney (email him here to call him out on it) during the discussion phase last week of the Cobb County School Board meeting.  He started spouting off schools that have had significant INCREASES in student absence.  While the schools he mentioned did indeed have a high increase in absences, he failed to mention some key facts.

Only 15 of 199 (12.6%) schools had a > 10% increase in absences, meanwhile 48 of 119 (40.3%) schools had a > 10% DECREASE in absences.  It's disappointing that none of the other board members called him on this (though I know three of them never would).

Why The Newspaper Industry Collapsed

From the Chart of the Day, an excellent depiction of the fall of newspapers.  I like everything about this chart: it’s clear, concise, to the point and very easy to read.

Facts are friendly. Why the Cobb County School Board should reinstate the balanced calendar.

Today I presented the information below at the monthly Cobb County School Board meeting.  As some background, the previous board approved a three-year “balanced” calendar that provides more frequent breaks throughout the year.  However, a new board was “elected” (some ran unopposed) and they decided that, despite the overwhelming evidence in support of the balanced calendar, they wanted to go back to the “traditional” calendar.

Naturally, I used Tableau to analyze the data that was provided by the county itself (some board members were using their own data which could neither be validated nor would they share).  The key points:

1. The Balanced Calendar has resulted in a 26% reduction in teacher absences for a total savings of \$987K. Previously when this data has been presented, it has not factored in the additional savings that result from a \$10 decrease in daily substitute pay.
2. The Balanced Calendar has resulted in a 4.2% reduction of student absences, with over 75% of school reporting a reduction in absences.  In the past, analysis has not considered that there were 88 schools days in 2010 compared to 85 in 2011.  Another interesting fact is that a school board member reported incorrect information.  I corrected him.
3. Utility cost trends are in alignment with my costs at home.  My assumption, though likely too broad, is that my utility costs are in alignment with other residents of Cobb county.

The board is supposed to vote on the calendar again today or tomorrow, but I’m not holding out hope that the board members that voted against the balanced calendar will be swayed by facts.

I closed my comments with the following from Seth Godin:

Before we invest a lot of time in evidence-based discussions, please tell us what evidence you would need to see in order to change your mind. If the honest answer is, "well, actually, there's nothing you could show me that would change my mind," you've just saved everyone a lot of time. Please don't bother having fact-less discussions.

HR output – 1920-2008: Which teams should be feared?

As a follow up to the comments from my recent blog post on ranking a team’s total HR output, I have put together this simple interactive viz using Tableau Public.

The data covers 1920 through 2008 (1920 was the start of the live ball era) and measures three stats:

1. Total home runs over the history of each franchise (the oldest teams will nearly always be ranked at the top)
2. Total home runs per year for each franchise
3. Total home runs per game for each franchise

Click on a team name and the viz will:

• Highlight the bar charts to allow for comparison within the three categories
• Update trend charts to reflect only the team selected

What do you see?

What’s the best measure of a team’s HR ranking? Three alternative measures.

Joe Mako posted the following comment about my post on all-time HR rankings: “What about consideration for the year? Seems to me that comparing the total number of home runs of teams that have been around for a over 100 years to teams that have been around for just a few dozen years does not seem useful. Is there data that tracks the number of home runs by team per year? If so maybe the data could be normalized, or viewed differently.”

Three measures that are likely a better indicator include HR/Game, HR/Year, and HR/Win.  Colorado now comes in at #1 in all three categories, with Arizona in the top 3 in all three of these categories.  The Yankees, Giants, Cubs and Braves are all much, much farther down the list now.  Remember, normalizing the data can very often be your friend, unless of course you want to intentionally skew the data.

All-Time Home Runs – Charting it so you can read it

Chart of the Day has recently become one of my favorite blogs to follow.  I like the content/writing, typically the visual displays are good and there are lots of posts about sports.  There have been more posts about baseball recently as spring training has already begun.

Today they posted this chart:

Quick, which team is #5 all-time?  Can’t find it very easily can you?  This chart has a few basic design flaws:

1. A bar chart would be more effective as it would rotate the labels so that they are easier to read.
2. The sorting could be done better.  It’s sorted by when they joined the major leagues (see the numbers inside the bars), but that’s distracting to me.  My brain is trying to order these in a more logical way
3. The gridlines are too strong.  Yes, they are gray, but they could be more muted; they grab too much attention

Here’s how I would have presented the data (note that my data only goes through 2008, but you get the idea):

Quick, which team is #5 all-time?  I  bet that only took 1-2 seconds.  Much easier, wouldn’t you say?  Notice that I’ve addressed all of the problems above:

1. The team names and stats are much, much easier to read
2. The teams are sorted by the number of home runs in the franchise’s history, which makes more sense for this chart.  You could still have the year in the bar, but it doesn’t add any value to the chart, therefore it’s chart junk
3. The gridlines are much less noticeable, but can still be used for reference if needed