Launch, grow, and unlock your career in data

September 26, 2009

The Big 3

My friend Dan Murray left a comment on my previous post about Bobby Cox's retirement. Dan brought up an interesting point that Maddux, Smoltz and Glavine might be tougher for the Braves to replace than Cox. I thought I'd look into it.

Some notes to the data:
  1. This data covers 1991-2008 (the entire period that they pitched for the Braves). During this time Smoltz was a Brave for all 18 years, Maddux was with the Braves from 1993-2003 and Glavine from 1991-2002.
  2. Smoltz was the Braves closer for the 2nd half of 2001 through the 2004.

I find all of this quite impressive. Sure, they made a lot more money, but they backed it up. Their ERA was 21% better than the rest of the team, they won 9% more games, clearly had a better ERA year after year, and more than pitched their fair share of innings.

In fact, the most impressive stat to me is that between 1993-1998 the Big 3 threw 50.3% of the total innings for the Braves. Three pitchers throwing that many innings is simply incredible. I can't imagine there are too many, if any, teams in the last 50+ years that could even come close to that.

September 24, 2009

The Bobby Cox Era

1 comment
Long-time Braves manager, Bobby Cox, announced his retirement today effective at the end of the 2010 season. Bobby took over the managerial duties for the Braves in the middle of the 1990 season. The next season started a string of 14 consecutive Division titles, a record likely to never be approached again.

The Braves were dominant in all facets of the game compared to the rest of Major League Baseball. He's going to be tough to replace.

September 19, 2009

Katrina Contracts

I read a story recently about Halliburton and the incredible number of contracts and money they scored from Hurricane Katrina. Oh by the way, George Bush was President, Dick Cheney was VP and Cheney was CEO of Halliburton from 1995-2000.

That led me to finding how Katrina contracts were being awarded by government Department. It's been tough to identify just which contracts were awarded to Halliburton since most of them are to subsidiaries. I'm working to gather all of those. The data was gathered from the Federal Procurement Data System (FPDS).

This is a very simple analysis, well not really much analysis at all, but in this instance, given the amount of information I want to display, I actually think the pie chart works better. Thoughts?

September 16, 2009

Why was Dave Stewart picked on?

I was building out the pitching stats of baseball database I downloaded from Baseball Databank and happened to look at a trend of balks. I noticed this HUGE spike in the number of balks in 1988. I figured there had to be something wrong with the data, so I turned to Google. I found this terrific blog post that explains exactly what happened in 1988.


So what happened? In 1988, the Oakland Athletics led the majors with 76 balks or just over 8% of the total. 1988 accounted for 37% of the A's balks from 1985-2008...that's ridiculous!

Dave Stewart set a record with 16 balks. Order was restored in 1989 and Dave Stewart had zero balks. I do remember Dave Stewart going to the plate quickly, so maybe it was the change in the rule that says "the pitcher must come to a single complete and discernible stop" that got him.

September 11, 2009

Two ways to look at job losses

Time Magazine blogger Justin Fox took a wonderful jab at a chart that Nancy Pelosi's office posted about the current recession. The Pelosi chart below makes it look like the sky is falling, but it looks at the data in terms of total job losses.

Justin did a great job of creating a different, perhaps more realistic, comparison of the current recession to past recessions by looking at job losses as a percent decline.

What an incredible difference if you just look at the data slightly differently.

It's kind of scary that there isn't someone in Pelosi's office that would know to at least consider looking at the data as a percent decline. This spins the data in a more positive light, or maybe I should say, a less negative light.

Read Justin's full post here.

September 10, 2009

Is pitching still more important than hitting?

I'm a big fan of the Atlanta Braves. Last night their TV broadcasters, were having a discussion about which league is stronger, the American League or the National League. The age old adage is that pitching wins, but Joe Simpson gave a pretty strong argument in favor of hitting and thus, that the American League is the "stronger" league.

Interleague play, which started in 1997, is a good place to determine how the leagues match up.

In 2004, they were separated by only one game, but since then, the American League has been winning by wide margins. This fact, in and of itself, does not support Joe Simpson's argument.

I extend the research a bit further by comparing ERA and batting average. I used Tableau to analyze the data.

On the top, it's clear that the American League has had a higher ERA and a higher batting average every year since 1985. I don't think it's a stretch then to say these charts in conjunction with the interleague comparison indicate that hitting is more important to winning than ERA, and Joe's argument is now supported.

The scatter plot on the bottom left again shows that the American League typically has a higher ERA and batting average. This is really just the two line graphs compared to each other.

Finally, the scatter plot on the bottom right extends the first scatter plot to include the league that won the World Series.

I find it interesting that when the American League wins the World Series, they win it because of hitting, whereas when the National League wins the World Series it's due to pitching.

Maybe this is all related to the DH. Hmmm....

September 8, 2009

Inflated Salaries

No comments
Stats, stats, stats. That's what baseball is all about. I found a great site,, that provides just about every stat imaginable. I have a baseball database as well, so I like to find sites like this and compare them to visualizations I build on my own. The chart below references salary data over five-year periods.

Source: Baseball Almanac

There are some fundamental flaws with this chart:

1) The 3D bars are unnecessary and make it very difficult to determine an approximate value for the bar.

2) The color choice would makes it virtually unreadable by those that are color blind.

3) The data is misleading. What good is a chart if the data is misrepresented? The stacked bar chart artificially inflates the average salary. It is increased by the amount of the minimum salary, whereas they are separate measures. I moused over the average salary bar for 2005; the value is $2.6M, but without mousing over the bar, I would have assumed the average salary was nearer $3M.

4) The header indicates "Salary Data Appears in Five-Year Increments." What does that mean? Is it a five-year average, is it every fifth year?

5) The chart leads you to believe that the average salary has increased every year since 1970, but it hasn't. Below is the average salary by year from 1980-2008. Visualizing this as a line chart reveals that salaries indeed did NOT increase every year.

Oh, by the way, salaries sure have escalated dramatically since the 1994 Players Association Strike. So much for revenue-sharing controlling spiraling salaries.

September 7, 2009

Bubbles Bubbles Everywhere

The New York Times ran an article written by A.O. Scott back in November. The purpose is not to critique the article, but rather the, gasp, bubble chart used to rank media consumption hours.

I'm a big fan of Stephen Few and have learned a lot from Stephen and his books about effective visual design. Stephen point out that "Visual perception in humans has not evolved to support the comparison of 2-D areas, except as rough approximations that are far from accurate."

As soon as I saw this ranked bubble chart, I immediately began exploring other, more effective display mediums. Here are some examples.

I wanted to start by trying to find a way to use the bubble charts. The only method I could employ was to add color to the bubble charts, but I don't gain much at all.

Of course, the simplest way to rank data is through a simple bar chart. The first example is as intuitive as it gets; it's very easy to compare the relative size of the bars. The only purpose of this graph is to emphasize the rank.

I took this a step further. When reviewing Scott's bubble chart, I had the impression that he was emphasizing the percentage of time that we spend in each of the different medium. That led me to a bar chart that shows the contribution to the total. It's the same graphic as the ranking chart above, but this time I intentionally labelled the bars to emphasize the contribution of each activity.

I'll conclude with one of the least effective displays, the dreaded pie chart, but I think one of the pie charts is actually a bit effective. The first pie chart displays every category, which makes it impossible to compare the sizes and has way too much information.

I decided to group all but the top two categories into an "other" category to simplify the pie chart and I also ensured that they were ranked by contribution as you made your way around the pie.

Which display do you like best? Which display is most effective? My vote is for the bar chart displaying the contribution to the total.

September 3, 2009

A correlation justified

No comments
I read an article by Chris Pereira the other day that made an attempt to link the recession to video game usage. In fact, the subtitle is: "With economy troubles, gamers are buying a larger percentage of used games." While there is a link between video game usage and the current recession, this a pretty bold statement given the analysis presented (which Chris built on from a Time magazine article). I'll show you my evidence that backs up Chris in a bit.

The first chart presented by Chris (via Nielsen) analyzes video game usage trends over the last four years. I don't think this chart proves anything more than the fact that video game usage has increased year over year for the last four years...that's it. You simply can't say that the upswing in time spent playing video games in 2009 is due to the recession.

The second chart tries to make the same correlation, but again, I see the same type of trend from 2007-2009 that you see in the first chart. People are just playing more variety of games...that's all. There is absolutely no way from this chart to draw a conclusion that used video game sales are in any way related to the recession.

Ok, so how can I say that it's just a matter of fact observation? Look at this chart (created with Tableau). If you look at the four year trend of hours played, there is a continuous increase is hours played. You might say "I don't see this in 2009." But in each year in May, you see a big decrease in hours played. As Chris says in his evaluation: "May has traditionally seen a drop-off each year (blame the improving weather)." I can see the reasoning there.

You can also see that from 2007-2009, a similar trend can be derived from used game sales.

I tried to find a correlation between hours played and used game sales (since the article says both are related to the recession), but the facts don't support this.

I decided to look at other factors that could correlate the recession to video game hours played to help support Chris' argument. We have been hearing quite a bit regarding the housing crisis and it's impact on the economy. I gathered economic data from FRED and was able to demonstrate that a prolonged decrease in housing starts does indeed indicate that a recession is coming.

My next step was to identify correlations between housing starts and the increase in video game hours played. Based on this analysis, I believe that a much stronger argument can be made the an increase in hours played is a possible indication that a recession is taking place. I set the "hours played" scale to match the Nielsen Data.

I find the color-coding of the years very useful and, to me, the relationship can be clearly seen. As housing starts decrease, video game hours player per week decrease. Since the years are color-coded, you can tie them back to the housing starts/recession trend chart above.

What's the bottom line? I agree with Chris that there is a relationship between video game hours played and the recession, but the way to get there is much more conclusive if you use economic indicators to support the theory.