Launch, grow, and unlock your career in data

January 16, 2012

Arsenal continues to stumble. Have they become too dependent on Van Persie? The stats don’t lie!

No comments

Another weekend has passed by and another totally lackluster effort from my beloved Gunners.  I knew they were in deep, deep trouble when they scored so early in the game and of course it was Robin Van Persie with another classy finish.  This makes three games I can think of where Arsenal scored really early and couldn’t hold on.  Surely there were other occasions as well.

Miraculously someone other than RVP scored later in the game (Theo Walcott), but when the final whistle blew and there were few chances created, I began to ponder what could happen if RVP gets hurt (some might say when instead of if).  Arsenal’s dependency on RVP has been staggering this season.

I created the viz below based on club and squad stats from ESPN Soccernet for EPL games. 

Here’s how I designed it and what I’ve observed:

  1. Anything in red = Robin Van Persie
  2. The dotted lines represent the averages over the 11 seasons
  3. There’s no stat that measures effort, desire or heart.  Thankfully, I guess, or the Gunners would be at the bottom of that table.  They play so gutlessly at times!
  4. The primary team stats are shown on the upper left for the last 11 EPL campaigns.  This provides a nice little recipe for what you need to do to ensure you tumble right down the table and out of the Champions League qualification spots.

    Fewer Goals For + More Goals Against = Lower Goal Differential = Fewer Points per Game = More Losses = Tumble Down the Table = No Champions League!

    In other words, if you can’t score and you can’t defend, you won’t win.  Seems pretty simple to me.
  5. Arsenal is 21 games into the 2011-12 season and only one player has more than five goals.  All other years had at least four, with the exception of 2003-04, but Thierry Henry had an exceptional season, scoring 30 goals or 41.1% of the teams’ goals.

    Click on a year and you’ll be taken to another page that list the stats for they players that scored five or more goals that season. 
  6. I charted scoring rate (goals per game) vs. % of total team goals.  This give us a way to measure a player’s total scoring importance to the team.  Guess who the outlier is at the way upper-right.  Hover over any point for more details.
  7. The bottom three charts show the number of goals, % of team goals and scoring rate for the top scorer for each season. Again, hover over any bar for more details.

As I had thought, RVP is scoring a never before seen percentage of the team goals (45%). What more proof could Wenger need that he must bring in some reinforcements?

You might say that Arsenal depended heavily on TH14 during their glory years as well, but he had a much better supporting cast, as evidenced by the number of players with five or more goals.  Plus Henry was nearly always fit, while RVP has proven to be very brittle over his tenure.

You can also look at GAA during those same years.  Arsenal was truly a “team” back then, now…not so much!

What Arsenal needs to do to turn things around is pretty simple: score more goals and defend better.  Personally, I don’t think the backline is as bad as it appears.  I blame the outside midfielders for lack of effort and not tracking back to support the defense.  If they’re going to be that lazy, they better at least score some goals. 

What Arsenal needs is another consistent goal scoring threat.  While I love Henry, I don’t think he’s the answer, especially since he’s only on loan.  A long-term solution is needed.  I fear without another formidable striker, the Gunners could easily finish outside the, gulp, top 8!

January 6, 2012

Information is Beautiful? Only if you like a totally useless mess of nothingness


I have a problem with David McCandless of Information is Beautiful.  From his own website he says:

A passion of mine is visualizing information – facts, data, ideas, subjects, issues, statistics, questions – all with the minimum of words.

I’m interested in how designed information can help us understand the world, cut through BS and reveal the hidden connections, patterns and stories underneath. Or, failing that, it can just look cool!

So let’s look at a couple of these statements in the context of his latest infographic


Has David met any of his own criteria?  Let’s check.

  1. Minimum of words – No! All this infographic contains is words…horrible!
  2. Facts – Maybe. If you consider a bunch of words facts, then I guess he meets this criteria.
  3. Data – No, not even close
  4. Ideas – No, nothing that I can see
  5. Can help us understand – No, not for me
  6. Reveal the hidden connections, patterns and stories – No, absolutely not!
  7. It can just look cool – Seriously?

To summarize, this is one of the worst infographics I’ve EVER seen.  Shouldn’t we expect better from an “expert”?  It’s impossible to gleam even the slightest bit of insight or data outside two things: nationality and that the album falls somewhere in the top 21. 

How am I supposed to deduce the rank?  The size?  The width? The font? The location?  I have no idea and no one else will likely know either.  Throw me a bone and at least give me some type of instructions for making sense of this mess.

And another thing.  Why the top 21?  At first, I thought the list only went to 21, but lists the top 30.  And four albums are tied for 20th.  The top 21 makes no sense whatsoever!  Maybe David likes blackjack?

Let’s consider the definition of an infographic from Wikipedia (not the defining source, I know).  I’ve bolded what I consider the key points.

Information graphics or infographics are graphic visual representations of information, data or knowledge. These graphics present complex information quickly and clearly.

Did David meet any of these criteria?

  1. Graphic visual representation – No, he merely created a wordle.
  2. Present complex information quickly and clearly – No. I can’t garner any insights quickly and clearly.  Can you?

Ok, you can tell that I think his work is a totally useless mess.  But how would I present the data to allow for quick and clear insights?  I’d use Tableau.

My viz may not be an “infographic” in its purest sense since it doesn’t have all of the cute pictures, figurines, and unnecessary clutter, but I have met David’s criteria, the criteria for an infographic, plus much more.

  1. Graphic visual representation – Check
  2. Present complex information quickly and clearly – Check
  3. Minimum of words – Check
  4. Facts – Check
  5. Data – Check
  6. Ideas – Check 
  7. Can help us understand – Check
  8. Reveal the hidden connections, patterns and stories – Check
  9. It can just look cool – Check

But Tableau let’s you go above and beyond.

  • Interact with the filters and highlighting
  • Change the Points Type
  • Filter and/or highlight nationalities
  • The data is ALWAYS ranked based on the selections you make

I’m sick and tired and tired and sick of seeing useless infographics like this one.  I won’t keep my finger crossed that they’re going away anytime soon though.

January 5, 2012

Tableau Tip: Calculating the distance between two points

I’m working on a project that requires me to calculate the distance between stores in order to plan resource allocation.  Pretty cool stuff that could have a huge impact if it pans out.
Naturally I want to do this in Tableau, but I since I hadn’t done this before I turned to the Tableau Forum and found this great step-by-step tutorial

TIP: For those of you that may be new to Tableau, I would highly recommend that you use the forum if you’re approaching something you’ve never done before.  You’ll often find that someone has already done something similar and it’ll save you a lot of time versus re-inventing the work yourself.

Tableau has outlined this as a 22 step process, but they go through it in extreme detail.  Note that your data source MUST have latitude and longitude available in this example.  Here’s a slimmed down version for you (some of this is taken directly from the article):
  1. Connect to your data source, select Single Table, then select Custom SQL
  2. Create an inner join on a second instance of the table where the locations from the two instances are not equal (refer to the SQL script in the detailed instructions)
  3. Click OK, then Extract the data.  For me, I’m looking at 7271 stores, so the self join will result in about 50M records.  Leverage the power of Tableau’s data extracts!
  4. Double-click your latitude and longitude fields to start building the map.  You may need to set the geographic role of the fields if you don’t have them named Latitude and Longitude.
  5. Use the Great Circle Distance formula by creating a calculated field named Distance (or the name of your choice)

    The formula is:
    3959 * ACOS
    SIN(RADIANS([Lat])) * SIN(RADIANS([Lat2])) +
    COS(RADIANS([Lat])) * COS(RADIANS([Lat2])) * COS(RADIANS([Long2]) - RADIANS([Long]))

    NOTE: To calculate miles, use 3959 as the first number.  For kilometers, use 6371 as the first number (thanks to Shawn Wallwork for the comment)

  6. On the Marks card, in the list, select Line. This will create lines between all locations on the map.  Start with only a couple locations if you have a huge dataset, otherwise it could take some time to draw all of the lines.

From this point, you can perform tons of different analysis.  One example would be to drag Distance onto the Color shelf and the Label shelf on the Marks card to color code and label the distances between each point on the map. 

Think about how you could blend other data source.  For me, I might have home zip codes for employees in another data source and I want to see all stores within a certain radius of each employee.  The possibilities are almost endless! 

You can find a sample workbook for how all of this is done here.  I know I’ll be using this technique over and over again.