Data Viz Done Right

March 1, 2017

Workout Wednesday: World Series Game 7 - Pitch-By-Pitch

16 comments
When I set out to do this week's workout, I really wanted to recreate this amazing World Cup infographic:


I love this graphic! So much information packed in a compact space. But I couldn't find the data anywhere. What I decided to do instead was look at game 7 of the 2016 World Series. It's talked about as one of the greatest games of all time, so I thought I'd create something similar, but on a pitch-by-pitch basis.

I was able to find the data on Brooks Baseball. I then imported it into Google Sheets for each pitcher and then unioned them all in Tableau. I'd recommend you just use the TDE I've created this week as I've removed all of the extra columns you won't need. You can download it here.

Here are the requirements:

  1. Each inning should be an individual row
  2. Within each inning, show every pitch from left to right
  3. The home team (Cleveland Indians) pitched first, so their bars should point up. Followed by the visiting team (Chicago Cubs), which should point down.
  4. Each pitch is color coded based on the outcome - Ball, Strike, or In Play
  5. The final outcome of each batter should be displayed as a shape and color coded. See the subtitle in my viz. Note that the open circle is filled in the middle with white so that the bar can't be seen through it.
  6. Match my tooltips
  7. Include the data source at the bottom
  8. Match my title and subtitle
  9. Viz must be a single worksheet
  10. Viz should be 450x800
  11. Optional: Match my font, Rubik in this case.

That's it! This shouldn't be as challenging as some of the past challenges I've posted. For me, this was more about trying to replicate a viz I liked. If you don't understand the fields or get stuck, ask for help. Good luck!

16 comments :

  1. This is a nice, compact summary graphic. I have one suggestion that would aid interpretability: Add 'out' marks for strikeouts. It's hard to pick out where strikeouts occur, and they can be very significant! Another idea, that may be too much information in one graphic: show which bases are occupied for each pitch.

    ReplyDelete
    Replies
    1. The data provided doesn't include which bases are occupied per pitch. As for the strikeouts, yes, I could have included that, but it would make the visual too cluttered in my opinion. Also, there's no empirical evidence that a strikeout has any more significance on the outcome of a game than any other out. In fact, I would argue a 1-pitch out is way more important because it saves the pitcher's pitch count.

      Delete
  2. Sir, I'm using public version. Is there any provision to get the data in excel version? Thanks in advanced.

    Mahfooj

    ReplyDelete
    Replies
    1. You can get it here (https://1drv.ms/x/s!AhZVJtXF2-tD1it_AjTT8cHy9ZBQ) but you will need to union all of the files and remove a bunch of fields. I'd recommend you download my workbook, delete the sheet I created, delete any calc fields I created, then start. It'd be like a blank workbook then.

      Delete
    2. Thank you for your prompt reply. Sir, I can download the workbook. But as you know we can not open a packaged workbook from public version. So, I'll download the excel file and union all the sheets in tableau. Thank again.

      Delete
  3. Well that was a fun ten minutes. I thought I was going crazy. Damn aliases!
    Right, now I can move forward...

    ReplyDelete
  4. I'm stuck on how to get them to start from "zero" on the vertical axis in each inning and how to "hide" Balls and Strikes in the shapes field.

    ReplyDelete
    Replies
    1. Finio, you're kind of answering your own question. To make them go opposite directions, the values have to go in opposite directions. There are a few different ways to do that.

      As for the shapes, try to right click on the ones you don't want on the shapes shelf. There's an option there you might not know about.

      Delete
    2. Thanks Andy. I was referring on how to start my lollipops from the beginning of each inning (row), I already figured it out. The shapes are driving me crazy I can't find how to hide them. It's 2:35am in this part of the world so I'm calling it off, tomorrow I'll give it another try. See ya!

      Delete
  5. It looks like you have a shape set called 'Icons' that I don't see by default. Is there somewhere we can download that from?

    ReplyDelete
  6. One slight variation I would suggest would be on the 'Count' tooltip. The way the data reads it is technically the count before the pitch actually occurred. Most people would typically want to see the count after the pitch. The formula below would take care of that. Thanks for all your work on these. I am using them as a training tool for my team.

    if [Outcome Type] = 'Ball' then
    str([Balls]+1) + '-' + str([Strikes])
    ELSEif [Outcome Type] = 'Strike' then
    str([Balls]) + '-' + str([Strikes]+1)
    ELSE
    str([Balls]) + '-' + str([Strikes])
    END

    ReplyDelete
  7. I'm having a hard time getting the bars to be evenly spaced. I'm trying to use the "Id" column, but the values in that column are not consistently spaced by 1. Is there another column that actually tracks the number of pitches accurately that I've missed, or how can we go about making such a column?

    ReplyDelete
    Replies
    1. Try putting AVG([Number of Records]) on the Size shelf. I also created a calculation to give me the unique pitch per inning. Think about how you can rank them to get them in order.

      Delete
    2. Thanks, I'm just not getting it. What I've got thus far is a calculation that essentially resets the "Id" column to 0 for the start of each inning, ultimately providing unique values for each pitch in each inning. However, I'm having a tough time when I try to rank anything. I've also tried just ranking on the raw "Id" column by inning, as those are all unique values, but to no avail. Any other teasers you can throw my way? :)

      Delete
    3. Correction. I did just get it to work, though I'm not entirely sure why it worked. Here's what I did:
      1) Put the "Id" variable as a discrete attribute for the columns
      2) Add a rank table calculation for "Id"
      3) Edit the table calculation to:
      - rank ascending
      - use competition method
      - compute using specific dimensions (selected all dimensions except "Inning," which is the variable for my rows)

      I'm just not sure why I had to select to rank using all the dimensions that are currently used in my plot. My intuition tells me that I should have had to select the inverse (i.e., just compute the rank using "Inning").

      Thanks.

      Delete
    4. Robert, I think the root of your problem was that you had to create this complex table calc just to make your ID field unique. If you download my workbook, you'll see that I created a calculation that makes it unique through a bit of simple multiplication. I then made this field continuous on the columns.

      Delete