June 30, 2024
How to Rank & Filter the Top 5 in Tableau in Under 60 Seconds!
October 24, 2022
#MakeoverMonday Week 43 - How have Major League Baseball Pitch Types Changed?
This week's data set was pretty simple. We had 21 baseball seasons and a column for each pitch type. Pivoting the metrics made it much easier to work with for me as I could then split the view by pitch type.
During #WatchMeViz (below), I create a trellis view, showed how to create groups, sets, set actions, sparklines, LODs, custom number formatting, creating a mobile view, and more.
Thank you for tuning in. Here's the video and below is my visualization.
September 29, 2021
#MakeoverMonday 2021 Week 39 - MLB All-Time Offensive Wins Above Replacement
May 7, 2019
#MakeoverMonday: Top 10 Major League Baseball Home Run Hitters
31 years of MLB Home Runs!— Will Sutton (@WJSutton12) May 6, 2019
I've seen plenty of these charts lately, so for #MakeoverMonday I wanted to learn how to make my own. Feedback welcome. Thanks, @TriMyData & @VizWizBI R code available here: https://t.co/8ipKQOzmrO #Rstats pic.twitter.com/GreSt6t1b0
As I posted last week, Sophie Sparkes introduced The Data School and me to Flourish. Flourish makes it super simple to create animated visualizations with tons of customization options. Given that Tableau doesn't support animations in the browser, this is a great alternative. Flourish provides an example, you import your data, do a bit of customization and voila! You have an animated viz.
The data needed to be structured with a column for each season, so I prepped the data in Alteryx and I included all seasons from 1912-2018. I then filtered down to players with 250+ career home runs (to make the list manageable).
And here's my animated viz of the top 10 home run hitters of all-time.
May 6, 2019
#MakeoverMonday: Major League Baseball's Most Cost Effective Players
What works well?
- The title and subtitle explain what the viz is about.
- Dividing the viz into two sections by using different background colors on the scatter plots
- Consistent scales for the salaries across the charts for each section
- Using gridlines to help the audience understand the approximate values of each point
- Only labeling the type of stat once by putting the label between the players and teams charts
What could be improved?
- There's no data source listed.
- I have no idea why these players or team are highlighted; an explanation is needed. At first, I thought it was highlighting the most effective player/team, but it's not (at least that's what I see).
- The logos are meaningless for people that aren't familiar with the teams.
- What does the big logo on the upper right represent? Is that the author?
- The data should be filtered to players that meet certain criteria, like at bats in a season. This would then filter out many players near zero.
Based on Ryan's explanation, I decided to use OBP as my proxy for batter effectiveness (y-axis). For the x-axis, I wanted to use salary for comparison. However, the data does not adjust salaries for inflation, so a salary in 1985 is not listed in 2016 value. Instead, I came up with a way to normalize the data across all of the seasons.
I created a calculation that compares a player's salary to that of the average salary of the entire league for each season. I made this a percent difference so that the data would then be normalized. Therefore, a player that was 10% above a 1985 salary would be comparable to players that was 10% above a 2016 salary.
Here are my calculations:
- Season average salary: { FIXED [Season] : AVG([Salary]) }
- △ to Season Average Salary: (AVG([Salary]) - SUM([Season avg Salary])) / SUM([Season avg Salary])
- First, I applied some filters to only include what I deemed "eligible" players. These are noted at the bottom of the viz.
- Now that I have the x-axis (salary variance from season average) and the y-axis (OBP), I created a scatter plot and added a point for each player for each season.
- I added reference lines for the average of each axis.
- The players on the upper left are the most cost efficient players. That led me to a quadrant chart, but I only wanted to highlight the most cost effective. I created a calculation to determine the points in that quadrant and place it on the color shelf.
- The problem now was that it was basically impossible to find a player in the viz. I thought about using a set action to drill in to a player, but that loses all of the context of the other players. Therefore, I create a parameter to allow the user to highlight a player and I show that players as a connected scatter plot.
- Players tend to be more cost effective earlier in their careers. That makes sense since they are on rookie contracts for the first few years of the career.
- Once players sign their first big contract, they tend to either move to the upper right (high OBP, high relative salary) or the bottom right (low OBP, high relative salary).
- Some players can sustain that for the rest of of their careers, but that's rare. Typically it's the superstars that follow this pattern (like Barry Bonds or Chipper Jones).
- For many of the other players, as they approach the end of their career, they tend to move either to the lower right (high relative salary, low OBP) or the lower left (low relative salary, low OBP). Neither of these are particularly good for the team.
And here's my final product. I had never thought of combining a scatter plot and a connected scatter plot before. I'm quite pleased with how this turned out.
October 31, 2018
Analyzing Pitcher Performance With Density Heatmaps
To give it a test, I downloaded every pitch for Clayton Kershaw and Justin Verlander (two of the best pitchers in Major League Baseball) from 2008-2018 from the great stats website Baseball Savant. Every time I look at baseball data, I'm amazed at the detail of the stats covered; the data far exceeds anything that is covered in other sports.
After downloading the data, I built the small multiples view below for each pitcher so that I could see their progression through the years. Click on the images for the interactive versions. I love how the data shows me how each pitcher has gotten better with their "misses" through their careers. For example, when they throw sliders for balls, they now tend to miss below the strike zone. This is a great sign that they have command of their pitches and are less likely to miss in an area where the batter can take advantage.
The density heatmap feature will most likely be used by most people on maps, which makes sense, but consider looking at it as an alternative whenever you need to plot x/y coordinates and have lots of points to display.
October 22, 2018
Makeover Monday: Historical Major League Baseball Beer Prices
What works well?
- The title is clear and tells the reader what the data is about.
- The user can sort the data based on their preference.
- The placement of the sort options encourages interaction.
- The rank helps show where a team falls amongst the league.
- The color of the bars goes with the beer theme.
What could be improved?
- The data source is not listed.
- Having so labels on the end of every bar makes the viz too busy.
- The beer mug icons are completely unnecessary.
- The font looks very small.
What did I do?
- The new data set has data for 2013-2018 (except 2017), so I wanted to make sure I looked at the data over time.
- Made the title more descriptive so that the user (hopefully) understands what the line represents.
- I borrowed several techniques I learned from Workout Wednesday week 41:
- Shading those that have increased prices vs. 2013 with a red background
- Labeling the top middle with the team and the latest price
- Labeling the end of each line; in WW the labels were all placed on the lower-right of each pane, but I didn't like how it looked in this case
- Ordered the teams from highest to lowest based on the latest price
- Organized the team in a trellis format so they fit nicely into a 6x5 grid
- Included the data source. my name, and the inspiration for the design
And here's my Makeover Monday week 41. Click on the image for the interactive version. I can't wait to see what everyone creates at MM Live!
February 28, 2018
Workout Wednesday: The MLB Diversity Gap
Suddenly a possible solution popped into my head (I figured it out by hovering again and again over his viz). I'm not going to give away any spoilers. Here's my solution if you get stuck...but give it a solid effort before you look at someone else's solution.
Good luck!
February 5, 2018
Makeover Monday: Did the rise of Latino players signal the decline of African American players?
What works well?
- The x-axis is labeled every 10 years starting with the first year in the data set. This works well since there are 70 years in the data set.
- Labeling the y-axis for every 20% keeps that axis from getting too cluttered.
- The title is straight to the point.
- Placing the legend in the middle of the graph allows the chart to use the entire space.
- Stacking "White" on the bottom is a good choice since it's always the largest segment.
What could be improved?
- As it's stacked bars, it's harder than necessary to determine the percentage that Black and Latino comprise since their position is influenced by the colors below them.
- The bars appear to be of differing widths and that makes it look a bit blurry to me.
- An area chart would be much easier to understand.
- Consider more distinct color choices, particularly for White and Black.
- The visualization doesn't flow well with the accompanying story, which was about the increase in blacks and the more recent decrease. There's no indicator to the audience that this is what the chart is about.
What did I do?
March 1, 2017
Workout Wednesday: World Series Game 7 - Pitch-By-Pitch
I love this graphic! So much information packed in a compact space. But I couldn't find the data anywhere. What I decided to do instead was look at game 7 of the 2016 World Series. It's talked about as one of the greatest games of all time, so I thought I'd create something similar, but on a pitch-by-pitch basis.
I was able to find the data on Brooks Baseball. I then imported it into Google Sheets for each pitcher and then unioned them all in Tableau. I'd recommend you just use the TDE I've created this week as I've removed all of the extra columns you won't need. You can download it here.
Here are the requirements:
- Each inning should be an individual row
- Within each inning, show every pitch from left to right
- The home team (Cleveland Indians) pitched first, so their bars should point up. Followed by the visiting team (Chicago Cubs), which should point down.
- Each pitch is color coded based on the outcome - Ball, Strike, or In Play
- The final outcome of each batter should be displayed as a shape and color coded. See the subtitle in my viz. Note that the open circle is filled in the middle with white so that the bar can't be seen through it.
- Match my tooltips
- Include the data source at the bottom
- Match my title and subtitle
- Viz must be a single worksheet
- Viz should be 450x800
- Optional: Match my font, Rubik in this case.
November 30, 2016
How Many Times Have Teams Been to the World Series?
Yesterday I wrote about how much I liked a World Series viz created by Business Insider. One of my favourite ways to learn Tableau, and one I highly recommend to everyone, is to reproduce work that inspires me.
What was most fun about creating this viz is that it’s built completely with ASCII squares. Yes, I use a measure for the axis, but the measure is merely a placeholder. I learned a lot creating this viz this way; basically you can easily create a unit chart without having to densify the data by using a simple calculation that trims the ASCII squares instead. I also included bar charts in tooltips.
Download the workbook to see how I did it. In the meantime, here’s my take on the frequency of teams appearing in the World Series.
January 4, 2016
Makeover Monday: Bryce Harper Had the “Most Valuable” Season of Any MLB Player Since 2002

This week we looked at this table from FiveThirtyEight. The main data point in this table and the article is the Surplus Value column. Essentially, FiveThirtyEight uses WAR as a way to calculate a player's value and then compares that to what they were actually paid.
There's nothing particularly terrible about this table. It serves its main purpose: looking up facts. But what is does lack is a simple way to make comparisons between the players and more quickly show the differences between them. Ideally, I want to answer the question: How great was the 2015 season from Bryce Harper?
With this in mind, I created this visualisation. Click on the image to interact, as I have included some highlight actions. However, the view itself can stand alone without the interactivity as well.
November 19, 2015
Philadelphia Has the Worst Sports Teams in North America
How bad are Philly sports teams?
- The Eagles are more or less unwatchable. They’re inventing new ways to lose.
- The 76ers have lost 20+ games in a row. That’s really, really hard to do in the NBA.
- The Flyers couldn’t score if there was no goalie in the opposing net.
- The Phillies…well, they did their best to be one of the worst baseball teams of all-time.
I took the ugly table of numbers from the article and built the interactive dashboard you see below, confirming my worst fears.
This merely confirms the misery that is being a Philadelphia sports fan.
September 28, 2015
Makeover Monday: How Much More Valuable are NFL Franchises than Other Leagues?
Business Insider’s chart is lacking context, so in today’s makeover, I walk you through a few simple methods for adding context to a simple bar chart.
April 28, 2014
Makeover Monday: What a beer will cost you at every Major League Baseball stadium
Anyone that goes to a professional sporting event in the US knows how ridiculously expensive it is to enjoy some frosty goodness at the game. Cork Gaines of Chart of the Day created this bar chart to show the most expensive beers in Major League Baseball stadiums.

The basic problems:
- As always, Cork has sorted the chart in the wrong order. Sorting should be based on what you want to emphasize. In this case, the story is about the most expensive beers, so the bars should be sorted in descending order.
- A horizontal bar chart would be much easier to read.
- Since the beers are not all the same size, it might make more sense to show an alternative view of cost per ounce.
Here's my alternative, created with Tableau for Mac.
I've not only addressed the issues I outlined, but I've also made it interactive. You can now answer more questions. Perhaps you're more interested in where you can find the cheapest beer or the best deal (per ounce). This is a much more informative version than Cork's.
Have a better way to display this data? Download the data here and the workbook here and leave a link in the comments.
May 16, 2012
Is drug testing working in baseball? An interactive analysis.
Cork Gaines wrote about the HR trend in baseball since testing started for performance enhancing drugs. He presented a chart of the trend (surprising effective given his past charts), but he never answered his own question….is testing working?
One way to determine the answer is through comparisons to other statistics.
I downloaded the season averages across both leagues and MLB in total from baseball-reference and built this interactive analysis. The stats are order by batting stats then pitching stats.
This viz allows you to compare home runs to many other statistics through the selectors at the top right. In addition you can:
- View any two statistics to look for trends by choosing a primary measure and a comparison
- Filter the time frame to all years, the pre-testing era, and the testing era (1993+)
- Filter the leagues to focus your analysis
- Click on a league at the bottom to highlight that league
In this initial view of HR vs. ERA, I see a couple of things:
- HR are on a slow descent in the testing era, especially since 2000
- ERA is in a similar decline, possibly indicating that improved pitches has had as much of an impact as testing
- Batting Average has remained flat. This means that the reduction in HR has not impacted BA.
- Teams are simply scoring fewer runs, likely due to the reduction in long balls
- The trend in complete games is despicable
What do you see? Play around with the different stats and see if you can draw any conclusions.
April 4, 2012
Let your Umpire Ejection Fantasy League preparations begin!
If you are interested in fantasy baseball, but want a slightly different take on things, I have just the game for you. Join the Umpire Ejection Fantasy League. I stumbled across this league as I searched for managerial ejection data, but found this umpire-specific data much more interesting. You can download the data here.
Living here in Atlanta for the past 15 years, I’m more than aware of the historical significance of Bobby Cox’s ejections (he’s the all-time leader…or is it last place?), but I wanted to know which umpires draw the most ire from managers today (Bobby retired after the 2010 season).
To help you with your UEFL draft, take a tour of this interactive viz. There are filters on the right side of each sheet to assist you with your own analysis.
The Ejections Summary gives a quick overview of:
- The spread of ejections across innings – not surprisingly most ejections occur towards the end of the game
- How umpires perform as a whole at the different segments of the game – they’re correct more often at the end of the game as well, leading directly to more ejections
- The top 5 reasons for ejections – arguing balls and strikes is an automatic ejections, so there’s no surprise it’s #1
The 2nd sheet, Who to Argue With & When, helps you isolate the specific time when you are most likely to benefit (or not) from an argument. In particular, I like the bar chart on the bottom right. This chart tells you the best umpire and time to get ejected if you want to turn the game from a losing position into a winning outcome.
The last sheet, Which Umpires Eject the Most, is a simple list of the umpires most likely to eject someone and the managers most likely to get ejected. Click on any manager or umpire to see who they get in the most arguments with.
Good luck in your draft!
March 4, 2012
Are teams benefiting from relievers pitching less? A visual analysis.
If you love baseball and particularly if you love baseball stats, you need to follow FanGraphs. The depth of the analysis is simple incredible, but one of the things I find lacking is visual analysis. There are often tables and some rudimentary charts, but I think the writing could be enhanced by adding some viz to the terrific explanations of the numbers.
Recently they wrote about the use of relief pitchers in Major League Baseball and whether adding depth to the bullpen resulted in a strong ROI. In this post, I’m going to quote directly from the article, but all of the charts and graphs that supplement the words were created by me.
All of the data that I used can be found here and the Tableau workbook I used to created the charts can be found here.
Batters Faced per Game
“The change in bullpen usage is the biggest difference in the sport now compared to 30 years ago.”
“Despite the fact that modern bullpen roles have been well established for quite a while, the dwindling rate of batters faced per appearance shows no signs of slowing down. While the drop from 1982-1991 was the most extreme, the last two decades have each seen the league shed an additional half a batter per reliever appearance, and given that we’ve seen teams now expand to carrying 13 pitchers at times, there seems to be no end in sight to this trend.”
The article only provides a table and if the writer did not include the analysis in words, there’s no way anyone would have ever been able to identify this trend by scanning their eyes across a list of number.
The chart above in broken down by decade by year and includes three methods for analyzing batters faced per game (BF/G).
- BF/G (top lines) – This is simply a trend of batters faced per game over the last 30 years. As the writer points out, the drop in the first decade is the most extreme (1.6 BF/G decline), but the last two decades have each declined more than a half batter (0.6 and 0.5 BF/G respectively).
- BF/G vs. 1982 (middle lines) – I wanted to understand how drastically the number of batters faced per relief appearance has really changed from 30 yrs ago. The numbers and trend are truly staggering. 19.8% decline by 1991 and additional 12.3% decline by 2001 and another 7.5% decline through 2011. That all adds up to an almost 40% decline.
- BF/G vs. Start of Decade (bottom lines) – This is similar to #2 except the calculation “resets” each decade. The idea here is to measure how much the BF/G rate has changed within the decade. If the –11% trend from the last two decades continues, you can expect relief pitchers to be facing less than four batters per appearance by 2021. So basically, every pitcher would be treated like a closer.
Wow! Bullpen strategy sure has changed!
Walk and Home Run Rates
“With pitchers facing fewer batters, you’d expect them to be able to throw harder and exploit platoon advantages for better results overall. The trade-off should be more quality for less quantity.”
“Looking at the numbers, we don’t really see much evidence that the modern bullpen has helped relievers perform better at all.”
- “Over the last thirty years, walk rates by relievers are essentially unchanged. They went up a bit when the home run barrage took over the late-1990s, but have gone back down as home runs have become less common.” (top two lines)
- “The ratio of walks to home runs is pretty steady and consistent over the last thirty years, and there’s certainly no evidence that the modern day bullpen has helped pitchers avoid the base on balls.” (bottom lines)
Strikeout Rates
“On the other hand, strikeout rate has skyrocketed, increasing by 40% since 1982. This would seem to support the idea that relievers can be more effective in shorter stints, and that playing the match-ups can help prevent run scoring.”
I have broken down the strikeout rates similarly to the BF/G rates.
- K% (top lines) – This is simply a trend of strikeout percentage over the last 30 years. K% has been on a steady increase of about 2-3% over each of the last three decades.
- K% vs. 1982 (middle lines) – As the writer noted, the strikeout rate has increased 40% since 1982, with the biggest increase from 1992-2001 of 18.4%. But he also goes on to explain this:
“While (starting pitchers’) strikeout rate has been raising at the same time that the modern bullpen has been evolving, this seems to be a case where correlation is not causation. If starters are seeing the same rise in strikeout rate, that points to a more fundamental shift among hitters – more sluggers swinging for the fences, the rise in acceptance of the strikeout as just another out among organizations – rather than a specific benefit being given to relievers from their new roles.”
- K% vs. Start of Decade (bottom lines) – Again, this is similar to #2 except the calculation “resets” each decade. This provides stronger evidence of the “swinging for the fences” effect of the late-1990s; strikeout rates increased 19% from 1992-2001.
BF/G vs. K%
We’ve seen the write discuss BF/G and K% rates, but do these two have a relationship? When I look at relationships between two stats, I like to look at them to ways: (1) a dual axis line chart and (2) a scatter plot.
The strikeout rate for relievers is clearly correlated to batters faced per appearance. As BF/G goes down, K% goes up. This is clear and easy to understand in both of these charts. This would have been a nice nugget for the writer to include.
BABIP and HR Rates
“Likewise, it doesn’t appear that relievers are really generating much of a benefit when hitters do put the bat on the ball.”
The write makes a few notes about the stats, but I don’t agree completely.
- “Home run rates have risen at a similar rate as what starting pitchers have experienced.” Ok, nothing to argue with here. I have to take his word for it since I don’t have the data for starting pitchers.
- “Batting average on balls in play has increased significantly over the years.” I subtly disagree here. BABIP has only increased 11 points or 4%. Is that significant? I don’t think so.
Let’s extend the analysis a bit farther. Let’s look again at the relationship between the two stats to find correlations.
When looking at the two lines together, there isn’t a clear correlation, like the obvious inverse relationship between BF/G and K%. But what is interesting is when you plot them on a scatter plot. I added the averages across all 30 years to each axis to make a nice quadrant chart. The R-Squared is only 0.688, but what sticks out to me is how, for the most part, the years within each decade cluster together nicely (for the most part).
- Nine of ten years from 1982-1991 were below average for both BABIP and HR/9, with the tenth year also below the average BABIP.
- Eight of ten years from 1992-2001 were above the average BABIP with seven of those years also above the average HR/9 (remember, there was a significant increase in HRs in the late-1990s). Note how much higher the BABIP and HR/9 rates are for the years above average.
- From 2002-2011, nine of ten years were above average for both BABIP and HR/9, but not nearly as far above average as 1992-2011. Notably, the HR/9 rate fell from 1.03 in 2006 to 0.85 in 2011, a 17.5% decrease (this can be seen in the line chart).
ERA- and FIP-
“If you look at (ERA- and FIP-), there’s just no evidence that bullpens are preventing runs at a better rate now than they were before the current roster construction norms came along. Any improvements in quality of performance by the elite relievers have been offset by the fact that more innings are now being given to inferior arms, so the trade-off has essentially resulted in a change of no real benefit.”
If you truly trust the reader, then you’re only choice is to take him at face value here. Me though, I like to “see” the data. I’ve done a couple things here to quantify the data, but first, two notes.
- For ERA- and FIP-, values below 100% are better than the league average. The lower the number, the better. Think of them like an index. If the ERA- is 95%, then that means it’s 5% better than the league average.
- Notice that the axes range from 80-120%. This was done to emphasize the lack of significant year to year variances.
For this particular chart I have:
- Added a reference line at 100% to remind us that this is the average
- Synchronized the axes so that you can see how ERA- and FIP- compare to each other
- Added color bands below and above the average to indicate levels of goodness and badness. That is, the darker the green, the better and the dark the tan, the worse.
Now, after having “seen” the data, I agree with the writer that “there’s just no evidence that bullpens are preventing runs at a better rate now than they were before the current roster construction norms came along.”
So what do you think? Does these charts and graphs make it easier to interpret the stats? Do they help tell the story more effectively?