Baseball Analytics Proves a Theory

By on

My best friend and I took in a game at Wrigley Field in August. As the Cubs were pounding on a weak opponent that day, we both hypothesized that at the end of the season it seems the better teams pick up the pace, while the others sink even lower. The Cubs, for example, struggled early against division foe – and doormat – Milwaukee Brewers. At the end of July, their record stood at 4-4 against Milwaukee. From August on, though, they were 10-1.

There were several alternative conjectures fueling the hypothesis, not the least of which were economics and having something to play for. And don’t forget the end-of-July trading deadline, where bad teams become sellers and good become buyers.

Not content with just the anecdotal evidence we discussed, I decided to take a quick look at some data to see if there was analytics to support our hypotheses. Fortunately, there’s no shortage of baseball data available for sabermetrics wannabees like me.

retrosheet.org shares a wealth of baseball data, including game logs for every regular season game. With the game files in hand, I decided to test the good team-bad team hypotheses as follows. I’d first load data from the games of 1960 through 2014, a full 55 years. Then, for each season, I’d determine the top four and bottom four teams in the American and National Leagues. Next, I’d aggregate the winning percentages of the top and bottom groups for each season, by league, before and after July 31. The thinking is that the top teams group would have a higher winning percentage after July than before, while the laggard group would reverse the performance. 55 seasons and two leagues would afford me 110 measurements to test each hypothesis.

I used R running under the Jupyter Notebook to perform the computations, leaning heavily on R’s data-table and ggplot2 packages. Ultimately, as with most R analyses, the results are summarized in statistical graphics.

With the simple methodology I deployed, it turns out there is evidence that “top” teams perform better after July 31 while “bottom” teams do worse. Consider graph 1, a “stripplot” that depicts differences between before and after August performance for both top and bottom team groups. In the 55 years, 70% of the time the top team group had higher winning percentages after July than before. Similarly, 66% of bottom-feeder team groups had lower winning percentages late in the season. In no decade did the hypotheses of good teams playing better late and bad teams playing worse not hold. And do the stronger differences between the American and National leagues have anything to do with the designated hitter?

Now consider graph 2, which details the “difference in differences” so loved by social scientists. D-in-D subtracts the computed bottom team differences from top team differences of the previous step. Fully 75% of the time, the top teams performed better than the bottom teams performed badly. If this were a quasi-experiment, we’d have pretty strong evidence of a treatment effect.

I’d classify this fun exercise as little more than a preliminary exploration into testing that “top” teams demonstrate enhanced performance after July, while “bottom” teams regress. There’re certainly more rigorous approaches to be considered, but this quick effort sated my curiosity to bring data to bear on the conjectures. Plus it was a lot of fun.

For long-suffering Chicago Cubs fans, the morning of October 14 is no doubt a giddy respite from a bleak 100+ year history. For at this moment we stand as victors in the playoff series against the top-winning and despised St. Louis Cardinals. The Cubs are now in the National League Championship, one step away from the World Series.