DIGGING DATA: Scouting and Statlines

A recent Twitter discussion we were in revolved around just how much weight an owner can put on Rookie or Short-Season statistical performances. Any scout will tell you that you cannot rely on stat lines to evaluate a player’s performance. There are many pitfalls in the stats. Small sample size mirages. Wide gaps in run-scoring environments, official scoring decisions, and even playing surfaces within leagues. And even bigger gaps in talent, especially in the Rookie leagues where teams can struggle to find competent pitchers and catchers to put on the field.

John Eshleman, senior evaluator for 2080 Baseball, coined a new phrase to keep in mind when looking at prospects at lower levels: “Stat the Scoutline”:

But why can’t we just rely on the stats? Well, our data guru Chip Bourne ran some data yesterday and found some interesting trends:

First, Chip analyzed a five-year sample (2008-2012) of MiLB seasons. Using wRC+ (more info on wRC+ available on our Farming 101 page), he looked at the number of individual player seasons in which a player posted a 150 or better wRC+. From Rookie ball, there were 230 individual seasons of 150 or better wRC+. Of those 230 player seasons, only 16 players went on to post at least one MLB season above 2.0 WAR. You can see that, as a player progresses to A+ and AA ball, the number of big wRC+ performances goes down, and a big wRC+ season becomes more indicative of future success.

Digging deeper, Chip ran a regression on a sample of MiLB seasons from 2008-2018, for every season with >= 50 ABs in that time frame. He compared a hitters wRC+ at a particular level to his later wRC+ production at higher MiLB levels. In general, as the r-squared regression % goes up, the linear correlation and predictive value between the two results are stronger. For our models (analog systems with human behavior as the input), anything under 10% correlation is noise. Once we get a result over 10%, we start taking notice:

You can see that a player’s wRC+ at Rookie ball doesn’t correlate well to his wRC+ at any other stop in his MiLB career. In other words, according to this research, Rookie ball stats are just noise. Also, notice that as a player advances levels, the correlations tend to improve. So, a player’s performance at AA has an 11% positive correlation to his performance at AAA.

Another thing to note is that when we get a forward-looking correlation greater than 10%, it only lasts for one season. In other words, a batter’s performance in A-Adv ball can tell you something about how he will perform at AA (15.0%) but is just noise when looking ahead to how he will perform at AAA (7.4%)

Basically, the data says don’t buy into stats too much, especially at the lower levels. While the stat lines are fun to follow, most of what you’re watching is not predictive of future success. Rely on trusted scouting reports for the lower levels. Start paying more attention to stats at A-Advanced ball. And reset your expectations on a player each year. A single good or bad statistical season doesn’t necessarily show you the future trajectory of his career.

Leave a Reply

Your email address will not be published. Required fields are marked *