|
The Diamond Mind Projection System

Written by: Tom Tippett
Last updated: February 24, 2005
In the 1985 Baseball Abstract, Bill James introduced two very important
tools for predicting the future performance of baseball players. The first
was the Major League Equivalency, or MLE, which demonstrated that it was
possible to use AAA stats to predict how a player is most likely to perform
at the major-league level. The second was the detailed description of
his Brock2 system for projecting future performance based on past performance
and the aging process (improving in the early years, peaking in the middle,
and declining thereafter).
Since then, many others have taken these ideas and implemented projection
systems of their own, most often for the purpose of helping fantasy baseball
players prepare for their fantasy drafts. You can now buy projections
from fantasy advisor services and you can find them online and in many
books and periodicals.
When we began our work on our Projection Disks in 1998, we had every
intention of licensing projected stats from a company that was already
in the business of preparing them. There didn't seem to be much point
in reinventing the wheel when we could instead focus on the game software,
the team rosters, and manager profiles. But we quickly learned that these
projections didn't meet the special needs of a full-season simulation,
for a variety of reasons:
- we needed projections for over 1500 players, including many players
who had yet to make their major-league debuts, and most of the other
sources don't do that many
- we wanted to use stats from the majors, Japan, AAA, AA, and A ball to make
sure we have as much playing time as possible on which to base our projections.
Some projection systems ignore minor league stats or go down only as
far as AAA.
- to support a full-blown simulation, we needed to project many more
statistical categories than the others provided
- we needed to project left/right splits as well as overall totals for
all batters and pitchers
So we built our own projection system. Actually, we expanded on a system
that we originally developed in 1994. When that season came to a premature
end, the TOPPS baseball card company hired us to simulate the missing
games so they could produce CyberCards with full-season stats (real life
stats through August 11 and simulated stats for the rest of the season).
To their credit, they wanted to include prominent minor-leaguers (such as Derek
Jeter) who would have been called up in September had the season
continued. So we developed a method for projecting major-league performance
from minor-league statistics.
The Diamond Mind projection system is based on Bill James' MLE and aging
ideas, though it uses different and more advanced formulas than those
in the 1985 Baseball Abstract. We can do better because of the
explosion in available data from both the major- and minor- league level.
Here are the key elements in our system:
- we use both minor-league and major-league statistics from the past
three seasons, ensuring that virtually all players have a large amount
of playing time on which to base the projections
- we use statistics from AAA, AA and A ball, plus the Japanese leagues, adjusting them all to their major-league
equivalents. Bill James' published formulas cover AAA adjustments only,
so we created our own adjustments for lower levels and for Japan.
- all stat lines are evaluated with respect to league averages. This
does two important things. First, it makes sure that stats from hitter-friendly
leagues are suitably deflated. Second, it ensures that pitchers who
faced the DH and those who didn't are evaluated properly. The DH adds
roughly a third of a run per game, and if one doesn't take this into
account, NL pitchers would be rated better than their AL counterparts
of equal ability.
- all stat lines are adjusted for ballpark effects, including the minor-league
parks. The published Bill James methods do not take minor-league park
factors into account because those factors simply weren't available at the time.
- recent performances are weighted more heavily. What a player last
year is more important than what he did three years ago.
- performances at higher levels are weighted more heavily. What a player
did in the majors is much more important than what he did in AA.
- stat lines with more playing time are weighted more heavily. If someone
batted .375 in 24 atbats, that doesn't matter nearly as much as what
he did in 400 atbats at some other stop along the way.
- the individual league- and park-adjusted stat lines are averaged (using
the weights just discussed), then age-adjusted to produce a set of projected
stats that are league- and park-neutral
- adjustments are made to account for players whose stats are misleading
because they were playing with an injury
- additional adjustments are made for pitchers with unusually good or
bad rates of hits allowed on batted balls that stay in the park, because
recent research has shown that extreme in-play batting averages tend
not to be repeated in the future
- these neutral projections are then applied to the league and park
in which the player will compete in the coming season
- projected left/right splits are based on each
player's composite splits for the past three seasons
- the distribution of fly balls and ground balls is based on actual
ratios compiled by each player in the past three seasons
That's the essence of the system. The other projection systems we looked
at make some of these adjustments, but we're not aware of any that make
them all. And we think it's necessary to make them all in order to evaluate
past performance correctly and to support a realistic simulation of a
season.
|