Evaluating Defense 
By Tom Tippett
December 5, 2002
Some people argue that it's impossible to measure the defensive performance
of baseball players because the statistics available for that purpose
are woefully inadequate. If you're talking about traditional fielding
stats -- games, putouts, assists, errors, double plays -- I wouldn't go
so far as to say that it's impossible, but I would agree that it's not
easy.
In this article, we'll look at those traditional fielding stats and talk
about what you can and cannot learn from them. We'll look at more modern
fielding statistics such as Pete Palmer's Fielding Runs, the zone ratings
from STATS Inc., and Bill James Win Shares. As the providers of a computer
baseball game, one of our ongoing tasks is rating players in all phases
of the game, including defense, and we'll talk about how we use detailed
play-by-play data from STATS to improve our understanding.
Even with these advances, evaluating defense is not an exact science.
If you're a the-glass-is-half-empty sort of person, you could take that
to mean it's not worth the effort. But I believe the availability of play-by-play
data has raised the level of the water so the glass is now about 90% full,
and if you're interested in joining me for a little stroll through the
evolution of fielding analysis, I think you'll end up with a better idea
of what we can and cannot learn about defense.
Baseball analysis in general
The idea of using statistical measures to assess the ability to succeed
in a certain phase of the game is not a radical one. Baseball people have
been doing this for over a century to measure batting and pitching performances.
They don't, after all, give the batting title to the guy with the prettiest
swing, they give it to the player who hit for the highest average. They
don't give the Cy Young to the pitcher with the best mechanics or the
guy who throws the hardest, they give it to the one who was deemed to
be most effective. They look at results, not form or effort or attitude
or any of the other things that a player brings to the game.
But for the most part this tradition has extended only to hitting and
pitching. Today's announcers and analysts make increasing use of modern
measures like on-base percentage and inherited runners to shed more light
on those areas of the game, but you never hear a television or radio analyst
talk about meaningful measures of baserunning, throwing or defense. Instead,
they talk about their impressions of the player -- how fast he looks,
his quickness, strength and athleticism -- and say simplistic things like
"they're the best fielding team in a league because they lead in
fielding percentage."
Because we do our own analysis, we sometimes find players whose performance
is better or worse than you would guess by watching them a few times a
year. And while most of our ratings are consistent with the opinions expressed
by baseball's leading writers and TV personalities, sometimes we conclude
that a player is actually performing at a higher or lower level than his
reputation would suggest.
Because we try very hard to provide the most accurate and realistic baseball
simulation available, we can't afford to give in to public opinion and
rate someone higher than his performance justifies. If we did that for
defensive ratings, we'd have these options:
- reduce the rating of one of his teammates so the team's defense isn't
overrated
- reduce the effectiveness of the team's pitchers to compensate for
the extra plays this player will now make in the simulated season
- disregard these side effects and allow the player, the team, and its
pitchers to produce better results than they should
We don't think it's fair to downgrade teammates so we can give a popular
player a better rating than he deserves. And we don't think our customers
would want us to disregard the side effects and publish a season disk
with players and teams who will overperform. So we do our best to rate
players based on their actual performance.
Judging by Watching
For a few years now, I've wanted to write a little piece about how difficult
it is to judge defensive ability, or any baseball skill for that matter,
just by watching a lot of games. Then I found an essay by Bill James in
his 1977 Baseball Abstract (a self-published book that predated his debut
in bookstores by about five years) that says it far, far better than I
ever could.
Here are a few excerpts from this wonderful essay, starting with a comment
on how differently most people tend to approach the assessment of hitters
and fielders:
"While we might not all be able to agree who the
greatest-hitting first baseman ever was, the record books will provide
us with a reasonably brief list to choose from: Gehrig, Anson, Foxx,
Sisler. That's about it. Nobody's going to argue that it was Joe Judge
or Moose Skowron, because the record books simply will not permit it
. . .
Fielding statistics provide no such limited clarity. Talk about the
greatest fielding shortstops ever . . . and the basic argument for everybody
is 'One time he made a play where...'
Suppose we turn that same argument back to hitting. Now Moose Skowron
hit some baseballs a long way, but nobody is going to say that he was
the greatest hitting first baseman ever because 'One time I saw him
hit a baseball so far that..." It is understood, about hitters,
that the important question is not how spectacularly but how often.
Brooks Robinson is known as a great fielding third baseman not because
of the number of plays that he makes, but because he looks so good making
them. Nobody talks anymore about what a great hitter Jim Northrup was,
although to tell you the truth I never saw anybody who looked
better at the plate. It is understood that, notwithstanding appearances,
he wasn't an especially good hitter. Hitters are judged on results;
fielders, on form."
And he talks about the difficulty of trying to judge effectiveness simply
by watching:
"One absolutely cannot tell, by watching, the
difference between a .300 hitter and a .275 hitter. The difference is
one hit every two weeks. It might be that a reporter, seeing every game
the team plays, could sense the difference over the course of the year
if no records were kept, but I doubt it . . . the difference between
a good hitter and an average hitter is simply not visible."
"a fielder's visible fielding range, which
is his ability to move to the ball after it is hit, is vastly less important
than his invisible fielding range, which is a matter of adjusting
his position a step or two before the ball is hit."
In that essay, Bill went on to propose a scoring system that accomplishes
essentially what STATS Inc. is doing now -- recording the location
of every batted ball so that we could build a record of fielding performances
similar to the statistical records that we use to judge batting and pitching
performances.
I'm not saying that it doesn't matter whether you watch games
or not. I'm just saying that I agree with Bill that it's very difficulty
to rate players solely by watching games. We also need useful measures
of what they accomplished.
Measuring Defensive Range
Defensive range is the ability to cover ground and get to more balls
than the average fielder, and it's one of the hardest elements of fielding
performance to measure.
Official fielding stats provide information such as games played, putouts,
assists, errors, double plays, and fielding percentage. But using these
numbers to assess player skills is extremely difficult, if not impossible.
The list of reasons is very long, but they all boil down to the fact that
they don't tell you how many chances to make plays were presented to each
fielder.
In 2002, for example, Jose Vidro led the majors in assists by a second
baseman. Does this mean he was the best seconde baseman in baseball, or
was this just because:
- he played more innings than everyone else?
- he played behind a pitching staff that didn't strike out a lot of
batters, so more balls were put in play?
- his pitching staff induced a high percentage of ground balls?
- his pitching staff was heavily right-handed, so they faced more than
the normal number of left-handed batters (who hit more ground balls
to the right side)?
- his park somehow makes it easier for him to make plays?
- it just happened that more balls were hit to second when he was playing?
Baseball analysts, ourselves included, have made many attempts to devise
methods that deal with some of these other factors so that we can isolate
the contribution the player is making. Let's review them, and then talk
about some newer methods that we've been using.
Range Factors and Defensive Innings
In the 1970s, Bill James introduced the idea of range factors to compensate
for playing time. A player's range factor is generally computed as successful
chances (putouts plus assists) per game. This was a good first step, even
though Bill acknowledged at the time that it wasn't meaningful for pitchers,
catchers and first basemen.
One thing that frustrated Bill was the fact that not all games played
are equal. Some players play almost every inning of their games. Others
split the playing time with a platoon partner. Late-inning defensive specialists
often pick up a lot of games played without actually playing a lot. For
a while, Bill devised methods to estimate how many innings each fielder
was actually in the game at his position, but this is very hard to do.
Fortunately, companies like STATS have been publishing accurate counts
of defensive innings for the last ten years. So we can now compute range
factors on a per-nine-innings basis, just like we do for earned run averages.
Using a range factor based on defensive innings, Pokey Reese moves to
the top of the list of 2002 second basemen with 5.86 successful chances
per nine innings. Vidro drops to seventh.
The fixed sum problem
Whether you use games or innings as the basis of a range factor calculation,
there's another critical problem with range factors. By measuring plays
made per game or per nine innings, the method takes no account of the
length of those innings. Consider the following two innings that start
out the same way and feature the same mix of batted balls, only with different
results:
- strikeout ... ground ball double down the third base line ... line
drive single to shallow center ... popup to third ... triple into the
right field corner ... ground ball single between first and second ...
groundout to third
- strikeout ... great diving stop by third baseman on hard-hit grounder
down the line, with the batter out at first on a strong throw by the
third baseman ... line drive single to shallow center ... popup to third
In the first version of this inning, the official fielding stats record
a putout for the catcher (on the strikeout), one assist (on the inning-ending
ground out) and one putout (on the popup) for the third baseman, and one
putout (on the grounder) for the first baseman. In the second version
of this inning, the official fielding stats are exactly the same. The
fact that the defense allowed three more hits in the first one is completely
lost.
In this example, there's no way to tell which team defense and which
individual fielders were more effective just by looking at the official
fielding stats. In the more general case, the best fielders will generally
end up making more plays than the poorest defenders. But the number of
putouts in a nine-inning game adds up to 27 no matter how many hits are
allowed, and the number of assists is mostly a product of the number of
ground balls, not the skill of the infielders. So we can't use range factors
to evaluate team defense at all, and they don't tell us nearly enough
about individual fielders either.
Adjusted Range Factors
Even if we use defensive innings to measure playing time, we still haven't
taken into account (a) the number of opportunities presented to each fielder
and (b) the fact that some putouts and assists are harder to come by than
others. Back in the 1980s, I developed a new type of range factor that
adjusts for many of these variables in the following ways:
- it counts the number balls put in play (excluding homeruns) while
each fielder was at his position, removing the strikeout rate of the
pitching staff as a potential source of bias
- it counts only those putouts and assists that required the fielder
to do some important work (e.g. taking a groundball and getting an out
by making a throw or stepping on the bag for a force, spearing a line
drive, or tracking down a fly ball) and ignores the ones that don't
say much of anything about defensive range (e.g. taking a throw at first
base, making the pivot on a double play, or tagging a runner on a steal
attempt)
- it tracks balls put in play by left- and right-handed batters separately,
since players pull the ball on the ground much more often than they
go the other way
- it adjusts for the ground ball percentage of each team's pitching
staff
Traditional range factors compute plays made per game or per nine innings.
This method computes plays made per 100 batted balls, meaning that we
can use it to get a better handle on both team and individual defense.
If one team gives up a lot more hits than another, it will need more balls
in play to get through a game, and the adjusted range factors for the
poor fielding team will be lower.
Here's how these factors affected Vidro:
- his pitching staff was a little above average in strikeouts
- only 12% of Montreal's innings were thrown by lefties. That's a low
figure, but the percentage of balls put in play by lefty hitters was
about average despite the right-handed nature of his pitching staff.
(By the way, if we made an assumption based on the left/right mix of
the staff instead of actually counting balls put in play, we would have
assumed Vidro got more chances to make plays than he really did.)
- Montreal's pitchers were second in the majors in ground ball percentage,
a strong indication that Vidro's numbers were boosted significantly
simply because he had more balls hit his way
Based on adjusted range factors, Vidro was a little below average among
all major-league 2Bs this year, and while we can't finish our assessment
of his play without using more advanced methods, we've already seen enough
to conclude that his MLB-leading assist total is highly misleading.
This approach produces much better information than does an ordinary
range factor, but we're still left with the fact that we're using these
adjustments to make an educated guess at how many opportunities each fielder
had to make plays. It goes without saying that it's possible to do better
when we have access to play-by-play accounts that note the location of
every batted ball.
Total Baseball's Fielding Runs
Before moving on, let me take a moment to say that the Fielding Runs
numbers in the Total Baseball encyclopedia can be extremely misleading.
I don't enjoy saying this, because they were developed by Pete Palmer,
and Pete's a friend and one of the nicest guys I've ever met.
The first problem I have with fielding runs is that they're just a glorified
range factor, with different weights for different events. Like range
factors, you cannot interpret them accurately unless you know the strikeout
rate and groundball/flyball ratio of the pitching staff and what percentage
of left-handed batters the fielder faced. For a good example of the distortions
that often creep into the fielding runs numbers, see the comments
on Frank White and Ryne Sandberg in an article I wrote for ESPN.com
in September, 1998.
In addition, I don't agree with some of the formulas, mainly because
they put too much weight on certain events. For example, the formula for
outfielders is .20(PO + 4A - E + 2DP), meaning that catching a fly ball
with the bases empty earns you .20 fielding runs, while catching the same
fly ball and throwing out a runner for a double play earns you 1.4 fielding
runs. In both cases, the fielder made the best play available, but one
counts seven times as much as the other. And suppose one center fielder
reaches a ball but muffs it for a one-base error, while another lets it
go up the gap for a double -- the guy who reached the ball has .20 fielding
runs deducted and the second guy isn't penalized at all.
Finally, the fielding runs formula mixes range, errors and throwing into
one number, which is appropriate for what Total Baseball is trying
to accomplish (an overall player rating), but useless for what we do,
which is to assign separate ratings for these skills.
STATS Zone Ratings
The next logical step beyond range factors is a system that counts actual
opportunities to make plays. We weren't able to do that until 1989, because
nobody tracked the location of every batted ball until then. The folks
at STATS were the first to do it, and they developed the zone rating to
take advantage of this new information.
STATS says the "zone rating measures all the balls hit in the area
where a fielder can reasonably be expected to record an out, then counts
the percentage of outs actually made." Instead of having to estimate
the number of opportunities to make plays from defensive innings, percentages
of balls in play, the left-right composition of the pitching staff, and
the staff groundball/flyball ratio, we can actually count the balls hit
to each fielder while they are in the game.
The zone rating could have been a tremendous breakthrough, but we disagree
with some of the details of their implementation.
First, they don't count all the balls. For example, no infielder is charged
with an opportunity when a grounder is hit down the lines, in the holes,
or up the middle. The only plays that go into the zone ratings are the
ones where the ball is hit more or less at a fielder. The net result is
a system that places more emphasis on good hands than range.
Even if you didn't know this, you could infer from their numbers. The
league average zone ratings range from .763 to .885 depending on the position,
suggesting that fielders are turning well over 80% of all batted balls
into outs. But the truth is that only about 70% of all batted balls become
outs. It's clear that the most challenging opportunities, the ones that
separate the best fielders from the ordinary ones, are left out of their
system.
The second issue is that errors are mixed in with the ability to get
to the ball in the first place. Let's suppose a player is credited with
500 opportunties in a season, and let's suppose he was very reliable,
making 8 fewer errors than the average player with that many plays to
make. Those 8 errors become 8 outs and produce a zone rating that is .016
above the league average. Without taking the errors into account, you
might conclude that he has above-average range, when in fact he has average
range and very good hands.
The third issue no longer applies but needs to be mentioned. Through
the 1999 season, when an infielder started a ground ball double play,
STATS credited him with two outs and one opportunity. Starting double
plays is an important skill for an infielder, but this approach gives
a significant boost to infielders who play behind pitchers who put lots
of runners on base and/or with a pivot partner who turns the DP well,
and it clouds the effort to measure defensive range. STATS doesn't do
this any more, but if you have copies of the STATS Player Profiles books
from the 1990s, you'll be looking at zone ratings that double-count these
DPs.
Once again, let me say that the idea behind the STATS zone rating is
sound and has value even with these issues. If you're looking for an overall
measure of fielding performance that includes both range and errors, it
won't matter to you that they're lumped together. And folks like us who
are interested in separating these skills can make an adjustment for error
rates to isolate the range portion.
The zones are smaller than we'd like, but my guess is that STATS did
this on purpose to avoid running into two other issues that we'll talk
about in a bit. First, some batted balls are playable by more than one
fielder, and keeping the zones on the small side reduces the number of
opportunities for one fielder to affect his neighbors. Second, outfield
zones that cover the entire field make the system more vulnerable to distortions
arising from different ballpark dimensions and characteristics. Our zone-oriented
analysis does cover the whole field, so we've developed some methods for
handling the interaction among fielders and accounting for park effects.
Defensive Average
For a few years in the early 1990s, we used a type of zone rating called
Defensive Average (DA) . It was developed by Pete DeCoursey and Sherri
Nichols and used play-by-play data from The Baseball Workshop.
Like the STATS zone rating, defensive average used the same principle
of counting batted balls hit into each fielder's zone and counting the
number of plays he made. But it covered the whole field and didn't mix
apples and oranges by double-counting GDPs. As a result, we felt we got
better results from defensive average than from the STATS zone ratings.
When assigning responsibility for balls hit between fielders, the STATS
and DA systems are similar if an out is made. Both systems credit the
fielder with one opportunity and one play. But things get tricky when
the ball falls in for a hit.
If the ball falls into one of the STATS zones, the fielder responsible
for that zone is charged with an opportunity. If it falls outside the
STATS zones, the play is ignored, and no fielder bears responsibility
for the hit.
In the DA system, each player gets charged with half an opportunity when
there's a hit that lands between two fielders. That means that someone
playing next to a weak fielder tends to look worse than he is, because
if the other guy makes the play, there is no opportunity charged, but
if the ball falls in, he's charged with half an opportunity even if it's
the sort of play the other fielder would be expected to make at least
some of the time.
During the years in which we used the Defensive Average system, we were
aware of this limitation and did our best to make intelligent adjustments
to compensate for it when assigning player ratings. But we always wanted
to see if we could do better.
The Diamond Mind Approach
In 1996, we began using a collection of old methods and new tools to
expand our look at defensive performance, and we have been refining and
improving these methods ever since. We believe that by using these tools
to look at player performance from several angles, we can learn a lot
more about who accomplished what in a given season.
To one degree or another, our best tools take advantage of the fact that
STATS has been recording the type (grounder, fly ball, line drive, popup,
bunt) and location (direction and distance) of every batted ball since
the late 1980s. Using this information, our analysis programs aren't vulnerable
to the potential biases in traditional fielding stats. We know exactly
how often each player was in the field, how often the ball was hit near
him, and how many plays he made on those balls.
The field is divided into approximately 80 zones. We count the number
of balls hit into that zone, the number of times each fielder made an
out, and the number of singles, doubles, triples, and errors that resulted.
When we're done, we look at the zone data for all of the major leagues
and see how often the players at each position were able to make plays
on those balls.
For example, on the 6939 grounders up the middle to the shortstop side
of the bag during the 2002 season, MLB shortstops turned 64.4% of those
balls into outs and made errors 1.9% of the time. Second basemen ranged
to the other side of the bag to make the play 0.8% of the time. Almost
of the remaining grounders in this zone resulted in singles, with a handful
of doubles and fielders choice plays to round things out.
This gives us a baseline that we can use to evaluate performance on balls
hit into this zone. Repeating this process for all batted ball types and
every zone gives us an overall measure of the playmaking ability of a
team and its players.
With one exception, our zone-oriented approach includes the entire field
and all types of batted balls. Early on, it became clear that we needed
to screen out infield popups because they don't tell us anything. Over
99% of these plays result in an out, so they don't distinguish the good
fielders from the not-so-good. And because these plays are easy to make,
most popups can be handled by any of several players, making the successful
completion of this play as much (or more) a matter of preference than
one of skill.
As I mentioned previously, we need to use measures of team defense to
help us deal with the interactions among fielders. If one player doesn't
get credit for making a play, it may be because another fielder beat him
to it, and the first guy shouldn't be punished for playing next to a superior
defender. It's only by looking at measures of team defense that we can
distinguish the cases where another guy made the play from those when
the ball fell for a hit. So let's take a moment to discuss team defense
metrics.
Defense efficiency record (DER)
We usually start by computing the percentage of batted balls, excluding
homers, that were turned into outs by the team. This percentage was labelled
the Defense Efficiency Record (DER) by Bill James when he wrote about
it in the 1980s, and you can find DER information on the Baseball Prospectus
web site during the season.
I'm not completely sold on DER as the ultimate measure of team defense,
however. For one thing, I've always been troubled by the fact that it's
just a variation on batting average, with strikeouts and homeruns removed,
and with the focus on the out percentage instead of the hit percentage.
But league batting averages have ranged from a low in the .230s to a high
in the .300s in the past 80 years, so they don't just measure batting
skill. They also embody the impact of the rules of the game (strike zone,
mound height), the equipment (dead ball, lively ball, juiced ball), and
the changing nature of ballparks. Similarly, the league DER figures have
risen and fallen by large amounts, indicating that factors other than
fielding skill are built into these numbers, too.
A second question about DER is the extent to which it measures pitching
versus fielding. I've always believed that DER measures some of both.
There is a strong (but not perfect) correlation between a team's rankings
in ERA and DER, suggesting that (a) good pitchers make their fielders
look better and/or (b) the team's rank in ERA is in large part due to
the quality of its defense. It's hard to know which way to look at it,
but I believe it works in both directions.
Recent work by Voros McCracken and Dick Cramer suggests that pitchers
have little or nothing to do with the percentage of balls in play that
are turned into outs. To put it another way, the defense is entirely responsible
for a team's DER ranking. I'm not ready to accept that pitchers have nothing
to with these outcomes. While I haven't had time to do any detailed studies
in this area, some very preliminary work suggests that good pitchers do
improve a team's DER, though only by a few points. But because pitchers
allow a very large number of batted balls over the course of a season,
these small improvements can have a large effect on the pitcher's ERA.
Another issue with DER is that park effects can play a large role. It's
clear that the enormous impact that Coors Field has on scoring isn't entirely
due to homeruns. A much higher percentage of balls that stay in the field
of play are falling in for hits, too, and that makes Colorado's team defense
look much worse than it really is. This is the most extreme example, of
course, but there are other parks that make a difference.
In other words, we start our process by computing the DER for each team,
but we don't take that figure as a precise measure of the team's ability
to make plays in the field. We keep the potential distortions in mind
as we go through our rating process.
Other measures of team defense
Our zone-oriented analysis provides us with another way of rating team
defenses. We can go zone by zone and compute how many more (or fewer)
plays were made by this team than the average team, then do a weighted
average of all of the zones to get an overall score for the team. That
overall score is expressed as the number of plays made above or below
the average. In 2002, for example, Anaheim's defense led the majors by
making 120 more plays than the average team (in 4228 opportunties). These
figures are not park adjusted, so they're not definitive, but they definitely
add value in the process.
To isolate portions of a team's defense, we rate the infields by computing
the percentage of ground balls turned into outs and the outfields based
on the percentage of fly balls and line drives that were caught.
Because we use a collection of overall measures (like DER), mid-level
measures (such as out rates on grounders), and detailed zone-based analysis,
we can examine team defense at several levels of detail. That helps us
determine which fielders are getting the job done and which are letting
the team down.
Park effects
We can't leave the subject of team defense without looking more closely
at the parks.
We mentioned Coors Field a moment ago, but Dodger Stadium is another
good example. From 2000 to 2002, that park depressed batting averages
by 21 points, making it one of the best pitchers' parks in the game. And
it wasn't just because of strikeouts and homers, either. Focusing only
on balls hit into the field of play, Dodger Stadium took away 97 hits
a year in that period. If half of them came with the Dodgers on defense,
measures that ignore park effects (like DER) make LA's team defense appear
to be 48 plays better than it really is.
Using play-by-play data, we can also compare the hit rate on different
types of batted balls. Dodger Stadium dramatically reduces the percentage
of ground balls that go for hits. It also cuts the hit rate on fly balls,
but not by a whole lot. Because virtually all of the park's effect is
concentrated in the infield, it would be especially easy to overrate the
LA infield if we ignored this information.
Evaluating individual players
Most of our work at the player level uses zone-based data. We compare
the rate at which each fielder turned batted balls into outs in each zone
with the overall averages. If a player made more than the normal number
of plays, he gets a plus score for that zone. If he fell short of the
overall average, he gets a minus score. By computing a weighted average
of all of his zones, we get a figure that tells us how many more (or fewer)
plays he made than the average defender. We call this figure "net
plays".
In a typical season, the top fielders at each position make 25-30 more
plays than the average. Exceptional fielders have posted marks as high
as 40-60 net plays, but those are fairly uncommon. Recent examples include
Darin Erstad in 2002, Scott Rolen just about every year, and Andruw Jones
in his better seasons. The worst fielders tend to be in the minus 25-40
range.
As a reality check, we look at other measures like range factors, adjusted
range factors, STATS zone ratings, and our own version of the STATS zone
ratings (with larger zones). More often than not, these measures tell
similar stories. When they disagree, we look for external factors that
might be skewing those other measures. In the end, we put the most weight
on our net plays analysis.
But the net plays figures are starting points, not the final answer,
because we have several other things to consider before we assign a rating.
We've already talked about park effects, so I won't dwell on that any
more.
As with the STATS zone ratings, our net plays analysis can be influenced
by error rates. So we always look to see whether a fielder is making more
plays mainly because he has better hands. Mike Bordick and Alex Rodriguez
are two good examples from the 2002 season. In some cases, a player will
have a mediocre net plays figure because he made a lot of errors, and
we may bump up his range rating to account for the fact that he's getting
to more balls in the first place.
For infielders, we have another analysis program that measures their
ability to start double plays and get force outs when such opportunities
exist. Especially for corner infielders, the ability to make the tough
plays can separate the men from the boys. If a first baseman always takes
the ball to the bag and doesn't start his share of double plays and force
plays, he's not helping the team, even if he does record a normal number
of outs.
For middle infielders, we also look at how often they are able to make
the pivot on the double play. This is an important part of the second
baseman's job, and he can make up for ordinary range by turning two more
often. It isn't talked about very often, but we also see differences in
the ability of shortstops to complete these plays.
For shortstops, we look at the zone data to see if their net plays score
has been artificially depressed by sharing the left side of the infield
with an especially talented third baseman. For example, Scott Rolen is
way above average on balls to his left, and that cuts down on the number
of plays his shortstops can make. If the overall team defense in that
zone is still very good, there's no reason to penalize the shortstop.
Similarly, we look for first basemen who are taking plays away from the
man at second. By looking at the zone data for individual fielders and
for the team as a whole, we can tell whether plays not made by one fielder
are getting made by someone else.
The same is true in the outfield. For balls hit in the gaps, we look
at the zone data to see if an exceptional fielder might be taking plays
away from his neighbors.
Another of our analysis programs counts the number of times a player
is used as a defensive sub or is removed for a defensive sub. This information
doesn't tell us anything about performance, of course, but it is very
helpful to know that one fielder was regarded by his manager as being
superior to another.
Like many of you, we read a lot, we watch games on local TV and satellite
and the highlight shows on ESPN and Fox, because it helps to have an image
of a player when we evaluate the performance data. And we compile an extensive
database of player notes, so we know who's coming off a knee injury or
a shoulder problem that might have affected their ability to make plays.
And when the evidence doesn't match the player's reputation, we double-check
our work and look very, very hard for the reasons why. Whenever possible,
we talk to people -- local writers, broadcasters and sophisticated fans
-- who have seen the player quite a bit to see if we can gain some additional
insight into each player's performance.
After rating all of the players, we go back and double-check these individual
ratings to see if they add up to something resembling the team's park-adjusted
defensive performance. If not, we go back over everything we know about
those players and keep at it until it makes sense.
Bill James' Win Shares
In his recent book called Win Shares (published by STATS in 2002), Bill
James developed a method for apportioning each team's wins to the players
who were most responsible for creating them. A big part of that method
involves evaluating defense at both the team and individual level. We're
still in the process of evaluating this new approach, but we can point
out a few things that you might want to keep in mind as you ponder the
role that system should have in evaluating players:
- Bill begins by evaluating overall team defense and then tries to break
that down and assign credit/blame to positions and then players. We've
been doing that for many years.
- Bill's method is intended to work with players from all eras, including
that vast portion of baseball history for which play-by-play data is
not available. So he chose to develop new techniques for coping with
the biases inherent in traditional fielding stats. We've been aware
of those biases for a long time and have always kept them in mind while
evaluating traditional fielding stats.
- Bill's system is an attempt to make better estimates of the number
of opportunities to make plays and the number of plays made, and it
appears that he has come up with at least a few useful ways to do that.
On the other hand, using play-by-play data from the 1990s, we can now
count those things directly, and we want to spend some time seeing whether
Bill's estimates match up with the actual data for that period. If they
do, he's made a giant contribution to the field, because we can confidently
apply his techniques to seasons for which we don't have first-rate play-by-play
data. If they don't, we'll have to figure out why and proceed from there.
- Bill's method is intended to aggregate all aspects of fielding performance
into one number, while our goal is to isolate specific skills. We have
separate ratings for range, errors and throwing, and we cannot assume
that a high number of defensive win shares necessarily indicates a fielder
who should get a top range rating. It's possible that his range is average
and his value lies in a strong arm and good hands.
- We're not yet sure about the weights Bill put on different fielding
skills when coming up with his fielding win shares. To some extent,
that doesn't matter to us because we're more interested in rating the
individual components of defense anyway. But as fans of baseball analysis,
we're curious to see whether Win Shares really works, so we hope to
find time to look at this part of his system, too.
The bottom line is that we will continue to rate fielders for modern
seasons based on our analysis of play-by-play data. But we're always on
the lookout for new and better ways to evaluate fielders, and if our review
suggests that the fielding portion of the Win Shares model provides us
with some new tools, we'll use them.
Other Approaches to Rating Players
We know that a lot of our customers like our products precisely because
we do our own analysis instead of rating everyone based on prevailing
opinions. At the same time, we know that there are other people who don't
buy our products because Tim McCarver says that someone is a brilliant
fielder, and because McCarver is a well-known TV analyst and ex-player,
he must therefore know a lot more about this stuff than we do.
Let's suppose, for the sake of argument, that we wanted to ditch all
of our analysis and rate players based upon what we read and hear from
the media. That's a lot harder to do than you might think, for a whole
host of reasons.
When someone in the media says "he's the best second baseman in
baseball," it's not always clear what it means. It could mean he's
the best overall player at his position (including hitting, running, etc.).
It could mean he has great hands. It could mean he turns the double play
well or that he has great range. Even if it means all of these things
to some degree, an overall evaluation doesn't help us. We have separate
ratings for separate skills, and we need objective evaluations of each
skill.
The media doesn't talk about all the players. We have 1200+ players to
rate each year, and only a fraction of them are regularly discussed. Some
players may be overrated because they play for teams in media-intensive
cities or teams that got a lot of exposure in the playoffs, while good
players on small-market teams may be overlooked.
It often seems as if it takes a year or two for someone's reputation
to catch up with a change in his performance, for better or worse. In
the 15+ years we've been rating players, we've often identified someone
who has been making a lot of plays without getting noticed. It's not unusual
to see that player start to win Gold Gloves two years later. And then
keep winning Gold Gloves for a few years after their performance no longer
merits them.
Managers and general managers make public comments about players all
the time, but their remarks can be influenced by the needs of the team.
Sometimes it's to their advantage to talk about players in certain ways,
whether it's to hype someone for marketing purposes or to talk them down
in a salary squabble. It's hard to tell when we can take a comment at
face value and when we need to discount it because of a hidden agenda.
I'd love to incorporate the opinions of professional baseball scouts
because they are trained to see things that other people don't see. But
it's difficult to find a collection of scouts who have seen every player
and can make their evaluations available to people outside the organizations
they work for.
We could base our judgments on how often someone shows up on SportsCenter.
But the photogenic play isn't always the best play. The exact same fly
ball might produce a routine play for a great fielder, a diving catch
for the average fielder, or a single for the poor fielder. The diving
catch is the only one that makes the highlight films. The majority of
highlight-film plays are made at the edge of the fielder's effective range,
whatever that range happens to be.
(A few years ago, I saw a game in Baltimore in which the right fielder
broke back on a line drive, realized it wasn't hit that hard, reversed
course and recovered in time to make a nice shoestring catch. What should
have been a very easy play wound up being shown dozens of times as CNN's
Play of the Day.)
We could place a lot of weight on the Gold Glove voting. Putting aside
the question of how well the voters do that job, there are still several
obstacles. They don't announce the voting, so we have no idea who came
second or how close the vote may have been. And even if we were to accept
all Gold Glovers as top fielders, we can't award them all our top range
rating because Gold Gloves are given for overall fielding performance,
and we have to rate players separately for range, throwing, and avoiding
errors. For some Gold Glovers, the most accurate way to rate them would
be to assign an excellent throwing rating, a very low error rate, and
an average range rating.
Summing up
We do our very best to rate players based on performance, not reputation.
To that end, we license play-by-play data and spend a lot of time developing
new ways to analyze that information and interpreting that information
in light of everything we know about that player's performance. The phrase
"everything we know" includes our own analysis of team and player
fielding skill, other measures like range factors and STATS zone ratings,
injury reports, park effects, plus what we see and hear and read as we
follow baseball on a daily basis. We hope you like the results as much
as we enjoy doing this work.
|