Diamond Mind Email Newsletter
April 30, 2003
Written by Tom Tippett
Welcome to the second edition of the Diamond Mind email newsletter for
the year 2003. Through these newsletters, we will try to keep you up to
date on the latest product and technical information about the Diamond
Mind Baseball game, related player disks, and our ongoing baseball research
efforts. Back issues are available on our
web site.
Topics for this issue:
New Diamond Mind Weblog
2003 Projections Update
All-time Greatest Players Update
Version 9 Update
Diamond Legends and ESPN Classic Fantasy Baseball
Easy as 1-2-3
Distributing League Databases
New Diamond Mind Weblog
For a while, we've been looking for the best way to publish small pieces
of baseball commentary and research, items that may not warrant a full
article (such as the ones we've been writing for ESPN.com since 1998)
or items that would be outdated by the time our next email newsletter
is due to be issued. A weblog
is well suited to this purpose.
For those of you who aren't yet familiar with weblogs -- and I was one
of those people until a few months ago -- a weblog is nothing more than
a web page with a series of writings on a particular topic. Our topic,
obviously, is baseball, and we'll post things to our weblog whenever we
feel we have something to say.
It might be a comment about a managerial decision or a defensive play
we've just seen in a game. It might be a more general comment about the
progress of the season. Or a small bit of research that doesn't warrant
a full article.
2003 Projections Update
Since we last published a newsletter, we have released the 2003 Projection
Disk, published our 2003 Projected
Team Standings article, collected a bunch of predictions by others
so we can rank them after the season, and watched about four weeks of
baseball.
There's a good deal of consensus among of the predictors this year:
- in the AL East, everybody thinks the Yankees and Red Sox, in that order,
will top the division. Athlon and Lindy's have Baltimore third; everyone
else has the Blue Jays in that spot.
- in the AL Central, there's a fairly evenly split between the Twins
and the White Sox to win the division, but everyone agrees that they'll
wind up one and two. Cleveland was a unanimous choice to finish third.
In our preseason simulations, Cleveland was three games better than
Kansas City, but I'd have to give the nod to the Royals now. As of April
28th, they were already 11-1/2 games up on the Indians, and I'll be surprised
if Cleveland can make that up. Of course, a better question is whether
the Royals can hang with Minnesota and Chicago all year. If they can,
it'll be fun to watch.
- in the West, everyone picked Oakland or Anaheim to win the division,
with the vast majority placing Seattle third and Texas fourth.
Our simulation results put Anaheim and Seattle in a virtual dead heat
for second place. Seattle averaged 0.3 more wins per season, but Anaheim
had a slightly better run margin. Because wins are more important than
runs, we're going with Seattle as our pick for second for the purposes
of our postseason prediction rankings.
- in the NL East, a small majority favored Atlanta over Philadelphia,
with one lone voice predicting that the Mets will take the division. In
our simulations, the Mets were last, a view nobody else shared. Time will
tell whether we alone were able to see the truth or whether there's safety
in numbers.
- the NL Central is a lot like its AL counterpart in that there's complete
agreement about the two teams that are expected to finish one-two and
a sizeable split as to which will be on top. We had St. Louis first with
Houston second. It's worth noting that the Cubs took first place in 14%
of our simulations, so the Cards and Astros can't just focus on each other.
- the NL West is another division where our results differed from the
consensus. Several others predicted that the Dodgers would win, but we
had them fourth because they didn't score enough runs.
In our fifty seasons, the Giants averaged about one more win than the
Diamondbacks, while the majority of other forecasters had Arizona finishing
ahead of San Francisco. We have Colorado finishing third, and only one
other publication projected them to finish in that spot; everyone else
had them fourth or fifth.
With the season a month old, I'm quite excited about how things are developing.
It's awfully early, of course, but it's nice to see potential surprises
brewing in a number of the races.
All-time Greatest Players Update
In recent weeks, we've made great progress in the development of our
All-time Greatest Players Disk, and we hope to be able to begin shipping
it in the next couple of months.
When we began this project, we weren't sure whether to focus on franchises
or players. A franchise focus would have meant going through each franchise
and finding the 32-or-so best players in its history. A player focus would
have meant finding the best players in history, regardless of who they
played for.
The advantage of the franchise approach is that we are pretty much guaranteed
to wind up with rosters that cover all of the positions and pitching roles.
The disadvantage is that we could end up including less-than-great players
just to fill out a roster or excluding other great players who had the
misfortune of playing for a team with a long history and a glut of other
stars.
So we decided to go with the best players. We've completed most of the
ratings and statistical work, and are beginning to think about how to
organize that pool of 1100+ players into meaningful rosters.
We could, I suppose, just ship a database of players and let you assemble
your own rosters, but we believe it will be a better product if we create
teams that you can use right away.
Our tentative plan is to take this pool of players and allocate them
to teams that match real-life franchises as much as possible. Certain
franchises have too many stars for one team, so we'll split them into
two teams. Other franchises are too young to have accumulated a full and
balanced roster, so we'll combine them.
It remains to be seen whether this approach will give us a collection
of teams that we're happy with. If not, we'll try something else. Of course,
no matter what we end up with, you'll be able to use various DMB tools
to assemble your own rosters and/or transfer these players to other DMB
databases.
Another fundamental question was how to rate the players: Single-season?
Peak years? Full career? If peak years, how many years, and which ones?
And so on.
We decided to go with a series of consecutive peak years. For each batter,
we've chosen his best group of consecutive seasons totaling at least 6000
plate appearances. That means eight years for some players and twelve
for others, but it's a meaningful amount of playing time in any case.
Why peak years? Some all-time greats had mediocre-to-poor seasons at
the beginning and/or end of their careers, perhaps because they were called
up at a very young age or because their names allowed them to hang onto
a job after they had lost much of their ability. If we used entire careers,
these stars would not stand out from the crowd as much as they should.
Why 6000 plate appearances? We're not wedded to this number, and we may
decide to bump it up or down before we're done. We do feel it's important
to include a lot of playing time so lesser players with one or two really
good seasons aren't rated as highly as others who sustained their success
over a much longer period.
If a player had a career of less than 4000 plate appearances, he's not
eligible for this disk, at least for now. (We may lower that threshold
in the future.) If he reached that mark but fell short of 6000 plate appearances,
we're rating him based on his entire career.
Why consecutive seasons? Because we want to end up with complete players,
not hypothetical combinations of skills that may not have existed together
at any one point in the player's career.
Players tend to change with age. An outfielder might start out as a superior
center fielder with great speed and enough hitting skills to be an asset
at the top of the order. As he ages and fills out his body, he might move
to left or right field, run less frequently, take more walks, add power,
and move to the middle of the batting order.
If we take the approach of rating the player based on his best N seasons,
regardless of where they fell in his career, we run the risk of winding
up with a few of his early years and a few of his later years. That might
be fine in terms of his batting ratings, but what about everything else?
We run the risk of creating a power hitter who could also play a great
center field and steal a bunch of bases. Each of his ratings (hitting,
defense, running) might reflect his best level in that one area, but as
a group, those ratings suggest a player who could do all of those things
at once, something that probably didn't happen in real life.
By choosing a series of consecutive seasons, we reduce the risk of combining
hitting and non-hitting ratings that don't really belong together. The
risk doesn't go away completely, but at least we're putting some boundaries
on it. We're no longer faced with the problem of how to combine an age-23
season with an age-38 season.
The question of how to choose each player's best years wasn't the last
one we needed to answer. We also needed to choose the period of baseball
history from which we would draw the players, and how to create a level
playing field in the face of changes to the rules, equipment, ballparks,
and other aspects of the game.
How far back in history should we go? In the early years of professional
baseball, leagues came and went, teams often folded or moved during the
season, and rules were changed on a regular basis.
We've chosen 1894 as our starting point because the rules had settled
down to something very much like what we use today. One could argue that
1903 would be an even better choice because it was the first time foul
balls counted as strikes in both leagues, but we didn't want to exclude
the top players from the 1890s.
How close to the present should we go? We decided to go all the way.
Our greatest players disk will include active player who have met our
minimum playing time limits and have performed at a high enough level
to qualify.
An obvious implication of this decision is that a year from now we'll
have more current players who have met that minimum plus one more year
of performance upon which to rate the active players. As a result, we
expect to release updates to this disk as time goes on. (Schedule and
pricing to be determined.)
How do we level the playing field? We take each player season, adjust
the stats for the effect of the player's home park, and evaluate those
park-adjusted stats relative to the appropriate set of league averages.
This is no different from how we rate players on a single-season basis.
In other words, we're adjusting every player's stats for the time and
park in which he played. We don't want to overrate pitchers from a low-offense
era like the 1960s or overrate hitters from a high-offense era like the
1930s or 1990s.
After we've decided which group of consecutive seasons is the player's
best, we combine them, giving more weight to seasons in which the player
had more plate appearances, and express the resulting set of park- and
era-adjusted statistics in terms of a neutral era.
All of these examples were expressed in terms of batters, but the concepts
apply equally to pitchers. They're also being rated based on era- and
park-adjusted stats. And we're making similar historical adjustments when
assigning error rates to fielders.
Our belief is that we're in the process of creating something quite unique.
We're not aware of any other systematic approach to rating historical
and current players based on peak years with adjustments for era and park
effects.
We hope to have our All-time Greatest Players disk available for shipment
in June. When we've set the price and have a more precise fix on the projected
ship date, we'll add this product to our price list and web store and
begin taking orders. In the meantime, we'll keep you posted via our web
site as more information becomes available.
Version 9 Update
Although we've chosen to write about our All-time Greatest Players project
in this newsletter, it doesn't mean that we've forgotten about version
9. We continue to make good progress on version 9 in parallel with our
greatest players project.
In February, we created a version
9 page on our web site and described some of the features we're
working on. As time goes on, we may add to the features list, and we'll
get a better handle on the projected ship date. As we know more, we'll
let you know through the web site and this newsletter.
Diamond Legends and ESPN Classic Fantasy Baseball
Many of you already know this, but in case you don't, here's a reminder
about two great games that use baseball simulation technology from Diamond
Mind:
Diamond Legends (http://legends.stats.com/intro.asp)
ESPN Classic Fantasy Baseball (http://games.espn.go.com/legends/frontpage)
Both games enable you to join a league, select from a large pool of
historical players to build the best team you can within your budget,
set up a manager profile, and play a 154- or 162-game season against the
other teams in your league.
A custom version of the Diamond Mind game engine is used to play the
games. As the season progresses, you can log in to the web site any time
to make roster moves, view boxscores, check the standings, browse the
leader boards, and pore over a wide range of statistical reports.
Easy as 1-2-3
Closers always get a lot of attention, but it seems as if they've been
in the news even more than usual this year:
- the Red Sox took the money they would have spent on Ugueth Urbina,
signed several veteran relievers (Embree, Timlin, Mendoza, Fox), and announced
that it doesn't make sense to save your best pitcher for the 9th inning
when games are often decided in the 7th or 8th
- several of the game's leading closers (Nen, Hoffman, Isringhausen,
Rivera) have yet to pitch because of injury
- some established closers (Benitez, Sasaki) have blown more than their
share of saves already this year, while others (Jimenez, Williams, Koch)
have sky-high ERAs that haven't yet translated into blown saves
- on one recent day (Sunday, April 27), Kansas City couldn't hold a 9-4
lead in the ninth, St. Louis coughed up a 5-run lead in the 9th and we're
forced to go 20 innings before taking home the win, and Boston's pen failed
to hold a 4-2 lead handed over by Pedro Martinez.
I'm not going to try to sort all this out at once, but there's one aspect
of closer performance that has been tugging at my mind for a while. It's
the messy save.
The closer comes in to start the 9th with a two-run lead. He walks the
leadoff hitter, gets a strikeout, gives up a single, gets the second out
on a long fly ball, then blows away the last hitter with high heat.
Mission accomplished. The team wins. The closer notches another save.
But the fans are ticked off because it wasn't clean enough:
"I almost had a heart attack! The tying run was on base. The potential
winning run was at the plate. For the money we pay these guys, they ought
to be able to retire the side in order, wouldn't you think?"
So I decided to look at one-two-three innings. How often do they happen?
Are they more common for starters or relievers? How much different is
the 9th inning? Does the score matter?
I've been playing around with this analysis, but there's still some work
to be done. Time permitting, I'll turn this into a full article sometime.
For now, here are some of the early findings based on the period from
1998 to 2002:
- approximately 29% of all innings were 1-2-3 innings
- there wasn't much of a difference between starting pitchers (29.1%)
and relievers (28.4%)
- when a team was leading by 1 to 3 runs in the 7th inning or later,
1-2-3 innings were registered 29.4% of the time
- the highest rates of 1-2-3 innings were recorded in the 9th. Starting
pitchers polished off their foes in order 31.3% of the time while relievers
posted a 30.4% mark.
There is a large amount of selection bias here -- the guys who get to
start the 9th inning in a close game are not average pitchers, they're
much more likely to be established closers or starters with names like
Johnson, Schilling, or Martinez.
- when a team entered the 9th inning with a lead of one to three runs,
the rate of 1-2-3 innings was 32.5%, or just less than one third. Again,
the top starters and closers get most of these opportunities, so there
is some bias here.
In other words, if you can get through an inning without allowing a baserunner
about 1/3 of the time, you're doing pretty well. Of course, that means
that if you've been asked to protect a 1-run or 2-run lead, you're going
to bring the tying or winning run to the plate more often than not. And
some fans might think you've come up a little short even if you get out
of the jam with the lead intact.
Here are the top five 1-2-3 performances in 2002 by pitchers when starting
the 9th inning with a lead of 1 to 3 runs (min 20 innings):
Inn 123 Pct
Isringhausen 31 16 .516
Smoltz 51 25 .490
Guardado 48 23 .479
Percival 34 16 .471
Wagner 35 16 .457
The bottom five:
Inn 123 Pct
Irabu 20 5 .250
Alfonseca 25 6 .240
Kim 32 6 .188
Acevedo 29 5 .172
Yan 25 4 .160
And some middle-of-the-pack guys of note:
Inn 123 Pct Koch 47 17 .362 Urbina 41 14 .341 Nen 45 15 .333 Julio 26 7 .269 Benitez 34 9 .265
Staying with these 9th inning, small-lead situations, but looking at
the non-closers:
- Steve Karsay (6 for 9), Damaso Marte (5 for 9), and Scott Williamson
(4 for 8) were impressive in limited opportunities
- TJ Tucker (0 for 6); Steve Kline and Dan Smith (0 for 4); Danys Baez,
Mike Stanton and Mike Myers (1 for 6); Matt Herges (1 for 8); Bob Wickman
(4 for 19); and Braden Looper (3 for 14) didn't inspire as much confidence,
at least in terms of 1-2-3 innings
The bottom line is that fans shouldn't be surprised when the other team
threatens in a save situation. It's not easy to get a 1-2-3 inning, and,
more often than not, even the best closers are going to give the other
guys a chance to tie or win the game on one swing.
Distributing League Databases
We got a call this morning from someone who had joined a DMB league and
wanted to buy the DMB game. Because the league is based on the 2002 Season
Disk, we said he'd need to buy the game and the season disk in order to
play in the league. He said the league commissioner had told him that
all he needed was the game because the stats and ratings would be distributed
by the league.
Whether he knew it or not, the commissioner was wrong. Our license agreement
clearly states that it is a violation of copyright law to distribute a
league database to members who do not already own the related season disk(s).
We spend a lot of time and money to acquire and evaluate the play-by-play
data needed to put our annual season disks together. Drafting new rosters
doesn't change the fact that a league database is full of stats and ratings
that are copyrighted by Diamond Mind and STATS, Inc.
Most league commissioners understand this and work with us to make sure
they're not inadvertently sending copies of our products to league members
who aren't entitled to receive them. Obviously, as is evident from that
phone call, there are some people who either aren't aware or don't care.
We have always tried to deal honestly with our customers. For example,
we offer a money-back guarantee so you can return products that don't
meet your needs. We provide free technical support because we feel that
anyone who spends their hard-earned money with us deserves our help when
they need it. And we have resisted using copy-protection schemes because
they inconvenience the many honest customers who simply want to make backups
or do reasonable things like install the game (for their personal use)
on a home computer and a laptop at the same time.
To the majority of you who have always respected our hard work and our
legal rights, we will always be grateful for your business and your support.
To others who might be violating our rights by distributing league databases
to non-owners of the relevant season disk, either knowingly or unknowingly,
please do the right thing.
|