|
Carpenter, Rivera to Take 2005 Cy Young Awards, Pathematicians Predict
Husband-and-wife team combine love of baseball, math to predict
sportswriters’ voting
results
[Update added November 10: The mathematicians' model turned out
to be more accurate than even they thought:
Read more here]
November 3, 2005—Pitchers Chris Carpenter of the St. Louis Cardinals
and Mariano Rivera of the New York Yankees will win the 2005 Major League Baseball
Cy Young awards, predicts a pair of mathematicians from Rhode Island College.
The actual winners, intended to represent the most outstanding American League
and National League pitchers during the regular season, will be announced November
8 (AL) and 10 (NL) by the Baseball Writers’ Association of America, whose
members vote on the award.
Mathematicians Rebecca Sparks and David Abrahamson, a husband-and-wife team
who teach at Rhode Island College, have developed a formula that predicts which
pitchers will place first through third in Cy Young voting. The researchers structured
their formula to predict the voting results for starting pitchers, who almost
always win the award, rather than relief pitchers, who are rarely the recipients.
However, their formula reveals a lack of standout American League starting pitchers
this year, suggesting that the AL award will go to relief pitcher Mariano Rivera
for his extraordinary 2005 season.
Sparks and Abrahamson presented their model in the April 2005 issue of Math
Horizons, a magazine published by the Mathematical Association of America (MAA).
Abrahamson will discuss the model in a talk about math and sports at a regional
MAA meeting to take place at the University of New Hampshire on November 18 and
19, 2005.
Every season, the baseball writers’ association selects two sportswriters
from every city in the major leagues to vote for a first, second and third place
choice. The ballots are due right after the regular season ends. “The identities
of the voters change frequently,” Sparks and Abrahamson write in their
Math Horizons article, “but we will see that their voting results follow
a predictable course.”
The pair took an extremely pragmatic approach in developing a method to forecast
Cy Young winners. They did not consider which pitchers should win the award,
or which qualities were most important in a pitcher. They simply aimed to develop
a mathematical formula that would best match the voting results.
Their formula computes a score for each pitcher on a scale from roughly 0
to 10. For their formula to be successful, it must yield the top score in a particular
season to the pitcher who places first in Cy Young voting, the next-highest score
to the player who places second, and the third-highest score to the player who
places third.
To calculate the scores, they first chose four key pitching statistics: wins,
losses, strikeouts, and ERA (earned run average, which is the average number
of runs that the pitcher is responsible for giving up per 9 innings of play).
They also included a fifth statistic, the winning percentage of the pitcher’s
team, as they thought that it influences the voting results.
But the main question, according to the two researchers, is how much importance
the voters placed on each of those five categories. Do voters, consciously or
unconsciously, generally value a pitcher’s number of wins more than his
number of strikeouts? Does a pitcher on a first-place team really have a better
chance of winning the award than a pitcher with slightly better stats on a last-place
team?
The tools of mathematics can answer this seemingly subjective question. First,
the researchers looked up the statistics in those five categories for starting
pitchers between 1993 and 2002 and compared them to the Cy Young voting results
for those years.
Then, to determine the relative importance of each of the five categories
in the voting results, they turned to a mathematical method, dating to the 1940s,
called linear programming. First developed by economists (who won the Nobel Prize
for work that employed it) and mathematician George Dantzig, the idea is to find
the missing numbers (in this case, the relative importance or “weight” of
each pitching category in the voting) in order to satisfy certain constraints
(i.e., a formula that would correctly yield the first- through third-place results
for Cy Young balloting).
Analyzing the 1993 to 2002 data, they concluded that a pitcher’s number
of wins carried almost three times as much weight in the voting as his earned
run average. ERA, in turn, was about one-and-a-half times more important than
strikeouts, and about twice as important as the winning percentage of the pitcher’s
team. Almost completely insignificant, according to the model, is a pitcher’s
number of losses; they seemed to have very little bearing on the voting results.
By taking each pitcher’s statistics in these five categories and adjusting
their values according to these relative weights, the researchers’ formula
correctly yielded all but one of the first-, second- and third place vote-getters
in each league from 1993 to 2002. Recently, they incorporated the data for the
2003 and 2004 seasons into the model, and predicted three out of four Cy Young
winners (the fourth was a reliever). By looking at the 2003 and 2004 statistics,
they again found that the relative weights of the five categories were almost
exactly the same as in the earlier data.
Using their formula, the researchers come up with the following predictions
for the first three places in the 2005 National League voting:
- Chris Carpenter, St. Louis (6.4257 points)
- Dontrelle Willis, Florida (6.3420)
- Roy Oswalt, Houston (5.9064)
According to Abrahamson, it is possible that voters may drift away from their
past behavior by voting for Roger Clemens or Andy Pettitte ahead of Roy Oswalt
this year.
Clemens and Pettitte are generally better known veterans who may have a somewhat
higher profile in the news media than Oswalt.
In the American League, the top starters in their model are, in order,
- Bartolo Colon, LA/Anaheim (5.8074)
- Johann Santana, Minnesota (5.3671)
- Jon Garland, Chicago (5.0730)
The model shows that there is no standout starter in the American League this
year. Bartolo Colon, the top starter according to their model, has a total score
of less than 6, a far cry from many AL Cy Young award winners in years past,
such as Barry Zito (6.75, 2002) and Pedro Martinez (7.54, 1999).
“Our model quantifies the fact that there is no AL pitcher who will
knock the voters’ socks off,” says Abrahamson. Therefore, Sparks
says the two are “very confident” that the AL Cy Young Award will
go to Mariano Rivera, a relief pitcher who had a particularly outstanding year.
A Cy Young for Rivera, they say, would also serve as a kind of “lifetime
achievement award” as Rivera, who has never earned the award, is likely
toward the end of a very distinctive career.
The researchers think that their mathematical approach, known generally as “constrained
optimization,” might work for other sports awards, such as the most valuable
player in various leagues. It also might help provide insights into how magazines
rank corporations, or top colleges. But the point of their approach, they say,
is to show how the methods of mathematics can apply in many unexpected everyday
situations.
“The moral is always the same for the mathematical modeler,” they
write in their Math Horizons article. “More often than we may know, there
is a pattern out there. We just have to keep thinking creatively, and we have
got a good chance of finding it.”
Reference:
Rebecca L. Sparks and David L. Abrahamson,
“A Mathematical
Model to Predict Award Winners,” in Math Horizons,
April 2005.
For More Information:
Rebecca Sparks, 401-456-9881, rsparks@ric.edu
David Abrahamson, 401- 456-9862, dabrahamson@ric.edu
Department of Mathematics and Computer Science
Rhode Island College
Contact:
Ben Stein, 301-209-3091, bstein@aip.org
Martha Heil, 301-209-3088, mheil@aip.org
American Institute of Physics
|
|