D B M

Posts

Building Retrosheet Database on a Mac

Okay, this post is going to be a long one, but by the end of it, you will have created yourself a Retrosheet database for MySQL or MariaDB. There is a post similar to this on Beyond the Boxscore called Saberizing a Mac #9: Retrosheet (part1); however, it is severely lacking details on how to get the database actually set up your Mac.

Installing Homebrew

In order to parse the .EVN and .EVA event files from Retrosheet, we need to install the Chadwick tools. Before we can do that we need to install Homebrew which is a package manager for OSX.

Download XCode for Mac

You can install XCode directly from the Mac App Store. We need this to install the command-line tools.

Get the Command-Line Tools

Open up your terminal which is located in /Applications/Utilities/, and in the terminal run the following command.

$ xcode-select --install

Install Homebrew

Now that we have the command-line tools, we can finally download and install Homebrew. Keep your terminal app open and run:

$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Once, it is installed run brew doctor and if you get the message Your system is ready to brew you are golden. If you do not, this post is an excellent troubleshooting resource.

Install MariaDB

If you already have MySQL or MariaDB installed on your computer, you can skip this section and move onto installing Chadwick. Once we have homebrew installed we can build MariaDB which is a fork of MySQL that has better performance. Reasons for choosing MariaDB over MySQL can be found here. To install we run:

$ brew install mariadb

Once that is completed run:

$ unset TMPDIR
$ mysql_install_db

You can make it so every time you log in to your computer the MariaDB/MySQL server starts up, but I prefer to start it manually any time I need it. To start it all you have to do is run in your terminal.

$ mysql.server start

Once you start up the server, you logging in is very easy.

$ mysql -u root

Install Chadwick

A couple weeks ago I was finally able to get Chadwick added to Homebrew. This makes installing the it so much easier on a Mac.

$ brew install chadwick

Chadwick is an open-source program that allows one to parse the event files from Retrosheet. Suppose I have the event files from the 2014 season, which can be downloaded directly from Retrosheet. I then unzip the files and navigate into the directory in which it was unzipped.

Read more

Atlanta Braves WAR Comparison

The Atlanta Braves have had an interesting season to say the least. Coming into this year they traded away many of their best performers from last season. This led a lot of the pundits (including myself) to conclude the Braves were going to struggle mightily this season. So far we wrong as the Braves sit at 29-30 and just 2.0 games back as of June 10. Their record would undoubtably be better if their relievers had not blown a few games this year. This is evident by the fact their relief pitching ranks 29th in RA/G (4.41), and in the table listing WAR values below shows most of the Braves’ players which have been performing below replacement level are relievers.

The Comparison

The table below compares how the players for the Atlanta Braves have done with regards to Wins Above Replacement. Baseball-Reference.com WAR is denoted by bWAR, FanGraphs.com WAR by fWAR, and the open-source version of WAR (openWAR) that uses R and the MLB Gameday API is denoted by oWAR. The repository for openWAR can be found on GitHub. The academic paper written by the creators Ben Baumer and Gregory Matthews, which explains openWAR better than I am able to, can be found here. bWAR and oWAR include games up to June 9th and fWAR contains games up to June 10th. Also, the table is sortable.

Player Position bWAR fWAR oWAR
Freddie Freeman 1B 1.65 1.7 2.95
Cameron Maybin CF 0.93 1.3 2.1
Jace Peterson 2B 1.13 1.1 1.74
Shelby Miller SP 2.95 1.2 1.73
Alex Wood SP 1.32 1.6 1.39
Nick Markakis RF 0.59 1.1 1.25
A.J. Pierzynski C -0.44 0.5 1.03
Kelly Johnson UT 0.54 0.7 1.02
Andrelton Simmons SS 1.52 1.4 0.85
Todd Cunningham OF 0.44 0.4 0.61
Jonny Gomes LF -0.31 -0.5 0.59
Luis Avilan RP 0.39 0.3 0.53
Chris Johnson 3B -0.24 -0.1 0.35
Brandon Cunniff RP 0 0.1 0.34
Williams Perez SP 0.56 0.5 0.28
Michael Kohn RP 0.18 0 0.23
Juan Uribe 3B 0.29 0.2 0.23
Mike Foltynewicz SP -0.06 0.5 0.16
Phil Gosselin UT 0.27 0.3 0.13
Jim Johnson RP 0.07 0.1 0.12
Jason Grilli RP 0.01 0.4 0.11
Eric Young CF -0.59 -0.6 0.04
Andrew McKirahan RP 0 0.1 0.04
Pedro Ciriaco UT 0.12 -0.1 0
Juan Jaime RP -0.04 -0.1 -0.07
Nick Masset RP -0.26 -0.3 -0.08
Cody Martin RP -0.31 -0.2 -0.09
Sugar Marimon RP -0.18 -0.1 -0.12
Joey Terdoslavich OF -0.1 0.1 -0.17
John Cornely RP -0.13 -0.1 -0.3
Christian Bethancourt C 0.35 -0.2 -0.3
Donnie Veal RP -0.41 -0.4 -0.39
Julio Teheran SP -0.6 -0.3 -0.73
Trevor Cahill RP -0.77 -0.1 -1.17

The Code

I gathered all this information using the R programming language and RStudio. I will walk you through how I performed the operations.

First, we must install the openWAR and openWARData R packages and then load them, and we will need the dplyr package as well.

devtools::install_github("beanumber/openWAR")
devtools::install_github("beanumber/openWARData")
library(openWAR)
library(openWARData)
library(dplyr)

After we have all the packages we need installed, we can download our play by play data using openWAR, and then make the open-source version of WAR. It must be noted this will take a little while depending on your internet conncection.

MLBAM2015 <- getData(start="2015-04-05", end="2015-06-09")
# make oWAR
ds <- makeWAR(MLBAM2015)
# tabulate oWAR
oWAR <- getWAR(ds$openWAR)

Next, we can download the data from Baseball-Reference using the getrWAR() function from the openWARData package. To get the FanGraphs WAR we can use the getfWAR() function I wrote. Also, let’s go on and download playerID map from Crunch Time Baseball.

rWAR <- getrWAR()
fWAR <- getfWAR(2015)
id <- read.csv("http://crunchtimebaseball.com/master.csv")
# We only need the mlb_id, bref_id, and fg_id 
id <- select(mlb_id, bref_id, fg_id)

Now that we have all the data, it is just a matter of putting it all together.

# filter rWAR so that only this year's Atlanta Braves WAR is included
rWAR <- filter(rWAR, yearId == 2015, teamId == "ATL")
# merge rWAR and playerIDs
braves <- merge(rWAR, id, by.x="playerId", by.y = "bref_id")
# merge braves and fWAR
braves <- merge(braves, fWAR, by.x="fg_id", by.y="playerId")
# merge braves and oWAR
braves <- merge(braves, oWAR, by.x="mlb_id", by.y="playerId")
# finally select only the columns that are interesting 
# you do not have to do this, it just makes the data cleaner
braves.WAR <- select(braves, Name, rRAA_bat, rRAA_field, rRAA_pitch, rWAR, 
                            fRAA_bat, fRAA_br, fRAA_field, fWAR_pitch, fWAR, 
                            RAA.br, RAA.off, RAA.field, RAA.pitch, WAR )

Read more

Comparing the Jason Heyward/Shelby Miller Trade using openWAR, bWAR, and fWAR

For those who are not familiar with openWAR, it is a package for the data and statistics programming language R created by Ben Baumer and Gregory Matthews. It allows us to download game data from the MLBAM GameDay web application. Using an open-source implementation of wins above replacement. More information on how to use openWAR can be found at the Exploring Baseball Data With R blog.

The Trade

One of the biggest moves this offseason was when Jason Heyward and Jordan Walden were traded from the Atlanta Braves to St. Louis Cardinals for Shelby Miller and minor league prospect Tyrell Jenkins. Obviously there were many differing opinions on the quality of the trade from the Atlanta Braves point of view. I understood the trade, but 1) did not completely believe the Braves front office saying Heyward had no interest in extension talks and 2) really had this kid-like love for Jason Heyward. However, these biased views of mine aside, I thought the trade would turn out well for Atlanta.

So far, my gut feeling has been right with Jason Heyward’s struggles so far this season. Walden has been good for the Cardinals, but is currently expected to miss 6-10 weeks with a muscle strain in his shoulder. As much as Heyward has struggled thus far, Shelby Miller has thrived. In his last start, Miller was one out away from a masterful no-hitter against the Miami Marlins. Now that we have that out of the way, let’s look at how the three different implementations of WAR say about the trade.

Player openWAR bWAR fWAR
Shelby Miller
1.52 2.27 1
Jason Heyward
0
0.18 0.2
Jordan Walden
0.56 0.56
0.4

Read more

An Example of Iteration of Strictly Dominated Strategies

Suppose we have a game that has two players, Player 1 and Player 2. Player 1 has two choices, Up \((U)\) or Down \((D)\), and Player 2 has three choices: Left \((L)\), Center \((C)\) and Right \((R)\). This game is represented by the normal form provided below. We are going to attempt to find the set of rationalizable strategies using an iterated-dominance procedure.

normal form of game

Iteration 1

In the normal form above, notice that if Player 2 chooses strategy \(C\), they will always receive a lower payoff than if he or she had chosen strategy \(L\). We can say that strategy \(C\) is strictly dominated by \(L\) because,

Therefore, if Player 2 is rational, he or she will never choose strategy C. We can show the elimination of this strategy by updating the normal form of this game.

normal form after first iteration

Iteration 2

After we have eliminated strategy C for Player 2. Strategy U is strictly dominated by D for Player 1 since,

So if Player 1 knows that Player 2 is rational, then Player 1 knows that Player 2 will never choose strategy \(C\). Therefore, if Player 1 is rational, he or she will never choose strategy \(U\). The normal form for this game now shows that strategies \(U\) and \(C\) have been eliminated.

normal form after second iteration

Read more