2017 CFB Bowl Predictions

CFB Bowl Predictions
college football
R
python
Author

Meyappan Subbaiah

Published

December 20, 2017

Last year at work, I was part of the college bowl pick’em. Nothing new, I used to do the CFB bowl pick’ems with my college roommates and others. But last year, I decided to scrape the data and use different Machine Learning (ML) algorithms to predict winners.

Note: I treated this as a classification problem, 1 - Win, 0 - Lose. For those curious about the ML algorithms and the parameters selected, there is a follow-up post.

Bowl Predictions

Prediction accuracies will be updated as bowl season progresses. I’m going to include 4 different set of predictions: 1. Personal picks, 2. Algorithm 1 - Random forest, 3. Algorithm 2 - XGBoost, 4. Algorithm 3 - Support Vector Machines (SVM)

Let’s look at what each algorithm thinks is important. These were all the possible stats available for all 128 teams in NCAA-FBS (Division I).

##  [1] "3rd Down Conversion Pct"         "3rd Down Conversion Pct Defense"
##  [3] "4th Down Conversion Pct"         "4th Down Conversion Pct Defense"
##  [5] "Fewest Penalties Per Game"       "Fewest Penalty Yards Per Game"  
##  [7] "First Downs Defense"             "First Downs Offense"            
##  [9] "Kickoff Returns"                 "Net Punting"                    
## [11] "Passing Offense"                 "Passing Yards Allowed"          
## [13] "Punt Returns"                    "Red Zone Defense"               
## [15] "Red Zone Offense"                "Rushing Defense"                
## [17] "Rushing Offense"                 "Scoring Defense"                
## [19] "Scoring Offense"                 "Team Passing Efficiency"        
## [21] "Team Passing Efficiency Defense" "T.O.P"                          
## [23] "Total Defense"                   "Total Offense"                  
## [25] "Turnover Margin"

Algorithm 1 - Random Forest vs. Algorithm 2 - XGboost

Random Forests (RF) are an ensemble learning method using decision trees. The model has the capability to select and identify the important variables. XGboost is an extreme gradient boosting method applied to decision trees. Similarly to RF it also has the capability to identify important variables.

Let’s see what we’ve got here.

Before we dive into this, the models were run to classify/predict whether the home team wins. Above we see the top 10 stats that each model thinks is important to predict home team wins. They both tend to cover the same spectrum, interestingly enough XGB doesn’t consider the home_turnover_margin in the top 10 variables.

I’d say overall, it tends to do a good job since it captures (1) turnovers, (2) offensive capability, (3) special teams (field position), and (4) clutch conversions (third down %). You can also argue/point out that it captures defense since turnovers and the number of total yards the other team gains.

For Support Vector Machines (Algorithm 3 - SVM), they are a bit more complex, and hence don’t provide straight forward variable importances. If you want some more detail on this, look for the follow up post.

Anyways let’s take a look at the predictions.

Predictions

Away Team Home Team Algorithm 1 A1 - Confidence Algorithm 2 A2 - Confidence Algorithm 3 A3 - Confidence Actual
Troy North Texas North Texas 0.74 North Texas 0.99 North Texas 0.77 Troy
Georgia St. Western Ky. Western Ky. 0.02 Georgia St. 0.11 Western Ky. 0.44 Georgia St.
Boise St. Oregon Oregon 0.81 Oregon 0.99 Oregon 0.66 Boise St.
Marshall Colorado St. Colorado St. 0.83 Marshall 0.88 Colorado St. 0.79 Marshall
Fla. Atlantic Akron Fla. Atlantic 0.77 Fla. Atlantic 0.97 Fla. Atlantic 0.71 Fla. Atlantic
SMU Louisiana Tech SMU 0.19 SMU 0.79 Louisiana Tech 0.05 Louisiana Tech
Temple FIU FIU 0.68 Temple 0.50 FIU 0.37 Temple
UAB Ohio Ohio 0.60 Ohio 0.99 Ohio 0.64 Ohio
Wyoming Central Mich. Central Mich. 0.36 Central Mich. 1.00 Central Mich. 0.75 Wyoming
South Fla. Texas Tech South Fla. 0.30 South Fla. 0.68 South Fla. 0.34 South Fla.
Army West Point San Diego St. Army West Point 0.19 Army West Point 0.37 Army West Point 0.38 Army West Point
Appalachian St. Toledo Toledo 0.54 Appalachian St. 0.84 Toledo 0.43 Appalachian St.
Fresno St. Houston Houston 0.80 Houston 1.00 Houston 0.73 Fresno St.
West Virginia Utah Utah 0.39 West Virginia 0.98 West Virginia 0.07 Utah
Duke Northern Ill. Northern Ill. 0.65 Northern Ill. 0.96 Northern Ill. 0.45 Duke
UCLA Kansas St. UCLA 0.80 UCLA 0.99 UCLA 0.35 Kansas St.
Florida St. Southern Miss. Southern Miss. 0.50 Southern Miss. 0.96 Southern Miss. 0.49 Florida St.
Boston College Iowa Boston College 0.37 Boston College 0.41 Boston College 0.09 Iowa
Arizona Purdue Arizona 0.27 Arizona 0.44 Arizona 0.79 Purdue
Texas Missouri Missouri 0.75 Missouri 1.00 Missouri 0.84 Texas
Virginia Navy Navy 0.90 Navy 1.00 Navy 0.92 Navy
Oklahoma St. Virginia Tech Oklahoma St. 0.33 Oklahoma St. 0.82 Oklahoma St. 0.43 Oklahoma St.
Stanford TCU TCU 0.50 TCU 0.99 TCU 0.54 TCU
Michigan St. Washington St. Washington St. 0.48 Washington St. 1.00 Washington St. 0.51 Michigan St.
Wake Forest Texas A&M Texas A&M 0.16 Wake Forest 0.55 Texas A&M 0.19 Wake Forest
Kentucky Northwestern Northwestern 0.69 Northwestern 0.99 Northwestern 0.34 Northwestern
New Mexico St. Utah St. Utah St. 0.52 New Mexico St. 0.94 Utah St. 0.23 New Mexico St.
Ohio St. Southern California Southern California 0.29 Ohio St. 0.50 Ohio St. 0.17 Ohio St.
Louisville Mississippi St. Mississippi St. 0.26 Mississippi St. 0.97 Louisville 0.19 Mississippi St.
Iowa St. Memphis Memphis 0.39 Memphis 1.00 Memphis 0.79 Iowa St.
Washington Penn St. Penn St. 0.01 Penn St. 1.00 Penn St. 0.37 Penn St.
Miami (FL) Wisconsin Wisconsin 0.82 Wisconsin 1.00 Wisconsin 0.78 Wisconsin
Michigan South Carolina Michigan 0.42 Michigan 0.64 Michigan 0.04 South Carolina
Auburn UCF UCF 0.01 UCF 0.98 UCF 0.14 UCF
Notre Dame LSU Notre Dame 0.16 Notre Dame 0.68 Notre Dame 0.34 Notre Dame
Oklahoma Georgia Georgia 0.30 Oklahoma 0.89 Oklahoma 0.13 Georgia
Clemson Alabama Alabama 0.00 Alabama 0.99 Alabama 0.18 Alabama

Note, the confidence of each prediction is also provided using the probability the model provided in predicting a home win or away win. I’ll update this

I’m quite unhappy with a few picks: (1) UCLA over Kansas St, I’m a huge Bill Synder fan. (2) TCU over Stanford, picking against the ground game of Stanford? (3) Also the confidence in Bama beating Clemson, scares me. 99% for XGB? JEEZ

I wish I had some conference and scheduled based statistics as well. I’d love to incorporate SOS, power-5 opponents, etc. There are probably a million different things that can be useful. As I continue this yearly, I’ll look for more statistics to include. Let me know if you think there are any that I should consider!

Final Results

Algorithm 1 - 16/37 = 43.2%
Algorithm 2 - 21/37 = 56.7%
Algorithm 3 - 15/37 = 40.5%

None of these are truly fantastic. I would like to break the 60% threshold. Just like last year, there’s a lot left to learn going forward. I’ll take a stab at this again for the 2018 bowl season. So long folks!

Congrats to Alabama! TUAAAAAAAA!