College Football Model

Motivation

There are already strong public college football models, but I wanted to understand how they are actually built. The best way to learn that was to build one from the ground up, control the data pipeline myself, and iterate on the modeling choices directly.

Project Overview

Engine: A college football analytics and simulation engine that ingests play-by-play and game data, stores it locally in SQLite, computes opponent-adjusted team ratings, and produces predictive game distributions. It supports real scheduled games, hypothetical matchups, weekly snapshots throughout the season, and renderer-facing exports into Postgres for a web app.
Frontend / Web App: A React + TypeScript web app backed by a renderer-oriented Postgres database. It is designed to surface rankings, game predictions, matchup previews, team pages, and hypothetical spread boards without exposing the modeling internals directly to the user.

What the Model Does

Rates teams using play-level offensive, defensive, and special teams data.
Predicts real game outcomes.
Predicts hypothetical matchups between any two teams.
Produces matchup-specific outputs, not just one-dimensional power rankings.
Can therefore prefer A over B, B over C, and still see C as a difficult matchup for A in some cases.

Rating Architecture

The ratings backbone starts by turning plays into a small set of core metrics: pass success rate, rush success rate, pass explosive rate, rush explosive rate, and special teams / field goal ratings.

From there, the system builds team ratings through iterative opponent adjustment. L0 is raw. L1 and beyond are opponent-adjusted using the prior level, and after each level the rates are shrunk back toward priors instead of being allowed to move freely.

Those priors are currently driven by recruiting and talent-based fitted priors, optional previous-season blending, and configurable prior weights. There are also play-level context adjustments for home field, indoor/outdoor conditions, and optional game-state weighting and garbage-time filtering.

Weekly snapshots are leakage-safe, so ratings used for a game come from the snapshot before that game.
Only FBS-vs-FBS data is used in the core grading path.
The system is intentionally built to stay interpretable and avoid hardcoded team-specific logic.

Predictive Model Architecture

The current predictive engine is a custom in-house model implemented directly in Python rather than a framework-heavy neural net. The final game predictor does not use a PyTorch model or a scikit-learn model object. Instead, the core implementation lives in the codebase as a pure Python, NumPy-style score distribution model.

Architecturally, it has two pieces. The first is a mean model that predicts expected points for each team using engineered pregame features and a log-link style setup so scoring stays positive. The second is an uncertainty model that predicts score variance from the same feature set, which lets the engine control how wide or narrow a given game’s scoring distribution should be.

Those two outputs are then combined into an independent discrete normal score PMF for home and away teams. From that PMF, the model derives win probability, ATS probability, over/under probability, median scores, and expected scores. It is not predicting the winner directly. It predicts the scoring environment first and then derives the rest from that distribution.

Active pregame features include offense pass and rush success, offense pass and rush explosive rate, opponent defensive success and explosive rates allowed, offense and defense overall ratings, special teams and field goal ratings, rush rate, pace, and home-field index.
All features are leakage-safe and come from the weekly snapshot prior to the game.
Training rows are symmetric, so each game is represented from both team perspectives.
The model is trained on historical seasons and evaluated holdout-style by leaving a season out.
No betting-line inputs are used in training.

Technical Challenges

The hardest part is still schedule adjustment and finding a stable way to rate teams without hardcoding around edge cases. College football makes that unusually awkward because schedules are uneven, conferences cluster heavily, early-season samples are noisy, priors matter much more in September than in November, and hypothetical matchup quality depends on the rating system being directionally right rather than just numerically sharp.

The biggest engineering challenge was finding a structure that stayed minimally hard-coded, leakage-safe, weekly usable, interpretable, and stable enough early in the season while still being flexible enough to drive both rankings and game predictions.

The second major challenge was calibration: getting the score model to produce coherent probabilities, reducing overconfident winner, ATS, and O/U outputs, and preserving score quality without cheating by using market lines as inputs.

Model Performance

The model is strongest as a team rating and winner prediction system. Across 2,398 FBS matchups from the 2023 through 2025 seasons, it picked the outright winner 71.7% of the time.

Against the spread, the model was 51.8% over that same sample. That is respectable, but I would treat it as a secondary check rather than the main purpose of the system.

A useful benchmark is the sportsbook favorite. From 2023 through 2025, the market favorite won 72.6% of the time, while the model picked the winner 71.7% of the time. That gap is not surprising because sportsbooks are absorbing more information, including things like injuries and weather.

Metric	Result
Winner accuracy	71.7%
ATS accuracy	51.8%
Vegas favorite benchmark on straight-up winners	72.6%
Sample size	2,398 games

One encouraging sign is that the model’s confidence behaves the way you would want it to. As confidence rises, winner accuracy rises with it.

Win Probability Bucket	Accuracy
50%-55%	53.1%
55%-60%	63.2%
60%-65%	65.4%
65%-70%	70.4%
70%-75%	76.92%
75%-80%	87.44%
80%-90%	89.43%
90%+	96.91%

Tech Stack & Deployment

Core stack: Python, modeling with NumPy, SQLite, Postgres, React, TypeScript, Next.js, Tailwind, Railway, and Vercel.
Modeling stack: Opponent-adjusted iterative ratings, recruiting-based priors, a custom feature-based score distribution model, a linear mean head with a log-link style setup, a feature-driven uncertainty head, and an independent discrete-normal score PMF for win, ATS, and total probabilities.
Live site: cfbprs.zubinjha.com