A Review of my March Madness Models

As many of you know, March Madness is my favorite time of the year. I pour countless hours into inputting data, creating models, and filling out brackets. This year was no different. I set a personal high in models created: 24, and I am happy to report, this was a pretty successful year and I am proud of the results. This article will be a review of my models as well as my general takeaways from following the tournament. Before beginning, below is the rankings of my models. There are two main ways to track a model’s success: correctly predicting games (regardless of the bracket), and accurately predicting the bracket. Each model’s final ranking was an average of their ranking in Total Wins and in Bracket Points.

The #1 Model for the 2026 March Madness tournament was AFT. This model led the total wins category by correctly predicted 52 out of 63 tournament games (as well as all 4 First Four games, which are not included in this tally). In addition to its success correctly predicting matchups, it finished in third in the bracket points category. It is both encouraging and slightly disappointing that AFT was the best this year, because it is my oldest model and I had high hopes for many of my newer models. Although, it is nice to know that this model has stood the test of time and continues to be successful. It has been above 70th percentile in three of the last four years, and above the 95th percentile in two of the last four years. 52 tournament wins is also a record for any model I have ever created.

Perhaps unsurprisingly, all of the top three finishers correctly predicted Michigan to win the championship, but what is surprising is that each of the top three models represent three of the four different categories of models:

AFT is what I like to call a Power Ranking. It uses a variety of team statistics and gives a single number as its result. When predicting matchups, the team with the higher number is projected to win the game. Power Rankings do not factor in opponent. Other examples of Power Ranking models are TOA Margin, Solidity Score (FF), and ZSCORE Sum.
4 Way Game Score is a Resumé Ranking. This model uses a team’s results in the regular season and gives a single number as its result. This number represents how well a team performed in the regular season. All of the kWins models are also Resumé Rankings.
AFT Matchup is a Matchup Model. This type of model uses a variety of team statistics to predict the result in a given game. This type of model does factor in the opponent, and teams’ scores will change based on the opponent. Other examples of Matchup Models are XPoint, FourPoint, and FACTOR.
The models COMBO 1.0 and 3.0 are what I call Aggregate Models. These models use the results of other models to predict games. In this case, COMBO 1.0 uses AFT, ZSCORE SUM, and XPoint Standard SOS, to predict which team will win a game. For COMBO 1.0, whichever team is predicted to win in at least 2 of the 3 models is selected to advance. These models are successful because they tend to smooth out oddly predicted results from one model. Other examples of Aggregate Models are BC-Esque and Karn Composite.

Defense Wins Championships… and Brings Success at Predicting Games

This is perhaps the biggest takeaway I had from looking at this year’s results. Each of the models that factored in defense outperformed their solely offensive-minded counterparts. For example, XPoint ptSOS Def was the highest of three XPoints models, and the only one that used defense to predict games. The other example of this was FourPoint Def outperforming FourPoint. It is clear that using defense to help predict games gives a more complete picture of how the game is going to turn out. I will definitely be utilizing this moving forward, and ensure that all matchup models have at least some form of defense factored into their prediction.

Farewell!

It is time to say goodbye to these models. Whether they underperformed this year, or simply other models do essentially the same thing, these models will not return for next year.

ZSCORE Sum. This model will be retired after its fourth season, due to its lack of good results over its career. After performing well in an upset-riddled tournament in 2023, it has put up 3 straight years of under 50th percentile brackets. In its four-year run, it never reached 45 wins. The z-score framework may be back, but with different statistics, and a new name.
kWins 50/50 and Gradient. While both of these are solid models, I feel the kWins AL10N is a more encompassing version of them. It factors in the quality of a team’s last 10 games, and adjusts their wins accordingly. Additionally, it uses the same 25/50/75/100 framework for Quadrant 1-4 games as Gradient, and thus makes Gradient unnecessary.
Solidity Score (Fraudulent Formula). Unfortunately this model simply underperformed. After using the last 5 years worth of first round upsets to try to predict this year’s, it just fell flat. It is possible that this was due to the relative lack of first round upsets compared to years past. However, it just feels like this was a swing and a miss.
XPoint Standard SOS. After years of using this Strength of Schedule metric in my XPoint calculations, I feel that it is time to retire the “Standard SOS” method. This is back-to-back years now that the “ptSOS” method has dominated the Standard method. Thus it is time for me to move on from it.

Requires Further Research

One big takeaway I had from looking at the Final Rankings Table was the massive discrepancy between TOA Margin and TOA Matchup. They use the same statistics, but TOA Margin is a Power Ranking model and TOA Matchup is a Matchup Model. This is the first time that two connected models have had such a wide gap between their results.
Another thing that always requires more research is how to properly account for the different strength of schedules. This is likely a never ending issue, as long as there are teams like Siena, High Point, and Miami (OH) that make the tournament, there will be a need to try to, as accurately as possible, adjust season statistics based on strength of schedule.
- I may try to attack this at the game-by-game level next year, rather than adjust season statistics. This would require far more work and effort, but it might be worth it.

Conclusion

This year’s March Madness was spectacular, as it always is. And, as was the case for every prior year, the attempt for a perfect bracket came up as empty as Tyler Tanner’s game-winning half-court heave against Nebraska in the Round of 32. While I may never get a perfect bracket, this year was definitely more encouraging than some of the previous years. 19 out of 24 models finished above the 50th percentile in ESPN. 21 out of 24 models correctly predicted the winner in 45 out of 63 games. Both of these are tremendous improvements. But what remains to be seen, is whether this year was simply a fluke, or the beginning of an upwards trend for the overall success of my models.

Recent Articles

Like this:

One response to “A Review of my March Madness Models”

Leave a ReplyCancel reply

Recent Articles