Model Rework Part I (The c-value)

 Introduction

The model I had used for the 2021 AFL season was not satisfactory for the beginning of this season, and hence I made a minor adjustment which significantly improved performance. However, I am still not achieving the results which I desire. Therefore, before the 2023 AFL season in March; I will take a similar process before the 2023 AFL women's competition as well; I will be documenting my progress in my model rework in a long multi-part series of blog posts.

Philosophy

As the name of the blog suggests, I will continue to use glicko rating (an elo-based system) as the backbone of my model. For the first section of this series, I will be focusing on the team-based model, fully optimising this will allow for a far better player-based model. There are a variety of different parameters which need to be accounted for in the team-based model, and each post in this series will deal with one of them. However, it is important to note that the optimised values for these parameters may change when new parameters are created, so at the end of each post, the findings will include values for parameters which may have been discussed in previous parts. The test are performed to optimise log probability score, a test for the probabilities estimated by the model. A post on optimal margin prediction will come later.

Methodology

I tried two tests for the c-value (controls how much the ratings change after each game), one which considered a constant c-value, and one which adjusted for particular parts of the season. For the latter, I split the home-and-away season in 30%-30%-30%-10% splits, and the finals series were a separate part.

Findings

Splitting the season up into several parts made the model perform better than using a constant c-value. The parameters are named, unsurprisingly; cval1, cval2, cval3, cval4, and cval5. The optimal values for these parameters are.

cval1 = 31.58822
cval2 = 14.39661
cval3 = 12.79934
cval4 = 10.49027
cval5 = 17.84442

log probability score per game = +0.0997.

One reason for the much higher values for the finals series may be due to the lack of regression before the start of each new season (we will explore this later). The results clearly show that having a high rating change early in a season allows for better predictions later in the season, which follows the conventional wisdom. The cval4 parameter is surprisingly low, perhaps due to the high amount of dead rubbers in the final 10% of the home-and-away season.

Final Remarks

The next blog post will be exploring what is the best way to convert a match results from the ternary 0, 0.5, and 1 results, to a continuous range on the interval [0,1]. Both margin, scoring shots, and a combination of the two will be used to find which performs best; I have a suspicion it is the combination of margin and scoring shots. I do sometimes stream on twitch while coding, I am a horrendously slow coder, but fast thinker, so I get frustrated very easily. Thanks for reading, stay tuned.

Comments

Popular posts from this blog

For Squiggle Bot 2023

AFL: Round 19 2022