Statistical association football predictions - Printable Version +- Forums (http://haxcore.net/forum) +-- Forum: My Category (http://haxcore.net/forum/forumdisplay.php?fid=1) +--- Forum: My Forum (http://haxcore.net/forum/forumdisplay.php?fid=2) +--- Thread: Statistical association football predictions (/showthread.php?tid=406) |
Statistical association football predictions - FabioIneks - 02-06-2021 п»їPredicting Football Results With Statistical Modelling. Combining the world’s most popular sport with everyone’s favourite discrete probability distribution, this post predicts football matches using the Poisson distribution. David Sheehan. Data scientist interested in sports, politics and Simpsons references. Football (or soccer to my American readers) is full of clichГ©s: “It’s a game of two halves”, “taking it one game at a time” and “Liverpool have failed to win the Premier League”. You’re less likely to hear “Treating the number of goals scored by each team as independent Poisson processes, statistical modelling suggests that the home team have a 60% chance of winning today”. But this is actually a bit of clichГ© too (it has been discussed here, here, here, here and particularly well here). As we’ll discover, a simple Poisson model is, well, overly simplistic. But it’s a good starting point and a nice intuitive way to learn about statistical modelling. So, if you came here looking to make money, I hear this guy makes ВЈ5000 per month without leaving the house. Poisson Distribution. HomeTeam AwayTeam HomeGoals AwayGoals 0 Burnley Swansea 0 1 1 Crystal Palace West Brom 0 1 2 Everton Tottenham 1 1 3 Hull Leicester 2 1 4 Man City Sunderland 2 1. We imported a csv as a pandas dataframe, which contains various information for each of the 380 EPL games in the 2016-17 English Premier League season. We restricted the dataframe to the columns in which we’re interested (specifically, team names and numer of goals scored by each team). I’ll omit most of the code that produces the graphs in this post. But don’t worry, you can find that code on my github page. Our task is to model the final round of fixtures in the season, so we must remove the last 10 rows (each gameweek consists of 10 matches). You’ll notice that, on average, the home team scores more goals than the away team. This is the so called вЂhome (field) advantage’ (discussed here) and isn’t specific to soccer. This is a convenient time to introduce the Poisson distribution. It’s a discrete probability distribution that describes the probability of the number of events within a specific time period (e.g 90 mins) with a known average rate of occurrence. A key assumption is that the number of events is independent of time. In our context, this means that goals don’t become more/less likely by the number of goals already scored in the match. Instead, the number of goals is expressed purely as function an average rate of goals. If that was unclear, maybe this mathematical formulation will make clearer: represents the average rate (e.g. average number of goals, average number of letters you receive, etc.). So, we can treat the number of goals scored by the home and away team as two independent Poisson distributions. The plot below shows the proportion of goals scored compared to the number of goals estimated by the corresponding Poisson distributions. We can use this statistical model to estimate the probability of specfic events. The probability of a draw is simply the sum of the events where the two teams score the same amount of goals. Note that we consider the number of goals scored by each team to be independent events (i.e. P(A n B) = P(A) P(B)). The difference of two Poisson distribution is actually called a Skellam distribution. So we can calculate the probability of a draw by inputting the mean goal values into this distribution. So, hopefully you can see how we can adapt this approach to model specific matches. We just need to know the average number of goals scored by each team and feed this data into a Poisson model. Let’s have a look at the distribution of goals scored by Chelsea and Sunderland (teams who finished 1st and last, respectively). Building A Model. You should now be convinced that the number of goals scored by each team can be approximated by a Poisson distribution. Due to a relatively sample size (each team plays at most 19 home/away games), the accuracy of this approximation can vary significantly (especially earlier in the season when teams have played fewer games). Similar to before, we could now calculate the probability of various events in this Chelsea Sunderland match. But rather than treat each match separately, we’ll build a more general Poisson regression model (what is that?). Generalized Linear Model Regression Results Dep. Variable: goals No. Observations: 740 Model: GLM Df Residuals: 700 Model Family: Poisson Df Model: 39 Link Function: log Scale: 1.0 Method: IRLS Log-Likelihood: -1042.4 Date: Sat, 10 Jun 2017 Deviance: 776.11 Time: 11:17:38 Pearson chi2: 659. No. Iterations: 8 coef std err z P>|z| [95.0% Conf. Int.] Intercept 0.3725 0.198 1.880 0.060 -0.016 0.761 team[T.Bournemouth] -0.2891 0.179 -1.612 0.107 -0.641 0.062 team[T.Burnley] -0.6458 0.200 -3.230 0.001 -1.038 -0.254 team[T.Chelsea] 0.0789 0.162 0.488 0.626 -0.238 0.396 team[T.Crystal Palace] -0.3865 0.183 -2.107 0.035 -0.746 -0.027 team[T.Everton] -0.2008 0.173 -1.161 0.246 -0.540 0.138 team[T.Hull] -0.7006 0.204 -3.441 0.001 -1.100 -0.302 team[T.Leicester] -0.4204 0.187 -2.249 0.025 -0.787 -0.054 team[T.Liverpool] 0.0162 0.164 0.099 0.921 -0.306 0.338 team[T.Man City] 0.0117 0.164 0.072 0.943 -0.310 0.334 team[T.Man United] -0.3572 0.181 -1.971 0.049 -0.713 -0.002 team[T.Middlesbrough] -1.0087 0.225 -4.481 0.000 -1.450 -0.568 team[T.Southampton] -0.5804 0.195 -2.976 0.003 -0.963 -0.198 team[T.Stoke] -0.6082 0.197 -3.094 0.002 -0.994 -0.223 team[T.Sunderland] -0.9619 0.222 -4.329 0.000 -1.397 -0.526 team[T.Swansea] -0.5136 0.192 -2.673 0.008 -0.890 -0.137 team[T.Tottenham] 0.0532 0.162 0.328 0.743 -0.265 0.371 team[T.Watford] -0.5969 0.197 -3.035 0.002 -0.982 -0.211 team[T.West Brom] -0.5567 0.194 -2.876 0.004 -0.936 -0.177 team[T.West Ham] -0.4802 0.189 -2.535 0.011 -0.851 -0.109 opponent[T.Bournemouth] 0.4109 0.196 2.092 0.036 0.026 0.796 opponent[T.Burnley] 0.1657 0.206 0.806 0.420 -0.237 0.569 opponent[T.Chelsea] -0.3036 0.234 -1.298 0.194 -0.762 0.155 opponent[T.Crystal Palace] 0.3287 0.200 1.647 0.100 -0.062 0.720 opponent[T.Everton] -0.0442 0.218 -0.202 0.840 -0.472 0.384 opponent[T.Hull] 0.4979 0.193 2.585 0.010 0.120 0.875 opponent[T.Leicester] 0.3369 0.199 1.694 0.090 -0.053 0.727 opponent[T.Liverpool] -0.0374 0.217 -0.172 0.863 -0.463 0.389 opponent[T.Man City] -0.0993 0.222 -0.448 0.654 -0.534 0.335 opponent[T.Man United] -0.4220 0.241 -1.754 0.079 -0.894 0.050 opponent[T.Middlesbrough] 0.1196 0.208 0.574 0.566 -0.289 0.528 opponent[T.Southampton] 0.0458 0.211 0.217 0.828 -0.369 0.460 opponent[T.Stoke] 0.2266 0.203 1.115 0.265 -0.172 0.625 opponent[T.Sunderland] 0.3707 0.198 1.876 0.061 -0.017 0.758 opponent[T.Swansea] 0.4336 0.195 2.227 0.026 0.052 0.815 opponent[T.Tottenham] -0.5431 0.252 -2.156 0.031 -1.037 -0.049 opponent[T.Watford] 0.3533 0.198 1.782 0.075 -0.035 0.742 opponent[T.West Brom] 0.0970 0.209 0.463 0.643 -0.313 0.507 opponent[T.West Ham] 0.3485 0.198 1.758 0.079 -0.040 0.737 home 0.2969 0.063 4.702 0.000 0.173 0.421. If you’re curious about the smf.glm(. ) part, you can find more information here (edit: earlier versions of this post had erroneously employed a Generalised Estimating Equation (GEE)- what’s the difference?). I’m more interested in the values presented in the coef column in the model summary table, which are analogous to the slopes in linear regression. Similar to logistic regression, we take the exponent of the parameter values. A positive value implies more goals (), while values closer to zero represent more neutral effects (). Towards the bottom of the table you might notice that home has a coef of 0.2969. This captures the fact that home teams generally score more goals than the away team (specifically, =1.35 times more likely). But not all teams are created equal. Chelsea has a coef of 0.0789, while the corresponding value for Sunderland is -0.9619 (sort of saying Chelsea (Sunderland) are better (much worse!) scorers than average). Finally, the opponent* values penalize/reward teams based on the quality of the opposition. This relfects the defensive strength of each team (Chelsea: -0.3036; Sunderland: 0.3707). In other words, you’re less likely to score against Chelsea. Hopefully, that all makes both statistical and intuitive sense. Let’s start making some predictions for the upcoming matches. We simply pass our teams into poisson_model and it’ll return the expected average number of goals for that team (we need to run it twice- we calculate the expected average number of goals for each team separately). So let’s see how many goals we expect Chelsea and Sunderland to score. Predicting Football Results With Statistical Modelling. Combining the world’s most popular sport with everyone’s favourite discrete probability distribution, this post predicts football matches using the Poisson distribution. David Sheehan. Data scientist interested in sports, politics and Simpsons references. Football (or soccer to my American readers) is full of clichГ©s: “It’s a game of two halves”, “taking it one game at a time” and “Liverpool have failed to win the Premier League”. You’re less likely to hear “Treating the number of goals scored by each team as independent Poisson processes, statistical modelling suggests that the home team have a 60% chance of winning today”. But this is actually a bit of clichГ© too (it has been discussed here, here, here, here and particularly well here). As we’ll discover, a simple Poisson model is, well, overly simplistic. But it’s a good starting point and a nice intuitive way to learn about statistical modelling. So, if you came here looking to make money, I hear this guy makes ВЈ5000 per month without leaving the house. Poisson Distribution. HomeTeam AwayTeam HomeGoals AwayGoals 0 Burnley Swansea 0 1 1 Crystal Palace West Brom 0 1 2 Everton Tottenham 1 1 3 Hull Leicester 2 1 4 Man City Sunderland 2 1. We imported a csv as a pandas dataframe, which contains various information for each of the 380 EPL games in the 2016-17 English Premier League season. We restricted the dataframe to the columns in which we’re interested (specifically, team names and numer of goals scored by each team). I’ll omit most of the code that produces the graphs in this post. But don’t worry, you can find that code on my github page. Our task is to model the final round of fixtures in the season, so we must remove the last 10 rows (each gameweek consists of 10 matches). You’ll notice that, on average, the home team scores more goals than the away team. This is the so called вЂhome (field) advantage’ (discussed here) and isn’t specific to soccer. This is a convenient time to introduce the Poisson distribution. It’s a discrete probability distribution that describes the probability of the number of events within a specific time period (e.g 90 mins) with a known average rate of occurrence. A key assumption is that the number of events is independent of time. In our context, this means that goals don’t become more/less likely by the number of goals already scored in the match. Instead, the number of goals is expressed purely as function an average rate of goals. If that was unclear, maybe this mathematical formulation will make clearer: represents the average rate (e.g. average number of goals, average number of letters you receive, etc.). So, we can treat the number of goals scored by the home and away team as two independent Poisson distributions. The plot below shows the proportion of goals scored compared to the number of goals estimated by the corresponding Poisson distributions. We can use this statistical model to estimate the probability of specfic events. The probability of a draw is simply the sum of the events where the two teams score the same amount of goals. Note that we consider the number of goals scored by each team to be independent events (i.e. P(A n B) = P(A) P(B)). The difference of two Poisson distribution is actually called a Skellam distribution. So we can calculate the probability of a draw by inputting the mean goal values into this distribution. So, hopefully you can see how we can adapt this approach to model specific matches. We just need to know the average number of goals scored by each team and feed this data into a Poisson model. Let’s have a look at the distribution of goals scored by Chelsea and Sunderland (teams who finished 1st and last, respectively). Building A Model. You should now be convinced that the number of goals scored by each team can be approximated by a Poisson distribution. Due to a relatively sample size (each team plays at most 19 home/away games), the accuracy of this approximation can vary significantly (especially earlier in the season when teams have played fewer games). Similar to before, we could now calculate the probability of various events in this Chelsea Sunderland match. But rather than treat each match separately, we’ll build a more general Poisson regression model (what is that?). Generalized Linear Model Regression Results Dep. Variable: goals No. Observations: 740 Model: GLM Df Residuals: 700 Model Family: Poisson Df Model: 39 Link Function: log Scale: 1.0 Method: IRLS Log-Likelihood: -1042.4 Date: Sat, 10 Jun 2017 Deviance: 776.11 Time: 11:17:38 Pearson chi2: 659. No. Iterations: 8 coef std err z P>|z| [95.0% Conf. Int.] Intercept 0.3725 0.198 1.880 0.060 -0.016 0.761 team[T.Bournemouth] -0.2891 0.179 -1.612 0.107 -0.641 0.062 team[T.Burnley] -0.6458 0.200 -3.230 0.001 -1.038 -0.254 team[T.Chelsea] 0.0789 0.162 0.488 0.626 -0.238 0.396 team[T.Crystal Palace] -0.3865 0.183 -2.107 0.035 -0.746 -0.027 team[T.Everton] -0.2008 0.173 -1.161 0.246 -0.540 0.138 team[T.Hull] -0.7006 0.204 -3.441 0.001 -1.100 -0.302 team[T.Leicester] -0.4204 0.187 -2.249 0.025 -0.787 -0.054 team[T.Liverpool] 0.0162 0.164 0.099 0.921 -0.306 0.338 team[T.Man City] 0.0117 0.164 0.072 0.943 -0.310 0.334 team[T.Man United] -0.3572 0.181 -1.971 0.049 -0.713 -0.002 team[T.Middlesbrough] -1.0087 0.225 -4.481 0.000 -1.450 -0.568 team[T.Southampton] -0.5804 0.195 -2.976 0.003 -0.963 -0.198 team[T.Stoke] -0.6082 0.197 -3.094 0.002 -0.994 -0.223 team[T.Sunderland] -0.9619 0.222 -4.329 0.000 -1.397 -0.526 team[T.Swansea] -0.5136 0.192 -2.673 0.008 -0.890 -0.137 team[T.Tottenham] 0.0532 0.162 0.328 0.743 -0.265 0.371 team[T.Watford] -0.5969 0.197 -3.035 0.002 -0.982 -0.211 team[T.West Brom] -0.5567 0.194 -2.876 0.004 -0.936 -0.177 team[T.West Ham] -0.4802 0.189 -2.535 0.011 -0.851 -0.109 opponent[T.Bournemouth] 0.4109 0.196 2.092 0.036 0.026 0.796 opponent[T.Burnley] 0.1657 0.206 0.806 0.420 -0.237 0.569 opponent[T.Chelsea] -0.3036 0.234 -1.298 0.194 -0.762 0.155 opponent[T.Crystal Palace] 0.3287 0.200 1.647 0.100 -0.062 0.720 opponent[T.Everton] -0.0442 0.218 -0.202 0.840 -0.472 0.384 opponent[T.Hull] 0.4979 0.193 2.585 0.010 0.120 0.875 opponent[T.Leicester] 0.3369 0.199 1.694 0.090 -0.053 0.727 opponent[T.Liverpool] -0.0374 0.217 -0.172 0.863 -0.463 0.389 opponent[T.Man City] -0.0993 0.222 -0.448 0.654 -0.534 0.335 opponent[T.Man United] -0.4220 0.241 -1.754 0.079 -0.894 0.050 opponent[T.Middlesbrough] 0.1196 0.208 0.574 0.566 -0.289 0.528 opponent[T.Southampton] 0.0458 0.211 0.217 0.828 -0.369 0.460 opponent[T.Stoke] 0.2266 0.203 1.115 0.265 -0.172 0.625 opponent[T.Sunderland] 0.3707 0.198 1.876 0.061 -0.017 0.758 opponent[T.Swansea] 0.4336 0.195 2.227 0.026 0.052 0.815 opponent[T.Tottenham] -0.5431 0.252 -2.156 0.031 -1.037 -0.049 opponent[T.Watford] 0.3533 0.198 1.782 0.075 -0.035 0.742 opponent[T.West Brom] 0.0970 0.209 0.463 0.643 -0.313 0.507 opponent[T.West Ham] 0.3485 0.198 1.758 0.079 -0.040 0.737 home 0.2969 0.063 4.702 0.000 0.173 0.421. If you’re curious about the smf.glm(. ) part, you can find more information here (edit: earlier versions of this post had erroneously employed a Generalised Estimating Equation (GEE)- what’s the difference?). I’m more interested in the values presented in the coef column in the model summary table, which are analogous to the slopes in linear regression. Similar to logistic regression, we take the exponent of the parameter values. A positive value implies more goals (), while values closer to zero represent more neutral effects (). Towards the bottom of the table you might notice that home has a coef of 0.2969. This captures the fact that home teams generally score more goals than the away team (specifically, =1.35 times more likely). But not all teams are created equal. Chelsea has a coef of 0.0789, while the corresponding value for Sunderland is -0.9619 (sort of saying Chelsea (Sunderland) are better (much worse!) scorers than average). Finally, the opponent* values penalize/reward teams based on the quality of the opposition. This relfects the defensive strength of each team (Chelsea: -0.3036; Sunderland: 0.3707). In other words, you’re less likely to score against Chelsea. Hopefully, that all makes both statistical and intuitive sense. Let’s start making some predictions for the upcoming matches. We simply pass our teams into poisson_model and it’ll return the expected average number of goals for that team (we need to run it twice- we calculate the expected average number of goals for each team separately). So let’s see how many goals we expect Chelsea and Sunderland to score. Predicting Football Results With Statistical Modelling. Combining the world’s most popular sport with everyone’s favourite discrete probability distribution, this post predicts football matches using the Poisson distribution. David Sheehan. Data scientist interested in sports, politics and Simpsons references. Football (or soccer to my American readers) is full of clichГ©s: “It’s a game of two halves”, “taking it one game at a time” and “Liverpool have failed to win the Premier League”. You’re less likely to hear “Treating the number of goals scored by each team as independent Poisson processes, statistical modelling suggests that the home team have a 60% chance of winning today”. But this is actually a bit of clichГ© too (it has been discussed here, here, here, here and particularly well here). As we’ll discover, a simple Poisson model is, well, overly simplistic. But it’s a good starting point and a nice intuitive way to learn about statistical modelling. So, if you came here looking to make money, I hear this guy makes ВЈ5000 per month without leaving the house. Poisson Distribution. HomeTeam AwayTeam HomeGoals AwayGoals 0 Burnley Swansea 0 1 1 Crystal Palace West Brom 0 1 2 Everton Tottenham 1 1 3 Hull Leicester 2 1 4 Man City Sunderland 2 1. We imported a csv as a pandas dataframe, which contains various information for each of the 380 EPL games in the 2016-17 English Premier League season. We restricted the dataframe to the columns in which we’re interested (specifically, team names and numer of goals scored by each team). I’ll omit most of the code that produces the graphs in this post. But don’t worry, you can find that code on my github page. Our task is to model the final round of fixtures in the season, so we must remove the last 10 rows (each gameweek consists of 10 matches). You’ll notice that, on average, the home team scores more goals than the away team. This is the so called вЂhome (field) advantage’ (discussed here) and isn’t specific to soccer. This is a convenient time to introduce the Poisson distribution. It’s a discrete probability distribution that describes the probability of the number of events within a specific time period (e.g 90 mins) with a known average rate of occurrence. A key assumption is that the number of events is independent of time. In our context, this means that goals don’t become more/less likely by the number of goals already scored in the match. Instead, the number of goals is expressed purely as function an average rate of goals. If that was unclear, maybe this mathematical formulation will make clearer: represents the average rate (e.g. average number of goals, average number of letters you receive, etc.). So, we can treat the number of goals scored by the home and away team as two independent Poisson distributions. The plot below shows the proportion of goals scored compared to the number of goals estimated by the corresponding Poisson distributions. We can use this statistical model to estimate the probability of specfic events. The probability of a draw is simply the sum of the events where the two teams score the same amount of goals. Note that we consider the number of goals scored by each team to be independent events (i.e. P(A n B) = P(A) P(B)). The difference of two Poisson distribution is actually called a Skellam distribution. So we can calculate the probability of a draw by inputting the mean goal values into this distribution. So, hopefully you can see how we can adapt this approach to model specific matches. We just need to know the average number of goals scored by each team and feed this data into a Poisson model. Let’s have a look at the distribution of goals scored by Chelsea and Sunderland (teams who finished 1st and last, respectively). Building A Model. You should now be convinced that the number of goals scored by each team can be approximated by a Poisson distribution. Due to a relatively sample size (each team plays at most 19 home/away games), the accuracy of this approximation can vary significantly (especially earlier in the season when teams have played fewer games). Similar to before, we could now calculate the probability of various events in this Chelsea Sunderland match. But rather than treat each match separately, we’ll build a more general Poisson regression model (what is that?). Generalized Linear Model Regression Results Dep. Variable: goals No. Observations: 740 Model: GLM Df Residuals: 700 Model Family: Poisson Df Model: 39 Link Function: log Scale: 1.0 Method: IRLS Log-Likelihood: -1042.4 Date: Sat, 10 Jun 2017 Deviance: 776.11 Time: 11:17:38 Pearson chi2: 659. No. Iterations: 8 coef std err z P>|z| [95.0% Conf. Int.] Intercept 0.3725 0.198 1.880 0.060 -0.016 0.761 team[T.Bournemouth] -0.2891 0.179 -1.612 0.107 -0.641 0.062 team[T.Burnley] -0.6458 0.200 -3.230 0.001 -1.038 -0.254 team[T.Chelsea] 0.0789 0.162 0.488 0.626 -0.238 0.396 team[T.Crystal Palace] -0.3865 0.183 -2.107 0.035 -0.746 -0.027 team[T.Everton] -0.2008 0.173 -1.161 0.246 -0.540 0.138 team[T.Hull] -0.7006 0.204 -3.441 0.001 -1.100 -0.302 team[T.Leicester] -0.4204 0.187 -2.249 0.025 -0.787 -0.054 team[T.Liverpool] 0.0162 0.164 0.099 0.921 -0.306 0.338 team[T.Man City] 0.0117 0.164 0.072 0.943 -0.310 0.334 team[T.Man United] -0.3572 0.181 -1.971 0.049 -0.713 -0.002 team[T.Middlesbrough] -1.0087 0.225 -4.481 0.000 -1.450 -0.568 team[T.Southampton] -0.5804 0.195 -2.976 0.003 -0.963 -0.198 team[T.Stoke] -0.6082 0.197 -3.094 0.002 -0.994 -0.223 team[T.Sunderland] -0.9619 0.222 -4.329 0.000 -1.397 -0.526 team[T.Swansea] -0.5136 0.192 -2.673 0.008 -0.890 -0.137 team[T.Tottenham] 0.0532 0.162 0.328 0.743 -0.265 0.371 team[T.Watford] -0.5969 0.197 -3.035 0.002 -0.982 -0.211 team[T.West Brom] -0.5567 0.194 -2.876 0.004 -0.936 -0.177 team[T.West Ham] -0.4802 0.189 -2.535 0.011 -0.851 -0.109 opponent[T.Bournemouth] 0.4109 0.196 2.092 0.036 0.026 0.796 opponent[T.Burnley] 0.1657 0.206 0.806 0.420 -0.237 0.569 opponent[T.Chelsea] -0.3036 0.234 -1.298 0.194 -0.762 0.155 opponent[T.Crystal Palace] 0.3287 0.200 1.647 0.100 -0.062 0.720 opponent[T.Everton] -0.0442 0.218 -0.202 0.840 -0.472 0.384 opponent[T.Hull] 0.4979 0.193 2.585 0.010 0.120 0.875 opponent[T.Leicester] 0.3369 0.199 1.694 0.090 -0.053 0.727 opponent[T.Liverpool] -0.0374 0.217 -0.172 0.863 -0.463 0.389 opponent[T.Man City] -0.0993 0.222 -0.448 0.654 -0.534 0.335 opponent[T.Man United] -0.4220 0.241 -1.754 0.079 -0.894 0.050 opponent[T.Middlesbrough] 0.1196 0.208 0.574 0.566 -0.289 0.528 opponent[T.Southampton] 0.0458 0.211 0.217 0.828 -0.369 0.460 opponent[T.Stoke] 0.2266 0.203 1.115 0.265 -0.172 0.625 opponent[T.Sunderland] 0.3707 0.198 1.876 0.061 -0.017 0.758 opponent[T.Swansea] 0.4336 0.195 2.227 0.026 0.052 0.815 opponent[T.Tottenham] -0.5431 0.252 -2.156 0.031 -1.037 -0.049 opponent[T.Watford] 0.3533 0.198 1.782 0.075 -0.035 0.742 opponent[T.West Brom] 0.0970 0.209 0.463 0.643 -0.313 0.507 opponent[T.West Ham] 0.3485 0.198 1.758 0.079 -0.040 0.737 home 0.2969 0.063 4.702 0.000 0.173 0.421. If you’re curious about the smf.glm(. ) part, you can find more information here (edit: earlier versions of this post had erroneously employed a Generalised Estimating Equation (GEE)- what’s the difference?). I’m more interested in the values presented in the coef column in the model summary table, which are analogous to the slopes in linear regression. Similar to logistic regression, we take the exponent of the parameter values. A positive value implies more goals (), while values closer to zero represent more neutral effects (). Towards the bottom of the table you might notice that home has a coef of 0.2969. This captures the fact that home teams generally score more goals than the away team (specifically, =1.35 times more likely). But not all teams are created equal. Chelsea has a coef of 0.0789, while the corresponding value for Sunderland is -0.9619 (sort of saying Chelsea (Sunderland) are better (much worse!) scorers than average). Finally, the opponent* values penalize/reward teams based on the quality of the opposition. This relfects the defensive strength of each team (Chelsea: -0.3036; Sunderland: 0.3707). In other words, you’re less likely to score against Chelsea. Hopefully, that all makes both statistical and intuitive sense. Let’s start making some predictions for the upcoming matches. We simply pass our teams into poisson_model and it’ll return the expected average number of goals for that team (we need to run it twice- we calculate the expected average number of goals for each team separately). So let’s see how many goals we expect Chelsea and Sunderland to score. Predicting Football Results With Statistical Modelling. Combining the world’s most popular sport with everyone’s favourite discrete probability distribution, this post predicts football matches using the Poisson distribution. David Sheehan. Data scientist interested in sports, politics and Simpsons references. Football (or soccer to my American readers) is full of clichГ©s: “It’s a game of two halves”, “taking it one game at a time” and “Liverpool have failed to win the Premier League”. You’re less likely to hear “Treating the number of goals scored by each team as independent Poisson processes, statistical modelling suggests that the home team have a 60% chance of winning today”. But this is actually a bit of clichГ© too (it has been discussed here, here, here, here and particularly well here). As we’ll discover, a simple Poisson model is, well, overly simplistic. But it’s a good starting point and a nice intuitive way to learn about statistical modelling. So, if you came here looking to make money, I hear this guy makes ВЈ5000 per month without leaving the house. Poisson Distribution. HomeTeam AwayTeam HomeGoals AwayGoals 0 Burnley Swansea 0 1 1 Crystal Palace West Brom 0 1 2 Everton Tottenham 1 1 3 Hull Leicester 2 1 4 Man City Sunderland 2 1. We imported a csv as a pandas dataframe, which contains various information for each of the 380 EPL games in the 2016-17 English Premier League season. We restricted the dataframe to the columns in which we’re interested (specifically, team names and numer of goals scored by each team). I’ll omit most of the code that produces the graphs in this post. But don’t worry, you can find that code on my github page. Our task is to model the final round of fixtures in the season, so we must remove the last 10 rows (each gameweek consists of 10 matches). You’ll notice that, on average, the home team scores more goals than the away team. This is the so called вЂhome (field) advantage’ (discussed here) and isn’t specific to soccer. This is a convenient time to introduce the Poisson distribution. It’s a discrete probability distribution that describes the probability of the number of events within a specific time period (e.g 90 mins) with a known average rate of occurrence. A key assumption is that the number of events is independent of time. In our context, this means that goals don’t become more/less likely by the number of goals already scored in the match. Instead, the number of goals is expressed purely as function an average rate of goals. If that was unclear, maybe this mathematical formulation will make clearer: represents the average rate (e.g. average number of goals, average number of letters you receive, etc.). So, we can treat the number of goals scored by the home and away team as two independent Poisson distributions. The plot below shows the proportion of goals scored compared to the number of goals estimated by the corresponding Poisson distributions. We can use this statistical model to estimate the probability of specfic events. The probability of a draw is simply the sum of the events where the two teams score the same amount of goals. Note that we consider the number of goals scored by each team to be independent events (i.e. P(A n B) = P(A) P(B)). The difference of two Poisson distribution is actually called a Skellam distribution. So we can calculate the probability of a draw by inputting the mean goal values into this distribution. So, hopefully you can see how we can adapt this approach to model specific matches. We just need to know the average number of goals scored by each team and feed this data into a Poisson model. Let’s have a look at the distribution of goals scored by Chelsea and Sunderland (teams who finished 1st and last, respectively). Building A Model. You should now be convinced that the number of goals scored by each team can be approximated by a Poisson distribution. Due to a relatively sample size (each team plays at most 19 home/away games), the accuracy of this approximation can vary significantly (especially earlier in the season when teams have played fewer games). Similar to before, we could now calculate the probability of various events in this Chelsea Sunderland match. But rather than treat each match separately, we’ll build a more general Poisson regression model (what is that?). Generalized Linear Model Regression Results Dep. Variable: goals No. Observations: 740 Model: GLM Df Residuals: 700 Model Family: Poisson Df Model: 39 Link Function: log Scale: 1.0 Method: IRLS Log-Likelihood: -1042.4 Date: Sat, 10 Jun 2017 Deviance: 776.11 Time: 11:17:38 Pearson chi2: 659. No. Iterations: 8 coef std err z P>|z| [95.0% Conf. Int.] Intercept 0.3725 0.198 1.880 0.060 -0.016 0.761 team[T.Bournemouth] -0.2891 0.179 -1.612 0.107 -0.641 0.062 team[T.Burnley] -0.6458 0.200 -3.230 0.001 -1.038 -0.254 team[T.Chelsea] 0.0789 0.162 0.488 0.626 -0.238 0.396 team[T.Crystal Palace] -0.3865 0.183 -2.107 0.035 -0.746 -0.027 team[T.Everton] -0.2008 0.173 -1.161 0.246 -0.540 0.138 team[T.Hull] -0.7006 0.204 -3.441 0.001 -1.100 -0.302 team[T.Leicester] -0.4204 0.187 -2.249 0.025 -0.787 -0.054 team[T.Liverpool] 0.0162 0.164 0.099 0.921 -0.306 0.338 team[T.Man City] 0.0117 0.164 0.072 0.943 -0.310 0.334 team[T.Man United] -0.3572 0.181 -1.971 0.049 -0.713 -0.002 team[T.Middlesbrough] -1.0087 0.225 -4.481 0.000 -1.450 -0.568 team[T.Southampton] -0.5804 0.195 -2.976 0.003 -0.963 -0.198 team[T.Stoke] -0.6082 0.197 -3.094 0.002 -0.994 -0.223 team[T.Sunderland] -0.9619 0.222 -4.329 0.000 -1.397 -0.526 team[T.Swansea] -0.5136 0.192 -2.673 0.008 -0.890 -0.137 team[T.Tottenham] 0.0532 0.162 0.328 0.743 -0.265 0.371 team[T.Watford] -0.5969 0.197 -3.035 0.002 -0.982 -0.211 team[T.West Brom] -0.5567 0.194 -2.876 0.004 -0.936 -0.177 team[T.West Ham] -0.4802 0.189 -2.535 0.011 -0.851 -0.109 opponent[T.Bournemouth] 0.4109 0.196 2.092 0.036 0.026 0.796 opponent[T.Burnley] 0.1657 0.206 0.806 0.420 -0.237 0.569 opponent[T.Chelsea] -0.3036 0.234 -1.298 0.194 -0.762 0.155 opponent[T.Crystal Palace] 0.3287 0.200 1.647 0.100 -0.062 0.720 opponent[T.Everton] -0.0442 0.218 -0.202 0.840 -0.472 0.384 opponent[T.Hull] 0.4979 0.193 2.585 0.010 0.120 0.875 opponent[T.Leicester] 0.3369 0.199 1.694 0.090 -0.053 0.727 opponent[T.Liverpool] -0.0374 0.217 -0.172 0.863 -0.463 0.389 opponent[T.Man City] -0.0993 0.222 -0.448 0.654 -0.534 0.335 opponent[T.Man United] -0.4220 0.241 -1.754 0.079 -0.894 0.050 opponent[T.Middlesbrough] 0.1196 0.208 0.574 0.566 -0.289 0.528 opponent[T.Southampton] 0.0458 0.211 0.217 0.828 -0.369 0.460 opponent[T.Stoke] 0.2266 0.203 1.115 0.265 -0.172 0.625 opponent[T.Sunderland] 0.3707 0.198 1.876 0.061 -0.017 0.758 opponent[T.Swansea] 0.4336 0.195 2.227 0.026 0.052 0.815 opponent[T.Tottenham] -0.5431 0.252 -2.156 0.031 -1.037 -0.049 opponent[T.Watford] 0.3533 0.198 1.782 0.075 -0.035 0.742 opponent[T.West Brom] 0.0970 0.209 0.463 0.643 -0.313 0.507 opponent[T.West Ham] 0.3485 0.198 1.758 0.079 -0.040 0.737 home 0.2969 0.063 4.702 0.000 0.173 0.421. If you’re curious about the smf.glm(. ) part, you can find more information here (edit: earlier versions of this post had erroneously employed a Generalised Estimating Equation (GEE)- what’s the difference?). I’m more interested in the values presented in the coef column in the model summary table, which are analogous to the slopes in linear regression. Similar to logistic regression, we take the exponent of the parameter values. A positive value implies more goals (), while values closer to zero represent more neutral effects (). Towards the bottom of the table you might notice that home has a coef of 0.2969. This captures the fact that home teams generally score more goals than the away team (specifically, =1.35 times more likely). But not all teams are created equal. Chelsea has a coef of 0.0789, while the corresponding value for Sunderland is -0.9619 (sort of saying Chelsea (Sunderland) are better (much worse!) scorers than average). Finally, the opponent* values penalize/reward teams based on the quality of the opposition. This relfects the defensive strength of each team (Chelsea: -0.3036; Sunderland: 0.3707). In other words, you’re less likely to score against Chelsea. Hopefully, that all makes both statistical and intuitive sense. Let’s start making some predictions for the upcoming matches. We simply pass our teams into poisson_model and it’ll return the expected average number of goals for that team (we need to run it twice- we calculate the expected average number of goals for each team separately). So let’s see how many goals we expect Chelsea and Sunderland to score. Predicting Football Results With Statistical Modelling. Combining the world’s most popular sport with everyone’s favourite discrete probability distribution, this post predicts football matches using the Poisson distribution. David Sheehan. Data scientist interested in sports, politics and Simpsons references. Football (or soccer to my American readers) is full of clichГ©s: “It’s a game of two halves”, “taking it one game at a time” and “Liverpool have failed to win the Premier League”. You’re less likely to hear “Treating the number of goals scored by each team as independent Poisson processes, statistical modelling suggests that the home team have a 60% chance of winning today”. But this is actually a bit of clichГ© too (it has been discussed here, here, here, here and particularly well here). As we’ll discover, a simple Poisson model is, well, overly simplistic. But it’s a good starting point and a nice intuitive way to learn about statistical modelling. So, if you came here looking to make money, I hear this guy makes ВЈ5000 per month without leaving the house. Poisson Distribution. HomeTeam AwayTeam HomeGoals AwayGoals 0 Burnley Swansea 0 1 1 Crystal Palace West Brom 0 1 2 Everton Tottenham 1 1 3 Hull Leicester 2 1 4 Man City Sunderland 2 1. We imported a csv as a pandas dataframe, which contains various information for each of the 380 EPL games in the 2016-17 English Premier League season. We restricted the dataframe to the columns in which we’re interested (specifically, team names and numer of goals scored by each team). I’ll omit most of the code that produces the graphs in this post. But don’t worry, you can find that code on my github page. Our task is to model the final round of fixtures in the season, so we must remove the last 10 rows (each gameweek consists of 10 matches). You’ll notice that, on average, the home team scores more goals than the away team. This is the so called вЂhome (field) advantage’ (discussed here) and isn’t specific to soccer. This is a convenient time to introduce the Poisson distribution. It’s a discrete probability distribution that describes the probability of the number of events within a specific time period (e.g 90 mins) with a known average rate of occurrence. A key assumption is that the number of events is independent of time. In our context, this means that goals don’t become more/less likely by the number of goals already scored in the match. Instead, the number of goals is expressed purely as function an average rate of goals. If that was unclear, maybe this mathematical formulation will make clearer: represents the average rate (e.g. average number of goals, average number of letters you receive, etc.). So, we can treat the number of goals scored by the home and away team as two independent Poisson distributions. The plot below shows the proportion of goals scored compared to the number of goals estimated by the corresponding Poisson distributions. We can use this statistical model to estimate the probability of specfic events. The probability of a draw is simply the sum of the events where the two teams score the same amount of goals. Note that we consider the number of goals scored by each team to be independent events (i.e. P(A n B) = P(A) P(B)). The difference of two Poisson distribution is actually called a Skellam distribution. So we can calculate the probability of a draw by inputting the mean goal values into this distribution. So, hopefully you can see how we can adapt this approach to model specific matches. We just need to know the average number of goals scored by each team and feed this data into a Poisson model. Let’s have a look at the distribution of goals scored by Chelsea and Sunderland (teams who finished 1st and last, respectively). Building A Model. You should now be convinced that the number of goals scored by each team can be approximated by a Poisson distribution. Due to a relatively sample size (each team plays at most 19 home/away games), the accuracy of this approximation can vary significantly (especially earlier in the season when teams have played fewer games). Similar to before, we could now calculate the probability of various events in this Chelsea Sunderland match. But rather than treat each match separately, we’ll build a more general Poisson regression model (what is that?). Generalized Linear Model Regression Results Dep. Variable: goals No. Observations: 740 Model: GLM Df Residuals: 700 Model Family: Poisson Df Model: 39 Link Function: log Scale: 1.0 Method: IRLS Log-Likelihood: -1042.4 Date: Sat, 10 Jun 2017 Deviance: 776.11 Time: 11:17:38 Pearson chi2: 659. No. Iterations: 8 coef std err z P>|z| [95.0% Conf. Int.] Intercept 0.3725 0.198 1.880 0.060 -0.016 0.761 team[T.Bournemouth] -0.2891 0.179 -1.612 0.107 -0.641 0.062 team[T.Burnley] -0.6458 0.200 -3.230 0.001 -1.038 -0.254 team[T.Chelsea] 0.0789 0.162 0.488 0.626 -0.238 0.396 team[T.Crystal Palace] -0.3865 0.183 -2.107 0.035 -0.746 -0.027 team[T.Everton] -0.2008 0.173 -1.161 0.246 -0.540 0.138 team[T.Hull] -0.7006 0.204 -3.441 0.001 -1.100 -0.302 team[T.Leicester] -0.4204 0.187 -2.249 0.025 -0.787 -0.054 team[T.Liverpool] 0.0162 0.164 0.099 0.921 -0.306 0.338 team[T.Man City] 0.0117 0.164 0.072 0.943 -0.310 0.334 team[T.Man United] -0.3572 0.181 -1.971 0.049 -0.713 -0.002 team[T.Middlesbrough] -1.0087 0.225 -4.481 0.000 -1.450 -0.568 team[T.Southampton] -0.5804 0.195 -2.976 0.003 -0.963 -0.198 team[T.Stoke] -0.6082 0.197 -3.094 0.002 -0.994 -0.223 team[T.Sunderland] -0.9619 0.222 -4.329 0.000 -1.397 -0.526 team[T.Swansea] -0.5136 0.192 -2.673 0.008 -0.890 -0.137 team[T.Tottenham] 0.0532 0.162 0.328 0.743 -0.265 0.371 team[T.Watford] -0.5969 0.197 -3.035 0.002 -0.982 -0.211 team[T.West Brom] -0.5567 0.194 -2.876 0.004 -0.936 -0.177 team[T.West Ham] -0.4802 0.189 -2.535 0.011 -0.851 -0.109 opponent[T.Bournemouth] 0.4109 0.196 2.092 0.036 0.026 0.796 opponent[T.Burnley] 0.1657 0.206 0.806 0.420 -0.237 0.569 opponent[T.Chelsea] -0.3036 0.234 -1.298 0.194 -0.762 0.155 opponent[T.Crystal Palace] 0.3287 0.200 1.647 0.100 -0.062 0.720 opponent[T.Everton] -0.0442 0.218 -0.202 0.840 -0.472 0.384 opponent[T.Hull] 0.4979 0.193 2.585 0.010 0.120 0.875 opponent[T.Leicester] 0.3369 0.199 1.694 0.090 -0.053 0.727 opponent[T.Liverpool] -0.0374 0.217 -0.172 0.863 -0.463 0.389 opponent[T.Man City] -0.0993 0.222 -0.448 0.654 -0.534 0.335 opponent[T.Man United] -0.4220 0.241 -1.754 0.079 -0.894 0.050 opponent[T.Middlesbrough] 0.1196 0.208 0.574 0.566 -0.289 0.528 opponent[T.Southampton] 0.0458 0.211 0.217 0.828 -0.369 0.460 opponent[T.Stoke] 0.2266 0.203 1.115 0.265 -0.172 0.625 opponent[T.Sunderland] 0.3707 0.198 1.876 0.061 -0.017 0.758 opponent[T.Swansea] 0.4336 0.195 2.227 0.026 0.052 0.815 opponent[T.Tottenham] -0.5431 0.252 -2.156 0.031 -1.037 -0.049 opponent[T.Watford] 0.3533 0.198 1.782 0.075 -0.035 0.742 opponent[T.West Brom] 0.0970 0.209 0.463 0.643 -0.313 0.507 opponent[T.West Ham] 0.3485 0.198 1.758 0.079 -0.040 0.737 home 0.2969 0.063 4.702 0.000 0.173 0.421. If you’re curious about the smf.glm(. ) part, you can find more information here (edit: earlier versions of this post had erroneously employed a Generalised Estimating Equation (GEE)- what’s the difference?). I’m more interested in the values presented in the coef column in the model summary table, which are analogous to the slopes in linear regression. Similar to logistic regression, we take the exponent of the parameter values. A positive value implies more goals (), while values closer to zero represent more neutral effects (). Towards the bottom of the table you might notice that home has a coef of 0.2969. This captures the fact that home teams generally score more goals than the away team (specifically, =1.35 times more likely). But not all teams are created equal. Chelsea has a coef of 0.0789, while the corresponding value for Sunderland is -0.9619 (sort of saying Chelsea (Sunderland) are better (much worse!) scorers than average). Finally, the opponent* values penalize/reward teams based on the quality of the opposition. This relfects the defensive strength of each team (Chelsea: -0.3036; Sunderland: 0.3707). In other words, you’re less likely to score against Chelsea. Hopefully, that all makes both statistical and intuitive sense. Let’s start making some predictions for the upcoming matches. We simply pass our teams into poisson_model and it’ll return the expected average number of goals for that team (we need to run it twice- we calculate the expected average number of goals for each team separately). So let’s see how many goals we expect Chelsea and Sunderland to score. http://z-dom4.ru/forum/predlozheniya-i-pozhelaniya/liverpool-fixed-game-2/ https://worldsim.club/showthread.php?tid=4&pid=14525#pid14525 https://www.ex-ttcommunity.com/forum/viewtopic.php?f=3&p=2914983 https://haxcore.net/forum/member.php?action=profile&uid=659 https://foro.minecraftdescargas.com/showthread.php?tid=94817 http://z-dom4.ru/forum/predlozheniya-i-pozhelaniya/correct-score-double-2/ https://support-247.com/mybb/showthread.php?tid=308116 https://politicsuk.net/Bosworth/thread-46099.html http://leshangcoo.com/forum.php?mod=viewthread&tid=4679882&extra= http://www.kolaservers.net/forums/showthread.php?tid=400655 http://forum.ornisoft.com/viewtopic.php?f=4&t=861815 http://www.tdedchangair.com/webboard/viewtopic.php?f=2&t=401686 https://pinballspares.com.au/showthread.php?tid=47249 https://forum.devnagri.com/posting.php?mode=reply&f=102&t=56&sid=37eb6766fec161776b3cc6afbdea0c2a%20 https://www.ex-ttcommunity.com/forum/viewtopic.php?f=3&p=2915054 http://aena.at/phpbb3/viewtopic.php?f=5&t=325060&p=494967#p494967 http://funquest.com.ua/forum/posting.php?mode=quote&f=9&p=145702 https://www.realmanageracket.com/board/viewtopic.php?f=1&t=780008&p=1578628#p1578628 Football prediction dropping odds Best weekend soccer predictions Best 2020 fantasy football draft strategy Best football betting books Nhl best bets for today 2015 nba finals odds Over under basketball strategy Zulubet combo Oddsshark nfl computer picks K8 betting review Ladbrokes fixed odds football coupon Nfl fantasy draft tips 2020 Bet sure soccer prediction How do sports betting odds work College football analysis and predictions Mlb tips and predictions Zulu correct score prediction Ladbrokes afl blog Just horse racing big bets hot fixed matches world cup 2019 fixed matches Federal sports betting Statarea fixed matches correct score table chart Mbet sure wins Best of football prediction Best fanduel lineup week 9 Nfl games betting odds Nfl week 17 computer picks Public betting lines Nfl week 8 score predictions 2020 Best betting sites in canada lion fixed matches Sure daily 5 odds Tennis over under Super rugby head to head odds Best snooker cue tips How to bet on volleyball Golf betting uk Trending betting tips Sure wins for today smartbetting 1x2 best predictions 3rd pick nba draft Nba mock draft 1st and 2nd round Handicappers nfl picks Canadian open golf betting Odds of raptors winning finals New orleans casino sports book Week 4 nfl straight up picks Best hockey draft picks 2020 Nba east odds Nfl game picks week 14 Dark web fixed football matches Fantasy do not draft list 888 sportsbook nj 2020 nfl draft dolphins Betking sure prediction Betsports soccer prediction Wnba picks and parlay 7 best bets sign up offer Oddsshark nfl Nigerian football prediction for today Ncaaf betting predictions Holly sportsbook Dream11 prediction for football Confidence pool nfl week 1 |