Statistical methods for football

cbrad · Oct 17, 2020

The Guy said:

This article is interesting because it proposes that the Dolphins had about 2.5 more wins than expected in 2019:

https://www.opensourcefootball.com/posts/2020-08-23-exploring-wins-with-nflfastr/
Click to expand...

Yeah.. if they're using EPA, then the first two games last season where we lost 10-59 and 0-43 are going to skew the expected wins quite a bit because EPA doesn't care that all the drives in those games (both on offense and defense) came in just 2 games lol. So EPA would predict a lower win% than we actually had because of that alone.

Now whether that explains the full 2.5 wins above expected I don't know, but I wouldn't be surprised if it explained a good portion of it. Something to keep in mind when merging data.

The Guy · Oct 17, 2020

cbrad said:

Yeah.. if they're using EPA, then the first two games last season where we lost 10-59 and 0-43 are going to skew the expected wins quite a bit because EPA doesn't care that all the drives in those games (both on offense and defense) came in just 2 games lol. So EPA would predict a lower win% than we actually had because of that alone.

Now whether that explains the full 2.5 wins above expected I don't know, but I wouldn't be surprised if it explained a good portion of it. Something to keep in mind when merging data.
Click to expand...

Would that be controlled for if they used EPA per play instead of just EPA?

cbrad · Oct 17, 2020

The Guy said:

Would that be controlled for if they used EPA per play instead of just EPA?
Click to expand...

No, in fact it looks like they used EPA per dropback and per rush. If you have one or two absolutely horrid games, with tons of plays and bad average EPA per play, that obviously skews the average across a season.

The way to "control" for that is to use a sampling distribution — a distribution of means — where in this case the mean is per game. In other words, use average EPA per game as a single statistic and look at the distribution of that. Problem there is you have so few data points with 16 games so there are drawbacks to that approach too, but it solves this one issue.

The Guy · Oct 18, 2020

cbrad said:

No, in fact it looks like they used EPA per dropback and per rush. If you have one or two absolutely horrid games, with tons of plays and bad average EPA per play, that obviously skews the average across a season.

The way to "control" for that is to use a sampling distribution — a distribution of means — where in this case the mean is per game. In other words, use average EPA per game as a single statistic and look at the distribution of that. Problem there is you have so few data points with 16 games so there are drawbacks to that approach too, but it solves this one issue.
Click to expand...

What do you think about the fact that in that article, when they use a random forest as opposed to regression, just about every variable pales in comparison in importance to offensive and defensive EPA per dropback?

cbrad · Oct 18, 2020

The Guy said:

What do you think about the fact that in that article, when they use a random forest as opposed to regression, just about every variable pales in comparison in importance to offensive and defensive EPA per dropback?
Click to expand...

That first figure uses simple linear regression, so they did regression separately on each variable (how well you can predict wins based on each variable alone). That second graph they used a random forest model which takes into account all the possible predictors, so those two graphs aren't comparable.

The proper comparison would be multiple linear regression vs. random forest. In general, you don't want to trust machine learning techniques like random forests because they tend to "overfit" the data, i.e., they work well on that one dataset but not on a slightly different one. Multiple linear regression is better, however I suspect EPA per dropback would also win out in multiple linear regression because we know passing efficiency matters more than rushing efficiency.

The Guy · Oct 18, 2020

cbrad · Oct 18, 2020

The Guy said:

Click to expand...

Yeah, whatever the effect of fans on home field advantage is being swamped by the effect of no fans on offensive production for both teams. Points per game, yards per game, passer rating, plays per drive, scoring percentage per drive.. all of these are at record levels this year with QB's having to whisper in the huddle lol.

Two links worth checking every week:
https://www.pro-football-reference.com/years/NFL/index.htm
https://www.pro-football-reference.com/years/NFL/passing.htm

We can also compare those stats at the end of the year to what they were a week ago (i.e., after 1 month of play), especially ppg at 25.6 and passer rating at 96.5, to see how much of the observed effect was due to lack of preseason. Right now, it's looking like most of the effect is stable and due to lack of crowd noise, but passer rating did decrease last week, taking the overall average to 95.6, possibly showing some effect of lack of preseason.

The Guy · Oct 18, 2020

cbrad said:

Yeah, whatever the effect of fans on home field advantage is being swamped by the effect of no fans on offensive production for both teams. Points per game, yards per game, passer rating, plays per drive, scoring percentage per drive.. all of these are at record levels this year with QB's having to whisper in the huddle lol.

Two links worth checking every week:
https://www.pro-football-reference.com/years/NFL/index.htm
https://www.pro-football-reference.com/years/NFL/passing.htm

We can also compare those stats at the end of the year to what they were a week ago (i.e., after 1 month of play), especially ppg at 25.6 and passer rating at 96.5, to see how much of the observed effect was due to lack of preseason. Right now, it's looking like most of the effect is stable and due to lack of crowd noise, but passer rating did decrease last week, taking the overall average to 95.6, possibly showing some effect of lack of preseason.
Click to expand...

I have to wonder also if there is an effect of the home field crowd on inspiring defensive play. So you have perhaps this synergistic effect where offenses can function better and defenses are less inspired and invigorated.

The Guy · Oct 23, 2020

The Guy · Oct 24, 2020

The Guy · Oct 24, 2020

The Guy · Oct 24, 2020

The Guy · Oct 26, 2020

The Guy · Oct 28, 2020

The Guy · Nov 14, 2020

The Guy · Nov 14, 2020

@cbrad have a look at this:

https://www.robbygreer.com/blog/using-market-regression-to-improve-prediction-accuracy-in-the-nfl

cbrad · Nov 14, 2020

The Guy said:

@cbrad have a look at this:

https://www.robbygreer.com/blog/using-market-regression-to-improve-prediction-accuracy-in-the-nfl
Click to expand...

Yeah nothing special there, basically fitting a curve to data on how well spreads predicted actual score differentials, and using that to improve on a model he has. The "weighting" is just adding this new information to the old model. Basically, he's adding more parameters to increase predictive power.

Like I said, nothing special from a modeling perspective, but it's exactly what you'd do if you really wanted to bet on games.

The Guy · Nov 15, 2020

cbrad said:

Yeah nothing special there, basically fitting a curve to data on how well spreads predicted actual score differentials, and using that to improve on a model he has. The "weighting" is just adding this new information to the old model. Basically, he's adding more parameters to increase predictive power.

Like I said, nothing special from a modeling perspective, but it's exactly what you'd do if you really wanted to bet on games.
Click to expand...

Right but am I reading that correctly that he's getting 65% accuracy against the spread with that model?

cbrad · Nov 15, 2020

The Guy said:

Right but am I reading that correctly that he's getting 65% accuracy against the spread with that model?
Click to expand...

No that's just the market regression weight lol (look at the x-axis on that graph).

danmarino · Nov 19, 2020

The Guy said:

Click to expand...

Do we know if there may be some bad teams vs good teams causing this? Maybe it just so happens that bad home teams are playing good away teams more often? Just a thought...

cbrad · Nov 19, 2020

danmarino said:

Do we know if there may be some bad teams vs good teams causing this? Maybe it just so happens that bad home teams are playing good away teams more often? Just a thought...
Click to expand...

No such effect. First, all teams have played either 4 or 5 home games, and if bad home teams were playing good away teams more often the red line in the graph below would have a higher slope and lower intercept. The red line shows the best fitting line to home win% as a function of overall win%, and the slope of that line is 0.9461 and intercept 0.0459, so nearly identical to the dotted line = slope of 1 and 0 intercept = no effect.

It's the lack of crowd noise. Offensive stats are at record levels. League average PPG is at 25.1 (highest ever previously was 23.3) and average passer rating is at 94.5 (highest previously was 92.9). Commentators have pointed out that QB's have to whisper in the huddle so that the defense can't hear them, etc.

The Guy · Nov 26, 2020

cbrad · Nov 26, 2020

The Guy said:

Click to expand...

This Moo guy is "not a fan of passer rating" yet he doesn't understand that passer rating inflation is so huge over time that you have to equate distributions to compare passer rating across eras, i.e., look at z-scores.

In any case, Mahomes has 1473 attempts right now, and I've generally put the threshold for comparing careers at 4000, so he's a long ways off from being included on any career list. But as of now his career z-score is 1.541, and that would put him 3rd all time behind Steve Young (1.8627) and Joe Montana (1.5602). Just goes to show how impressive Young was.

Remember though, Young and Montana had 4000+ attempts. Let's see if Mahomes can keep up this torrid pace. For reference, Wilson (just passed 4k attempts) is 1.1953, Rodgers (6k+ attempts) is 1.3165, and Brees (10k+ attempts) is 1.1613.

The Guy · Jan 14, 2021

@cbrad

cbrad · Jan 14, 2021

The Guy said:

@cbrad

Click to expand...

That was Dec 8th. It's now 127-128-1 for home teams so there was no home field advantage in 2020.

Still don't see any reason to use Bayesian regression. For those who don't know, "Bayesian" means you start off with some prior probability (in this case you assume home field advantage is 2 points) and then use each new data point to update those probabilities. There's no reason to use Bayesian. Just look at the actual statistics.

Anyway, 2020 was truly an anomalous year. Highest ppg ever = 24.8, and highest average passer rating ever = 93.6. If you look at the decrements in both from early in the season, it looks like maybe 1/3 of the increase in offensive output from last year could be attributed to lack of preseason and about 2/3 to lack of crowd noise. Just a rough estimate, but interesting.

The Guy · Jan 14, 2021

cbrad said:

That was Dec 8th. It's now 127-128-1 for home teams so there was no home field advantage in 2020.

Still don't see any reason to use Bayesian regression. For those who don't know, "Bayesian" means you start off with some prior probability (in this case you assume home field advantage is 2 points) and then use each new data point to update those probabilities. There's no reason to use Bayesian. Just look at the actual statistics.

Anyway, 2020 was truly an anomalous year. Highest ppg ever = 24.8, and highest average passer rating ever = 93.6. If you look at the decrements in both from early in the season, it looks like maybe 1/3 of the increase in offensive output from last year could be attributed to lack of preseason and about 2/3 to lack of crowd noise. Just a rough estimate, but interesting.
Click to expand...

Something more on the scoring issue:

https://fivethirtyeight.com/features/why-did-nfl-teams-score-so-much-this-season/?cid=_inlinerelated

cbrad · Jan 14, 2021

The Guy said:

Something more on the scoring issue:

https://fivethirtyeight.com/features/why-did-nfl-teams-score-so-much-this-season/?cid=_inlinerelated
Click to expand...

The average correlation between points per game and offensive penalties committed across NFL history is -0.04 so I think we can discount that as an important factor in the observed increase in ppg.

They list lack of home field advantage per se as another reason. I don't follow the logic there. It's theoretically possible that HFA helps the home offense more than it hurts the away offense. Can't directly test this since it's confounded with the increase in ppg, but that one requires evidence. It's certainly not intuitive to me.

The play action I can see, but in terms of yards gained the percent is virtually identical to 2019 so it wouldn't explain this.

I think the main reason is lack of crowd noise. The mechanism there is you can communicate better on offense. THAT helps both offenses (irrespective of whether there is some other source of home field advantage). And of course lack of preseason just added to that.

btw.. so far in the playoffs the away team has won 4 out of 6.

The Guy · Jan 19, 2021

Something interesting here regarding EPA and the pick-six Buffalo had against Baltimore last weekend:

Johnson’s pick-six interception off Jackson late in the third quarter was so big, it produced the sixth-largest EPA swing on a single postseason play since at least 2000, according to TruMedia. Here’s how that play produced a 10.7-point swing, tied for the Bills’ second-largest positive swing on any play since 2000:

The Ravens faced third-and-goal from the Buffalo 9-yard line with 58 seconds left in the third quarter. That situation is worth about 4 EPA to the offense, which represents the likelihood of all the potential outcomes. In that situation historically, there’s roughly a 20 percent chance the offense will throw a touchdown pass, a 10 percent chance the offense will incur a sack, a 5 percent chance the offense will suffer a turnover, etc. Barring a turnover or especially bad sack, the offense will be in prime position to attempt a high-percentage field goal, at least.

Bottom line: The Ravens were expecting to get about four points from that situation on average. A field goal would have cut a 10-3 deficit to 10-6 while a touchdown likely would have tied the score. Those are not disastrous outcomes from the Ravens’ standpoint. What the Ravens got, instead, was the worst possible outcome, a pick-six and a Bills PAT, converting that 4 EPA into a touchdown and PAT for the Bills. That’s how 10.7 EPA change hands in a single play.
Click to expand...

https://theathletic.com/2327328/202...ios-proposals/?source=dailyemail&redirected=1

The Guy · Jan 21, 2021

The Guy · Jan 25, 2021

The Guy · Jan 26, 2021

The Guy · Jan 28, 2021

I suspect this will be heavy on statistics:

https://theathletic.com/2350097/?source=email&campaign=betmgmcrm

The Guy · Feb 5, 2021

@cbrad

I'm curious about your prediction for this year's Super Bowl if you don't mind chiming in on that. I know last year you damn near nailed that thing score and all.

cbrad · Feb 5, 2021

The Guy said:

@cbrad

I'm curious about your prediction for this year's Super Bowl if you don't mind chiming in on that. I know last year you damn near nailed that thing score and all.
Click to expand...

lol yeah, got a bit lucky though. There's still such small sample size with SB's that the variance is pretty high, and last year just happened to be very close to the expected (i.e., very close to the average).

I forget exactly what I did last year, but the average point differential between the SB winner and loser since 2000 is about 10 points (same for both decades since 2000), while it was closer to 18 points in the 20 years before. In other words, point differential should not take all of NFL history into account and should be expected to be about 10 in this era.

The equation for the best-fitting line between points scored by the SB winner (SBW) and league average points (LAP) from 2000-2019 is SBW = 0.91*LAP + 9.21. Given that league average points was a record high 24.8 this year, that suggests the SB winner should score 32 points with the SB loser scoring 22 points. So a preliminary prediction would be 32-22.

I think last year I looked z-scores for offense and defense for both teams, but they were so similar on both sides of the ball for both teams that it didn't budge the prediction from that preliminary one. Same this year. KC and TB on offense have z-scores of 1.1158 and 1.3642, while on defense they're 0.5901 and 0.7094. That is so close that there's no reason to adjust based on z-score differences, although note that once again offense is more important than defense for winning the SB no matter which team wins it this year.

So I'll go with 32-22 as the prediction (keeping in mind the variances are pretty big).

The Guy · Feb 6, 2021

cbrad said:

lol yeah, got a bit lucky though. There's still such small sample size with SB's that the variance is pretty high, and last year just happened to be very close to the expected (i.e., very close to the average).

I forget exactly what I did last year, but the average point differential between the SB winner and loser since 2000 is about 10 points (same for both decades since 2000), while it was closer to 18 points in the 20 years before. In other words, point differential should not take all of NFL history into account and should be expected to be about 10 in this era.

The equation for the best-fitting line between points scored by the SB winner (SBW) and league average points (LAP) from 2000-2019 is SBW = 0.91*LAP + 9.21. Given that league average points was a record high 24.8 this year, that suggests the SB winner should score 32 points with the SB loser scoring 22 points. So a preliminary prediction would be 32-22.

I think last year I looked z-scores for offense and defense for both teams, but they were so similar on both sides of the ball for both teams that it didn't budge the prediction from that preliminary one. Same this year. KC and TB on offense have z-scores of 1.1158 and 1.3642, while on defense they're 0.5901 and 0.7094. That is so close that there's no reason to adjust based on z-score differences, although note that once again offense is more important than defense for winning the SB no matter which team wins it this year.

So I'll go with 32-22 as the prediction (keeping in mind the variances are pretty big).
Click to expand...

So does that specify which team will have 32, or is it independent of that?

cbrad · Feb 6, 2021

The Guy said:

So does that specify which team will have 32, or is it independent of that?
Click to expand...

Yeah that's a more interesting question than you might think because both KC and TB have statistical properties more similar to SB losers than to winners lol.

The average z-score difference for points scored and points allowed for a SB winner is 0.0357. So for SB winners both offense and defense are similarly important, with offense a tad more. That totally changes for the SB loser. SB losers tend to be far more imbalanced, with that difference going up to 0.4713. SB losers tend to have far better offenses than defenses.

Guess what? Both KC and TB fit that bill lol. Both look statistically like SB losers, though TB looks a bit MORE like a SB loser with a 0.6548 difference compared to 0.5257 for KC.

The same balance vs. imbalance principle holds for z-score passer ratings, and once again KC is a tad better than TB. KC's z-score passer rating (for the team) is 1.169 on offense and 0.5121 on defense, while for TB it's 0.7833 on offense and -0.0916 on defense. The difference is again bigger for TB: 0.8749 vs. 0.6569.

So in summary, both KC and TB look statistically like SB losers, but TB looks MORE like a SB loser than KC, so the prediction is KC over TB: 32-22. Keep in mind once again that variance is huge. Still interesting to try and predict.

Log in / Register

Forums

Statistical methods for football

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

danmarino Hyperbole or death Club Member

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

Share This Page

Log in / Register

Useful Searches

Statistical methods for football

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

danmarino Hyperbole or death Club Member

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

The Guy Well-Known Member

cbrad .

The Guy Well-Known Member

cbrad .

Share This Page