1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Statistical methods for football

Discussion in 'Miami Dolphins Forum' started by cbrad, Feb 4, 2020.

  1. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Yeah.. if they're using EPA, then the first two games last season where we lost 10-59 and 0-43 are going to skew the expected wins quite a bit because EPA doesn't care that all the drives in those games (both on offense and defense) came in just 2 games lol. So EPA would predict a lower win% than we actually had because of that alone.

    Now whether that explains the full 2.5 wins above expected I don't know, but I wouldn't be surprised if it explained a good portion of it. Something to keep in mind when merging data.
     
    Pauly, Irishman and The Guy like this.
  2. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Would that be controlled for if they used EPA per play instead of just EPA?
     
  3. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    No, in fact it looks like they used EPA per dropback and per rush. If you have one or two absolutely horrid games, with tons of plays and bad average EPA per play, that obviously skews the average across a season.

    The way to "control" for that is to use a sampling distribution — a distribution of means — where in this case the mean is per game. In other words, use average EPA per game as a single statistic and look at the distribution of that. Problem there is you have so few data points with 16 games so there are drawbacks to that approach too, but it solves this one issue.
     
    Irishman and The Guy like this.
  4. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    What do you think about the fact that in that article, when they use a random forest as opposed to regression, just about every variable pales in comparison in importance to offensive and defensive EPA per dropback?
     
  5. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    That first figure uses simple linear regression, so they did regression separately on each variable (how well you can predict wins based on each variable alone). That second graph they used a random forest model which takes into account all the possible predictors, so those two graphs aren't comparable.

    The proper comparison would be multiple linear regression vs. random forest. In general, you don't want to trust machine learning techniques like random forests because they tend to "overfit" the data, i.e., they work well on that one dataset but not on a slightly different one. Multiple linear regression is better, however I suspect EPA per dropback would also win out in multiple linear regression because we know passing efficiency matters more than rushing efficiency.
     
    The Guy likes this.
  6. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
    cbrad likes this.
  7. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Yeah, whatever the effect of fans on home field advantage is being swamped by the effect of no fans on offensive production for both teams. Points per game, yards per game, passer rating, plays per drive, scoring percentage per drive.. all of these are at record levels this year with QB's having to whisper in the huddle lol.

    Two links worth checking every week:
    https://www.pro-football-reference.com/years/NFL/index.htm
    https://www.pro-football-reference.com/years/NFL/passing.htm

    We can also compare those stats at the end of the year to what they were a week ago (i.e., after 1 month of play), especially ppg at 25.6 and passer rating at 96.5, to see how much of the observed effect was due to lack of preseason. Right now, it's looking like most of the effect is stable and due to lack of crowd noise, but passer rating did decrease last week, taking the overall average to 95.6, possibly showing some effect of lack of preseason.
     
    The Guy likes this.
  8. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    I have to wonder also if there is an effect of the home field crowd on inspiring defensive play. So you have perhaps this synergistic effect where offenses can function better and defenses are less inspired and invigorated.
     
  9. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  10. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  11. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  12. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  13. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  14. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  15. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  16. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Yeah nothing special there, basically fitting a curve to data on how well spreads predicted actual score differentials, and using that to improve on a model he has. The "weighting" is just adding this new information to the old model. Basically, he's adding more parameters to increase predictive power.

    Like I said, nothing special from a modeling perspective, but it's exactly what you'd do if you really wanted to bet on games.
     
    The Guy likes this.
  17. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Right but am I reading that correctly that he's getting 65% accuracy against the spread with that model?
     
  18. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    No that's just the market regression weight lol (look at the x-axis on that graph).
     
    The Guy likes this.
  19. danmarino

    danmarino Tua is H1M! Club Member

    15,357
    20,976
    113
    Sep 4, 2014
    Do we know if there may be some bad teams vs good teams causing this? Maybe it just so happens that bad home teams are playing good away teams more often? Just a thought...
     
    Irishman likes this.
  20. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    No such effect. First, all teams have played either 4 or 5 home games, and if bad home teams were playing good away teams more often the red line in the graph below would have a higher slope and lower intercept. The red line shows the best fitting line to home win% as a function of overall win%, and the slope of that line is 0.9461 and intercept 0.0459, so nearly identical to the dotted line = slope of 1 and 0 intercept = no effect.

    [​IMG]

    It's the lack of crowd noise. Offensive stats are at record levels. League average PPG is at 25.1 (highest ever previously was 23.3) and average passer rating is at 94.5 (highest previously was 92.9). Commentators have pointed out that QB's have to whisper in the huddle so that the defense can't hear them, etc.
     
    The Guy and danmarino like this.
  21. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  22. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    This Moo guy is "not a fan of passer rating" yet he doesn't understand that passer rating inflation is so huge over time that you have to equate distributions to compare passer rating across eras, i.e., look at z-scores.

    In any case, Mahomes has 1473 attempts right now, and I've generally put the threshold for comparing careers at 4000, so he's a long ways off from being included on any career list. But as of now his career z-score is 1.541, and that would put him 3rd all time behind Steve Young (1.8627) and Joe Montana (1.5602). Just goes to show how impressive Young was.

    Remember though, Young and Montana had 4000+ attempts. Let's see if Mahomes can keep up this torrid pace. For reference, Wilson (just passed 4k attempts) is 1.1953, Rodgers (6k+ attempts) is 1.3165, and Brees (10k+ attempts) is 1.1613.
     
    Last edited: Nov 26, 2020
    The Guy likes this.
  23. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
  24. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    That was Dec 8th. It's now 127-128-1 for home teams so there was no home field advantage in 2020.

    Still don't see any reason to use Bayesian regression. For those who don't know, "Bayesian" means you start off with some prior probability (in this case you assume home field advantage is 2 points) and then use each new data point to update those probabilities. There's no reason to use Bayesian. Just look at the actual statistics.

    Anyway, 2020 was truly an anomalous year. Highest ppg ever = 24.8, and highest average passer rating ever = 93.6. If you look at the decrements in both from early in the season, it looks like maybe 1/3 of the increase in offensive output from last year could be attributed to lack of preseason and about 2/3 to lack of crowd noise. Just a rough estimate, but interesting.
     
    Irishman and The Guy like this.
  25. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Something more on the scoring issue:

    https://fivethirtyeight.com/features/why-did-nfl-teams-score-so-much-this-season/?cid=_inlinerelated
     
  26. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    The average correlation between points per game and offensive penalties committed across NFL history is -0.04 so I think we can discount that as an important factor in the observed increase in ppg.

    They list lack of home field advantage per se as another reason. I don't follow the logic there. It's theoretically possible that HFA helps the home offense more than it hurts the away offense. Can't directly test this since it's confounded with the increase in ppg, but that one requires evidence. It's certainly not intuitive to me.

    The play action I can see, but in terms of yards gained the percent is virtually identical to 2019 so it wouldn't explain this.

    I think the main reason is lack of crowd noise. The mechanism there is you can communicate better on offense. THAT helps both offenses (irrespective of whether there is some other source of home field advantage). And of course lack of preseason just added to that.

    btw.. so far in the playoffs the away team has won 4 out of 6.
     
    Irishman and The Guy like this.
  27. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Something interesting here regarding EPA and the pick-six Buffalo had against Baltimore last weekend:
    https://theathletic.com/2327328/202...ios-proposals/?source=dailyemail&redirected=1
     
  28. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  29. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  30. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  31. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
  32. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    @cbrad

    I'm curious about your prediction for this year's Super Bowl if you don't mind chiming in on that. I know last year you damn near nailed that thing score and all.
     
  33. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    lol yeah, got a bit lucky though. There's still such small sample size with SB's that the variance is pretty high, and last year just happened to be very close to the expected (i.e., very close to the average).

    I forget exactly what I did last year, but the average point differential between the SB winner and loser since 2000 is about 10 points (same for both decades since 2000), while it was closer to 18 points in the 20 years before. In other words, point differential should not take all of NFL history into account and should be expected to be about 10 in this era.

    The equation for the best-fitting line between points scored by the SB winner (SBW) and league average points (LAP) from 2000-2019 is SBW = 0.91*LAP + 9.21. Given that league average points was a record high 24.8 this year, that suggests the SB winner should score 32 points with the SB loser scoring 22 points. So a preliminary prediction would be 32-22.

    I think last year I looked z-scores for offense and defense for both teams, but they were so similar on both sides of the ball for both teams that it didn't budge the prediction from that preliminary one. Same this year. KC and TB on offense have z-scores of 1.1158 and 1.3642, while on defense they're 0.5901 and 0.7094. That is so close that there's no reason to adjust based on z-score differences, although note that once again offense is more important than defense for winning the SB no matter which team wins it this year.

    So I'll go with 32-22 as the prediction (keeping in mind the variances are pretty big).
     
    The Guy likes this.
  34. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    So does that specify which team will have 32, or is it independent of that?
     
  35. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Yeah that's a more interesting question than you might think because both KC and TB have statistical properties more similar to SB losers than to winners lol.

    The average z-score difference for points scored and points allowed for a SB winner is 0.0357. So for SB winners both offense and defense are similarly important, with offense a tad more. That totally changes for the SB loser. SB losers tend to be far more imbalanced, with that difference going up to 0.4713. SB losers tend to have far better offenses than defenses.

    Guess what? Both KC and TB fit that bill lol. Both look statistically like SB losers, though TB looks a bit MORE like a SB loser with a 0.6548 difference compared to 0.5257 for KC.

    The same balance vs. imbalance principle holds for z-score passer ratings, and once again KC is a tad better than TB. KC's z-score passer rating (for the team) is 1.169 on offense and 0.5121 on defense, while for TB it's 0.7833 on offense and -0.0916 on defense. The difference is again bigger for TB: 0.8749 vs. 0.6569.

    So in summary, both KC and TB look statistically like SB losers, but TB looks MORE like a SB loser than KC, so the prediction is KC over TB: 32-22. Keep in mind once again that variance is huge. Still interesting to try and predict.
     
    The Guy likes this.

Share This Page