1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Statistical methods for football

Discussion in 'Miami Dolphins Forum' started by cbrad, Feb 4, 2020.

  1. Pauly

    Pauly Season Ticket Holder

    3,696
    3,743
    113
    Nov 29, 2007
    There is a very famous quote in mathematics from John von Neumann “With 4 parameters (i.e. assumptions) I can fit an elephant. With 5 I can make him wiggle his trunk”. Demonstrated here:
     
    resnor and Irishman like this.
  2. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Another interesting finding here:

    The correlation between clutch-weighted EPA per play and QBR: 0.98 (95% CI 0.968 to 0.986).

    QBR is clutch-weighted EPA per play.

    The correlation between clutch-weighted EPA per play and quarterbacks' win percentage from 2017 to 2019 is 0.654 [0.520 to 0.757].

    That's a bit stronger than the correlation between adjusted passer rating and win percentage (0.648; 0.512 to 0.752) and the correlation between DVOA and win percentage (0.626; 0.484 to 0.736).

    So if you're comfortable with the clutch weighting of EPA, whose calculation isn't known, but is explained in this way:
    ...then you may find clutch-weighted EPA per play an appealing measure of quarterback play.

    Here is a description of EPA: https://www.advancedfootballanalyti...s-explained/expected-points-and-epa-explained

    https://www.espn.com/blog/statsinfo...-calculated-we-explain-our-quarterback-rating
     
  3. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    What???

    The first quote is fine (though I haven't double-checked it), the second I don't for a moment believe. First of all, the correlation on a per play basis cannot be higher than the correlation after you average across plays. How is that mathematically possible? When you average things you reduce variance! So the correlation could be higher with the average but I see no way it could be lower.

    It also makes no sense conceptually. If what you're saying is true (which I'm sure it's not), then those 10k lines of code are basically doing nothing. That is, apportioning credit among players (this is ESPN's main selling point) is essentially not happening, nor is adjustment for opponent strength, nor is adjustment for difficulty. You're saying all that has no influence on QBR. It's just "clutch-weighted EPA".

    And btw.. 0.98 correlation is NEVER seen with football stats. That's so high it's almost like there's no random variation.

    No I don't believe you. You need to show us the data. How did you get clutch-weighted EPA per play and QBR per play? ESPN gives you both for a season (https://www.espn.co.uk/nfl/qbr/_/seasontype/2) but where did you get that per play? Showing people data per play is dangerous from ESPN's point of view because it makes it easier for people to reconstruct the formula.
     
    resnor, Pauly, Irishman and 1 other person like this.
  4. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    I can understand the concern, but I don't think they averaged across plays. I think when they post EPA it's simply additive. Notice that Tannehill for example, who played in only 11 games, has an EPA of only 45.8 in 2019, whereas Patrick Mahomes, who played in 14 games, has an EPA of 97.3.

    Anyway the data are from this page:

    https://www.espn.com/nfl/qbr

    You can see at the top there where you can go back to past seasons. The data are from 2017 to 2019, regular season only, and EPA per play is simply the "EPA" column divided by the "PLAYS" column.

    Also, their lines of code and the adjustments you're talking about are even more meaningless, because the correlation between EPA per play and QBR "RAW" (the right-most column on the above page) is 0.99 (again from 2017 to 2019), and the fact that isn't 1.0 may have more to do with different numbers of decimal places than anything else.
     
    cbrad likes this.
  5. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Another interesting finding from those data:

    The correlation between quarterbacks' win percentage and "PAA" on the above webpages (which is defined by ESPN as "number of points contributed by the quarterback, accounting for QBR and how much he plays, above the level of an average quarterback") is 0.66, which is stronger than the correlation between win percentage and 1) adjusted (to 2019) passer rating, 2) DVOA, 3) QBR, and 4) EPA per play.

    The correlation between PAA and EPA per play? 0.97.

    I think we've cracked the code.
     
    cbrad likes this.
  6. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Well something had to be wrong. Your calculations are correct. That means that ESPN's 10k lines of code actually are doing NOTHING!!

    In other words, as you say, ESPN's QBR is nothing more than EPA per play. All the code they have for apportioning credit among players is being swamped by the extra noise from the parameters – exactly that "bad modeling" that I was talking about.

    This also means that they don't have any better way of teasing apart individual QB ability than any other "QB" stat. Actually, EPA is MORE of a team stat than passer rating because passer rating is a team offensive passing stat while EPA includes everything on offense.

    You cracked the code, not me. Good job!
     
    resnor, Irishman, Pauly and 1 other person like this.
  7. Disgustipate

    Disgustipate Season Ticket Holder Club Member

    31,608
    55,628
    113
    Nov 25, 2007
    This is a really interesting thread. I took an education-based research/statistics class last year and was able to play around a bit with SPSS, and the entire time I was doing the different labs I kind of wished I could have spent the time trying it out for football stat stuff.
     
    Irishman, Pauly, cbrad and 1 other person like this.
  8. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Download the Jamovi program online and you can do the same thing on your own time.
     
    Disgustipate likes this.
  9. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Yeah it’s essentially Brian Burke’s influence at ESPN. On his Advanced Football Analytics site, his two main variables were EPA and WPA, and now QBR is simply EPA down-weighted by WPA in garbage time.
     
  10. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Wiki says QBR was last modified in 2013. Maybe there have been minor modifications since then, but Burke joined ESPN in 2015 so I doubt he was the one who made the formula so reliant on EPA. This almost certainly predates him.
     
    The Guy likes this.
  11. Pauly

    Pauly Season Ticket Holder

    3,696
    3,743
    113
    Nov 29, 2007
    As someone who has built and used computer models it is comforting that 99% of ESPN’s model is essentially fluff and bubbles that are there purely as window dressing.

    The more complex a model is the more it tells you about the assumptions of the user. The more chance there is of a small initial error in the assumptions magnifying or minimizing the effect of one component.

    The good news is we can ignore ESPN’s QBR and just use EPA and WPA.
     
    Irishman and cbrad like this.
  12. Pauly

    Pauly Season Ticket Holder

    3,696
    3,743
    113
    Nov 29, 2007
    A more modern approach should be able to get you a better outcome than the NFL’s passer rating. The NFL’s passer rating was developed in the late 60s/early 70s and had to rely on slide rules and pen and paper calculations to develop. It had a much more basic set of stats to depend on (for example sacks weren’t tracked separately and treated as negative rushes on the stat sheet).

    What is surprising is that passer rating has proven to be so robust and so predictive given the changes in how the game is played over the last 50 years.

    The last time I calculated correlation of adjusted to common year passer rating made to win % was 2 years ago, and I only had coded the data for 10 years. I got 0.67 correlation of passer rating made to win% for 2006 tp 2016. (Here https://www.thephins.com/threads/building-a-winning-team.91138/
     
    Irishman and The Guy like this.
  13. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Again in terms of the variables on this page:

    https://www.espn.com/nfl/qbr

    ...the correlation between clutch-weighted EPA per play and QBR RAW is 0.99, which I suspect is actually 1.0 but is less because of differences in numbers of decimal places.

    The correlation between clutch-weighted EPA per play and QBR is 0.979 (95% CI = 0.968 to 0.986).

    So that means the transformation from "QBR RAW" to QBR -- which is where I suspect the adjustments beyond the clutch down-weighting for WPA are made -- is responsible for a change in the variance in QBR accounted for by only clutch-weighted EPA of a mere 2.2%.

    So whatever they're doing between "QBR RAW" and QBR isn't changing things hardly at all. You might as well just stick with clutch-weighted EPA per play.

    Now, the clutch-weighting itself (using WPA) may make a significant difference over and above EPA per play, and that's part of their model as well, but unfortunately we can't know that because of reasons we've gone over here.
     
    Last edited: Feb 9, 2020
  14. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Well keep in mind here that these statistics (ESPN's) consist of "plays on which the QB has a non-zero expected points contribution; includes most plays that are not handoffs." So that includes quarterback runs, sacks, and -- very importantly -- penalties, like pass interference for example.

    So the fact that we're getting a slightly stronger correlation with win percentage by including far more of the activity of quarterbacks on the field, while again down-weighting (using WPA) for garbage time statistics, I think makes QBR (or clutch-weighted EPA per play...) a more attractive statistic than traditional passer rating.

    I'm with you with regard to being surprised that we can't do any better with regard to predicting win percentage on the basis of QBR, but we may in fact be capturing more of what the quarterback is doing in relation to win percentage. And therefore it nonetheless may be a better measure than traditional passer rating of quarterbacks' individual performance/ability.

    EDIT: To support the above, consider that the correlations among clutch-weighted EPA per play, QBR RAW, QBR, and PAA are all 0.97 or more, whereas the correlations between those same variables and traditional (adjusted) passer rating are in the high 0.70s.

    So if what ESPN is doing is truly capturing more of what the quarterback is doing on the field, we're talking about somewhere in the neighborhood of 33% of variance accounted for by those methods. Again if that's validly measuring quarterback play, that's an awful lot of variance.
     
    Last edited: Feb 9, 2020
    Pauly likes this.
  15. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    For reference these are the correlations to win% from 1966-2019:

    Correlation between passer rating and win%: 0.634
    Correlation between passer rating allowed and win%: -0.595
    Correlation between passer rating differential and win%: 0.7972

    And while the passing game changed dramatically since 1978, what's amazing is that these correlations are almost exactly the same if you only look at 1978-2019. So passer rating is definitely robust which is good.

    Something else those correlations show through the square of the correlation r^2 which tells you how much variance in one variable is explained by the other (see first post in this thread for an explanation): 0.634^2 = 40.2% and (-0.595)^2 = 35.4%, so offensive passer rating explains about 5% more of win% than defensive passer rating. It's just one of many stats that show that offense is statistically speaking slightly more important than defense in the NFL (not every game, just on average).

    Regarding how passer rating could be improved, there are obviously all kinds of parameters you could add, from sacks to air yards (instead of passing yards) to QB rushing (if you're interested in a measure of QB ability rather than strictly passing efficiency), etc... And any improvement should also automatically adjust for era so that (let's say) 100 is defined to be league average.

    But there is one "mathematical" flaw in passer rating that needs to be corrected before worrying about adding other parameters: the artificial ceiling passer rating has on how much any of its components (COMP%, Y/A, TD%, INT%) could influence it.

    Let me give an example:

    If you have 15 completions in 20 passing attempts with 200 yards, 3 TD's and 3 INT's, your passer rating is 106.25. Maybe to the surprise of some people, if you had those same stats but had 4 TD's instead of 3 TD's, your passer rating is STILL 106.25. You could keep increasing the number of TD's to 5, 6, 7.. etc.. and no matter how much you increase it (as long as TD + INT is less than total completions) your passer rating is still 106.25 lol.

    That occurs because the formula puts in an artificial ceiling so that you can't go higher than a 158.3 rating (a "perfect" rating). Same thing occurs with INT's. You could have 15 completions in 20 passing attempts with 200 yards, 3 TD's and X INT's where X > 3 (X could be 10 INT's) and you STILL have a 106.25 rating, which is totally absurd.

    The problem is "mathematical" because a linear relationship is assumed for the entire scale of possible passer ratings, meaning that each extra TD counts exactly the same as the previous one. Generally it's much easier to increase TD's from 0 to 1 than from 4 to 5 in a game (if for no other reason than fixed amount of time in a game). So a linear relationship shouldn't be assumed. The relationship should be sigmoidal. In other words, a simple improvement on passer rating would be to remove the ceiling restrictions, take the result without a ceiling and use a sigmoidal function to arrive at an improved formula. Almost guaranteed to very slightly improve correlations to win%.
     
    Irishman, Pauly and The Guy like this.
  16. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    I think it's time we stopped talking about QBR and instead talk about EPA or clutch-weighted EPA. QBR can be dismissed if the entire purpose of it was to apportion credit among players and the code that does that (which is black box) actually does nothing lol.

    So the question should be how does clutch-weighted EPA compare to passer rating. EPA weights the importance of a play based on how it affects the probability of scoring points (most direct relation to win% for the offense) while passer rating doesn't. And as you point out EPA incorporates plays like running plays, sacks and penalties, which passer rating doesn't.

    The question however is: which is capturing "QB ability" more. Let me pose this question: if your goal is to measure the ability of a QB to complete passes in tight coverage, would you want to weight completion percentage by the probability of scoring points?

    I'd say no. Game condition, field position, and how many expected points you add shouldn't matter if you're interested in "ability". You should just look at completion percentage as a function of different levels of "coverage" (a measure of difficulty of the task).

    So I don't think weighting plays by the probability of scoring points is the best foundation for a better measure of QB ability. EPA is probably a good foundation for measuring how much the team relies on a QB to win, but that's a different question than QB ability.
     
    Irishman, Pauly and The Guy like this.
  17. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    I see what you mean, and it's tricky, because what EPA does as well is attribute more "success" to a seven-yard pass for a first down on 3rd and 6 than to a nine-yard pass that doesn't result in a first down on 3rd and 10, even though the latter QB passed for two more yards than the former.

    So there is some degree of "difficulty level of the task" built into that kind of measurement, in that opposing defenses are naturally going to play to stop those sorts conversions, and others like them (e.g., red zone offense and stopping touchdowns).
     
  18. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
  19. Pauly

    Pauly Season Ticket Holder

    3,696
    3,743
    113
    Nov 29, 2007
    One point to note about the fivethirtyeight article is that they say that early round picks are overvalued according to Jimmy Johnson’s draft chart and later round picks undervalued.

    The counterpoint to that is that you can only have 11 players on the field at one time (and many positions you can only field 1 or 2 players at any one time). Having good depth with 3 late round picks who can perform at NFL average level at a position sounds nice in theory, but to create favorable matchups a team would be better have 1 pro-bowler, 1 average level backup and one scrub. Because of the limited resources you can put on the field a player who produces 10% more than average demands more than +10% additional resources (salary or draft position) than an average player.
     
    Last edited: Feb 10, 2020
    The Guy and Irishman like this.
  20. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Now that we've discovered QBR is nothing more than clutch-weighted EPA (CWEPA), I thought it would be interesting to revisit the correlations between CWEPA and traditional passer rating (PR) on an individual game basis in 2019. The correlations recently discussed above involving CWEPA were on a season-long basis, with data from 2017 to 2019.

    So again these are for 100 individual NFL games from 2019, selected at random:

    CWEPA and PR (offense): 0.66
    CWEPA and PR (defense): 0.73
    CWEPA differential and PR differential: 0.60

    CWEPA (offense) and points scored: 0.56
    PR (offense) and points scored: 0.59

    CWEPA (defense) and points allowed: 0.63
    PR (defense) and points allowed: 0.71

    CWEPA differential and points differential: 0.58
    PR differential and points differential: 0.67

    So the first thing of note in my opinion is that CWEPA and passer rating are obviously measuring something different. CWEPA accounts for only 44% of the variance in passer rating offensively, and only 53% of the variance in passer rating defensively.

    I'd be interested to hear others' observations as well.
     
  21. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
  22. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
  23. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Yeah so first of all that's a good research paper in the sense that they're using a reproducible approach for trying to define Wins Above Replacement in the NFL. And providing the R package nflscrapR for others to download play-by-play data for themselves is good.

    That research group by Horowitz is one of the few that actually takes statistical analysis in the NFL seriously.

    Having said that, the nflscrapR package was developed long time ago and till this day they haven't uploaded that to CRAN (Comprehensive R Archive Network) which is where you'd upload R packages if you want to do it professionally. CRAN doesn't just allow you to upload anything. It goes through a review process (just like publishing a paper) and you have to make sure everything is backwards compatible and works on all platforms (e.g., Windows, Mac, Linux, etc..) and also that there are no identifiable errors in it, etc...

    I published an R package on CRAN so I've been through that. They haven't yet with nflscrapR and it shows because when I tried installing it, it got hung up on another R package nflscrapR depends on called rlang. In other words there's some dependency in there that prevents me from installing the whole thing. Regardless, the 2009-2018 play-by-play database I have comes from that group (that is, they compiled the database and put it on Kaggle for others to download).

    So that's a bit disappointing.

    As far as the methodology, it's transparent and uses traditional approaches such as hierarchical linear models (multilevel models) so that's good, but in no way do I think it solves the problem of division of credit. For example, there is no way for them to estimate interaction effects and there are obviously interaction effects among players.

    But yes for a research paper it's good because it's the antithesis of proprietary approaches like ESPN's QBR (or clutch-weighted EPA) or FO's DVOA. So it's a good starting point, but not the solution.
     
    The Guy and Irishman like this.
  24. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    OK thanks. So what do you think about the validity of the WAR constuct as outlined and calculated in that paper, if we were to use it to compare players? Not so great I take it.
     
  25. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    I think it's fine to quote it as an example of where statistics can currently take you (and also its limitations), but I wouldn't assign it any greater credibility than a stat like passer rating or so. It would be interesting for discussion purposes but it's not gospel.
     
    The Guy likes this.
  26. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Check this out:

     
  27. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    That was bound to happen: someone using machine learning on real-time data from football. The issue of course is that the same machine learning algorithm will likely give you very different changes in expected points or win probability for the exact same play after it is trained on more data. So those graphs don't remain the same (for the exact same play) over time.

    So on the face of it these things aren't reliable. However, if these guys can take the next step and show that machine learning makes more accurate predictions than any other method, then they have something.
     
    Surfs Up 99 and The Guy like this.
  28. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Here's the article for it:

    https://arxiv.org/pdf/1906.01760.pdf
     
  29. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    Interesting from this weekend:

     
  30. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
  31. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    That's not a good article. The interpretations of some of those stats are too loose and unjustified.

    For example, in the "Why luck plays an increasing role for NFL quarterbacks" section, he's saying that because the variance in some stat (e.g., completion percentage) has decreased over time that therefore luck plays a more important role than before. That's bull****. If variance decreases, all that means is that a given difference in skill leads to a smaller measurable difference in the stat. It says NOTHING about random variation if you keep skill constant. Random variation with skill constant could also decrease, meaning that skill plays an equally important role as before.

    And with some of the play calling suggestions, like whether to go for it on 4th down more often, one should at least put in the qualifier that the stats are being gathered under conditions where coaches do NOT go for it that often. One can't just assume that the stats will remain the same if coaches change their play calling tendencies. Such stats are useful suggestions for changing play calling tendencies but no one should interpret them as predicting the outcome if play calling tendencies change.

    This is more akin to pop science than something to take seriously. In fact, taking it too seriously could lead to loss of credibility for statistical analysis because the interpretations aren't properly qualified.
     
    Irishman and The Guy like this.
  32. Pauly

    Pauly Season Ticket Holder

    3,696
    3,743
    113
    Nov 29, 2007
    I have a different criticism about the article than cbrad. A couple of times the writer intones “regression to the mean” as if it is a magic spell that will force things to happen.

    Regression to the mean simply means that a series of lucky outcomes from random events is more likely in the future to continue at the average rate than continue at the “lucky” rate. A simple analogy is if you flip a penny 10 times and it comes up heads 10 times. Regression to the mean says that it is more likely that the next 10 throws will be split 5 heads and 5 tails than to continue with another 10 heads.
    However, if you are in the penny tossing situation and heads just came up the last 10 times If you just say “regression to the mean” and predict the next 10 throws are most likely to be 5 heads and 5 tails that is bad analysis. The proper way to do the analysis is
    1) check that you don’t have a 2 headed penny (or is any other way biased to heads)
    2) check that the person throwing the penny doesn’t have a technique that influences the chance of getting heads.
    3) Only after eliminating possible biases that alter the base line probability do you say “regression to the mean”.

    A football related example has just been posted by cbrad in the Ryan Tannehill thread (2nd graph i. post 10,139 if anyone needs the specific reference). I have read many times over the last few years at analytics sites that a team’s record in 0-7 point games should be 50% and long term success in close games cannot be maintained, and if a team has a good record in one year that regression to the mean says the success is unlikely to continue next year. However cbrad’s research shows the opposite. It shows that good teams are more likely to win 0-7 point games and bad teams are more likely to lose 0-7 point games. What does change is a team’s relative success from year to year because of personnel/coaching changes. There are few franchises with outstandingly good or bad records where you can see this effect over longer timeframes (example of a good team with a sustained positive record in 0-7 point games New England, example of a poor team with a sustained negative record in 0-7 point games Cleveland).
     
    The Guy likes this.
  33. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
    cbrad likes this.
  34. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Yeah I saw that. You have to use a web scraper. There are enough tools people have developed for that (better than the one I wrote so I switched lol) and some get it directly from NFL.com:
    https://github.com/maksimhorowitz/nflscrapR

    Thing is, you have to learn how to program in R to use that (there's one for python too). It's worth it though, and I don't mean for gathering football stats! (worth it in general to learn programming)
     
    The Guy and Irishman like this.
  35. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  36. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Irishman and The Guy like this.
  37. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018

     
  38. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
     
  39. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018
  40. The Guy

    The Guy Well-Known Member

    6,598
    3,323
    113
    Oct 1, 2018

Share This Page