1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Understanding the limitations of stats in the NFL.

Discussion in 'Miami Dolphins Forum' started by Pauly, Feb 18, 2016.

  1. Pauly

    Pauly Season Ticket Holder

    3,697
    3,745
    113
    Nov 29, 2007
    Warning: Longish post.

    There have been a lot of posts recently which have involved a lot of nerd fights about statistical analysis of the NFL. I fully endorse the use of stats and really appreciate how much more statistical analysis is available now compared to the past. But the NFL has major limitations on how useful stats can be, especially compared with sports like MLB.

    Limitation 1. In Football time and field position are assets. So a 1 yard rush on 3rd and 5 is considered a failure in advanced stats, yet for a team that really wants to burn another 40 seconds off the clock, their primary aim on the play might be to complete the play with no fumble and no clock stoppage so the coach would call it a successful play. If the play is in field goal range another goal of the play might be to end the play in midfield to allow an easier FG attempt. Since stats measure yards and downs, not field position or time, what can be successful in-game might be measured as a failure by statisticians.

    Limitation 2. Almost nothing in football is what a statistician would call an independent variable. An independent variable is if I change X how does Y change. A simple example in baseball is batting percentage against right handed and left handed pitchers. The pitcher’s handedness is the independent variable and you can measure the difference. Since everything is dependent on other factors then you end up with chicken and egg arguments that are unprovable. For example people try to prove a WR’s performance was helped/hindered by the QB. The QB’s stats are affected by the WR’s abilities, the WRs stats are affected by the QBs abilities and both of their stats are affected by the O-line, the quality of the opposing Defense, ad nauseum.

    Limitation 3. Game situation is king. Teams playing behind are much more likely to try high risk/high reward plays. Teams playing ahead become more risk averse. This means that teams that are behind often end up asking their QB to throw more high risk passes, and ask their RBs to extend plays, increasing the risk of fumbles. So teams that are behind are more likely to commit fumbles than teams ahead on the scoreboard. i.e. being behind means you are more likely to commit a turnover, and teams ahead on the scoreboard are less likely to commit turnovers. Teams that are behind often blitz more to try to force a mistake by the offence, yet if the blitz doesn’t get home fast enough it gives the opposing QB opportunities for a big play.

    Yet it’s an ingrained belief that the turnovers cause a team to lose, when it can be that teams commit turnovers because they are losing. People might say 20 plus carries for a RB caused a team to win, but it can be the RB got his 20 plus carries because the team was winning.

    Game situation dictates what type of passes are thrown, what type of defenses are used, run/pass ratios. So stats are never equal because teams are never in equal positions.

    Limitation 4. Stats are compiled without knowing what play was called or what the player’s actual assignment was. So players might get a neutral grade for doing nothing much yet their coach might give them a critical failure for missing their assignment on the play. Other times a player might have a critical success that allows a play to succeed, for example the selling of a route by a WR that drags a DB out of position, yet it gets recorded as a nothing by the stat keepers.

    Limitation 5 Playcalling. For example if we have a relatively slow WR with great hands and good agility. If he is asked to run lots of curls and comebacks he’ll have a crappy YAC, yet if the same WR is asked to run a lot of bubble screens he’ll have a good YAC.

    Limitation 6. Advanced stats are based on someone else’s perception of reality. The people who compile them compile them based on what they think is important. The simple fact that sometimes DVOA and PFF and other so called advanced sites will sometimes disagree strongly is proof that they are imperfect.

    So while stats are valuable, and they have improved by several orders of magnitude in the last 10 years, it is very dangerous to rely solely on stats to evaluate a team or player. In the NFL everything is dependent on everything else.
     
  2. Fin-O

    Fin-O Initiated Club Member

    11,377
    11,394
    113
    Sep 28, 2015
    Nice


    Sent from my iPhone using Tapatalk
     
  3. CaribPhin

    CaribPhin Guest

    I agree. A lot of uncontrollable variables involved in football. The good thing is that since that's the case for everyone, comparative statistics don't really suffer much.

    That's why you have statistics like Success Rate:

    Using a statistic that isn't applicable really isn't the fault of statistics. It's the fault of the person checking the stats.

    I think you're a bit confused here. You can have more than one explanatory (independent) variable. A simple demand equation can be written as:

    'Q' (Quantity demanded) being the dependent variable, 'P', 'I', 'A', and 'Z' are all independent variables. Once you have sufficient data and have generated such an equation, you can easily isolate the individual variables and determine their relative effects on the dependent variable. There are extra variables that can affect pitcher performance other than handedness including avg. pitch speed that day, wind speed, temperature, stadium size, team defense, but you can isolate those things out. It's much easier to measure and isolate in baseball especially given sample size available but to say there's only a simple handedness relationship between pitcher and batter performance is an oversimplification.

    Here is the 'correlation vs. causation' argument that is well known by now. This stuff is well understood and implicitly included when comparing across players and teams. Statistics like Total QBR (from ESPN) attempt to adjust for situation and PFF's grades do as well. TQBR is difficult to trust because ESPN's proprietary statistics' methodologies are always kept secret but there are stats out there. DVOA and DYAR are defense adjusted measures of performance and therefore take into account the likelihood of success of players against who they're playing. If you were likely to be throw those INT's or only gain 4 yards against Denver, then the statistics don't hurt as much as laying an egg against the Giants.

    Stats like win probability help here as they track the likelihood of winning at a given point of the game. A first quarter INT up 5 is not nearly as harmful as a fourth quarter INT down 7. Win probability factors in play results relative to historical probability of success after that given result at that given point in the game. Add in stats and grades adjusted for defense and situation, then you have a better picture of what's going on.


    PFF grades, while subjective, do attempt to understand the role a player has in a given play and adjusts their grade based on their success at performing that role. So does TQBR, success rate, and others. This is an issue of stat applicability. If you're judging a blocking TE based on yards receiving, that's your issue, not an issue of statistics. Things like blocking and decoy effectiveness are hard to quantify but rarely will you ever see someone bashed for lacking yards when it's well known that they're not supposed to get any. Statistics won't show how well Jarvis Landry blocked on a screen pass but I don't see what the issue is. Decoys don't get thrown to so no one really bashes a WR on a play he wasn't thrown to unless he clearly ran a poor route or gave little effort.

    Consistency in play-calling helps here. Some receivers are built from certain molds. Mike Wallace is a burner who just catches a pass. Jarvis Landry is a short route YAC guy. Based on the consistency of play-calling, you know which stats to apply to which players. Wallace's measure of success is going to be YPC or yards per route run. The latter of which is similar to YPA for QB's. If a guy has a low number of catches but those are high, you know he's doing his job as a deep threat. Landry on the other hand is going to be more about receptions and YAC as you know consistency in play-calling is going to ensure that those are high for him. If they aren't then either the QB isn't getting the ball to him enough/in the right spots, or he isn't doing his job. Knowing who the QB is (intervening variable) helps with understanding whose fault it is.

    Not even; advanced stats are empirically based and tend to use historical data to justify their usefulness. DVOA, for example, takes into account down and distance, opposing defense, compares based on similar situations, historical data, and is normalized. As I explained above, PFF grades are subjective play by play grades and are not comparable to DVOA unless you're trying to make a sketchy point. Predictive ability is very important in statistics and DVOA and other stats have great predictive ability. Football Outsiders does an AMAZING job of explaining their numbers and how they work.

    The reality is that comparative statistics with the same faults for every player are still very useful. While, for example, Matt Ryan may be held back by offensive play-calling, that does not change the fact that his play is less than elite. There is no correcting factor so it's up to us to be judicious with our criticism. Advanced stats do have a great benefit in that they sometimes do adjust for offensive play-calling and surrounding talent. For example, Lamar Miller was an advanced stat beast in 2014 even though his usage downplayed his talent.
     
  4. tirty8

    tirty8 Well-Known Member

    1,333
    1,389
    113
    Jan 2, 2016
    Well done! Loved the post.
     
  5. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Some of this is valid some not. In principle, stats work (to different degrees) anytime you can measure something. So to say "there is a major limitation on how useful stats can be", you are really talking not just about stats that are currently used in the NFL but the applicability of statistical analysis in general.

    The only valid arguments are limitations 4 and 5, precisely because we aren't "measuring" play call or play design.

    The others are in principle wrong. You can measure time and field position, as well as whether a team is ahead or behind, so in principle you can condition on those, eliminating limitation 1 and limitation 3. Limitation 2 isn't true if you use statistical techniques that look for structure in data, like factor analysis or principal components analysis. The statistical tools won't tell you how to interpret the data, but they can tell you which variables are most important and can help create models that can be tested so you don't have "chicken and egg" issues.

    Finally, limitation 6 isn't true if it's just simple descriptive or inferential stats, which are just logical implications of the data. Those don't depend on different people's perceptions of what's important. DVOA is NOT a stat despite what some people here like to claim. It's a model based on stats with subjective weights implicitly assigned to things through that model. Descriptive or inferential stats are logical implications of the data, not something that is based on data but includes assumptions in addition to it.
     
  6. Stringer Bell

    Stringer Bell Post Hard, Post Often Club Member

    44,356
    22,480
    113
    Mar 22, 2008
    The NFL does not inherently have any limitations. Everything can be measured and quantified. While people have yet to measure and quantify many things associated with the game of football, it certainly is feasible. Simply requires enough intelligence and initiative.
     
  7. Stringer Bell

    Stringer Bell Post Hard, Post Often Club Member

    44,356
    22,480
    113
    Mar 22, 2008
    Agreed on all of this, but I think you can even take it a step further and quantify #4 and #5. Quantifying coaching is difficult, but there is enough data available to do it.
     
    cbrad likes this.
  8. djphinfan

    djphinfan Season Ticket Holder Club Member

    111,966
    67,940
    113
    Dec 20, 2007
    Lotta smart folks gonna be postin in here...lol...jmo..I think PFF has the most accurate data..I would trust them over naked data
     
  9. Fin-O

    Fin-O Initiated Club Member

    11,377
    11,394
    113
    Sep 28, 2015
    http://espn.go.com/blog/oakland-rai...reading-in-dan-marino-territory?ex_cid=espnfb

    I like Carr, he is a very promising young QB. But this is a perfect example of "stats" without facts.

    Despite some of our more often wrong contributors, these modern day "milestones" aren't all that spectacular. 4k yards is the new 3k yards, TD passes are as common as rushing TD's.

    Times have changed folks, you can't judge a QB from today with a QB from the early 90's without considering WHY.

    A good example? John Elway had 1 4k yard passing season in 16 seasons, Ryan Tannehill has 4 in 4 years. So all the talks of records falling is due to rule changes and volume...certainly not talent.
     
  10. Pauly

    Pauly Season Ticket Holder

    3,697
    3,745
    113
    Nov 29, 2007
    My experience has been that the more you more measure and the more you quantify the less reliable your projections become.

    For example a generic projection of 'home teams wins by 3' over the course of a season beats every sophisticated model I've seen. Another example is that passive Index funds out perform over 90% of actively managed funds. What are probably the most sophisticated and detailed models in Human history the climate models by the IPCC have all performed worse than the null hypothesis

    I know that stat heads, and I know I've been one, believe that the answer to the problem is more data.

    The most reliable and most accurate models in many of the fields I've looked at are the ones with the fewest moving parts. Finding out what the critical moving parts are is much more important than more detailed data.
     
  11. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Actually "stats without facts" is probably a more useful approach to making your argument. Even without knowing why things occur, knowing how to adjust stats from one year so that you can compare to other years is arguably more valuable than just knowing when rule changes occurred without knowing how to compare across years or eras.

    For example, you can do some simple fitting to show that you get a decent trend-line from around 1980-2005 where there's a "normal" (in the statistical sense) distribution around a line that goes from 3200 passing yards per year in 1980 to 3300 in 2005, giving you an average of 4 yards increase in passing per year over that time. Then from 2006-2015 you go from 3300 to 3900 with a very stable slope of 60 yards passing increase per year.

    So you want to compare stats from 2015 to say Marino's time in 1983? Using those crude trends, you are looking at 3900 vs. 3312, which is a 17.75% increase from 1983, or calculated from above, it's a 15.08% decrease. So take Marino's total passing stats and multiply them by 1.1775, or take 2015 stats and multiply them by 0.8492 and you can compare.

    Crude yeah, but stats give you the ability to make those types of comparisons while eyeball arguments do not.

    Oh, and the stats also show which rule changes actually had an effect while without stats you'd have no way of quantifying the effect of any rule change. Clearly the one in 1978 that allowed the OL to grab DL with extended arms, as well as the 10-yard chuck rule reduced to 5, had a tremendous effect, helping Marino shatter records by far more than he otherwise would. What explains the very steady rise in total passing yards from 2006 or so is harder to pin down because it's so steady year after year.
     
  12. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    You bring up a good point, but you have to qualify it properly.

    It has nothing to do with total number of measurements (or parameters in the model). It has to do with total amount of variance introduced by each extra parameter relative to the increase in predictive power. There are many cases where the extra variance introduced is small enough that you get an overall increase in predictive power with each extra parameter, up to some point after which the opposite starts to occur (what you're talking about).

    Where that tipping point occurs is different for each type of model. I've worked on models of how humans process visual input that include the physical optics of the eye, the different spatial resolution of the output neurons of the retina (retinal ganglion cells), parameters that estimate the shape of their receptive field (how each neuron responds to different stimuli at different points in the visual field), etc.. and all that is based on estimates from known human or monkey (macaques) anatomy and physiology as well as psychometric functions that describe human detection performance of certain localized spatial patterns at different points in the visual field. Those models get more and more accurate even up to 15-20 parameters, and most of those parameters have a physical or biological basis. Then, you start adding extra stuff to account for contextual effects due to stuff that happens in the next stage of processing in visual processing areas of the brain (so not the retina) and then you start to slowly see what you're talking about (usually predictive power just doesn't change much.. it levels off instead of getting much worse).

    All depends on type of model and how much it's anchored to phenomena you can directly test. Those climate models have too many things the computer has to estimate that can't be tested directly so the variance does at some point increase a lot with each added parameter.
     
  13. Finster

    Finster Finsterious Finologist

    3,087
    2,038
    113
    Jul 27, 2013
    Well that gives Dan 6000yds and 56 TDs, which I would say is about accurate.

    The passing law year was 2004, that's the year the 5 yd chuck rule was actually starting to be enforced, and there's why numbers are going up since 2004, it's true as you said that the rule was put in in 78, but it was never enforced, so Marino didn't actually benefit from it.

    "Illegal contact" penalties were the highest ever in 04, and Peyton broke Dan's TD record that year with 49, and then in 07 Brady broke it again, 50 this time, and now we see the dramatic affects, 15 of the top 20 TD seasons are from 2004 or beyond, 16 of the top 20 in passing yds, and it's only been 16 years.
     
  14. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    Well, statistically speaking something happened for real from '79. In the 1960's total passing yards varied around an average of 2900. Then something happened that I don't know about which pulled it down in the 1970's to 2500. From 1979 onwards, it's been an average of 3200 or so. There's no question that rule changed a ton, so while there may be some difference in enforcement level more recently, I'd maintain there's very good statistical evidence the 5 yard rule was having a huge effect on passing yards from the 1980's. Seen from this point of view, Marino's stats should be about 20% less than they were if he played in the 1970's, just statistically speaking of course.

    Anyway, what makes the 2005-present increase problematic to understand is that it's a very predictable and steady increase per year of 60 yards passing over a season, so it's not a single effect after a single rule change. It's happening each and every year. Not sure I can explain that, even accounting for the arguable increase in talent at the QB position.
     
  15. Finster

    Finster Finsterious Finologist

    3,087
    2,038
    113
    Jul 27, 2013
    The way they enforced it prior to 2004 was "1st contact ability", 5 yds from when the DB and WR had "1st CA", e.g., if the CB lines up with a 7 yd cushion, and backpedals 3 yards before "1st CA", he has until the 15 yds to make his chuck, and you can cut the field in half with a 10 yd cushion under those rules, and if you look at old game tape you can see instances of DBs delaying "1st CA" until 20-25yds downfield, and getting chucked 20 yds into a route pretty much kills the route, lol.

    So, I think this is what helped the passing era to start, but I also think it coincided with some QB/WR talent that was coming into the league, like Tarkenton, Staubach, Griese, Bradshaw, and Fouts, but from then until 04 the passing game grew slowly, since 04 it's changed drastically.

    I think it takes a couple years for everyone to get adjusted, and also the've been trinkling in different QB/WR protection rules since just prior to 04, and more since, which is why the stats really start going up post 06.

    1983= 2 QBs over 4000yds, 10 over 3000, 25 over 2000yds__ 1 QB over 30 TDs, 10 QBs with over 20, 18 with over 15.

    2003= 2 QBs over 4000yds, 15 over 3000, 27 over 2000yds__1 QBs over 30 TDs, 11 QB over 20, 18 over 15 TDs.

    ΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩ

    2004= 5 QBs over 4000yds, 16 over 3000, 28 over 2000yds__1 QBs over 40 TDs, 4 over 30, 15 over 20, 23 over 15 TDs.

    2015= 12 QBs over 4000yds, 23 QBs over 3000, 30 over 2000yds__ 11 QBs over 30 TDs, 21 over 20, 25 over 15 TDs.
     
    cbrad likes this.
  16. Stringer Bell

    Stringer Bell Post Hard, Post Often Club Member

    44,356
    22,480
    113
    Mar 22, 2008
    Vegas closing lines beat it. The power of free markets ultimately trumps everything, because it encompasses all available data.

    This is exactly my point - the more data you have available, the better you can find predictive data.

    One applicable example - for a number of years, the number of 'corner' 3-pt shots attempted was a very high predictor of a team's performance in the NBA. A model using just that measure was able to beat Vegas closing lines. The challenge was actually obtaining that data - many people were manually charting every game by hand to obtain this data.

    The more people measure, the more opportunity people have to identify quality predictive data.
     
    Pauly likes this.
  17. Fin D

    Fin D Sigh

    72,252
    43,684
    113
    Nov 27, 2007
    I feel like people are saying the OP is wrong because in a perfect world with unlimited info stats can tell us anything, while the Op is basically saying we don't live in a perfect world and the stats we have available aren't telling us what we need to know.
     
  18. cbrad

    cbrad .

    10,659
    12,657
    113
    Dec 21, 2014
    That's why I started off my post #5 the way I did, pointing out the language Pauly used doesn't make it clear he's only restricting his arguments to stats used today. You read his post and it can be taken to be a limitation of statistical analysis in general.

    But yes, he's more on target when you only restrict it to stats currently used in the NFL.
     
    Pauly and Finster like this.
  19. Pauly

    Pauly Season Ticket Holder

    3,697
    3,745
    113
    Nov 29, 2007
    Whilst I do think that stats will continue to improve, I am skeptical that NFL stats will ever become as value adding as they are in say MLB.

    Primarily because the interactions are more interdependent in nature than independent.

    If we take a stat like yards/attempt which is very solid indicator we really can't pull it apart into separate components like;
    QB ability
    receiver ability
    play design
    play calling
    game situation
    defensive pressure
    defensive cover
    defensive play design
    defensive play calling

    and assign a discrete value to each one. There's always going to be a degree of subjective weighting.
    While I believe stats will continue to improve, I don't think the nature of the NFL will allow pure statistical analysis to replace eyeball evaluations. BTW I think anyone who relies one the eyeball test over stats will be wrong more often than not.

    The future of NFL analysis is a better mix of eyeball and stats.
     
    djphinfan likes this.
  20. Pauly

    Pauly Season Ticket Holder

    3,697
    3,745
    113
    Nov 29, 2007
    As I understand it Vegas closing lines aren't built on mathematical modeling.

    With the 3 point data, that's based on an observable and testable phenomona. Whilst that sort of data wasn't being measured in the NFL 5 or 10 years ago, all the major data points are being measured now. Which has lead to much better statistical analysis, but no silver bullets as far as I know.
     
  21. Stringer Bell

    Stringer Bell Post Hard, Post Often Club Member

    44,356
    22,480
    113
    Mar 22, 2008
    Sure they are. Closing lines are built upon all available data that is utilized by those participating in the market.

    Great example of new data being made available and resulting an increased ability to predict results:

    NFL teams prepping for RFID data dump
     

Share This Page