(NOTE: This post was written in early December 2012, but only recently edited to be publishable. Most of the current-season NFL stats herein are therefore pretty nonsensical now. Apologies in particular to Alex Smith, who I feel kind of bad for these days.)
I don't really like watching most sports. Baseball, especially in a post-
Moneyball world, holds all the excitement of watching a weighted random number generator update every few seconds for me. Basketball at least moves quickly, but it's hard to get excited about a game where only the last few minutes seem to matter very much. I don't have a snarky reason for not liking hockey (in principle it's fast, violent, and scoring is infrequent enough to matter) but I never quite got into it, despite basically learning to curse from watching my parents watch Flyers-Islanders games as a kid.
Football (note to non-American readers, if any: I'm not talking about that thing you play with the round ball and awesomely faked injuries. Never got into that either, sorry) is a different beast. It's violent as hell, deceptively strategic, and to a large extent totally unpredictable. Anyone attempting to apply Moneyball-style "Sabermetrics" to the NFL runs into basically the same problem I did when I was trying to optimize my fantasy football team using the powers of math: there are so many variables, most of them coupled together, as well as totally random factors like injuries (which are much, much more frequent than in other sports), that we haven't yet built a supercomputer that can do anything useful with NFL stats analysis. And quite honestly that's kind of awesome.
One downside of the fact that the NFL is completely impervious to rigorous statistical analysis is that it still uses a lot of the weird, probably anachronistic metrics that baseball has largely phased out. For example, a key stat for receivers is number of catches, which is fairly meaningless without knowing how many times they were thrown catchable passes, which is going to depend in large part on the quarterback, who is relying on the offensive line to give him time to accurately throw the ball, etc etc. Like I said, it's really hard to pull useful individual numbers out of a sport where everything is so interconnected, so we mostly just don't bother and count up what we can, whether it makes much sense or not, and hope averaging over a long enough period (usually the whole season) will get rid of most of the noise. Which brings us to the weirdest artificial stat of all: the QB passer rating.
The QB passer rating was apparently designed in 1973 by Don Smith of the Pro Football Hall of Fame, who was trying to find a useful way to quantify QB performance holistically, since looking at a single statistic (completion ratio or points scored, for example) is a highly misleading way to evaluate all the things that make a good quarterback. It's supposed to be an aggregate measure of quarterback performance, distilling completed passes, points scored, times sacked, interceptions, and other stuff into one convenient number that even ESPN reporters can understand. If the number is higher, the quarterback is better. It usually falls between about 50 and 150 and seems to correlate reasonably well with observed quarterback performance (Peyton Manning, for example, has a season rating of 108 as of this writing, while perennial object of ridicule in my household Jay Cutler is rocking about an 81), but sometimes seems to be way off (or maybe Alex Smith really is the third best quarterback in the NFL right now? Who knows). What you never see or hear about, probably because most sportscasters have no idea, is how this magic number is actually calculated.
I crammed the
entire NFL passer rating calculation into one equation to give you an idea of just how complicated it is:
All four of those terms in the numerator are bounded, meaning they get replaced with zero if they go negative and aren't allowed to be higher than 2.375, just to add confusion.
Ignoring all the arbitrary weighting, the four relevant metrics here are completion percentage (comp/att), average yardage (yards/att), touchdown frequency (TD/att), and interception frequency (INT/att). The first two metrics (completion percentage and average yardage) are pretty hard to argue with. You obviously want a quarterback to complete most of his passes, and ideally you'd like those passes to be for as much yardage as possible. Similarly, interception percentage is pretty important, since throwing lots of interceptions (hi Michael Vick!) is somewhat less than ideal. TD frequency is the most problematic of the bunch. Offenses with good running options are going to drive down the field a lot, get close to the goal line, and then run the ball into the end zone, meaning the QB doesn't get a touchdown credit no matter what he did on the drive (even if the QB is the one who runs the ball in, oddly enough). So TD frequency is going to be biased pretty heavily towards QBs leading pass-centric offenses, although good quarterbacks are still going to generally throw more touchdown passes than bad ones.
Sum it all up and you get a number between zero and 158.3, which is a perfect passer rating. Interestingly, while you'd think that would mean perfect passing (all completions, no interceptions, lots of yards and touchdowns) it's actually almost real-world attainable, thanks to the way each stat is weighted; a perfect rating corresponds to a lower bound of 77.5% completion percentage, 12.5 yards per attempt, and touchdowns on 11.875% of passes (and zero interceptions, obviously). That's by design, since "perfect" is completely unachievable in some categories (100% of your passes are for TDs?) and just nonsensical in others (an average passing yardage of...INFINITY?); it was set up so the numbers would be low enough to be theoretically achievable, but still high enough that it's unlikely that a quarterback could hit the ceiling on any of them, let alone all four at once. Unsurprisingly, that's been proven wrong; there are lots of QBs who have exceeded the "perfect" threshold on one or more of these stats over one game, and perfect single-game ratings have been achieved
41 times at last count, most recently by Robert Griffin III against my own perennially useless Philadelphia Eagles in November of 2012.
There are some pretty obvious problems with the passer rating formulation, as you'd probably guess. Conspicuously left out of the numbers are sacks, fumbles, and any kind of rushing contribution, meaning "running" QBs like Cam Newton and RGIII are always going to get undervalued in the passer rating (it should be noted at this point, to be fair, that it is called a "passer rating" and not a "quarterback rating," so only taking passing-related stuff into account is technically kosher). Similarly, getting sacked a lot or fumbling the ball won't hurt your rating (since there was no pass attempt), even though a QB that's constantly getting sacked and/or fumbling the (hi Michael Vick!) is not particularly useful. From a more numerical standpoint, making it possible to "max out" the passer-rating contribution of any of the four statistics it measures is problematic. Take, for example, two QBs with identical stats, except that QB #1 has an average of 15 yards per completion and QB #2 has an average of 12.5. Since the completion-yardage term has to max out at 12.5 yards/pass, they'd both have the same passer rating; that extra 2.5 yards/completion that QB #1 has effectively don't count. Luckily it's next to impossible to string out numbers like that over a whole season; while single-game perfect passer ratings have been achieved fairly often, the highest season-length passer rating ever was 122.5 (Aaron Rodgers 2011, if you were wondering). The last (and presumably only) time anyone has maxed out even a single term in the passer rating equation was in 1943, when Sid Luckman posted a 13.9 yards/completion ratio. So taken over a whole season, the "ceiling" values for all four terms are apparently high enough to not matter, even though they can underrate really good single-game performances. Why the "perfect" number was left at 158.3, instead of just being normalized to 100, is completely beyond me; it doesn't really matter, it's just weird is all.
So the passer rating is kind of an imperfect statistic. In the only way it matters at the end of the day (correlating to whether or not QBs can win football games) though, it's pretty good; in 2010; 80% of the NFL games played were won by the team with the higher-rated quarterback. There have been some attempts to tweak the formula, most famously with ESPN's largely-ignored
total quarterback rating, an attempt to take all the situational factors that could possibly affect a QB's stats into account. It was insanely complex, drew on lots of weird hard-to-access stats like pass travel distance and something called a "clutch factor", proprietary (ESPN never revealed exactly how it was calculated), and not appreciatively better than passer rating at predicting wins. In principle it wouldn't be too hard to just tweak the passer rating formula to, say, count rushes as pass attempts, rushing yards as pass yards and rushing touchdowns as touchdown passes (that's not perfect but it's a starting point), but to my knowledge nobody's done it, or if they have it hasn't caught on.
The entire clusterfuck that is the passer rating calculation is an almost perfect illustration of the mind-bending difficulty of applying statistical analysis to football. It's an oddly-calculated, arbitrarily-aggregated statistic that has some pretty major omissions and issues, particularly when used to calculate single-game performance. It also predicts quarterback performance fairly well when integrated over a good number of games (a season, say) and it's simple enough to calculate it that anyone with a pen and calculator (and access to Wikipedia for the formula, probably) can do it. That makes it a pretty good metric, especially compared to previous ways quarterback performance was measured, and the fact that ESPN's attempt to take every single possible contributing factor of quarterback performance into account didn't produce appreciably better results speaks volumes for its efficacy. That's the beauty of statistics; you don't need to know everything about the system you're analyzing if you can average over enough data and come up with a way to measure the results that helps you predict future behavior. You don't need to know the thumb position, angular velocity, and air turbulance during 100 coin flips (basically the equivalent of what ESPN's system was trying to do) to know that about 50 of them will end up heads, to use a more concrete example.