This site is supported by the advertisements on it, please disable your AdBlocker so we can continue to provide you with the quality content you expect.
  1. Plum Diamonds Lab Grown Diamond Rings
  2. Follow us on Twitter @buckeyeplanet and @bp_recruiting, like us on Facebook! Enjoy a post or article, recommend it to others! BP is only as strong as its community, and we only promote by word of mouth, so share away!
    Dismiss Notice
  3. Consider registering! Fewer and higher quality ads, no emails you don't want, access to all the forums, download game torrents, private messages, polls, Sportsbook, etc. Even if you just want to lurk, there are a lot of good reasons to register!
    Dismiss Notice

Lies, Damned Lies, and Statistics ... and DSA

Discussion in 'Buckeye Football' started by DaddyBigBucks, Nov 3, 2022.

By DaddyBigBucks on Nov 3, 2022 at 10:39 PM
  1. DaddyBigBucks

    DaddyBigBucks Administrator Staff Member Bookie

    Lies, Damned Lies, and Statistics ... and DSA

    While most people save money so that they can retire some day, there are some professions that people seem to want to do for as long as they can get people to pay them to do it. Joe Biden and a disturbingly high percentage of Congress say hello. Football is another such profession. Those who can get people to pay them to play, tend to play as long as people keep paying them. Tom Brady says hello.

    Coaching football is also a profession that people tend to do for as long as they can get people to pay them to do it. Nick Saban tips his cap while preparing to head to Louisiana State; Joe Paterno tips his halo from the heights of heaven (which is much warmer than he expected). You might say that coaching takes this phenomenon to a higher level than any other, as people do it until they can get someone to pay them not to (Ed Orgeron gives a nod from the deck of his yacht), or better yet until several people have paid them not to (Charlie Weis mumbles a greeting with a mouthful of cheeseburger).

    This has had a direct impact on the way many fans consume the sport, because sports media draws much of their "talent" from the ranks of former coaches. Because of the nature of the sport, most good coaches are still coaching. This means that sports media gets the leavings of the coaching industry.

    Even so, some of these former coaches are by far the best analysts in the business (Urban Meyer says hello). For one reason or another, most of the rest of the analysts are former jocks, who played as long as they could and are now doing the talking-head thing because they can't get jobs as rocket scientists (Desmond Howard says derp).

    The one thing that the leavings of the coaching industry and the former jocks have in common is that comparing teams the way that fans do is not really something that was part of their former job. Most of them don't really care about numerical comparisons because it had little impact on the way that they did their jobs in football, and very few (if any) have any idea what numbers can tell you and what they cannot. Most of them will use numbers from time to time, but only to support a point they're trying to make. Whether the numbers they use have any relevant context can be very hit and miss because few of them really understand numbers.

    The practical upshot of this is that conversations at sports bars sound nothing like what you hear on the radio and on television, and not because of the presence of alcohol (Don Meredith and Howard Cosell nod and mumble unintelligibly). Many fans understand what numbers can tell us. More importantly, and counter to what you'll hear from the media, we understand what numbers cannot say far better than the overwhelming majority of washed up coaches and players and better than ANY former journalism major.

    While we freely admit that numbers have their limitations under the best circumstances, the numbers that we are fed by the media aren't limited because of some inherent inability of mathematics to analyze a game played by 20-year-olds, but rather by the complete absence of context. None of the numbers provided by the media have any context built into them. Worse, many of the numbers they provide have had the context stripped from them, usually by ignorance but all too frequently, with specific intent. If context is inconvenient to the financial partners of the network that you've tuned to, you can bet you won't be hearing any context.

    If you're wondering what we mean by "context", the best examples are the questions you yell at your television when former jocks tell you that J.J. McCarthy is the "most accurate quarterback in America" and that James Madison has the "best rushing defense in America". We know they're basing those claims on numbers, but they are numbers that have been stripped of context because the person spewing them found the context to be inconvenient. Does completion percentage automatically mean "most accurate"? Of course not. Nor does fewest rushing yards allowed per game equal "best" defense. There are numbers that can give us a more complete picture of such things, but we will freely admit that they don't fully answer questions like "who is most accurate?" or "who is best?". They are numbers that give us more information with more context based on a larger data set. These things matter whether rocket surgeons like Desmond Howard are capable of understanding them or not.

    Our favorite way of adding context and data points is to take your raw stats (like points scored), which is all the media will usually tell you, and compare that to what everyone else does against the teams that you've played. We call this Differential Statistical Analysis (DSA), and it is actually quite simple as we will illustrate below. You don't see it very often (though some YouTube channels, run by fans, not former jocks, use forms of differential analysis) because, in spite of its simplicity, this level of analysis requires a large number of tedious calculations. The 21st Century says: "Hello. We have spreadsheets for that now."

    You might want to skip the following 3 paragraphs if you're familiar with DSA. Unless you're an Ohio State fan. Buckeye fans might like this example.


    Differential Scoring - Offense


    As Ohio State fans, our favorite example of DSA is Differential Scoring Offense (DSO). Take the Iowa game for example: Ohio State scored 54 points. That's one datum. We can add context by comparing it to what Iowa's other FBS opponents did against them. (Yes, ALL games against FCS competition have been eliminated from this analysis. They just muddy the water, not that prospective Mensa member, Desmond Howard, would know that.) Iowa has given up 9.857 points per games against FBS opponents not named Ohio State. Ohio State's 54 points is 5.478 times that number. So Ohio State's DSO for that game is 5.478.

    The problem with that ratio and that number (DSO: 5.478) is that most fans are not familiar with them. Is 5.478 good? Well... first let's just use simple logic. If you score exactly the same as what your opponent gives up to everyone else, on average, then your DSO is equal to one, right? Therefore any DSO greater than one is, by definition, better than average and DSO less than one is worse than average. So how good is 5.478?

    To answer that question, let's add some more context. Our spreadsheets have calculated the DSO numbers for each team of every FBS game played this season. Ohio State's 5.478 is the highest DSO achieved by any one team against any other team this season. And this despite the fact that Buckeyes were said (rightly so) to have had an off day against Iowa (if only for a half). Seriously... The highest DSO by ANY FBS team all season long when the Buckeyes mostly finger-painted their way through the first half on offense. On a day when Iowa played cover-0 more than they have in years against a team with the best receivers in the game and Stroud threw one post pattern all day (which of course went for a touchdown). A game that could easily have been much more lopsided was still the best Differential performance by any offense all year.

    If that's not enough context for you then we could always compare the best DSO performance by every FBS team. Doing that produces the following top-five DSO performances:


    TeamTop DSOOpponent
    Ohio State5.478Iowa
    Penn State5.250Minnesota
    Tennessee5.136Alabama
    Central Florida3.952Temple
    Alabama3.611Tennessee
    Just looking at the top five there are some interesting takeaways. The top three are the only games of the year where a team achieved DSO greater than 4. All of them were also above 5. Odder still is that Penn State is one of those three teams. Also of note is that the Crimson Tide's trip to Knoxville produced two of the top five offensive performances of the year by this metric (That prepositional phrase should be used a LOT more often when discussing stats. It's called context.).

    But did that Tennessee-Alabama game really produce two of the top five performances? Upon further review, it appears that Ohio State's 77-point nuking of the Toledo Rockets produced a DSO of 3.733. This means that Ohio State is the only team in FBS with two of the top five DSO performances of the year.

    So far we have been looking at DSO for single games. It works the same way at the season level, because you simply take what you've scored against your opponents and divide it by what all other FBS teams have scored on those teams on a per-game basis. Additionally you can calculate a number that indicates how well a team maintains their DSO when playing better competition. We call the number rigidity. Positive rigidity means you do better against better competition where DSO is concerned, negative numbers are indicative of a team that pads its stats against weaker opponents.

    The following shows DSO and Offensive Rigidity (Off. Rgd.), ordered by DSO ranking. We also include Scoring Offense (SO), but it's different from what you'll see elsewhere as FCS games have been eliminated.


    TeamDSOOff. Rgd.SO
    Ohio State2.54622.44448.875
    Tennessee2.24961.68547.143
    Alabama1.889-12.61643.125
    UNC1.870-44.78839.714
    Michigan1.861-12.36741.000
    Wake Forest1.79726.72338.143
    Oregon1.767-23.96438.429
    Southern Cal1.761-14.16141.000
    Georgia1.7412.33143.000
    Penn State1.65851.76933.125
    Clemson1.64351.04437.429
    Wisconsin1.642-84.01530.857
    UCLA1.61058.11038.857
    Things to note:
    • There is a big gap between the top 2 and everyone else.
    • While the gap between 2 and 3 is the first thing that jumps out at you, the gap between Ohio State and Tennessee is bigger than the gap between Alabama and UCLA (bottom of the table).
    • Tennessee's rigidity is notable, but because it was built primarily on the game against Alabama it is too early to draw conclusions about it. Rigidity is a fairly new part of these calculations and we are still learning about it. We will learn a great deal more about it from Tennessee's upcoming game as will be discussed later.
    • Penn State's presence in the top-10 is surprising to me, as is their rigidity.
    • Wisconsin's ranking is a little surprising, but their very poor rigidity gives it context. They have built that ranking by scoring 66 on New Mexico State and 42 on Northwestern and not much besides, hence the fourth-worst rigidity in all of FBS (only Akron, FIU, and Vanderbilt are worse).
    • Yes, that's Alabama at number 3. Did you forget about them? Seems like a lot of people are dismissing them. They're still a really good team.
    Differential Scoring - Defense

    We can do the same for defense that we've done for offense. The only difference is that lower numbers are better.

    TeamDSDDef. Rgd.SD
    Illinois0.466-30.94410.143
    Georgia0.49359.74012.000
    Michigan0.545-29.46111.500
    Alabama0.608-63.71916.625
    Iowa State0.610-23.43617.571
    Troy0.698-73.00817.714
    Ohio State0.710-20.47316.875
    Kansas State0.712-15.28219.714
    Iowa0.723-56.59017.571
    Minnesota0.728-83.81015.000
    Texas0.75825.05621.125
    NC State0.79434.17519.429
    Washington State0.798-48.179 21.286
    Alabama-Birmingham0.79973.00621.143
    Central Florida0.80726.22618.714
    Penn State0.809-66.35822.000
    Kentucky0.821-32.47622.714
    Tulane0.82449.79819.857
    Louisiana State0.825-48.72921.714
    Tennessee0.827-68.50420.571
    Notre Dame 0.82956.11722.125
    Notable:
    • Did you forget about Alabama? A lot of people have. They're the only team that is in the top-four in both DSO and DSD.
    • More teams were included for defense because some of the top-21 are significant/surprising.
    • Tennessee in the top-20? Yeah... that's one of the surprises.
    • So is Notre Dame at 21. Raw stats would have you believe they're 40th in Scoring Defense.
    • Ohio State's appearance at #7 is impressive, but...
    Any time you add context to numbers you have to think critically about the limitations of your numbers. One thing not captured here is how much time each team's defensive back-ups spend on the field in blow outs. And because Ohio State's DSD and its defensive rigidity were both negatively affected by garbage time the question becomes: "How much was Ohio State affected compared to everyone else?"

    The problem with that question is that there isn't a convenient source of information that we could neatly import into a spreadsheet to supply an answer for us. There might be a way of reducing it to numbers, but ain't nobody got time for that.

    We can however compare how much Ohio State has blown their opponents out to how much other teams have done so. Rather than quibble about what defines a blow-out (most discussions about numbers devolve into arguments about how different terms are defined), here is a table that compares the top teams based on how much they've outscored their opponents (total points scored (per game) minus total points given up (per game)).


    TeamPt Diff per gm
    Ohio State32.00
    Georgia31.00
    Michigan29.50
    Tennessee26.57
    Alabama26.50
    Southern Cal17.00
    Clemson16.43
    Texas15.25
    Illinois15.14
    Central Florida14.14
    Texas Christian13.43
    Minnesota12.57
    Notable:
    • To the point we were making, the Buckeyes are winning by a bigger margin than anyone, so it seems likely that they are spending as much time playing backups on defense as anyone, or as near as makes no matter.
    • This is especially true when you consider the absolute cliff of a drop off after the top-five above.
    • Outscoring people by 32 points a game is ridiculous. That's not likely to go down this week. More on that later.
    Differential Scoring Composite

    As simple as it is, point difference per game was an interesting comparison. A similar way to rank teams by a single number is to compare the differential numbers instead of the raw scoring numbers. If we just divide DSO by DSD, we get a sort of Differential Scoring Composite (DSC - why not?) that we can use to rank teams. This one number takes all of the data and context of DSO and DSD and combines it into one concise means of comparing teams that is far more meaningful than the vast majority that is thrown at you by people who can't find anyone to pay them to coach or play football (or do rocket science or brain surgery or some combination of the two). Here are the Top-25 Plus One in DSC.


    TeamDS Comp
    Ohio State3.584
    Georgia3.534
    Michigan3.417
    Alabama3.107
    Tennessee2.718
    Illinois2.608
    Penn State2.050
    Kansas State1.951
    Texas1.939
    Central Florida1.840
    Clemson1.734
    Louisiana State1.712
    Southern Cal1.710
    Minnesota1.702
    Utah1.676
    Mississippi State1.647
    Syracuse1.592
    Wisconsin1.525
    Maryland1.511
    Notre Dame1.509
    Oregon1.501
    Tulane1.489
    Louisville1.488
    UCLA1.474
    Texas Christian1.465
    Iowa1.455
    Some questions remain:
    • Did you forget about Alabama?
    • How is Alabama ahead of Tennessee? Simple. If you know this sport, you know the better team doesn't always win in any particular game. I am NOT saying that Alabama is better than Tennessee; the numbers are not saying that either (they can't). What the numbers are giving us is a point of comparison and a reason to think that it's possible that the better team (overall) did not win that game. People have been far too quick to count the Tide out.
    • Like it or not, That Team Up North is playing very good football. Somehow they've rallied around the guy that tried to leave them for the NFL (and failed). They've even, as a group, taken on some of the more unique aspects of his personality, but I digress.
    • DSC does not like Clemson nearly as much as a certain committee does.
    • Many people, in their desire to heap derision on said committee, have taken up the cause of Texas Christian. I wasn't that convinced before I ran the numbers; I am less convinced now. DSC has enough context built into it that I can state flatly that anyone ranked 25th in DSC should not be in a 4-Team playoff. (Neither should anyone who is ranked 11th).
    • I may not have agreed with some about Texas Christian, but I did agree with them about Louisiana State. I thought #10 was way too high for the Tigers, but my own DSC says they're #12, even though their offensive and defensive rigidity averages out to -47.22, which is lower than anyone else in the top-12. Still, that ranking is a surprise to me.
    • Penn State may not be elite, but their having the 7th-best DSC in FBS makes them an unequivocal quality win, especially at their house.
    • Texas in the top-10 is astonishing.
    • Ohio State has played four of the top 26 in DSC. They have played one of the top 25 in the polls including the committee's ranking.
    Game Predictions

    All that is left to do is to use our differential numbers and combine them with raw stats and the rigidity numbers to come up with predictions for some of this Saturday's games. As rigidity is a fairly new component to this analysis, it's use in the algorithm is still in flux, but the gist is that a positive offensive rigidity will make your score go up, as will a negative defensive rigidity by your opponent. Having said that, and skipping over a great deal of tedious detail (but with a reminder that I never bet real money myself and neither should you), here are some DSA predictions for some of Saturday's games.

    Tennessee at Georgia

    TeamSOSDDSODSDO RigD RigDSC
    Georgia43.012.01.7410.4932.33159.743.534
    Tennessee47.120.62.2490.82761.685-68.5042.718
    I mentioned before that this game might teach us a lot about how to use rigidity in these calculations. Before I continue: Please note that rigidity does not mean toughness; it has nothing to do with that. It just means how well differential numbers are maintained against better competition.

    Georgia's defense holds up very well against better competition with a rigidity over 59. And Tennessee's offense is certainly better competition, with the second-ranked DSO in FBS and a spectacular rigidity of 61.685. Something has got to give there.

    The other side of the equation might hold the key to what will happen in this game. Georgia's offense has average rigidity while Tennessee's defense does not hold up against better competition at all with rigidity below -68. This suggests we should modify Georgia's point total upward.

    Doing so gives us the following DSA prediction:

    • Georgia 42
    • Tennessee 27
    Alabama at Louisiana State

    TeamSOSDDSODSDO RigD RigDSC
    Alabama43.116.61.8890.608-12.616-63.7193.107
    Louisiana State30.921.71.4130.825-45.712-48.7291.712
    Both defenses are less rigid than the opposing offense, but numbers this low all around won't affect the outcome much, at least not the way the algorithm works right now. As stated previously, the use of rigidity is being tracked and will be optimized over time.

    Prediction:

    • Alabama 42
    • Louisiana State 24
    Texas at Kansas State

    TeamSOSDDSODSDO RigD RigDSC
    Kansas State30.619.7141.3900.712-45.702-15.2821.951
    Texas36.421.11.4690.75818.98225.0561.939
    Some of you may be wondering why we're including this game. Others of you may have looked at the rankings and the Vegas odds and been wondering which of them is right, because they can't both be. Kansas State is #13 in the playoff rankings and Texas is #24. Yet Vegas has Texas as the favorite (by just 1.5 points). DSC likes both teams better than the committee does, ranking KSU 8th and Texas 9th. That answers the question of why we're including this game – it should be a good one. Texas's tendency to play better in big games (see the rigidity above) might be the deciding factor in this one.

    Prediction:

    • Texas 31
    • Kansas State 24
    Wake Forest at NC State

    Just kidding. I don't care.

    Ohio State at Northwestern

    TeamSOSDDSODSDO RigD RigDSC
    Ohio State48.916.92.5460.71022.444-20.4733.584
    Northwestern17.028.40.7951.204-40.67625.0120.660
    The raw DSA for this one predicts the highest offensive output that we've ever seen for a DSA prediction. There are some exigencies that might bias the prediction toward an even higher number under normal circumstances. Ohio State just played two good defenses and are now facing a sub-par defense. The DSA spreadsheets allowed us to examine all of the times that that's happened in FBS this year. It has happened eight times, with five of those eight having occurred in the Big Ten, and with two of those having had Northwestern as the sub-par opponent after the two good defenses (Ohio State and Illinois were the two good defenses in both cases, oddly enough). In all of those cases, the offense had a better than average day against the sub-par defense; not by raw stats, by differential stats. But recall that we said "under normal circumstances". These are not normal circumstances. The most important number in Evanston on Saturday (when it comes to the spread and especially the over/under) is 30: That's the forecasted wind speed in MPH.

    Prediction:

    • Ohio State 56
    • Northwestern 9
     
    Last edited: Nov 4, 2022

Comments

Discussion in 'Buckeye Football' started by DaddyBigBucks, Nov 3, 2022.

    1. zincfinger
      zincfinger
      Please correct the spelling of Sadistics in the title.
    2. DaddyBigBucks
      DaddyBigBucks
      Paging Dr Freud
    3. zincfinger
      zincfinger
      We all slip sometimes. But sincerely, excellent article.
    4. dragurd
      dragurd
      Something something about wanting to do your mother...

      That's Freud right?
    5. shetuck
      shetuck
      twain...
      LordJeffBuck likes this.
    6. shetuck
      shetuck
      [​IMG]
    7. RuGettinIt
      RuGettinIt
      Amazing write up! The numbers in themselves are good at painting a picture, but you’re analysis and description really bring it all together. Thanks @DaddyBigBucks!
    8. DaddyBigBucks
      DaddyBigBucks
      The B1G has 8 of the top 26 in DSC

      The SEC has 5

      If you include upcoming members for each conference the count becomes SEC 6, B1G 10
      brodybuck21 likes this.
    9. DaddyBigBucks
      DaddyBigBucks
      The Texas - Kansa State prediction was a bit nostalgic for me. That’s the game where the pollsters and the committee have KSU ranked much higher, but Vegas has Texas as the favorite.

      I had forgotten how much I enjoyed setting up the prediction page each week and the thrill of dragging the calculations down to see the results. I always liked seeing that my numbers were similar to what Vegas was “expecting”, but my favorite part was when the polls and Vegas disagreed and finding that DSA almost always agreed with Vegas (I say “almost” as a hedge; it might have been always).

      Spending the time to recreate my lost spreadsheets after a long hiatus… coming up with something even better than what I had before… then unwrapping the predictions and finding my old friend “Vegas knows something the pollsters don’t” was like Christmas morning for this stat nerd.
    10. muffler dragon
      muffler dragon
      @DaddyBigBucks Great work and title. I've been thinking about that title since you'd starting crafting these amazing posts again. :biggrin:
      You may have noted it elsewhere, but I wanted to ask: what is the formula for rigidity that you're using? Just curious. Thanks
    11. DaddyBigBucks
      DaddyBigBucks
      It’s simply a statistical correlation of a team’s differential stats to its opponents’ differential stats with a slight modification to make the numbers more digestible and intuitive
      muffler dragon likes this.
    12. MD Buckeye
      MD Buckeye
      Great stuff, DBB!
    13. Bestbuck36
      Bestbuck36
      Brother I freaking love these posts and always have! I know we've chatted about it from a wagering perspective but I too love to see how close we can get to the actual scores just because. No need to say anymore about it because everyone else seems to feel the same way!
    14. DaddyBigBucks
      DaddyBigBucks
      Obligatory shout-out to @LordJeffBuck for the editing and formatting. He makes the posts twice as good so he deserves half the credit.

Share This Page