• Follow us on Twitter @buckeyeplanet and @bp_recruiting, like us on Facebook! Enjoy a post or article, recommend it to others! BP is only as strong as its community, and we only promote by word of mouth, so share away!
  • Consider registering! Fewer and higher quality ads, no emails you don't want, access to all the forums, download game torrents, private messages, polls, Sportsbook, etc. Even if you just want to lurk, there are a lot of good reasons to register!

Official Statistical Analysis Thread

Great work as always.

Rgarding TO's, their correlation to win % and our digression in that area I would offer that they are misleading because so much random luck is involved in the finished product of an actual TO. In 2003 the team caused an ass load of fumbles but just couldn't ever seem to get the bounces to go their way and land on them.

I wish they would track TO opportunities and actual TO's. I would then venture the team that consistently creates opportunities will have, over the long term, a greater chance to convert them into actual TO's (duh) and you could then maybe get truer correlation to win% from opportunities vs TO's. Dunno for sure, just a thought.

They probably already do this but your post seemd like a good time to discuss it.


edit

I would also say the correlation between TO's and win % is off because TO's are such a flawed stat, much like BA in baseball. For example last minute chuck ups to the end zone before the end of halfs get "intercepted" all the time, if you want to show the TO for what people assume it to be(a critical change of possesion) then you have to find a way to weed out the ones that don't really "matter".

Also it does a defense no good to give them credit for an unforced TO (from the stats POV). It doesn't do you any good to draw conclusions from flawed data and taking credit for forcing a TO when some putz just drops the ball is a flawed assumption.

The can of worms that opens is how you count opportunities, forced vs unforced, critical vs non important etc..It would become very similar to the official scoring of errors in baseball. Too subjective and varied from venue to venue and scorer to scorer.

A coaching staff would have to keep track themselves from the game film and have a uniform system in place as to what was what. We as fans would not be able to track this.
 
Last edited:
Upvote 0
palmbuck;900785; said:
Very interesting!
But I have to ask a question: Is it possible to have a correlation of 1.0 with NO causality whatsoever?
I was under the impression that it was and you now have me unsure.


It is absolutely possible to have a correlation of 1.0 with NO causality. Think about what happens when the power goes out. If two lights share a circuit breaker, you could track every incidence of the lights going out spontaneously and you might very well end up with a 1.0 correlation as long as neither of the bulbs burned out during your study (light bulbs can last a very long time these days). Every time one light goes out spontaneously, so does the other, so the correlation is 1.0. But one light going out does not cause the other to go out.

So when two things share a cause, they can have a high level of correlation without there being a causal relationship BETWEEN them. Some might suggest that this is the case between scoring defense and winning percentage: that they are both caused by preventing the other team from scoring. My response to that would be that scoring defense IS preventing the other team from scoring.

I guess it all depends on the definition of the word IS.
 
Upvote 0
OK, very good example.
But what if the occurances don't share a cause. Isn't it possible that the correlation between them could be 1?
For example, with the trillions of occurances in the universe, isn't it probable that some of them will perfectly correlate, even though there is absolutely no connection between them? Wouldn't a few of the trillions of random occurances perfectly correlate, even though they may exist in different galaxies?
It seems to me there would be some, but I would be interested in what you think. There is probably a mathematical proof of the question, but I wouldn't know where to find it.
 
Upvote 0
palmbuck;900962; said:
OK, very good example.
But what if the occurances don't share a cause. Isn't it possible that the correlation between them could be 1?
For example, with the trillions of occurances in the universe, isn't it probable that some of them will perfectly correlate, even though there is absolutely no connection between them? Wouldn't a few of the trillions of random occurances perfectly correlate, even though they may exist in different galaxies?
It seems to me there would be some, but I would be interested in what you think. There is probably a mathematical proof of the question, but I wouldn't know where to find it.

Definitely possible, but the probability becomes vanishingly small as the amount of data grows.

For the amount of data shown here, there is just no way that a correlation of -0.93 is a matter of random chance. I realize that you're not suggesting that; but I think it's a worthwhile example.

I think I better come up with some more stats to post before we start getting into Kant, Hegel, Nietzche, et. al.
 
Upvote 0
DaddyBigBucks;900969; said:
Definitely possible, but the probability becomes vanishingly small as the amount of data grows.

For the amount of data shown here, there is just no way that a correlation of -0.93 is a matter of random chance. I realize that you're not suggesting that; but I think it's a worthwhile example.

I think I better come up with some more stats to post before we start getting into Kant, Hegel, Nietzche, et. al.


Kant wasn't a team player, Hegel couldn't get his timing down and Nietzche was a cancer in the clubhouse.
 
Upvote 0
BB73;900391; said:
More great stuff, DBB.

I decided to create a thread for just these posts of yours. That will make it easier to find these posts in the future.

Thanks for creating this for me. I often agonize over which thread to place something in, probably more than I should. This will be awfully convenient.

I have now moved a post I made in Mili's "Winning %" thread over here because I think some of the data in it illuminates points made by me and others in this thread. I have also moved the posts related to mine as they would have seemed awfully out-of-place in their former thread without my post there.

palmbuck;901101; said:
Yeah, I think I remember that Nietzche guy; IIRC, he was a linebacker for Illinois and Green Bay. I bet he would agree that defense wins more often than offense.
Those other two guys must have been before my time.:wink2:
Good Post!

While not the guy I meant, (I know you knew that), there are a ton of youngsters here that don't know that the man pictured below is the man to whom you refer.

photo1.jpg


Every football fan should know the name Ray Nitschke.
 
Upvote 0
In a post above I showed how different statistics relate to winning percentage. The most obvious point illuminated in that post is that, of those 18 statistics, the ones most closely associated with winning are Scoring Defense and Scoring Offense, in that order.

But this made me wonder: How do the 16 other statistics relate to Scoring Offense and Defense respectively? Afterall, I now have all the data for every I-A team for the last 6 years; that's one hell of a sample size. So the data given will reliable. And the new calculations would only take a few minutes.

But what a gold mine. I am very interested on hearing the opinions of some of the football gurus on this board about what some of this data means. I'll offer some of my own interpretations; but with any luck, this will be far from the last word on the subject.

The Data

The table below shows:

Column 1: The 18 stats in descending order of their correlation to winning percentage
Column 2: Their correlation to Scoring Offense
Column 3: Their correlation to Scoring Defense


Category____________________corr. SO_______corr. SD
Scoring Defense_____________0.5781__________1.0000
Scoring Offense_____________1.0000__________0.5781
Pass Efficiency Defense_____0.5336__________0.8849
Rushing Defense_____________0.5871__________0.8650
Total Defense_______________0.4504__________0.9088
Pass Efficiency_____________0.8338__________0.4989
Turnover Margin_____________0.5832__________0.6639
Total Offense_______________0.8819__________0.3639
Net Punting_________________0.2323__________0.5330
Punt Return Defense_________0.2694__________0.5423
Rushing Offense_____________0.4969__________0.4782
Punt Returns________________0.3954__________0.3566
Kickoff Return Defense______0.3413__________0.3280
Pass Offense________________0.4914__________0.0550
Pass Defense________________0.0547__________0.5243
Kickoff Returns_____________0.2547__________0.1746
Penalties__________________-0.1297__________0.0089
Penalty Yards______________-0.1608__________0.0324


Observations

First off, one of the many checks to make sure the math was right was verifying that yes, Scoring Defense has a 1.0 correlation to itself, and so does Scoring Offense. Likewise, it is noteworthy, but just barely, that there is a 0.5781 correlation between the two. Very good teams tend to be good in both areas. Very bad teams tend to suck at both. The teams in the middle are mixed. Whatever.

Further, it is not terribly surprising that Total Defense has the highest correlation to Scoring Defense and Total Offense has the highest correlation of any of the stats to Scoring Offense. If anything, this only serves to give me confidence in the data.


Rushing

Now this starts to get fun. Lets take another look at the rushing part of the table by themselves.

Category____________________corr. SO_______corr. SD
Scoring Defense_____________0.5781__________1.0000
Scoring Offense_____________1.0000__________0.5781
Rushing Defense_____________0.5871__________0.8650
Rushing Offense_____________0.4969__________0.4782

First off, look at Rushing Defense's correlation to Scoring Offense. Is that cool or what? There are only two stats more closely correlated to scoring offense than is Rushing DEFENSE, but Rushing OFFENSE is NOT one of them!!

To me, the reason for this seems simple and I'll call it "The Blow-Out Effect". When you have an extremely high-powered offense, the other team typically ends up throwing the ball all over the joint trying to catch up. Not only does this drag down the # of carries per game, but even drags down the yards per carry as the number of sacks goes up (due to the NCAA's perverse habit of subtracting sacks from rushing yards). The Rushing Defense ends up with better numbers, but may not necessarily BE better.

But this phenomenon seems to exacerbate the problem of the already unexpectedly low correlation between Rushing Offense and Scoring Offense. When you're getting killed, you tend to not rush as much; when you're blowing the other team out, you rush more so as to kill the clock. Shouldn't that INFLATE rushing (per game) numbers? Why on earth is Rushing Offense SIXTH on the list of things that are most correlated to Scoring Offense, and only barely above Passing Offense at that? Better football minds than mine will have to sort that out. All I can tell you is that I have checked and rechecked the data and the calculations.

What is less mysterious but still up for conjecture is why Rushing Offense has only a slightly better correlation to scoring offense than to scoring defense. Some might suggest that the offense that is good at rushing is keeping the ball away from the opposition. Personally, I have always been suspicious of this reasoning. When you count the number of possessions per game, each team almost always gets between 9 (extreme Tressel-ball) and 14 (WAC ball), with the number of possessions seldom differing by more than 1. That comes out to about an equal number of chances to score.

But maybe there is something to the idea of letting the other team's qb get "cold". I have always admitted that this is a factor, but have wondered just how much.

On the other hand, isn't it true that old-school coaches who focus on Rushing Offense also tend to appreciate the importance of a stout D? And isn't it also true that WAC-a-doodle, pass-happy coaches often focus on the offense to the detriment of the defense? It seems to me that this might explain the data more than anything. Let's call this "The Coaching Effect" for future reference.

Passing

The questions don't stop there. Here are the data for passing, taken by themselves:

Category____________________corr. SO_______corr. SD
Scoring Defense_____________0.5781__________1.0000
Scoring Offense_____________1.0000__________0.5781
Pass Efficiency Defense_____0.5336__________0.8849
Pass Efficiency_____________0.8338__________0.4989
Pass Offense________________0.4914__________0.0550
Pass Defense________________0.0547__________0.5243


What makes these numbers most interesting is when you compare them to the numbers for rushing statistics. While Rush Defense has a much higher effect on Scoring Defense than does Pass Defense (yardage), Pass EFFICIENCY Defense BEATS Rushing Defense with its correlation to Scoring Defense. This, I would expect, is also due to "The Blow-Out Effect": You know the pass is coming, so the other team's QB is throwing the ball while running for his life into a defensive backfield that is less concerned with run support than they otherwise might be. This also explains Pass Efficiency Defense's correlation to Scoring Offense, though it is notable that Rushing Defense's correlation to scoring is higher still.

Similarly explainable, though with less certainty is the looseness of the connection between passing yardage and scoring. Pass offense's relatively low correlation to scoring offense, and likewise pass defense's loose association with scoring defense both seem attributable to "The Blow-Out Effect". You might throw less when you've scored a lot; and the other team will throw more when they're have a hard time scoring, because they're probably playing from behind for much of the game.

But look at the other side of that coin; and better yet, compare it to the situation with rushing offense and defense. Passing offense has seemingly NO CONNECTION AT ALL with scoring defense, and Passing defense likewise has NO CONNECTION AT ALL with scoring offense. Recall that this was clearly not the case with the rushing numbers, as each rushing category has a moderate positive correlation with the opposite scoring category.

It seems to me that both "The Blow-Out Effect" and "The Coaching Effect" would apply negative pressure to both of these correlations, and therefore the number I would expect would be moderately negative. I guess it is possible that very good teams being adept in all areas and very bad teams being uniformly awful might cancel that out though. It is also possible that Passing simply has no "Cross-Correlation" effect the way that rushing does.

For those of you who are engineers and are therefore familiar with a very different definition for "Cross-Correlation", spare me. No one who isn't already familiar with it wants to know the "classical" definition of that term.

Turnovers

Turnover Margin, it turns out, has a fairly significant impact on Scoring Offense, and an even greater impact on Scoring Defense. I wonder if the greater (effect?) on defense is caused by the shift in momentum attendant to turnovers ("sudden change" is the generalized term used by tOSU coaching staff).

Special Teams

Net Punting and Punt Return Defense both have a moderate impact on defense as you would expect. There effect on the field position battle would seem to explain their small but clear effect on scoring offense. What is interesting about these two is that Net Punting has a higher correlation to winning percentage, but Punt Return Defense has a higher correlation to both Scoring Offense and Scoring Defense. As the difference is not great, it isn't clear if there is anything to be gained by employing inductive reasoning to fathom the cause.

It is notable that the effect Kick Return Defense has on the field position battle is seen more in Scoring Offense than in Scoring Defense, while the reverse was true for Punt Return Defense. It seems to me that this is attributable to the fact that a good Kick Return Defense will leave the opponent pinned deep in its own territory every time (punt defense only sometimes), ultimately resulting in the offense getting good field position; bad Kick Return Defense results in average field position much of the time, but bad Punt Return Defense can result in disastrous field position to your defense much of the time.


Penalties

Finally, there is the matter of the zebras. It is fascinating to me that penalties and penalty yards seem to have no effect whatsoever on defense. But look at the effect on offense. Bear in mind that this is the ranking for scoring offense correlated to the ranking for fewest-penalties and fewest-penalty-yards. So the lower number is better for both, thus resulting in an expectation of a positive correlation. But there is a small but measurable negative correlation. With this large a set of data, this correlation can be taken to be quite real, even if it is perplexing.

My first thought was that this small correlation is a result in there being a difference between the way games are officiated from conference to conference in conjunction with the fact that there are vast differences in the average offensive output of each conference. But wouldn't that then show an effect on the defense as well? Better football minds than mine will have to hash that one out too.

What seems very clear from this though is that the officials affect the game far more on calls regarding possession than on called (or not) penalties.
 
Last edited:
Upvote 0
palmbuck;900785; said:
Very interesting!
But I have to ask a question: Is it possible to have a correlation of 1.0 with NO causality whatsoever?
I was under the impression that it was and you now have me unsure.

DaddyBigBucks;900810; said:
It is absolutely possible to have a correlation of 1.0 with NO causality. Think about what happens when the power goes out. If two lights share a circuit breaker, you could track every incidence of the lights going out spontaneously and you might very well end up with a 1.0 correlation as long as neither of the bulbs burned out during your study (light bulbs can last a very long time these days). Every time one light goes out spontaneously, so does the other, so the correlation is 1.0. But one light going out does not cause the other to go out.

So when two things share a cause, they can have a high level of correlation without there being a causal relationship BETWEEN them. Some might suggest that this is the case between scoring defense and winning percentage: that they are both caused by preventing the other team from scoring. My response to that would be that scoring defense IS preventing the other team from scoring.

I guess it all depends on the definition of the word IS.

palmbuck;900962; said:
OK, very good example.
But what if the occurances don't share a cause. Isn't it possible that the correlation between them could be 1?
For example, with the trillions of occurances in the universe, isn't it probable that some of them will perfectly correlate, even though there is absolutely no connection between them? Wouldn't a few of the trillions of random occurances perfectly correlate, even though they may exist in different galaxies?
It seems to me there would be some, but I would be interested in what you think. There is probably a mathematical proof of the question, but I wouldn't know where to find it.

DaddyBigBucks;900969; said:
Definitely possible, but the probability becomes vanishingly small as the amount of data grows.

For the amount of data shown here, there is just no way that a correlation of -0.93 is a matter of random chance. I realize that you're not suggesting that; but I think it's a worthwhile example.

I think I better come up with some more stats to post before we start getting into Kant, Hegel, Nietzche, et. al.






:paranoid:
 
Upvote 0
This is great stuff, DBB!

It seems logical that total offense is more important than either rushing or passing offense, as balance is needed for real success.

It's reassuring to see that pass efficiency, on both offense and defense, is more important that passing yards gained or allowed. Too often analysts look only at passing yardage, rather than efficiency.

I am somewhat surprised that rushing defense doesn't have a higher correlation to winning.
 
Upvote 0
BB73;901456; said:
This is great stuff, DBB!

It seems logical that total offense is more important than either rushing or passing offense, as balance is needed for real success.

It's reassuring to see that pass efficiency, on both offense and defense, is more important that passing yards gained or allowed. Too often analysts look only at passing yardage, rather than efficiency.

I am somewhat surprised that rushing defense doesn't have a higher correlation to winning.

Agreed on all counts. I was astonished that rushing defense was not more closely associated to winning. Your point about efficiency is well taken too. Efficiency turns out to be one of the most important stats; whereas passing yards, both offensively and defensively, are near the bottom of the list.

EDIT:

The thing that fascinates me the most is the moderate "cross-correlation" of rushing and the complete absence of it for passing (yardage):
  • While Rushing Offense is moderately correlated to Scoring Defense...
  • and Rushing Defense is moderately correlated to Scoring Offense...
  • Passing Offense has practically zero correlation to Scoring Defense...
  • and Passing Defense has practically zero correlation to Scoring Offense
Nevertheless, Pass Efficiency, and its defensive counterpart both display moderate "cross-correlation", even more so than Rushing.


FWIW: I had one of the admins change the thread title for me. There are quite a few people here who have posted great stat analysis in the past (BB73, DiHard, Jax', and many others). With input from those I've mentioned and others that I will be embarrassed that I've forgotten for the moment; this thread has a chance to be one of the best on the 'Planet.
 
Last edited:
Upvote 0
Top to Bottom:

First, let me repeat something that I said off-hand in another thread. If anyone wants to get the raw Excel files for these stats, let me know which post has the data you're interested in. If your only purpose is to ensure that I'm not making this stuff up, that's fine. If on the other hand you want to see how the calculations work, make a point of that in your request. Some of the spreadsheets were "dumbed down" in order to make sorting easier and will require some modifications to make them useful for you.

...

While I was thinking about "The 18 stats" and their correlation to winning percentage, I started to wonder if some of them don't matter a lot more at the top of the standings than they do at the bottom. In other words: Which stats make the cream rise to the top? Conversely, which really separate the teams at the bottom but matter less at the top?

It turns out that the answers to these questions may be the most interesting that I've stumbled across.

To get these answers, I did the same calculations as before; but on subsets of Div. IA instead of on every team. I calculated how "The 18 stats" (plus the composite of them to make 19) correlated to winning % for:
  • The top 30 teams (Top Quartile)
  • The top 59 teams (Top Half)
  • The bottom 60 teams (Bottom Half)
  • The bottom 30 teams (Bottom Quartile)
Here are the highlights:

Defense

The table below shows:

Column 1: Defensive Statistical Categories
Column 2: Correlation of each stat to winning % for the Top 30 teams
Column 3: Correlation of each stat to winning % for the Bottom 30 teams

Defensive Stat__________Top 30__________Bottom 30
Scoring D________________0.650______________0.486
Total D__________________0.675______________0.004
Pass Eff. D______________0.624______________0.352
Rushing D________________0.647______________0.249
Passing D________________0.354_____________-0.151


Observations
  • Defense makes MUCH more difference at the top than at the bottom.
  • Total Defense does not differentiate teams at the bottom, but is about as important as Scoring Defense at the top.
  • While Pass Efficiency D. has a higher correlation to winning than does Rushing D. among all Div. IA teams (and among the bottom 30); Rushing D. is more important (slightly) at the top.
  • The negative correlation for Passing D. (YPG) at the bottom indicates that the teams with better passing defense (of those in the bottom 30) tend to be ranked lower than those with worse passing defenses. This is probably because the worst of the worst get blown out so often that more of their opponents spend more of the games just running the clock.
Offense

The table below shows:

Column 1: Offensive Statistical Categories
Column 2: Correlation of each stat to winning % for the Top 30 teams
Column 3: Correlation of each stat to winning % for the Bottom 30 teams

Defensive Stat__________Top 30__________Bottom 30
Scoring O________________0.305______________0.706
Total O__________________0.087______________0.467
Pass Eff. O______________0.483______________0.581
Rushing O________________0.204______________0.282
Passing O________________0.076______________0.396


Observations
  • This is a mirror image of the results for defense. Offense makes MUCH more difference at the bottom than at the top; probably because bad defense at the bottom is a given.
  • Total Offense does not differentiate teams at the top!
  • Unlike the situation with defense, pass efficiency maintains a higher correlation to winning relative to rushing defense both at the top and at the bottom.
  • PASSING OFFENSE DOES NOT DIFFERENTIATE TEAMS AT THE TOP!!
Separating Wheat from Chaff

The following stats were more important in the top 1/4 in the top 1/2; showing that the higher the level of competition, the more important they are:
  • Scoring Defense
  • Rushing Defense
  • Pass Defense
  • Total Defense
  • Punt Return Defense
  • Net Punting
  • Rushing Offense
  • Kickoff Returns
One final observation that didn't fit anywhere else: Net Punting was the 3rd most important stat to the bottom 30. It was not nearly that important to any other quartile, and its overall importance is 9th.

More to follow, I have to run....
 
Last edited:
Upvote 0
Back
Top