Waller;969607; said:
Every model contains bias based on the criteria selected. Maybe the bias is more accurate for what you are measuring, maybe it isn't. The fact is that the goal of the all the models is to create an algorithm by which to determine a very difficult subject: how to rank 117 teams across a long time span with no direct comparisons between each of the teams (ie they can't play each other on a neutral field every week). Do the BCS models include a variable for the number of points allowed by the defense? If not, I think they are missing what I think to be an important measurement for a good team, thus they are biased.
No wisdom here. Had enough stats classes to know their application to phenomenon as complex as college football is fraught with misapplications and abuses. Question for the esteemed professors is how many of them have had any philosophy of science classes? Way too many people out there using stats without the requisite understanding of the foundation and assumptions that underlie it.
But if the computer polls don't have biases, why is there variation between them? Look at Virginia. Some have them 6th, one has them 17th! The fluctuation is because each chooses a different model, with different assumptions, and thus different biases. Cooley for example assumes that point spreads shouldn't be taken into account (good for Oklahoma after last week, huh). But wouldn't some consideration of point spreads give a more robust finding? He assumes it away (not without good reason btw). But his assumption has introduced bias.
Finally, SOS is taken into account I believe in some of the polls, though through an indirect means.
Contrary to what is said in an earlier post, I have no desire to best anyone on BP. After all, Clarity's vision was about a place where Buckeye fans could come freely and feel at home, speak their minds, and relate to one another.
You've probably noticed a number of posts by the BP staff about the constant whinging that has virtually taken over some conversations on BP. Although I wasn't the first to speak about it, I have come to agree with this point of view. It seems to me that the best way to stop the practice is for us to interogate our own points of view when a thread starts down that road.
If there is objective evidence of bias against us, then by all means then I think we should talk about it (i.e., the ESPiN attack on Ohio State a few years ago). But otherwise, it just becomes an irritation and keeps conversations here from moving forward. So, in that vein, I will respond to the points you raise.
1. Your point in the earlier post was not that the computer rankings are biased in terms of their weighting of different criteria, but that they are weighted against Ohio State and the Big Ten. If I understand you correctly, I think we can now agree that we can set that argument aside now?
2. The statistical models used in the BCS computer rankings appear to typically employ Bayesian Markov chain Monte Carlo models these days and, to a lesser extent, logistic regression.
The models are indeed complex but the advent of faster chips and more advanced generation languages, such as the implementation of Bell Lab's S language (e.g., in the R statistical package), makes it much easier computationally to implement even complex models. Things have speeded up dramatically and what took more than a day as recently as 2000 now will run in less than 15 minutes.
The BCS actually would be a rather simple model to implement and the dataset would be rather modest, compared to what marketing modelers often do in database marketing or what financial modelers do in technical analysis of securities. Consider, for example, a bank that is modeling the transactions of 100,000 randomly chosen credit card users over the span of a year. One credit bureau that I managed in South Africa frequently produced credit scores using Bayesian statistics for a data set including eight years of credit data for 42 million consumers. So, BCS really is a very small and simple model to implement in comparison.
3. I don't think anyone knows exactly what is in the models, as the statistical model comprises the family jewels and the modelers don't disclose much. As I think you know, people build models by including lots of data and then begin to drop variables that don't contribute to predicting a game outcome. So, it is not surprising that models would include different variables, because as you note, they start with different variables and drop different variables, depending on their analysis.
Not sure about defensive scoring and models. If a model includes the scores of a game, then points scored against a team would be captured in the model. I am not sure how many models would be able to differentiate between turnover points scored against the offense, though, and I am not sure that it is really relevant. Would it really matter how points are scored when the scores are added to determine a winner, in terms of the objectives of someone trying to model which teams will win in future games?
Thanks for the discussion guys.