Tag Archives: data mining

Using Statistics to Predict a Baby’s Due Date Beats Traditional Method

“Most doctors don’t even give the most accurate prediction of the due date. They still often calculate the due date based on the quasi-mystical formula of Franz Naegele, who believed in 1812 that ‘pregnancy lasted ten lunar months from the last menstrual period.’ It wasn’t until the 1980s that Robert Mittendorf and his coauthors crunched numbers on thousands of births to let the numbers produce a formula for the twentieth century. Turns out that pregnancy for the average woman is eight days longer than the Naegele rule, but it’s possible to make even more refined predictions. First-time mothers deliver about five days later than mothers who have already given birth. Whites tend to deliver later than nonwhites. The age of the mother, her weight, and her nutrition all help predict her due date. Physicians using the crude Naegele rule cruelly set up first-time mothers for disappointment.” (p. 209)


Statistical Thinking Versus Intuition

“The rise of statistical thinking does not mean the end of intuition or expertise…Increasingly, decision makers will switch back and forth between their intuitions and data-based decision making. Their intuitions will guide them to ask new questions of the data that non-intuitive number crunchers would miss. And databases will increasingly allow decision makers to test their intuitions–not just once, but on an ongoing basis.” (pp. 195-196)

Data Mining Techniques May Threaten Traditional Jobs

“The rise of Super Crunching threatens the status and respectability of many traditional jobs…Following some other guy’s script or algorithm may not make for the most interesting job, but time and time again it leads to a more effective business model. We are living in an age where dispersed discretion is on the wane. This is not the end of discretion; it’s the shift of discretion from line employees to the much more centralized staff os Super Crunching higher-ups…Marx was wrong about a lot of things, but through a Super Crunching lens, eh looks downright prescient when he said that the development of capitalism would increasingly alienate workers from their work-product.” (p. 166-167)

Getting People to Accept Statistical / Data Mining Approaches

“There’s almost an iron-clad law that it’s easier for people to warm up to applications of Super Crunching outside of their own area of expertise. It’s devilishly hard for traditional, non-empirical evaluators to even consider the possibility that quantified predictions might do a better job than they can on their own home turf. I don’t think this is primarily because of blatant self-interest in trying to keep our jobs. We humans just overestimate our ability to make good decisions and we’re skeptical that a formula that necessarily ignores innumerable pieces of information could do a better job than we could.” (p. 150)

Research Says Physical Exams Are Unnecessary, Yet Physicians Persist in Doing Them

“Even when statistical studies exist, doctors are often blissfully unaware of–or, worse yet, deliberately ignore–statistically prescribed treatments just because that’s not the way they were taught to treat. Dozens of studies dating back to 1989 found little support for many of the tests commonly included in a typical annual physical for symptom-less people. Routine pelvic, rectal, and testicular exams for those with no symptoms of illness haven’t made any difference in overall survival rates. The annual physical exam is largely obsolete. Yet physicians insist on doing them, and in very large numbers.”

Doctors Don’t Wash Hands Enough Because They Don’t Trust Statistics

“Doctors today of course know the importance of cleanliness. Medical dramas show them meticulously scrubbing in for operations. But the Semmelweis story remains relevant. Doctors still don’t wash their hands enough. Even today, physicians’ resistance to hand-washing is a deadly problem. But most importantly, it’s still a conflict that is centrally about whether doctors are willing to change their modus operandi because a statistical study says so.” (p. 83)

Epicurus, Multiple Explanations, Data Mining Implications

The Greek philosopher Epicurus…expressed almost the opposite sentiment [to Occam’s razor]. His principle of multiple explanations advises “if more than one theory is consistent with the data, keep them all” on the basis that if several explanations are equally in agreement, it may be possible to achieve a higher degree of precision by using them together–and anyway, it would be unscientific to discard some arbitrarily. This brings to mind instance-based learning, in which all the evidence is retained to provide robust predictions, and resonates stronly with decision combination methods such as bagging and boosting that actually do gain predictive power using multiple explanations together.” (p.183)