Satoshi Kanazawa, a reader in management and research methodology at the London School of Economics, published a series of papers that predict the sex of one’s baby, the last of which is "Beautiful Parents Have More Daughters"1. Dr. Kanazawa took a sample of almost 3,000 individuals who asked the number of children of each gender and who were rated on a five-point scale regarding attractiveness. His results are shown in the following graph as the points.
Two researchers re-examined his method and found that the “statistical significance” noted in the original paper just did not exist.2
Note that the least attractive people (rated “1") had about a 50-50 chance of having a girl while the most attractive people (rated “5") had about a 56% chance of having a girl. What the author did was to compare groups 1-4 and compare it to group 5 and found that the difference between them was significant. But, in reality, a correct statistical test would have made not only that comparison, but also other combinations of groups, such as group 1 to the aggregate of groups 2-5, or the aggregate of groups 1 and 2 to the aggregate of groups 3-5, etc. Furthermore, if you do those additional tests, they must be included in the test of significance of the experiment. In other words, statistical validity does not rely upon just the one comparison, but rather on all of the comparisons together. As the authors point out, the curved lines in the diagram above are the result of a better test; this test does not show statistical significance. This is one of the examples of statistical problems associated with the mining of data.
1 Kanazawa, S. 2007. Beautiful parents have more daughters: A further implication of the generalized Trivers-Willard hypothesis. Journal of Theoretical Biology 244:133–140.
2 Gelman, A. and Weakliem, D. “Of Beauty, Sex and Power,” American Scientist, 97(4), July-August, 2009, p. 310-314. Available online at http://www.stat.columbia.edu/~gelman/research/published/power4r.pdf