Jesus M. Castagnetto

2015-07-22

- I used the dataset
`GaltonFamilies`

, in the HistData R package, to create a predictive model for child's height. - The heights in the dataset were converted to centimenters using the equivalence: 1 inch = 2.54 cm.
- The model predicts a child's height (in cm.), given the father's and mother's heights (in cm.), as well as the child's gender.
- The dataset originates from an 1886 study by Francis Galton (
*vide infra*), in which he concludes that using the average height of the parents (the “mid-parent” height) is a sufficient predictor of his/hers stature.

Galton, F. “Regression Towards Mediocrity in Hereditary Stature”, The Journal of the Anthropological Institute of Great Britain and Ireland Vol. 15 (1886), pp. 246-263, DOI: 10.2307/2841583

Height variables (in cm.): **ch** (child),
**fh** (father), **mh** (mother), and **mph** (mid-parent).
The **gender** factor: female/male.

The child's height is moderately correlated with the father's, mother's and midparent's heights.

Considering the sample distributions of heights by gender, we observe a distinct difference, so the child's gender is an important factor in any predictive model.

First I used Galton's assumption, considering only the mid-parent's height, resulting in a model with a low \( R^2 \). That is why I tried a couple more models that included the child's gender, as summarized below:

Model | Formula | Adj. \( R^2 \) |
---|---|---|

1 | ch ~ mph | 0.1030 |

2 | ch ~ mph + gender | 0.6332 |

3 | ch ~ fh + mh + gender | 0.6354 |

*Where*:
`fh`

: father's height, `mh`

: mother's height,
`mph`

: midparent's height, `ch`

: child's height,
`gender`

: child's gender

The last model gives a slightly better fit, with a reasonable QQ-plot, and is the one I used for the Shiny App.

- The Shiny App brings to life in a simple and interactive way, research done
in Victorian times
`:-)` - I did not implement range validation in the
`serve.R`

code, so you could put non-sensical values (negative, for example), and you will still get a prediction… Perhaps that could be done in the next version of the app. - You would think that with genetic data we can now do better predictions of
a phenotypical trait such as height, but that is not the case (
*vide infra*), the old Victorian method is not only cost effective, but also more robust.

Go and play with my Shiny App – Read the code @github

Aulchenko, Y.S.; et. al. “Predicting human height by Victorian and genomic methods” European Journal of Human Genetics (2009) 17, 1070–1075, DOI: 10.1038/ejhg.2009.5