Jesus M. Castagnetto
2015-07-22
GaltonFamilies
, in the
HistData R package, to
create a predictive model for child's height.Galton, F. “Regression Towards Mediocrity in Hereditary Stature”, The Journal of the Anthropological Institute of Great Britain and Ireland Vol. 15 (1886), pp. 246-263, DOI: 10.2307/2841583
Height variables (in cm.): ch (child), fh (father), mh (mother), and mph (mid-parent). The gender factor: female/male.
The child's height is moderately correlated with the father's, mother's and midparent's heights.
Considering the sample distributions of heights by gender, we observe a distinct difference, so the child's gender is an important factor in any predictive model.
First I used Galton's assumption, considering only the mid-parent's height, resulting in a model with a low \( R^2 \). That is why I tried a couple more models that included the child's gender, as summarized below:
Model | Formula | Adj. \( R^2 \) |
---|---|---|
1 | ch ~ mph | 0.1030 |
2 | ch ~ mph + gender | 0.6332 |
3 | ch ~ fh + mh + gender | 0.6354 |
Where:
fh
: father's height, mh
: mother's height,
mph
: midparent's height, ch
: child's height,
gender
: child's gender
The last model gives a slightly better fit, with a reasonable QQ-plot, and is the one I used for the Shiny App.
serve.R
code, so you could
put non-sensical values (negative, for example), and you will still get a
prediction… Perhaps that could be done in the next version of the app.Go and play with my Shiny App – Read the code @github
Aulchenko, Y.S.; et. al. “Predicting human height by Victorian and genomic methods” European Journal of Human Genetics (2009) 17, 1070–1075, DOI: 10.1038/ejhg.2009.5