Jesus M. Castagnetto
GaltonFamilies, in the HistData R package, to create a predictive model for child's height.
Galton, F. “Regression Towards Mediocrity in Hereditary Stature”, The Journal of the Anthropological Institute of Great Britain and Ireland Vol. 15 (1886), pp. 246-263, DOI: 10.2307/2841583
Height variables (in cm.): ch (child), fh (father), mh (mother), and mph (mid-parent). The gender factor: female/male.
The child's height is moderately correlated with the father's, mother's and midparent's heights.
Considering the sample distributions of heights by gender, we observe a distinct difference, so the child's gender is an important factor in any predictive model.
First I used Galton's assumption, considering only the mid-parent's height, resulting in a model with a low \( R^2 \). That is why I tried a couple more models that included the child's gender, as summarized below:
|Model||Formula||Adj. \( R^2 \)|
|1||ch ~ mph||0.1030|
|2||ch ~ mph + gender||0.6332|
|3||ch ~ fh + mh + gender||0.6354|
fh: father's height,
mh: mother's height,
mph: midparent's height,
ch: child's height,
gender: child's gender
The last model gives a slightly better fit, with a reasonable QQ-plot, and is the one I used for the Shiny App.
serve.Rcode, so you could put non-sensical values (negative, for example), and you will still get a prediction… Perhaps that could be done in the next version of the app.
Aulchenko, Y.S.; et. al. “Predicting human height by Victorian and genomic methods” European Journal of Human Genetics (2009) 17, 1070–1075, DOI: 10.1038/ejhg.2009.5