Vine Regression

Vine regression is illustrated with the National Longitudinal Study of Youth to estimate the effect of the duration of breastfeeding on IQ. The effect varies according to an individual’s IQ and the amount of breastfeeding received.

Download

Date

Nov. 24, 2015

Authors

Roger Cooke, Harry Joe, and Bo Chang

Publication

Working Paper

Reading time

1 minute

Regular vines or vine copula provide a rich class of multivariate densities with arbitrary one dimensional margins and Gaussian or non-Gaussian dependence structures. The density enables calculation of all conditional distributions, in particular, regression functions for any subset of variables conditional on any disjoint set of variables can be computed, either analytically or by simulation. Regular vines can be used to fit or smooth non-discrete multivariate data. The epicycles of regression - including/excluding covariates, interactions, higher order terms, multi collinearity, model fit, transformations, heteroscedasticity, bias, convergence, efficiency - are dispelled, and only the question of finding an adequate vine copula remains. This article illustrates vine regression with a data set from the National Longitudinal Study of Youth relating breastfeeding to IQ. Based on the Gaussian C-Vine, the expected effects of breastfeeding on IQ depend on IQ, on the baseline level of breastfeeding, on the duration of additional breastfeeding and on the values of other covariates. A child given 2 weeks breastfeeding can expect to increase his/her IQ by 1.5 to 2 IQ points by adding 10 weeks of Breastfeeding, depending on values of other covariates. Averaged over the NLSY data, 10 weeks additional breast feeding yields an expected gain in IQ of 0.726 IQ points. Such differentiated predictions cannot be obtained by regression models which are linear in the covariates.

Key findings

  • Vine regression uses new techniques to estimate high dimensional densities. With this, we can compute the expected response conditional on the covariate values for each individual in a population. This is called the regression function for the response.
  • Once a suitable density is found, we needn’t agonize over which covariates to include, whether to include higher order terms, whether to include interactions, whether to transform the variables, and so on.
  • We can compute the effect of increasing the value of one covariate by computing the regression function for the response, augmenting one covariate and keeping the others fixed. We then compare the two regression functions with and without augmentation.
  • Averaged over the data, 10 weeks additional breastfeeding yields an expected gain in IQ of 0.726 points. Such differentiated predictions cannot be obtained by regression models which are linear in the covariates.
  • Vine regression works for (roughly) continuous data, for which the dependences are monotonic (though not necessarily linear).

Authors

Related Content