![]() |
|||||||
![]() |
|||||||
![]() |
home > dr. van der laan > research interests Research interests |
||||||
![]() |
Mark van der Laans
main research interests are: He believes that these three research areas overlap extensively, and that in the future, statisticians will encounter typical data sets that involve longitudinal data, where gene-expression profiles, SNP-profiles, DNA-profiles (Comparative Genomic Hybridization) and biomarker data are measured at various points in time, in addition to the usual covariates and survival type outcomes. Mark and James Robins have written a book on a "Unified Approach to Censored Data and Causality," (Springer, 2002) which describes locally optimal estimating function methods to deal with such high dimensional data sets. These methods model the parameter of interest, and aim to minimize the effect of modeling assumptions on the nuisance parameters, and minimize the need for modeling nuisance parameters. They study double robust estimation procedures, which are guaranteed to always be more nonparametric than a maximum likelihood procedure. Under appropriate model assumptions, these estimators are asymptotically normally distributed, and efficient at a user supplied submodel. Mark and collaborators are involved in research in causal inference on estimation of direct and indirect causal effects in longitudinal studies, estimation of a causal effect of treatment in a randomized trial with non-compliance, data adaptive estimation of causal effects, and they introduced a new class of history adjusted marginal structural models (generalizing Robins's Marginal Structural Models) which allow adjustment by time-dependent covariates, and estimation of statically optimal dynamic treatment regimes. Parameters of interest (such as regressions, densities, hazards) used to answer Public Health questions of interest in Genomics and Epidemiology are typically estimated using an estimator relying on model assumptions (e.g., linear model, covariates used in the model, nuisance parameter model). This is also still true for the estimation function methodology. The estimator used, i.e., also the implicit model assumptions, are invariably subject to relative arbitrary choices. Therefore estimator selection procedures need to be developed to assist the statistician into the choice for an appropriate estimator and to reduce the subjective component of estimator selection: estimator selection needs to become more data driven. Estimator selection thus designates a critical component of statistical inferences made in Genomics and Epidemiology. It encompasses in particular a number of selection problems which have traditionally been treated separately in the statistical literature or have not been treated at all: predictor selection based on censored outcomes, predictor selection based on multivariate outcomes, density estimator selection, survival function estimator selection, and counterfactual predictor selection in causal inference. Recent work by Mark and collaborators showed that this common issue of estimator selection in Biostatistics can be successfully addressed using a unified cross-validation loss based estimation methodology. Asymptotic and finite sample results have shown that the proposed cross-validation estimator selection procedure should be conducted more aggressively than believed in the past. These new theoretical results establish that data, even finite sample data, contains enough information to engage in an intensive data driven search among candidate estimators using cross-validation to select the estimator used to answer the question of interest in practice. An important component of Mark's research focuses on statistical methods based on this unified cross-validation loss based estimation methodology in order to provide the end-users with data adaptive statistical routines to conduct parameter estimation in different applications in Genomics and Epidemiology. The dominating feature of all applications of such methods in Genomics and Epidemiology is the large number of candidate estimators to consider and thus the need for computationally intensive algorithms to generate these candidate estimators and select the best one. The proposed estimation methodology consists of combining two components sequentially. The first component is a method to generate candidate estimators so as to perform an intensive and thorough search among the space of possible estimators for estimating the parameter of interest. The second component of the methodology is the unified cross-validation methodology to select the best estimator from the pool of candidate estimators generated from the previous component of the methodology. This general methodology is extremely flexible and can be adapted to all learning/estimation problems by modifying the definition of the so called loss function. In order to address the fact that one is typically interested in simultaneously estimating and testing many parameters, Mark and his collaborators have developed new multiple testing methodology which avoids the need for specifying (e.g., artificial) null distributions for the data generating distributions, and controls (under general distributions) user supplied Type-I error rates such as the Family Wise Error, Generalized Family Wise Error, Tail Probability of the Proportion of False Positives, and False Discovery Rate. |
||||||
![]() |
|||||||
![]() |
|||||||