Regression is a tool used by statisticians and clinicians to establish or identify a relationship between one or more explanatory covariates and a response variable (outcome). Many textbooks are devoted to specific regression models and strategies. In particular, Regression Modeling Strategies with Applications to Linear Models, Logistic Regression and Survival Analysis by Harrell (1) is an excellent reference. In this chapter, we briefly summarize and exemplify the techniques that are fundamental to outcomes research.
First we address common, or “macro,” issues in regression modeling, regardless of the scale of outcome. We discuss the purpose of an analysis and the relationship to covariate selection, model complexity, and criteria for model performance. Focus is placed on interpretation of models on a “macro” scale. We then focus on features of specific models (linear, logistic, and hazards regression), paying particular attention to the hypotheses and data behind an analysis. Parameters and output are interpreted on a “micro” scale for each of the three regression methods.
Before building a regression model, the statistician and clinician should discuss the modeling purpose, key hypotheses, and available data. Each of these will guide an appropriate regression analysis.
The purpose of an analysis may not be straightforward or singular. Some common purposes seen in practice are to establish a predictive model for an outcome (prediction), identify covariate associations that can inform understanding of disease etiology (association), or examine a specific variable of interest while adjusting for the effects of others (adjustment). If we knew the “true model,” including all important variables and their functional relationships with outcome, then model would be useful for all purposes. In practice, we do not know the truth, and our decisions have trade-offs. Understanding the purpose of the model helps us create a model that is useful, albeit imperfect. As its name suggests, a predictive model is used when clinicians are interested in predicting the outcome for future patients based on currently available data. Having a predicted risk of outcome can guide treatment and therapies as well as inform the patient. The goal is to achieve precise predictions of future outcomes, rather than interpreting specific parameter estimates. There is no benefit from including covariates for which the effect cannot be well estimated. Researchers might be willing to yield some bias in individual covariate estimates in exchange for greater precision. Hence, statisticians and clinicians must work to identify the covariates that are the most important to predicting outcome. Automated variable-selection techniques are commonly employed; such techniques are discussed below. However, any variable selection should still be guided, in part, by clinical practice. In a predictive study, interpretation of the model can be largely focused on performance statistics, such as the C-index, to evaluate the model prediction as a whole. Some of these predictive assessment tools are described below. Researchers often publish an entire predictive model or design a nomogram to aid practitioners in establishing patient risk ...