Skip to Main Content


If we could see the future, could we improve our patients’ care? For example, what if we could predict which patients may be at risk of a clinical decline in the next 6 hours?1–3 What if we knew which patients in a primary care practice were going to skip an essential outpatient appointment?4–6 And if we implemented an intervention—say, a reminder by phone—could we measure whether our intervention meaningfully improved adherence (or some other outcome)?

Predictive modeling in healthcare delivery science uses the models of biostatistics in healthcare delivery. A key aspect of these more advanced models—compared to those we’ve looked at earlier in the book—is that we can now begin to look at the role of multiple exposures in achieving the outcome of interest.


One of the principles of predictive modeling—like all statistical analyses—is that you must understand the assumptions of your models. In this chapter, we will start with the basic building block of regression analysis. Three types of regression (linear, logistic, and proportional hazards) are workhorses for a tremendous amount of healthcare delivery science, and in fact for all the biological sciences. These regression models make two key assumptions: (1) each observation (or patient) is independent of each other observation; and (2) we understand the mathematical relationship of all the variables well enough that we can describe them accurately with specific functions that have a few parameters (i.e., these regression models are parametric methods). Sometimes this second assumption doesn’t hold. In those cases, we can turn to nonparametric methods, and we’ll do that in this chapter—exploring Classification and Regression Tree (CART) analysis. We will also have a section to help you avoid problems, in which we will discuss many of the common pitfalls and land mines inherent with each of these models. We call all of this Predictive Modeling 1.0.

Next, we will explore what to do when that first assumption—that each observation is independent of each other observation—doesn’t hold. In those cases, one value predicts another value in the dataset. Ordinary linear and logistic regression simply can’t deal with these problems. This is tricky to handle, and we’ll call it Predictive Modeling 2.0 to reflect that complexity. We will conclude the chapter with exploring additional methods that handle these issues, but that trade simplicity for complexity. These models better reflect the actual environment in which we deliver healthcare, but they may be harder to describe to other people. For example, we should have the methodologic flexibility to model the hierarchical nature of data (e.g., multiple patients cared for by a single attending physician, or multiple patients housed on a single medical floor in a hospital). The trade-off, though, is that complex models and their results are harder to explain.


We have already ...

Pop-up div Successfully Displayed

This div only appears when the trigger link is hovered over. Otherwise it is hidden from view.