Yeah, we don't like numbers either. But they are at the end of the chapter for those of you who want to learn 2 × 2 tables, etc. We do like evidence-based medicine (EBM) though, and it will be on the examination. So here goes….

Table 28-1 is here for reference. You may want to refer to it as you work your way through the chapter.

Sensitivity: True Positives/(true positives + false negatives) Specificity: True Negatives/(true negatives + false positives) False Positive Rate: 1 – specificity False Negative Rate: 1 – sensitivity Positive Predictive Value: True Positive/(true positive + false positive) Negative Predictive Value: True Negative/(true negative + false negative) |

**Research published in World's Best Medical Journal studied screening for lung cancer using a new method. The researchers reported that patients who had lung cancer detected via screening lived longer after diagnosis than people who were diagnosed with lung cancer but not screened.**

**Question 28.1.1** **Which is true?**

**A)** This shows that screening is effective at prolonging survival.

**B)** This may be an example of lead-time bias.

**C)** This may be an example of verification bias.

**D)** Well-respected medical journals (and board review books) are always right.

**Answer 28.1.1 The correct answer is "B."** This may be an example of lead-time bias. Screening is intended to diagnose disease earlier, hopefully allowing for interventions that prevent or slow the progression of the disease. Without screening, the disease may be discovered only after symptoms develop when it may be too late to intervene. Screening, however, can also give the appearance of longer survival even though no additional life has been gained. This is called lead-time bias. Here's another example. Mr. X has the test, is diagnosed with disease, receives treatment, and dies 5 years later. Mr. Y is in the control group, develops symptoms at year 4 of the study and dies one year after that. They have both lived for 5 years after being randomized to screening or no screening. Mr. X and Mr. Y both die at age 65 of the same disease. Did Mr. X have more survival time or just more "disease time?" This lead-time bias may be avoided by using age-specific mortality rates rather than survival time from diagnosis. "C," verification bias, occurs when you are looking at a new diagnostic modality, and patients with a negative test result (for the new test) are not evaluated with the gold standard test. For example, verification bias could occur in a study where people with a negative cardiac stress test do not proceed to a cardiac catheterization. This underestimates the prevalence of disease in the population studied (we don't really know about those who didn't have a catheterization) and overestimates the value of the stress test (seemingly, all patients with cardiac disease were picked up by the stress test*…* but only because we didn't look far enough). See Table 28-2 for more types of bias found in studies.

Type of Bias | Efect |
---|---|

Confirmation Bias | This occurs when you only look for data that supports your contention and ignore any information to the contrary (i.e., if i believe that the internet is full of sociopaths, I could confirm that by reading comments under videos posted on related websites and ignoring everything else that would suggest otherwise). |

Confounders (Confounding variables) | This occurs when two or more factors are associated with the outcome and only the one being studied is accounted for (i.e., research shows that 9 out of 10 cubs fans have annoying behavior, but 9 out of 10 cubs fans are also drunk when the annoying behavior occurs; if we do not account for the confounding variable (drunkenness), we may incorrectly assume that being a cubs fan causes annoying behavior … which can't possibly be true … can it?). |

Length time bias (not lead time) | This occurs because screening tests are more likely to find slow-growing tumors rather than those that are rapidly growing. This can bias results in favor of screening because more slow-growing cancers with a good prognosis will be found with a screening test. |

Selection bias | This occurs when subjects selected for the study do not represent the entire population you might see clinically. For example, they may be sicker or less sick than the patients in your practice or they may be excluded from the study for another reason. For example, let's say we are doing a study on renal failure. if all patients with diabetes are excluded, we will not be able to apply the results to our patients many of whom have renal failure and coexisting diabetes. This is avoided by having large, representative samples with few exclusion criteria. |

Spectrum bias | This occurs if a study is skewed toward a particular group of patients, for example, those with an NSTEMi. if you study NSTEMi patients, you cannot apply your results to patients with other related conditions, such as a STEMi or unstable angina. Spectrum bias is common in studies; often the patients studied at a tertiary care institution differ in the severity of illness than those in primary care practices (who in general tend to have less severe disease). |

Performance bias | Something besides what you believe is giving you your positive (or negative) results is actually active. An example of this is estrogen for heart disease. case–control studies suggested estrogen was cardio-protective, but it didn't pan out in randomized, controlled trials. Thus, the positive effect was due to something else; maybe the women on estrogen exercised more, maybe they had better diets, maybe they smoked less and were more interested in their health overall. Randomized, blinded trials protect against this bias. |

**As part of a quality control study, the hemoglobin A _{1c} values of patients with diabetes at two clinics are compared. In a study of 4,000 patients, it is found that the mean hemoglobin A_{1c} value in group 1 is 7.4% and the mean hemoglobin A_{1c} value in group 2 is 7.6%. The authors did the correct statistical test and found a p-value of 0.04 for this comparison.**

**Question 28.1.2** **Based on this information, you conclude:**

**A)** Group 1 is significantly different from group 2. Reject the null hypothesis.

**B)** Group 1 is not significantly different from group 2. Don't reject the null hypothesis.

**C)** Group 1 is not significantly different from group 2. Reject the null hypothesis.

**D)** Group 1 is significantly different from group 2. Don't reject the null hypothesis.

**Answer 28.1.2 The correct answer is "A."** To answer this question, you have to know what the usual cutoff for significance is for a *p*-value, and you also have to know what a null hypothesis is. A null hypothesis is the hypothesis that there is no significant difference between two groups being compared to one another. By setting up null hypotheses in this way, we can then search for proof that the null hypothesis is incorrect. Tests of significance are a method of looking for evidence that a null hypothesis is incorrect. The *p*-value gives you the probability that the results of the study occurred by chance alone. A *p*-value of 0.04 means that if the study results were untrue, we would expect to see these results only 4% of the time by chance alone and not related to the study intervention. By convention, a *p*-value of 0.05 or smaller is considered statistically significant. Thus, when you have a *p*-value of less than 0.05, you have evidence that the null hypothesis is false and can therefore be rejected.

**A type I** error occurs when a difference is found when none is present. The *p*-value provides you with the probability of a type I error. For example, a *p*-value of 0.05 is considered statistically different. What this means, however, is that 5% of the time, the same conclusion would be produced by chance alone. By contrast, a *p*-value of 0.005 means that there is only a 0.5% chance that the conclusion is mistaken and occurred by chance. The lower the *p*-value, the lower the chance of a type I error.

**A type II** error occurs when a study fails to show a difference where one exists. This may occur because there are not enough subjects in a study or when there is measurement error. For example, in a (real) study of lorazepam versus diazepam for seizures, twice as many patients had their seizures stop with lorazepam. However, the conclusion of the study was that there was no difference between the two drugs. This is only because there were not enough subjects for this to reach statistical significance. Including another 100 subjects may have made this reach statistical significance. Remember this by "Type **II** error is **too** few patients."

Recognize forms of bias in research studies?

Define

*p*-value and null hypothesis?Describe the significance of

*p*-value and type I error?Recognize a type II error?

**Being the compulsive physician that you are, you are spending your Saturday morning relaxing by reading your journals (we know who you are; you can't fool us … or else why would you be working through the EBM chapter?). You notice a study of particular interest on type 2 diabetes, using a novel drug called Shugabegone (not a real drug), and note that the data were analyzed in two ways. The first method used was an "intention to treat" analysis. The second method was by a "per-protocol" analysis. Hmm, you think. The per-protocol data sure makes you want to start prescribing Shugabegone for your patients with diabetes.**

**Question 28.2.1** **Which of the following applies to a per-protocol analysis?**

**A)** It provides an objective description of how a new therapy will work in our patient population.

**B)** It is more statistically stringent when compared to an intention-to-treat analysis.

**C)** It allows the study authors to manipulate the outcome to make it look better than it is.

**D)** You are wrong, Dr. Graber. I am not reading this chapter.

**Answer 28.2.1 The correct answer is "C."** Per-protocol analysis allows the authors to make the outcome look better than it is. To understand why, we must know the difference between an "intention to treat" analysis and a "per-protocol analysis." A per-protocol analysis allows the authors to manipulate the results, usually in favor of the study drug. Here is an example: "Of all of the patients who were enrolled in this study, those who took the drug at least 75% of the time were included in the final analysis." There is a problem here. Many of **our** patients will not take all of their medicine. Maybe they forgot to take it. Maybe it had side effects that were intolerable. If we only analyze the patients who took all of their medication, the drug will look better in the per-protocol analysis than it will in our clinical practice where adherence to a regimen is less than complete. You will see this in many papers. In some papers, there is a "wash-in period" where everyone is given the medication and only those who tolerate it for 2 weeks (for example) continue to the main trial. The results here will **not** be representative of our patients, many of whom may not tolerate a medication. In summary, per-protocol analysis = bad.

An intention to treat analysis analyzes *all* study patients in the group to which they were originally assigned. In this case (to continue the hypothetical study above), the authors do not care if the patients took the drug 75% of the time. Even if they *never* took the drug but were assigned to the treatment group, they are analyzed in the treatment group. Many of these patients likely would be treatment failures making the overall results look worse. However, the results will be more applicable to our patient population, many of whom will not take their medications properly. In summary, intention to treat analysis = good.

**"OK," you think. "I'll pay attention to the intention to treat analysis." So, you next look at the inclusion and exclusion criteria. In this diabetes drug study, the exclusion criteria include renal disease, a history of heart failure, coronary artery disease, and peripheral vascular disease.**

**Question 28.2.2** **From this we can conclude:**

**A)** The patients in this study are so finely selected that the results cannot be applied to our general clinic population.

**B)** The results should be generalizable to our general clinic population given the fact that the medication worked so well in this study.

**C)** This is an example of "selection bias."

**D)** A and C.

**E)** B and C.

**Answer 28.2.2 The correct answer is "D."** We cannot apply these results to our general clinic patients. Think about it: how many of your diabetic patients have no renal disease, no history of CHF and no history of CAD/PVD? Not too many. Many of our patients with diabetes have at least some renal disease (proteinuria). You always need to look at the inclusion and exclusion criteria of a trial before you can determine if the trial is applicable to your patients. This is called "selection bias"; only select patients are entered into the study.

There is also a phenomenon called "spectrum bias." In the case of spectrum bias, the patients in the study are different than our patients; they may be sicker or not as sick. Case in point: glycoprotein IIB/IIIA inhibitors for acute coronary syndrome (ACS). The initial studies looked at patients going for cardiac catheterization; it appeared there was a small benefit here. But the drug companies then generalized from these sick patients to say that all patients with ACS should have glycoprotein IIB/IIIA inhibitors. You can't do this. Most of our "ACS" admissions turn out not to have ACS, and they will get better regardless of what we do. They certainly don't need glycoprotein IIB/IIIA inhibitors. And, thankfully, these drugs have fallen out of favor.

Subgroup analyses (you know, the "our drug worked in women over 60" pitch) can **only** be used to generate a hypothesis. This is called the **"derivation set."** Before accepting it into practice, a second study of that subgroup, called the **"validation set"** must be done. This is always true. Don't let them tell you otherwise.

Describe an intention to treat analysis and its value for applying study results?

Recognize the effect of inclusion and exclusion criteria on the application of study results?

Which of the following statements is true?

**A)** Specificity is the most important test characteristic when trying to find a very dangerous disease.

**B)** As sensitivity increases, specificity decreases.

**C)** Specificity need not be considered as long as a test is sensitive enough.

**D)** As sensitivity increases, specificity increases.

**The correct answer is "B."** As sensitivity increases, specificity decreases. This makes intuitive sense. The more cases you detect, the more false positives you will have. We can have a sensitivity of 100% if, for example, we say that everyone with WBCs in their blood and a cough has pneumonia. We will pick up everyone with the disease (very sensitive) but also a lot of patients without the disease (poor specificity…everyone has at least one, lonely, white cell running around). Ideally, we would like to have a diagnostic screening test with both high sensitivity and high specificity. In reality, there is an inherent trade-off between sensitivity and specificity—as sensitivity increases, specificity decreases and vice versa. "A" is incorrect. Generally, *when it is very dangerous not to detect a disease, it is important to have a highly sensitive test (one that will find "all" cases)* with an acceptable specificity. "C" is incorrect. This is why both an ELISA and a Western blot may be done when trying to detect HIV. The ELISA is very sensitive (will pick up the great majority of HIV cases) but is not very specific (will categorize a lot of patients who

**do not have the disease**as positive). The Western blot is more specific and will filter the true positives from the false positives found on the screening test (the ELISA).

You are having a meaningful discussion with an industry representative (yeah, right). OK, let's recalibrate. You are being sold a package of goods by an industry representative. She says that if their test for Dreaded Disease is positive, the likelihood ratio of the disease being present is 3.

Your response to this is:

**A)** "Great! The disease is three times more likely to be present if the test is positive."

**B)** "Not so great! A likelihood ratio of 3 is pretty much worthless in differentiating between those who are ill and those who are not."

**C)** "What is this likelihood ratio stuff anyway?"

**D)** "What happened to my free lunch?"

**The correct answer is "B."** In a situation in which the pretest probability of a disease is between 30% and 70%, a likelihood ratio can meaningfully **reduce** the possibility of disease presence **only** if it is <0.1. In a situation in which the pretest probability of a disease is between 30% and 70%, a likelihood ratio can meaningfully **increase** the possibility of disease presence **only** if it is over 10. So, a likelihood ratio of 3 is more or less useless. Draw some lines on this and you will see what we mean (Fig. 28-1).

**FIGURE 28-1.** Reproduced from the Centre for Evidence-Based Medicine. Available at http://www.cebm.net.

**One common and debilitating complication of diabetes is neuropathy. In a (fictional) study by Tooth E. Fairie et al., one group had routine therapy and an experimental group had intensive therapy for their diabetic neuropathy. In the first group, those assigned to routine therapy, 10% of patients developed neuropathy. In the second group, those assigned to intensive therapy, 2% of patients developed neuropathy.**

**Question 28.3.1** **Using the data above, how many patients with diabetes need to be treated with intensive therapy to prevent the development of one case of neuropathy?**

**A)** 10.

**B)** 11.

**C)** 8.

**D)** 12.5.

**E)** 25.5.

**Answer 28.3.1 The correct answer is "D."** The question is really asking, "What is the number needed to treat (NNT)?" In this question, the absolute risk reduction is 8% (10% in control group vs. 2% in the treated group). The NNT is the number of patients who need to be treated to prevent one adverse outcome. To calculate this, we need to know a few other terms:

ARR = **A**bsolute **R**isk **R**eduction = control group event rate (CER) – experimental group event rate (EER).

NNT = 1*/*ARR (in percent, so 8% = 0.08 and 20% = 0.20)

Using the values given above, ARR = 10 – 2% = 8% and NNT = 1*/*0.08 = 12.5.

**The anti-clotting properties of aspirin are well described. In a (fictional) trial studying the long-term outcome of stroke patients by Don Sox et al., 1% of patients on long-term aspirin therapy developed new onset of strokes and 50% of patients without aspirin therapy developed new strokes.**

**Question 28.3.2** **Using the data above, how many stroke patients need to be treated with aspirin therapy to prevent one new stroke (what is the NNT)?**

**A)** 2.

**B)** 8.

**C)** 10.

**D)** 12.

**E)** 25.

**Answer 28.3.2 The correct answer is "A."** Again, the NNT is the number of patients who need to be treated to prevent one adverse outcome. NNT = 1*/*ARR, where ARR = CER – EER. Using the values given above, 50 – 1% = 49% = ARR and NNT = 1*/*0.49 = 2 (well, really close to 2).

**In a (fictional) pharmaceutical study by Amanda Hugginkes et al., Group A is the placebo group and Group B is the group that received the actual new drug. Data were gathered on Groups A and B and confidence intervals (CI) were calculated. Side effect rates were calculated as a percentage of each group.**

**Question 28.3.3** **Using the 95% CI, which of the following group comparisons are statistically significantly different?**

**A)** Group A CI 30% to 46% and Group B CI 44% to 88%.

**B)** Group A CI 10% to 30% and Group B CI 44% to 88%.

**C)** Group A CI 0.1% to 0.3% and Group B CI 0.2% to 0.4%.

**D)** Group A CI 88% to 90% and Group B CI 88% to 90%.

**E)** None of the above is statistically significant.

**Answer 28.3.3 The correct answer is "B."** The CI is a range of possible high to low values of data. The true mean is likely to be in the specified range. In general, the larger the study group, the more narrow the CI. When you have a large study, you are more likely to get closer to the true value.

"B" is correct because when comparing the CI between two groups, **there is no overlap**. When there is an overlap of CI, as in the other options, the groups are not statistically significantly different. For example, in answer "A" the true mean value of Group A could lie anywhere between 30% and 46% (it could be 45%), and the true mean value of Group B could lie anywhere between 44% and 88% (it could also be 45%); therefore, the groups have no statistically significant difference.

Confidence Intervals are usually given as "CI 95%," meaning that there is a 95% probability that the true mean value will be within the CI. When looking at CI for relative risk (RR), relative benefit, odds ratio, etc., remember that if the CI 95% crosses "1," there is no difference between the groups. Thus, an RR of 4.2, CI 95% of 0.8 to 10 is consistent with a 0.8 times risk or a 10 times risk (or benefit). However, since the CI 95% crosses "1," there is no real difference between the groups.

Confidence intervals are useful when determining the magnitude of a treatment effect. For example, if a RR has CI 95% of 1.2 to 1.4, this means there is a small difference (0.2–0.4 times) between the two groups, even though it is statistically significant. **Remember, something that is statistically significant may not be clinically significant.** On the other hand, if the RR has a CI 95% of 10 to 20, this is a major difference between the groups. This means that one group has a 10 to 20 times greater risk (or benefit depending on what is being studied) than does the other group.

**In a (fictional) clinical trial testing a new provider order entry (POE) technology at Big Important University Hospital, relative and absolute risk reduction is discussed. A group of family medicine residents at the hospital was allowed to give verbal orders during their intern year and averaged seven medication errors per year. In the residents' second year, POE was instituted (in which physicians were required to enter orders and alerts to medication errors were given before finalization of orders), and the group's medication errors dropped to an average of four per year.**

**Question 28.3.4** **Which of the following is true?**

**A)** The RR reduction is 43% and the absolute risk reduction is three in medication errors.

**B)** The RR reduction is 57% and the absolute risk reduction is three in medication errors.

**C)** The RR reduction is 43% and the absolute risk reduction is four in medication errors.

**D)** The RR reduction is 57% and the absolute risk reduction is four in medication errors.

**E)** POE has ruined medicine. I quit! boo-hoo!

**Answer 28.3.4 The correct answer is "A."** POE compared with no POE (the control group) resulted in a 43% relative decrease in the risk of a medication error—from seven to four errors per year (3*/*7 = 43%). The difference in the number of medication errors before and after POE is three errors (7 – 4 = 3), which is the absolute reduction in the risk of a medication error. Now, think about this from a pharmaceutical or medical device representative's point of view. Would you say (a) "We reduced errors by 3 per year," or would you say (b) "We reduced errors by **43% per year**!"? That's how you sell a study.

Calculate number needed to treat?

Differentiate between absolute and relative risk reduction?

Use confidence intervals to determine statistical significance?

**Mr. Handsome Q. Drugrep has come to tell you all about Happy Lucky Golden Drug (HLGD) that is newly indicated for the treatment of the Dreadful Yucks. As a primary care doctor, you are concerned about better treatment of this disease. Current standard treatment involves ChemoRAdical Pharmacotherapy (CRAP). Cure rates with CRAP are only about 10%. Mr. Drugrep has a study that shows HLGD has a 12% cure rate versus placebo. He's very excited and expects HLGD to be the new standard of care.**

**Question 28.4.1** **To his argument, you appropriately respond:**

**A)** "Wow. HLGD is clearly superior to CRAP."

**B)** "Hmm. HLGD is statistically no different from CRAP."

**C)** "Wow. HLGD is clearly superior to placebo."

**D)** "Do you have free samples of HLGD? Where's lunch?"

**E)** "I need more information before I can make an informed decision."

**Answer 28.4.1 The correct answer is "E."** You need more information. Before coming to market, a drug manufacturer must demonstrate safety and efficacy of a drug. The new drug may or may not be compared with another currently available treatment. Without a study comparing HLGD to CRAP, you cannot say anything about how these drugs compare, even if HLGD looks better versus placebo. In addition, "C" is incorrect because the placebo results have not been given.

**You ask Mr. Drugrep for more information. He proudly tells you the drug study involved 10,000 subjects with the Dreadful Yucks, randomly assigned to placebo (5,000) or HLGD (5,000). All of the subjects completed the trial. At the end of 1 year, 400 subjects on placebo (8%) were cured and 600 subjects on HLGD (12%) were cured.**

**Question 28.4.2** **He correctly tells you that:**

**A)** The NNT is 10,000.

**B)** The number needed to harm (NNH) is 10,000.

**C)** The relative benefit of HLGD versus placebo is 50% greater cure rate.

**D)** The absolute benefit of HLGD versus placebo is 50% greater cure rate.

**Answer 28.4.2 The correct answer is "C."** When looking at drug studies, benefit is often stated as "relative benefit" or relative risk reduction. In this question, 600*/*5,000 patients benefit from HLGD and 400*/*5,000 benefit from placebo; thus, 200 more patients are cured with HLGD, 200*/*400 = 0.5 = 50% relative benefit of the drug vs. placebo. The absolute benefit is only 4% (12% cure with HLGD vs. 8% cure with placebo). For the NNT in this example, think about the previously given equation: NNT = 1*/*ARR, where ARR = CER – EER. The control group, the placebo, had a risk reduction of 8% (92% still had disease); and the experiment group had a risk reduction of 12% (88% still had disease). So, the ARR = 12% – 8% = 4%, and NNT = 1*/*0.04 = 25. NNH cannot be calculated with the information available since the adverse event rate is not known.

In real life, there are often more dramatic examples of how relative and absolute risks differ. It may be stated that there is a 50% reduction in complications of diabetes using Drug A versus placebo. However, when translated into patients, this could be 1*/*1,000 complications of diabetes in the drug group versus 2*/*1,000 complications of diabetes in the placebo group. This is a 50% relative decrease in adverse outcomes but in fact may be clinically meaningless. The **absolute** risk reduction is 1*/*1,000 or 0.1%! This ploy is often used to make drug studies look good. Thus, anytime you are looking at a new drug, ask for the **absolute** risk reduction **and** the NNT and the NNH. Forget the relative risk reduction and the p-values.

**Mr. Drugrep tells you that the adverse event rate for HLGD is only 1%. Aren't you impressed? But he frowns a little when you want to know the NNH.**

**Question 28.4.3** **To calculate NNH, you ask him for:**

**A)** The types of adverse events that occurred in the treatment group.

**B)** The percentage of adverse events that occurred in the control group.

**C)** The percentage of adverse events with standard treatment.

**D)** The cure rate in the treatment group.

**Answer 28.4.3 The correct answer is "B."** **Adverse effects** of a drug will often be reported as an absolute number, and here it is 1%. So, the conclusion you are given by the pharmaceutical industry may be **50% reduction in disease and only a 1% risk of side effects of the drug.** Both of these statements are true, but it's an "apples and oranges" comparison. We prefer comparing apples-to-apples (or corn-to-corn in Iowa). In order to directly compare benefits and harms, we need to know the NNT and the NNH.

Let's say that when you ask Mr. Drugrep, he tells you that the adverse event rate in the placebo group was 0.5%. Here's the calculation: NNH = 1*/*ARI, where ARI (absolute risk **increase**) = risk in experiment group – risk in control group.

Using the numbers in this question: ARI = 1 – 0.5 = 0.5; NNH = 1*/*0.5 = 2.

So, for HLGD, the NNT is 25 and the NNH is 2. By the way, the adverse event in question is disfiguring, painful ear hair growth. You will have to treat 25 patients with HLGD to cure one case of the Dreadful Yucks; but with every two patients you treat, one will have an adverse event. Demand NNT and NNH: how many patients who take the drug will benefit and how many will be harmed?

Employ CI in the analysis of data?

Analyze data using risk reduction and relative benefit?

Understand the importance of absolute risk reduction, NNH, and NNT when clinically applying data from a study?

**Mounting paperwork and electronic medical record hassles have played a role in your decision to make a career change. You have found a nice academic job with a research focus—minimal patient care, 10 weeks of vacation, no paperwork. Your work centers on reducing the risk of stroke in patients who have survived one stroke.**

**Question 28.5.1** **This is an example of which category of prevention?**

**A)** Primary prevention.

**B)** Secondary prevention.

**C)** Tertiary prevention.

**D)** Quaternary prevention.

**Answer 28.5.1 The correct answer is "C."** The idea behind **primary prevention**, a big interest in primary care, is to prevent a disease from occurring at all by removing its cause (i.e., influenza vaccine to prevent illness from influenza). Primary prevention may occur in the healthcare setting but is often in the domain of public health. **Secondary prevention** detects disease at an early stage so that intervention can prevent progression (i.e., Pap smears detecting dysplasia prior to cancer declaring itself or treating elevated cholesterol prior to vascular events). Your new job will be to study **tertiary prevention**: the reduction in complications and mortality due to disease after it is recognized. The line between secondary and tertiary prevention can be blurry: some would consider preventing another stroke "secondary" prevention and preventing stroke complications (e.g., muscle atrophy and pressure ulcer) "tertiary" prevention. There is no such thing as quaternary prevention.

**In between day-trading and coffee breaks, you plan to study two groups of patients (A and B) to see if variable XYZ makes any difference in death or recurrent stroke. There is no randomization and there are no interventions. You are just reviewing records to see how each group did. Subjects in Group A had a stroke and then had another stroke or died a year later. Subjects in Group B had a stroke but were alive with no recurrent stroke at the time of the study. You assess the presence of XYZ in each group.**

**Question 28.5.2** **This type of study is called a:**

**A)** Prospective study.

**B)** Case–control study.

**C)** Cohort study.

**D)** Randomized, controlled study.

**Answer 28.5.2 The correct answer is "B."** A **case–control** study, like this one, will look at select subjects who are categorized based on outcome and try to find associations with certain variables. Case–control studies do not follow subjects over time and therefore are not prospective. **Cohort studies** look at groups starting at time zero and following them for a specified amount of time to find an association between a variable and an outcome. The variable in question is not under the researcher's control. An example of a cohort study might be one looking at the association between two different diets (e.g., high-protein vs. high-carbohydrate) and the development of type 2 diabetes. **The highest quality evidence is produced by a randomized, double blind, controlled trial**, in which the researcher has control over exposure to a variable and studies its effect on an outcome. In general, the strength of trial design goes: experimental study > cohort study > case–control study > cross-sectional study. Unfortunately, it is not possible to design randomized controlled studies for all conditions. So, a well-done cohort study may be the best we can do.

**You are concerned about numerous confounding variables in your study population. Never fear!**

**Question 28.5.3** **Your trusty statistician recommends the following in order to minimize confounding:**

**A)** Multivariate analysis.

**B)** Careful calculation of *p*-values.

**C)** Matched controls and cases.

**D)** A and B.

**E)** A and C.

**Answer 28.5.3 The correct answer is "E."** Confounders can be a serious threat to any study. Confounders result from extrinsic factors—things that may affect the outcome and are also associated with the variable but are not accounted for in the study. As an example, a study may find an association between long-haul truck driving and lung cancer. If tobacco use was not accounted for in this study, the results of the study would be meaningless. Tobacco is a confounder. It is always advisable to look at a study with an eye for what confounder might be missing. Confounding can be limited by a study design that anticipates confounders and matches controls and cases ("C"). It is important to note that if you match your control and cases on a variable, you can no longer study that variable as a potential cause of the outcome. For example, if you match cases and controls on county of residence among the truck drivers, you can no longer explore county of residence as a risk factor for lung cancer. Also, multivariate analysis ("A") is a statistical method that allows for adjustment of known confounders. "B" is incorrect because *p*-value has nothing to do with confounding but will tell you whether the results should be considered significant or not.

**When you review the literature, you find that there are a number of small studies looking at the effect of variable XYZ on stroke victims. You even find a meta-analysis.**

**Question 28.5.4** **If this is a well-done meta-analysis, you should find all of the following EXCEPT:**

**A)** Statistically confirmed heterogeneity between the included studies.

**B)** A thorough search for all valid studies.

**C)** An evaluation of whether estimates change with varying assumptions.

**D)** The exclusion of poor-quality studies.

**E)** The studies included measure the same underlying effect.

**Answer 28.5.4 The correct answer is "A."** Hopefully, a meta-analysis would confirm *homogeneity* between studies. For example, if one study measured NIH stroke scale and another measured patient quality of life (the outcomes are heterogeneous), it would be impossible to combine the studies. Although there is controversy regarding the appropriate use of meta-analyses, they are often used to study various outcomes by combining smaller studies. A meta-analysis is a systematic review that combines the results of previous studies to evaluate the magnitude or direction of an effect or to evaluate the effect on a subgroup. All valid studies looking at similar outcomes should be included; poor-quality studies should be excluded. The Jadad score is one common way that studies are judged as to their appropriateness for a meta-analysis (Table 28-3). Meta-analyses should also include a "sensitivity analysis." This may consist of excluding large studies and only analyzing smaller studies. It may consist of changing the economic assumptions in a cost-benefit meta-analysis. If the outcome is the same either way, the result is said to be "robust." If excluding some studies or changing the underlying economic assumption changes the result, one has to wonder about the quality of the studies etc.

Define different types of prevention?

Define different study types?

Identify and account for confounding variables?

Describe some characteristics of a meta-analysis?

You will see an increasing number of "non-inferiority" studies being done. What does "non-inferior" mean? Basically, it means that two drugs have *some* overlap in their efficacy. See Figure 28-2. Drugs Y is non-inferior to Drug X even though it may not cure as many patients. "Non-inferior" does **not** mean "as good as." "Biocreep" (which is one of our favorite terms and reminds us of zombies) occurs when drug Z is now compared to drug Y. Drug Z is non-inferior to Drug Y although it clearly cures fewer people than Drug X. Thus, the standard against which new drugs are compared becomes progressively less efficacious (Fig. 28-2). In another concern, researchers can define their own "margin" of non-inferiority. For example if the margin is "2" (as in some of the new anticoagulant studies), the new drug could allow up to twice as many events (e.g., pulmonary emboli) as the old drug and still be considered "non-inferior."

**FIGURE 28-2.**

**Uh, oh … Here comes the math. This section is important, especially the concepts of positive and negative predictive values (PPV and NPV) and the concepts of sensitivity and specificity. You need not do the math if you do not want to (although it is simple). Here is a summary:**

**Sensitivity: How often the test will pick up the disease if it is there.** Sensitivity = true positives*/*(true positives + false negatives). Note that the sum of true positives + false negatives represents all of the people with disease.

**Specificity: Specificity is defined as the proportion of patients who do not have the disease and who will test negative for it.** Specificity = true negatives*/*(true negatives + false positives). Note that the sum of true negatives + false positives represents all of the people who do not have disease.

**Positive predictive value:** The probability that someone with a positive test actually has the disease. This takes the prevalence of a disease into account. For example, an individual with a positive HIV test who is an IV drug user is more likely to really have the disease than a clean-living nun with a positive HIV test. In the nun, the test is more likely to be a false positive.

**Negative predictive value:** The probability that someone with a negative test actually does not have the disease. Again, this takes the prevalence of the disease into account. So, for example, a negative HIV test in an IV drug user from Sub-Saharan Africa with a CD4 count of 150/mm^{3} and PCP is likely to be a false negative. Conversely, a negative HIV test in a nun, for example, is likely to be a true negative.

**A new test, the "reception-o-meter," has been developed that can tell whether a cell phone will have reception in a given area (allegedly better than a guy walking around asking, "Can you hear me now?"). When compared with the gold standard of turning on your cell phone and checking whether you have reception or not, the new test has a sensitivity of 90% (will pick up a signal 90% of the time when there is one) and a specificity of 95% (there are only 5% false positives; thus, 95% of the time when the reception-o-meter says there is a signal, there will actually be one).**

**So how can you tell if the phone company is pulling a fast one or if this is a good test? You need to know the PPV of the test. In order to calculate the PPV, you need three pieces of data: the sensitivity of the test (how often the test will pick up the "disease" if it is there), the specificity of the test (how often you will get a false positive), and the prevalence of the condition, which in this case is the prevalence of having cell phone reception (in other words, the true amount of cell phone reception in a given area).**

**You are currently in Los Angeles, attending a CME course where the reception for carrier X is 99%. You check your "reception-o-meter" and it says you have coverage. But does this mean you have coverage?**

In order to answer this question, you can use Bayes theorem or set up 2 × 2 tables. Here's the 2 × 2 table method. Begin by drawing a 2 × 2 table and filling in what you know. See Table 28-4 and Table 28-5.

Test | Disease | |
---|---|---|

+ | - | |

+ | a (true positive) | b (false positive) |

- | c (false negative) | d (true negative) |

Total | a + c | b + d |

Actual Reception | |||
---|---|---|---|

+ | – | ||

Test | + | 90 (true positive) | 5 (false positive) |

Reception | – | 10 (false negative) | 95 (true negative) |

Total | 100 | 100 |

So this makes it easy.

If we have 100 phones, the data will look like that above.

So let's add actual numbers to the table (above). Let's use a population of 10,000. We multiply by the prevalence of reception to get the subpopulation totals. Ninety-nine percent of the population has reception (99% prevalence). So, 99% prevalence × 10,000 = 9,900 with reception; 1% × 10,000 = 100 without reception. Once we have these numbers, we simply multiply by the sensitivity and the specificity to get the exact cell numbers to plug into the table above: 9,900 × 90% sensitivity = 8,910 for cell A ("true positives"); 9,900 – 8,910 = 990 for cell C ("false negatives"); 100 × 95% specificity = 95 for cell D ("true negatives"); 100 – 95 = 5 for cell B ("false positives"). See Table 28-6.

After Adding Actual 99% Prevalence | |||
---|---|---|---|

Actual Reception | |||

+ | – | ||

Test | + | 8,910 | 5 |

Reception | – | 990 | 95 |

Total | 9,900 | 100 | |

(99% prevalence) |

Once the table is filled in, these numbers can then be used to calculate the PPV, using the equation above. In this case, a*/*(a + b) = 8,910/8,915 = 99.9%.

For those who prefer the Bayes theorem method, here's how this approach is done. Bayes theorem shows the relationships between sensitivity, specificity, prevalence, PPV, and NPV. The equation for PPV, derived from Bayes theorem, is shown as is the calculation based on the numbers from the question:

**Question 28.6.1** **What would the likelihood of not having coverage be if the "reception-o-meter" had said you did not have coverage (what is the NPV)?**

**A)** 95%.

**B)** 90%.

**C)** 50%.

**D)** 9%.

**E)** None of the above.

**Answer 28.6.1 The correct answer is "D."** The question asks for the NPV—the likelihood of not having coverage if the reception-o-meter is negative. This also can be derived from Bayes theorem or calculated using a 2 × 2 table. For those of you who prefer the Bayes theorem method, the equation for NPV, derived from Bayes theorem, is shown as is the calculation based on the numbers from the question.

**You are now in rural Russia where you were invited to help with community efforts to fight multidrug resistant tuberculosis. Here cell phone reception is 10% for Carrier Y. You check your "reception-o-meter" and it says you have reception.**

**Question 28.6.2** **What is the likelihood that your cell phone actually will have reception if you try to make a call?**

**A)** 91%.

**B)** 83%.

**C)** 67%.

**D)** 16%.

**E)** None of the above.

**Answer 28.6.2 The correct answer is "C."** You can use the 2 × 2 method or the Bayes theorem methods.

Here's what our 2 × 2 table looks like. See Table 28-7.

Before Adding Actual Prevalence | After Adding Actual Prevalence | ||||||
---|---|---|---|---|---|---|---|

Actual Reception | Actual Reception | ||||||

+ | – | + | – | ||||

Test | + | 90 | 5 | Test | + | 900 | 450 |

Reception | – | 10 | 95 | Reception | – | 100 | 8,550 |

Total | 100 | 100 | Total | 1,000 (10% prevalence) | 9,000 | ||

(50% prevalence) | (10% prevalence) |

To convert to 10% prevalence, we start with a large baseline population and multiply by the prevalence to get the subpopulation totals (10% prevalence × 10,000 = 1,000 with reception; 90% × 10,000 = 9,000 without reception). Once we have the subpopulation totals, we multiply by the sensitivity and the specificity to get the exact cell numbers (1,000 × 90% sensitivity = 900 for cell "a"; 1,000 – 900 = 100 for cell "c" (or alternately 1,000 × 10% will get the same result for cell "c"); 9,000 × 95% specificity = 8,550 for cell "d"; 9,000 – 8,550 = 450 for cell "b").

These numbers can then be used to calculate the PPV, using the equation above. In this case, a*/*(a + b) = 900*/*(900 + 450) = 66.7% (rounds to 67%). Using Bayes theorem, the equation is as follows.

A test that has a negative predictive value of 99% may sound good. But if only 1% of the population has the disease, doing **no** test will have a 99% negative predictive value.

**Question 28.6.3** **Still in that remote tuberculosis-infested region of Russia, we ask: What would the likelihood of having coverage be if the "reception-o-meter" said you did not have coverage?**

**A)** 50%.

**B)** 40%.

**C)** 30%.

**D)** 1%.

**E)** None of the above.

**Answer 28.6.3 The correct answer is "D."** Again, you can use the 2 × 2 method or Bayes theorem. The 2 × 2 table for this question is the same as it was for the previous question (see Table 28-7). However, unlike previously, you are asked for the likelihood of reception if the "reception-o-meter" said there was no reception. In other words, you have been asked to calculate the false negative rate (FNR) for this scenario. The equation for the FNR is below.

You were not asked to calculate it, but there is also a false positive rate (FPR), which is shown below.

**Cervical cancer is a disease in which early detection can make a great difference in halting disease progression. One screening procedure for this disease is the Papanicolaou ("Pap") smear. In a (fictional) study to assess the competency of technicians who read the Pap smear slides, a local lab checked their technician's work against patient records.**

**A total of 1,000 Pap smears were read. Of these, 100 patients had cervical abnormalities based on biopsy (gold standard). Of this group, 75 had abnormal (positive) Pap smears and 25 had negative Pap smears. There were 900 women without disease. Of these 900 women, 200 had positive Pap smears and 700 had negative Pap smears. Note that these are example numbers only, have no basis in reality, and do not reflect the actual sensitivities and specificities of Pap smears.**

**Question 28.6.4** **Using the data above, which of the following is true about this survey of Pap smear technicians?**

**A)** FNR is 20%.

**B)** FPR is 15%.

**C)** The sensitivity of the Pap test is 75%.

**D)** The specificity of the Pap test is 98%.

**E)** The prevalence of cervical cancer in this sample is 7.5%.

**Answer 28.6.4 The correct answer is "C."** The sensitivity of the test is 75%. Setting up the data in a 2 × 2 table, we are able to answer the question. See Table 28-8.

Pap Test Result | Cervical Disease | No Cervical Disease |
---|---|---|

Positive | True positive (TP) = 75 | False positive (FP) = 200 |

Negative | False negative (FN) = 25 | True negative (TN) = 700 |

Total | TP + FN = 100 | FP + TN = 900 |

**Sensitivity: Probability that a patient with the disease will have a positive result.**

*/*(TP + FN)) = 75

*/*100 = 0.75 or 75% sensitive.

**Specificity: Probability that a patient without the disease will have a negative test.**

*/*(FP + TN)) = 700

*/*900 = 0.777 or about 78% specific.

**FNR: Patient has the disease but the test is negative.**

*/*(TP + FN)) = 25

*/*100 = 25% FNR.

**Also calculated as 1 – sensitivity.**

**FPR: The patient has a positive test but does not have the disease.**

*/*(FP + TN)) = 200

*/*900 = 0.22 or 22% false positive.

**Also calculated as 1 – specificity.**

**We are going to make another assumption here about the prevalence of disease. The prevalence is the proportion of individuals who have the disease at any point in time. One way to describe it is as follows: prevalence = ((TP + FN) /Total population) = 100/1,000 = 10% or prevalence of 100 per 1,000 people.**

**Question 28.6.5** **Given the above results of the Pap smear screening tests and if the prevalence of cervical abnormalities among women is 10%, then applying Bayes theorem, we find:**

**A)** The PPV is 27%.

**B)** The NPV is 96%.

**C)** The PPV is 0.999.

**D)** Unable to solve the problem with data provided.

**E)** A and B.

**Answer 28.6.5 The correct answer is "E."** The prevalence of a disease is the proportion of individuals who have the disease at a given point in time ((TP + FN)*/*(Total population) = 0.1 or 10%).

The **PPV** of a test is **the probability that a disease exists given a positive test result**= TP*/*(TP + FP) or 75*/*275 = 27%. So, a patient with a positive test result only has a 27% chance of actually having the disease because there are so many false positives.

The **NPV** of a test is the probability of no disease given a negative test result (TN*/*(FN + TN)) = 700*/*725 = 96%. So, a patient with a negative test has a 96% chance of **not** having the disease. This is because there are few false negatives compared with the size of the overall population. If, for example, there were 200 false negatives in the same population, the negative predictive value would be only 700*/*900 = 78%.

**Recall that 100 out of 1,000 women had positive biopsies and thus had the disease regardless of what the Pap test said.**

**Question 28.6.6** **How does the pretest probability of cervical abnormalities among women compare with the posttest probability?**

**A)** Posttest probability is about three times greater than the pretest probability.

**B)** Pretest probability is about three times greater than the posttest probability.

**C)** Posttest probability is 10 times greater.

**D)** Pretest probability is 10 times greater.

**E)** The pretest and posttest probabilities are equal.

**Answer 28.6.6 The correct answer is "A."** The pretest probability is given above as 100*/*1,000 or 10%. We know that 10% of the population has the disease. **The posttest probability is defined as the PPV.** Remember from above, the **PPV** of a test is the probability that a disease exists given a positive test result = TP*/*(TP + FP) or 75*/*275 = 27%. Comparing the two results, pretest probability of 10% and posttest probability of 27%, we find that the posttest probability is about three times greater than the pretest probability. If answer "E" were correct and the pretest and posttest probabilities were equal, there would be no point in doing the test.

Define and calculate sensitivity and then apply it to data interpretation?

Define and calculate specificity and then apply it to data interpretation?

Calculate positive and negative predictive values?

Apply Bayes theorem to determine the utility of a test?

A highly sensitive test helps to rule OUT disease; a highly specific test helps to rule IN disease.

A treatment that is

**statistically significantly**superior to placebo may not offer a**clinically significant**benefit. Use clinical judgment when interpreting study results.Compare number needed to treat (NNT) with number needed to harm (NNH) when considering therapies, rather than relying on relative risk reduction. The same calculation can be done for screening tests (e.g., number of women needed to screen to avoid one breast cancer death).

Do not draw conclusions from subgroup analyses. The only conclusion that can be drawn is, "This must be studied."

Recognize that the utility of a test is contingent upon the sensitivity and specificity of the test and the prevalence of disease in the population being tested. Therefore, a sensitive and specific test may have a low predictive value in a population with very low disease prevalence.

When evaluating a non-inferiority study, look for the "margin" used by the investigators. This is the maximum extent of clinical difference that will be considered non-inferior (e.g., a margin of 2 means twice as many events can occur in the experiment group and still be considered non-inferior).

*Clinical Epidemiology: The Essentials*. 5th ed. Baltimore, MD: Williams & Wilkins; 2012.

*How to Read a Paper: The Basics of Evidence-Based Medicine*. 5th ed. West Sussex, UK: John Wiley and Sons, Ltd, BMJ Books; 2014.

*Principles of Biostatistics*. 2nd ed. Australia: Duxbury Thomson Learning; 2000.