++
Relatively few physicians design clinical trials, and a limited
number enter patients into clinical trials—but all clinicians
read published accounts of clinical trials and use the results to
guide the treatment of patients. A checklist of questions to help
the physician interpret and evaluate clinical trials appears in
Table 7–6.
++
The null hypothesis—and what would constitute a meaningful
difference in outcome—should be stated in the methods section
of a reported clinical trial. In the diabetes therapy trial, the
primary outcome and the difference that was considered meaningful
were clearly stated prior to beginning the trial. Trials should
be designed to test one specific hypothesis or only a few hypotheses,
and these should be evident to the reader.
++
The characteristics of the study population are particularly
important when assessing the relevance of a specific trial to an
individual practitioner’s patients. In the diabetes therapy
trial, the eligibility and exclusion criteria are clearly stated,
as summarized in Table 7–7. Summary data about sex, age,
and medical histories of the participants appear in Table 7–4.
The combination of the entry criteria and the demographic characteristics
of the study’s entrants provides the reader with an adequate
description of the study group. With this information, the reader
is better able to judge whether the results of this study are applicable
to a particular patient. Patients are often excluded from a trial
for pragmatic reasons, such as inability to comply with treatment
or lack of fluency in the language of the investigators. These exclusions,
however, may limit the ability to generalize study findings to other
patient groups.
++
++
Once randomized to a particular treatment regimen, a patient
may adhere (comply) or elect not to follow the prescribed regimen.
Possible reasons for noncompliance are listed in Table 7–8.
The investigator cannot force participants to comply, since such
coercion would violate the rights of subjects to participate of
their own free choice.
++
++
There are several possible ways, however, to increase compliance
in a clinical trial (Table 7–9). The investigator may be
able to select subjects who can be expected to be compliant. Motivation to
participate is likely to be enhanced if the patients perceive themselves
to be at high risk of an adverse health consequence. In the diabetes
therapy trial, all enrolled subjects had an illness that put them
at risk for the long-term vascular complications of diabetes. Also,
participants are apt to be motivated to comply if the treatments
offered may reduce the need for painful or debilitating therapy.
In the diabetes therapy trial, for instance, patients may have hoped
that more intensive therapy, or participation in a clinical trial
with stringent guidelines for treatment, would help reduce the long-term
complications of disease.
++
++
In nonurgent situations, the investigator may be able to assess
probable compliance before randomization is performed. To monitor
compliance the investigator may ask eligible participants to take
either an active or an inert medication. This pretest interval is
referred to as a “run-in” test period. Individuals
who show a likelihood of compliance are randomized to the treatments
of interest, and those likely to be noncompliant are excluded from
the trial. Other strategies to increase compliance include providing
incentives for participation or maintaining frequent contact with
subjects. Compliance also is likely to be enhanced by keeping the
duration of the intervention as brief as possible.
++
Regardless of how carefully a clinical trial is designed, it
is likely that some subjects will not adhere to the treatment regimen.
The extent to which participants actually comply can be assessed by
various approaches. Personal reports by subjects and family members
provide a simple but possibly unreliable basis on which to determine
compliance. In drug studies, a traditional approach to assessing
compliance is to count the number of unused pills at regular intervals.
However, because pills can disappear for reasons other than ingestion
by the subject, pill counts provide suggestive but not definitive
evidence of compliance. The most conclusive evidence of compliance
with a drug regimen is likely to be obtained by measurement of the
drug (or a metabolite) in the subject’s blood or urine.
However, even this type of biological assay has limited utility,
with the most obvious constraints being cost, inconvenience to subjects,
and the difficulty of collecting specimens from some individuals.
Moreover, long-term compliance typically cannot be assessed by measurements
of such specimens, as the presence of most drugs is detectable in
blood or urine for no more than a few days. The diabetes therapy
trial was fortunate to be able to measure glycosylated hemoglobin,
a long-term measure of blood glucose control, as a secondary end point.
Although measurement of glycosylated hemoglobin levels is not a
direct measure of compliance, among the two treatment groups it
provided an independent measure of effect of treatment that was
expected to differ if the treatments did differentially affect subjects’ plasma
glucose levels.
++
Despite the difficulties inherent in assessing compliance, it
is important to estimate the extent to which subjects adhere to
the assigned regimens. The ability of a study to identify a true
effect of treatment (statistical power) may be diminished if a substantial
proportion of the participants do not comply with the assigned treatment.
That is, the observed difference in outcomes between the study groups
may be reduced because of noncompliance. Accordingly, it may be
necessary to include a larger initial sample size to compensate
for the loss of discriminatory power. As will be emphasized later
in this chapter, it is important to include all randomized patients
in the main analysis of a clinical trial. Therefore, every effort
should be made to determine the outcomes of both compliant and noncompliant
subjects.
++
Loss of some patients to follow-up is likely to occur in any
clinical trial. The greater the number of patients who are lost
and the less information available about them, the less confidence
can be placed in the results of the trial. In the analysis of results
from a clinical trial, patients should be
left in the treatment group originally assigned by the study (intention
to treat) even if they received one of the other treatments after
the original treatment regimen failed. In the diabetes therapy
study 95 women assigned to the standard therapy group received more intensive
therapy during a pregnancy. In the analysis, these women remained
in the standard therapy group. In another smaller subgroup of patients,
researchers discontinued intensive therapy and resumed standard
therapy. However, these patients remained in the intensive therapy
group during analyses. This may seem counterintuitive, as these
patients actually received some of both treatment regimens. The
purpose of this trial, however, was to help clinicians determine
the best treatment at the time of initial presentation for entry
into the trial. Subsequent treatment decisions may include the alternative
treatment, but those subsequent decisions have no bearing on the
original clinical question posed by the trial and thus are not pertinent
to group assignment. Design of the diabetes therapy trial with consideration
of intention to treat is illustrated schematically in Figure 7–4.
++
++
All participants who are randomized should be included in the
analysis of a clinical trial. Selective removal of subjects from
the comparison of outcomes, even if it seems justified for pragmatic
reasons, may lead to erroneous conclusions. Consider, for example,
the question of whether to include noncompliers in the analysis.
Because these individuals did not receive the assigned treatments
as intended, it may seem illogical to leave them in the analysis.
It has been shown, however, that noncompliers tend to have worse
outcomes than compliers, regardless of their treatment assignment.
If the treatment assignment affects the level of compliance, then
excluding noncompliance from the analysis can produce a misleading
result.
++
Removal of noncompliers from the analysis may also limit the
ability to generalize study findings to clinical practice. In recommending
treatment to a particular patient, the physician must consider the
possibility that the treatment will not be completed as intended.
The essential question of a clinical trial is whether a treatment
should be offered at a particular point in time. Therefore the relevant
information on treatment benefit is the outcome among all patients
who were offered the treatment, rather than just those who completed
it.
++
A well-reported clinical trial should contain enough of the primary
data to enable the reader (1) to compare the main outcome measure
between the treatment groups and (2) to perform basic statistical
tests to determine whether it is reasonable to exclude chance variation
as a cause of observed differences between the compared groups.
For the clinical trial, it is useful to review three very common
types of comparisons: the comparison of two risks, the comparison
of time to an event (survival analysis), and the comparison of two
means.
++
Many outcomes from a clinical trial are yes/no outcomes
(eg, death or no death, cure or no cure, recurrence or no recurrence)
and therefore can be displayed in simple tabular format. Whether
a patient developed retinopathy in the primary prevention group
in the diabetes therapy trial is presented in Table 7–10.
++
++
The incidence rate of developing retinopathy in each arm of the
primary treatment group was determined by dividing the number of
subjects who developed retinopathy by the number of person-years
of follow-up. Person-years are calculated
by summing the years that each subject is in the study prior to
the development of retinopathy. This allows all subjects to be included
in the results of a study, regardless of how long each subject was
enrolled. This issue is common to the clinical trial, as recruitment
of subjects into a trial often occurs over several years. The incidence
rate (IR) of retinopathy for the standard
therapy group was calculated to be 4.7 per 100 patient-years of follow-up,
and 1.2 per 100 patient-years of follow-up for the intensive therapy
group. There are several ways to compare these two rates.
++
One way to compare the rates is to calculate the percentage of
incidence of retinopathy that would be avoided if intensive therapy
were used instead of standard therapy. This percentage is known as
the percentage rate reduction and
is calculated as follows:
++
++
If the percentage rate reduction = 0, there is no reduction
in incidence rate attributable to the new therapy, and the treatments
are judged to be equivalent. The further the percentage rate reduction is
from zero, the greater the difference between the two groups. For
the diabetes therapy trial, the percentage of retinopathy that could
have been prevented by patients using intensive therapy rather than
standard therapy is calculated as follows:
++
++
That is, almost three fourths of the retinopathy that occurred
in the standard therapy group could have been avoided if those patients
had been treated with intensive therapy. The value of 74% is known
as a point estimate because it is
the single value along the scale from 0 to 100% that is
most consistent with the results of the trial.
++
A useful method to gauge the precision of any point estimate
is to calculate the 95% confidence intervals for the estimate.
If the clinical trial were repeated many times, the values falling
between the upper and lower bounds of the 95% confidence
interval would include the true point estimate value 95% of
the time. If the 95% confidence interval of the percentage
risk reduction includes 0, the data are consistent with the null
hypothesis, and the difference between the groups is not statistically
significant at an alpha level of 0.05. If the 95% confidence
interval does not include 0, the difference is statistically different
at an alpha level of 0.05.
++
The approximate 95% confidence interval (CI) for the percentage rate reduction
in the diabetes therapy trial calculated above is 60–83%.
Since the interval does not include 0, this decreased rate is considered
statistically significant at an alpha level of 0.05. This means
that given the observed data, if the trial were repeated often,
95% of the time the percentage rate reduction would fall
between 60% and 83%. The 95% confidence
interval for the percentage rate reduction for retinopathy is illustrated
schematically in Figure 7–5.
++
++

Another method of comparing two rates is by forming a ratio of
the rates, the so-called
rate ratio (
RR). For other situations, in which
risks rather than rates of events are estimated, the risks of end points
can be measured for the experimental and control groups, then contrasted
by dividing the risk of adverse events for the experimental group
by the risk of adverse events for the control group; this is the
risk ratio or
relative risk. If the
RR = 1.0, the rate (or risk)
of the outcome of interest in the two treatment groups is exactly
equal. The farther the ratio is from 1.0, the greater the difference
in rate (or risk) between the two groups. For this trial, the rate
of retinopathy in the intensive therapy group compared with the
standard therapy group would be calculated as follows:
++
++
That is, the rate of developing retinopathy in the intensive
treatment group is about one quarter that of the standard therapy
group. The value of 0.26 is another example of a point estimate, because
it is the single value along the rate ratio scale most consistent
with the observed data. Similar to the percentage rate reduction,
95% confidence limits for the RR point
estimate can also be calculated. If the 95% confidence
interval includes the null value of 1.0, there is no statistical
difference between rates in the two groups, and the null hypothesis would
be accepted. If the 95% confidence interval does not include
1.0, the difference in rates between the two groups would be considered
statistically significant at the 5% level. The point estimate
and 95% confidence interval for the rate ratio of retinopathy
in the diabetes therapy trial is illustrated schematically in Figure
7–6. Note that the point estimate of the RR does not lie at the midpoint of
the 95% confidence interval. The asymmetry of the confidence interval
occurs because the distribution of possible values of the RR is skewed toward the right (ie,
all the values corresponding to a benefit for intensive therapy
are compressed into the range of zero to 1, whereas the values corresponding
to a benefit for standard therapy are spread from 1 to positive
infinity).
++
++
The RR is a simple, easily understood
method to evaluate results of clinical trials. For time-to-event
data, however, survival analysis has several advantages over the RR. The survival curve, as described
in Chapter 2: Epidemiologic Measures, is a graphic presentation of time-to-event data. Since
it graphically depicts events as they occur over time, the survival
curve provides information on the rapidity with which events occur.
Furthermore, the survival curve can make use of data from patients
who are followed for varying lengths of time. For the diabetes therapy
trial, the cumulative risk of retinopathy was plotted over time
(Figure 7–7). Although this figure is not a display of
survival (life versus death), it does represent “survival” without
the occurrence of the event of interest, in this case retinopathy.
A patient who is followed for only 3 years provides useful information
on risk of developing retinopathy for that period of time, but would
provide no information pertinent to a comparison beyond 3 years.
Survival analysis also allows the median survival duration (eg,
time to retinopathy development) as well as the percentage of survivors
(eg, persons without retinopathy) to be estimated at any time along
the curve.
++
++
Time since first treatment is depicted along the horizontal axis,
and the percentage of patients with retinopathy is displayed on
the vertical axis. At the time of initial treatment (years = 0),
0% of the patients in each group have developed retinopathy
(100% have survived without retinopathy). As time from
treatment progresses, the percentage of patients who are diagnosed
with retinopathy increases, although more rapidly in the control
group. At the end of 9 years of follow-up, 14% of the patients
treated with intensive therapy have been diagnosed with retinopathy
compared with 55% of the standard therapy subjects.
++
The survival curves could be used to estimate the relative risk
of being diagnosed with retinopathy at any point in time. For example,
the intensive to conventional group relative risk of retinopathy
at 5 years is as follows:
++
++
This relative risk indicates that the intensive therapy subjects
have about 60% less risk than the conventional therapy
subjects of developing retinopathy within 5 years. Alternatively,
the median time to development of retinopathy for the two groups
can be estimated and contrasted. The median time is the point at
which half of an initial study group remains free of the occurrence
of interest. In this example, the estimated median time to development
of retinopathy for the standard therapy group is 8.5 years. The
median time to development of retinopathy for the intensive therapy
group has not been reached after 9 years of follow-up. That is,
patients treated with intensive therapy are developing retinopathy
at a slower rate than patients treated with standard therapy.
++
Several tests of significance can be used to compare survival
curves (see Dawson and Trapp, 2004). As is demonstrated by the diabetes
trial data, although this type of analysis is called survival analysis,
it is not limited to the analysis of deaths. Any event that occurs
over time (time to disease recurrence, time to return to work, etc)
can be compared in this fashion.
++
Another comparison frequently used in clinical trials is the
comparison of means. In the diabetes therapy trial, blood glucose
levels were measured at the onset of the study and at regular intervals thereafter.
In Table 7–4, one of the baseline characteristics compared
between the standard therapy and intensive therapy groups is mean
blood glucose level. It is possible to distribute patients into
several discrete categories based on blood glucose level (eg, <
140, 140–179, 180–239, 240 or greater), but that
would involve a loss of useful information about the actual observed
glucose measurements. Instead, the mean or average blood glucose
level can be compared. The null and alternative hypotheses for this
comparison would be stated as follows:
++
++
A test of the equality of two means can be accomplished by performing
a t test as follows:
++
++
where x1 and x2 are the observed mean blood
glucose levels of the standard and intensive treatment groups, respectively, sp is an estimate of the pooled
variance of the two means, and n1 and n2 are the sample sizes for
each group. To illustrate the use of a t test,
the observed mean blood glucose levels in patients treated with
intensive and conventional therapy can be compared as follows:
++
++
A t statistic of 22.8 for this sample
size corresponds to a p value of <
0.0005. In other words, there is less than a 0.05% chance
that a difference in means as large as that observed with these
sample sizes could have occurred by sampling variability alone. Accordingly,
the null hypothesis of no difference between the means is rejected.
If a study concludes that no difference exists between treatment
regimens, the amount of difference the authors thought was important
should be specified, as well as the likelihood that the study did
not find a difference due to chance alone. The likelihood that a
negative result is due to chance is the beta error; 1 minus the
beta error (1 – β) is the statistical power (see Sample
Size Determination).
++
In the preceding section, the analysis of a single clinical trial
was discussed. Rarely is a single study capable of providing the
definitive answer to a clinical question. The diabetes therapy trial discussed
in this chapter is one of the most important contributions to the
evidence on the value of intensive treatment in reducing the complications
of diabetes mellitus. The strengths of the study included its comparatively
large sample size and long duration of follow-up. The substantial reduction
in the risk of complications with intensive therapy suggests that
a clinically meaningful benefit can be obtained and it was shown
that this was unlikely to have occurred by chance alone.
++
However, the tight eligibility requirements for this study produced
a study population that was relatively young, healthy, and had few
minority participants. Whether these results apply to patients who
are older, sicker, or members of minority groups cannot be determined
from this study. To assess whether the results of this clinical
trial are applicable to patients who did not meet the eligibility
requirements of this study, it is useful to consider whether other
clinical trials have been conducted on this topic. If other clinical
trials exist, they may have included patients with characteristics
different from those in this investigation. Treatment effects then
can be compared across these studies. If the results of the various
studies show a consistent pattern of results, then it is possible
to be more confident that the benefits of intensive therapy are
not confined to a narrow subset of patients.
++

One approach to considering the results of multiple studies is
referred to as
meta-analysis. The
term meta-analysis refers to
a statistical
analysis that combines or integrates the results of several independent
clinical trials. A meta-analysis may be thought of as a special
type of
systematic review. A systematic
review is
any type of synthesis of evidence
on a topic that has been prepared using strategies to minimize errors. When
the results of individual studies in a systematic review are not
combined statistically, the product may be described as a qualitative
systematic review. A meta-analysis, on the other hand, is a systematic
review in which the results of two or more studies are combined
statistically.
++
A well-conducted meta-analysis offers several advantages over
other types of reviews. First, a meta-analysis allows the direct
presentation of the data on which the summary conclusions are made.
Accordingly, the results of a meta-analysis are likely to be more
data driven and objective than in other types of reviews. Second,
the statistical summation in a meta-analysis produces a quantitative
outcome measure and leads to precise estimates of effect. Third,
if outcomes vary across studies, it may be possible to develop explanations
for the pattern of variation in results, thereby gaining greater
insight into the clinical question.
++
In essence, a meta-analysis is a study of other studies. As with
any other type of investigation, a meta-analysis should be planned
in advance and should follow a research protocol. The research plan
should specify the hypotheses, the sampling strategy, the inclusion
criteria, the data to be collected, and the approach to analysis
of the information. At each of these steps, strategic decisions must
be made that will impact the final product.
++
To illustrate this point, consider the seemingly straightforward
issue of selecting the studies for inclusion. If interest is centered
on clinical trials concerning intensive therapy for diabetes mellitus,
a search of the published literature on this topic seems to be the
logical sampling strategy. Because not all studies are published,
the published literature may not yield a fair representation of
all available information on the topic. It has been shown that studies
with statistically significant findings are more likely to be published
than studies with negative results. Also, studies sponsored by governments
or nonprofit organizations are more likely to be published than
those sponsored by the private sector (presumably because of the
desire to protect proprietary information). Additional sampling
distortion can result from the fact that papers published in certain influential
journals are more likely to be cited, and therefore are easier to
identify than works published elsewhere. Also, some studies result
in multiple publications, which makes them easier to detect. In
some instances, it may not even be possible for the meta-analyst
to determine that two separate published papers arise from the same
source clinical trial population. For pragmatic reasons, a meta-analysis
may be confined to English language publications, which selects
for more widely read journals and may result in an overrepresentation
of studies with positive results.
++
Thus, the published literature on a topic may selectively exclude
information that would affect the ultimate conclusion. This potential
source of error is sometimes referred to as publication
bias. Attempts can be made to identify and include unpublished
studies in a meta-analysis, but there are obvious difficulties in
locating such information and securing access to it. One approach
to finding unpublished studies is to search for relevant investigations
in a registry of clinical trials. Since studies are registered before
they are completed, their inclusion in a registry is less likely
to be influenced by whether the results were positive. If both published
and unpublished studies are included, it is possible to analyze
the results separately. An apparent difference in results between published
and unpublished studies suggests the possibility of a publication
bias.
++
Once the universe of clinical trials on a topic is identified,
the meta-analyst must decide which specific studies to include in
the study. This process is analogous to the need to establish eligibility
criteria for enrolling patients in a clinical trial. For a meta-analysis,
the decision about whether to include a particular study typically
is based on (1) an assessment of its quality and (2) the ability
to combine it with other studies based on respective patient populations,
treatment regimens, and outcomes. Judging quality can be a subjective
process, so it is advisable to limit inclusion criteria to basic
features of a well-designed clinical trial. Examples of such inclusion
criteria might include
++
- 1. Proper randomization
of treatment assignments
- 2. Blinded assessment of outcome
- 3. Analysis based on the intention-to-treat
principle.
++
Once the eligible source studies are selected, a standard abstract
form should be used to extract key information. Independent abstraction
of information by two reviewers is a useful practice to help reduce
errors in data collection. Blinding of the abstractors to the identity
of the original investigators, their affiliations, sources of support,
and publication may help to reduce the potential of observation
bias.
++
The outcome must be measured in a similar way across studies.
For dichotomous outcomes (eg, the development of retinopathy or
not), a rate ratio or relative risk may be the appropriate measure.
For continuous outcomes (such as blood glucose level), a difference
in means for the experimental and control groups may be employed.
It should be noted, however, that the magnitude of a mean difference
is affected by the distribution of the underlying population. For
example, the mean difference in blood glucose levels between experimental
and control subjects would be expected to be greater in a study
population of diabetics with higher initial blood glucose levels than
in a study population with lower initial levels. Accordingly, differences
are often presented in units of standard deviation, thereby adjusting
for the underlying distribution of values.
++
At the heart of a meta-analysis is the statistical combination
of results from the individual clinical trials. The simplest approach
to this combination would be to calculate the arithmetic average (mean)
of the individual results. The simple mean of individual results
would give equal weight to each of the studies. Because studies
with smaller sample sizes are more prone to the effects of chance
variation than studies with larger sample sizes, it is desirable
to assign less influence to the smaller studies. By calculating
a weighted average (in contrast to
a simple or unweighted average) of results, more emphasis can be
placed on the most statistically precise individual trial findings.
++
The statistical models used to combine results fall into two
broad categories. The so-called fixed-effects
model assumes that any differences in results across individual
studies are attributable entirely to random variation. In contrast,
the random-effects model assumes that
the underlying relationship in question varies across studies in
addition to the influences of random variation. When the results
of the individual clinical trials being summarized are similar,
minimal differences between the fixed- and random-effects models
will be obtained. In general, however, the random-effects model
will lead to somewhat less precise summary estimates, as reflected
in wider confidence intervals. When the results of the underlying
clinical trials vary widely, they are said to be heterogeneous. Under conditions of
heterogeneity, the summary estimates obtained through the fixed-
and random-effects models will differ, possibly to a considerable
extent. It is important, therefore, to be able to determine whether
the results of individual clinical trials are heterogeneous.
++
One approach to addressing the question of whether the variation
across individual study results can be attributed to random variation
alone is to perform a statistical test for heterogeneity. A statistically
significant result for a test of heterogeneity would suggest that
the variation across studies is greater than can be attributed to
random variation alone. In such instances, therefore, a random-effects
model would be the preferred approach to summarizing the data. On
the other hand, if the test of heterogeneity is not statistically
significant, the level of variation observed across individual study
results can be explained by random variation. In such circumstances,
a fixed-effects model would be an acceptable approach to summarizing
the data.
++
Whether the question of heterogeneity of results should be addressed
only, or even primarily, by a test of statistical significance is
a matter of some debate. Because the tests of heterogeneity have limited
statistical power, they are prone to type II errors, that is, they
may lead to false-negative conclusions about the presence of heterogeneity
in the underlying data. A statistically nonsignificant result may
falsely reassure the meta-analyst (and the reader) that there is
no evidence of heterogeneity in the underlying data. It is recommended,
therefore, that the question of heterogeneity of results across
studies be explored further during the course of the meta-analysis.
++
Heterogeneity of findings across studies can arise for a number
of reasons in addition to random variation. The characteristics
of the individual patient populations may be an important contributor
to differences in observed effects. As previously noted, eligibility
and exclusion criteria for a particular clinical trial may impose
limitations on the selection of subjects based on age, sex, race, general
health status, and other characteristics. To the extent that eligibility
requirements vary across studies, the source populations may differ
in fundamental ways that affect their likelihood of response to
the treatment under investigation.
++
Aspects of the design of a clinical trial, for example, whether
blinding was performed, the rate of compliance with treatment, the
duration of follow-up, and the completeness of follow-up, may influence
the results obtained. Approaches to analysis of individual study
results, such as adhering to the intention-to-treat principle, can
also impact the results.
++
Whether a clinical trial is stopped early is likely to be related
to the findings. In general, clinical trials are not terminated
prematurely unless an unexpectedly large difference in outcomes
is observed between the experimental and control group. Such a difference
could arise because the experimental treatment is much more effective
than the control treatment (placebo or standard therapy). Alternatively,
the complications or side effects of treatment may be greater in
one treatment group than the other, causing the investigators to
terminate the clinical trial early. In either case, the results
of prematurely concluded trials are likely to differ from those
trials brought to the originally intended conclusion.
++
Given all of the potential sources of heterogeneity, it is not
surprising that variation in results across the selected studies
is a common occurrence in meta-analyses. At one level, this variability is
disconcerting because consistency of results across studies is a
useful approach to establishing that an observed treatment effect
is real. The ability to reproduce findings across independent investigators,
study populations, and settings increases confidence in the inferences
drawn concerning the effectiveness of the experimental treatment.
Heterogeneity in results, particularly if it cannot be explained,
raises uncertainty in the inferences that can be drawn from a meta-analysis.
++

An important approach to exploring heterogeneity is referred
to as
sensitivity analysis. In this
context, a sensitivity analysis is an exploration of the summary
results within subsets of the studies under review. If subgroups
of studies can be identified that yield consistent results, which
in turn differ from the results of other subgroups, some insight
may be gained concerning the overall sources of heterogeneity. For
example, if a meta-analysis includes both published and unpublished
clinical trials, it might be useful to summarize the results of
the published studies separately from the results of the unpublished
studies. A larger treatment effect in the published studies would
support the suspicion of a publication bias. Analysis of results
according to the sample size of clinical trials also might provide
evidence of a publication bias. Large studies have greater statistical
power than small studies, and therefore are better able to detect
small treatment effects. Because studies with statistically significant
results presumably are more likely to be published than other studies,
weaker treatment effects in large trials compared with small trials
is a potential indication of a publication bias. That is to say,
small studies with weak effects were less likely to be published
and therefore were less available for meta-analysis. Similarly,
subgroup analysis may reveal a stronger treatment effect among clinical
trials that were stopped early. Through these types of subgroup
analyses, nonrandom patterns of variation in study results may be
detected and explained. In this sense, a sensitivity analysis can
reveal subgroups of studies with consistent results, and thereby
provide greater insight into the true effects of the treatment under
study. However, a sensitivity analysis must be interpreted with
caution, as it is possible that the patterns observed might have
arisen by chance. Overinterpretation of subgroup analyses can lead
to erroneous conclusions.
++
It is also important to recognize circumstances in which a meta-analysis
may not be advisable. For some questions of clinical interest, there
may be an insufficient number of trials available to yield a conclusive
summary. Even when a large number of trials on a topic are available,
fundamental differences in their study populations, design, and
outcome measures may make it unwise to attempt to combine results.
Also, when there is considerable heterogeneity of treatment effect across
studies that cannot be explained by a sensitivity analysis, it may
be imprudent to calculate a weighted average of the individual study
results.
++
In summary, meta-analysis is a powerful tool to help assess the
cumulative evidence on the effectiveness of a particular treatment.
By combining information across studies, it can lead to a statistically
precise and objective estimate of the effect of interest, and it
allows the consistency of findings across studies to be considered.
On the other hand, the quality of the design and data of the original
clinical trials, which is beyond the control of the meta-analyst,
will affect the quality of the meta-analysis. Decisions made by
the meta-analyst, such as which clinical trials to include and how
the summary analyses are performed, will also affect the quality
of the meta-analysis. Ultimately, the value of a meta-analysis will
rest on its ability to help make informed treatment decisions. We
will return to the topic of meta-analyses and systematic reviews
in Chapter 13: Interpretation of
Epidemiologic Literature in the context of discussing the interpretation of
epidemiologic literature.
+++
The Cochrane
Collaboration
++
The Cochrane Collaboration is an international organization whose
goal is to help people make well-informed decisions about health
care. It addresses this goal by preparing, maintaining, and ensuring
accessibility to current, rigorous systematic reviews of the benefits
and risks of health care interventions. This organization was founded
at a meeting in Oxford, England in late 1993. It is named in memory
of Archie Cochrane (1909–1988), a physician epidemiologist
who was a pioneer in the field of health services research. Cochrane
was noted for his strong belief that health care decisions should
be based on a critical synthesis of well-designed clinical trials
of treatment effectiveness.
++
The Cochrane Collaboration involves six organizational units:
Collaborative Review Groups, Fields, Centers, Methods Working Groups,
Consumer Networks, and a Steering Committee. The core work of the
organization is the preparation of systematic reviews, which are
conducted by the Collaborative Review Groups. Each Review Group
is comprised of individuals who share interests in particular health
problems. The results of the systematic reviews are incorporated
in the Cochrane Library, an electronic repository of evidence needed
to make informed health care decisions. The Library was established
in 1995 and is updated quarterly. It contains the following resources:
++
- 1. The Cochrane Database
of Systematic Reviews—a regularly updated collection of
completed systematic reviews and protocols of systematic reviews
that are in progress.
- 2. The Database of Abstracts
of Reviews of Effectiveness—an inventory of more than 2000
reviews that were completed outside of the Cochrane Collaboration.
- 3. The Cochrane Controlled
Trials Register—a database that by 2003 contained citations
for nearly 400,000 controlled trials identified by contributors
to the Collaboration.
- 4. The Cochrane Review Methodology
Database—a reference list of articles on the science of
synthesizing evidence and the practical aspects of preparing systematic
reviews.
++
Over its relatively short history, the Cochrane Collaboration
has already produced a remarkable body of information. By 2003,
1837 systematic reviews were published. In spite of this impressive
progress, the Collaboration faces many challenges. First, although
the goal of the Collaboration is to produce reviews across the full
spectrum of health care, it is dependent on the interests of persons
who volunteer to prepare systematic reviews. Second, the use of
such a wide range of reviewers of various backgrounds and skill
levels makes it difficult to ensure a uniform high standard of work
quality. Third, availability of the information in the Cochrane
Library is not yet universal. Barriers to its availability include
lack of knowledge about its existence and the subscription cost.
However, with the progress already achieved by the Cochrane Collaboration, there
is reason for optimism that this organization will play an increasingly
important role in promoting well-informed health care decisions.