++
The concept of validity concerns the
degree to which a measurement or study reaches a correct conclusion. A
measurement or study may lead to an incorrect (invalid) conclusion
because of the effects of
bias. The
variability seen with bias is systematic or nonrandom and distorts
the estimated effect. In
Figure 10–1, the amount of bias
can be determined by the degree to which the shots are off target in
D. Unfortunately, in medical research
the truth (bull’s-eye) may not be known, or there may be
no “gold standard” for comparison. Consequently,
the degree of bias often is difficult to determine. Two different
types of validity, internal validity and external validity, are
described in this chapter.
++
Internal validity is the extent to which
the results of an investigation accurately reflect the true situation
of the study population. If the results are not valid in the
study population, there is little reason to suspect that those results
will apply to other populations. Internal validity is defined by
the boundaries of the study itself. Therefore, a study is internally
valid if it provides a true estimate of effect, given the limits of
the population studied. Measures that can be used to improve internal
validity often involve restricting the type of subjects and the
environment in which the study is performed. These measures decrease
the impact of factors extraneous to the question of interest.
++
A result obtained in a tightly controlled environment, however,
may not be applicable to more general situations. External validity is the extent to which
the results of a study are applicable to other populations. External
validity addresses the following question: Do these results apply
to other patients, such as patients who are older, sicker, or less
economically advantaged than subjects in the study?
++
External validity often is of particular interest to clinicians
who must decide if a research finding is applicable to their clinical
practice. Determining whether the results of a study can be generalized
involves a judgment regarding the following:
++
- 1. The type of subjects
included in the investigation
- 2. The type of patients seen
by the clinician
- 3. Whether there are clinically
meaningful differences between the study population and other populations.
++
An example of the kind of difficulty that can occur when study
results are generalized is the criticism that too many clinical
studies focus on white males. One such study is the Lipid Research Clinics-Primary
Prevention Trial, which demonstrated a significant reduction in
cardiovascular mortality for white men aged 35–59 years
with hypercholesterolemia who were placed on a cholesterol-lowering
diet and medication. Do the results also apply to women? Do they
apply to men of different ages, races, or with different, but still
abnormal, serum cholesterol levels? These questions led to the suggestion
that federally funded research should include women, minorities,
and children in the study populations.
++
Bias is a systematic error in a study
that leads to a distortion of the results. Bias, a threat to
validity, can occur in any research, but is of particular concern
in observational studies because the lack of randomization increases
the chance that study groups will differ with respect to important
characteristics. Bias often is subdivided into different categories,
based on how bias enters the study. The most common classification
divides bias into three categories:
++
- 1.Selection bias
- 2.Information bias
- 3. Confounding.
++
Although these categories overlap, this classification is useful
because it provides the reader with a systematic approach to evaluate
bias. It should be remembered that with the exception of confounding,
which can be quantitated, the evaluation of bias is subjective and
involves a judgment regarding the likelihood of (1) the presence
of bias and (2) its direction and potential magnitude of effect
on the results. Even though the magnitude of bias cannot be quantified,
often its influence on the results of a study can be inferred. It
is important to discern whether the suspected bias is likely to
make an association appear stronger or weaker than it really is.
Overestimation of a risk ratio for a protective exposure and a separate
hazardous exposure is demonstrated schematically in Figure 10–4.
Underestimation of a risk ratio for a protective exposure and a
hazardous exposure is shown in Figure 10–5.
++
++
++
A variety of procedures can be used to select subjects for a
study. Usually, it is not possible to include all individuals with
a particular disease or exposure in a study, so a sample of subjects must
be chosen. The procedures used for the selection of subjects depend
on a number of factors, including
++
- 1. The design of the investigation
- 2. The setting of the study
- 3. The disease and exposure
of interest.
++
Often subjects are selected in a manner that is convenient for
the investigator. Under optimal circumstances, the method for inclusion
of subjects leads to a valid comparison that, in turn, yields correct
information regarding a disease process or treatment.
++

The selection process itself, however, may increase or decrease
the chance that a relationship between the exposure and disease
of interest will be detected, creating a
selection
bias. A schematic diagram of the steps involved in recruiting
and maintaining a study population is shown in
Figure 10–6.
From this diagram, it is easy to see that selection factors could
lead to biased results at several different steps in the process.
++
++
Some aspects of the selection of subjects lead primarily to problems
with the generalization (extrapolation) of the results (ie, external
validity). Subjects must agree to participate in a study, and this
causes one of the most common problems. Volunteers for a study may
differ from individuals who do not volunteer in various characteristics
such as age, race, economic status, education level, and sex. Moreover,
volunteers may be healthier than those who decline to participate. A
study of a population limited to individuals who are employed may
also make it difficult to generalize the results, because people
who work are generally healthier than those who do not. A comparison
of health outcomes between workers and the general population may
show that the workers have a more favorable outcome simply because
they are healthy enough to be employed (the “healthy worker” effect).
++
Referral of patients to clinical facilities can also lead to
distorted study conclusions. Selective referral patterns can be
seen in the study of children with febrile seizures. Febrile seizures
are brief, generalized seizures that occur in conjunction with elevation
in temperature in children aged 6 months to 6 years. There is some
disagreement about whether these febrile convulsions are predictive
of future seizures and other unfavorable neurologic sequelae. Ellenberg
and Nelson (1980) compared the results of a number of studies on
the long-term outcome of patients with febrile seizures. Studies
of geographically defined populations in which affected children
were followed, regardless of whether medical care was sought, consistently
revealed a relatively low rate of unfavorable sequelae. Clinic-based
studies tended to report a high frequency of adverse outcomes. Accordingly,
it was concluded that clinic-based studies selectively included
children at the more severe end of the clinical spectrum. The inferences
that might be drawn regarding the prognosis of a child with febrile
seizures might be very different based on whether a clinic-based or
a population-based sample was studied.
++
Other aspects of the selection process can diminish internal
validity. In a clinical trial or cohort
study, the major potential selection bias is loss to follow-up. Once
subjects are enrolled in the study, they may decide to discontinue
participation. Certain types of subjects are more likely than others
to drop out of a study. Furthermore, during the course of the study
some subjects may die from causes other than the outcome of interest.
At first glance, these losses may not appear to be related to selection
because the subject already was enrolled in the study. If the lost
subjects differ, however, in their risk of the outcome of interest,
biased estimates of risk may be obtained.
++
If the unrecognized early manifestations of the disease of interest
cause exposed persons to leave the study more or less frequently
than unexposed persons, a distorted conclusion might be reached.
For example, in a randomized controlled trial of the effects of
using a cholesterol-lowering drug versus diet therapy on prevention
of myocardial infarctions, bias might be introduced if drug-treated
patients with coronary insufficiency were more likely to develop
side effects from treatment and withdrew from participation, whereas
patients with coronary insufficiency receiving dietary therapy remained
in the study.
++
Selection bias is of particular importance in case– control
studies (see Chapter 9: Case–Control Studies) in which the investigator must select two
study groups, cases and controls, in a setting in which the exposure has
already occurred. For example, it must be decided whether to use
existing (prevalent) cases who are available at the time of study,
regardless of the duration of their disease, or to limit eligibility
to newly diagnosed (incident) cases. If the risk factor of interest
also is a prognostic factor, the use of prevalent cases can lead
to a biased conclusion. Consider, for example, a case–control study
of total serum cholesterol level as a risk factor for developing
myocardial infarction. Suppose that of patients who have a myocardial
infarction those with very high total serum cholesterol levels are
more likely to die suddenly than those with lower serum cholesterol
levels. Under these circumstances, a comparison of patients surviving
myocardial infarction with controls will underestimate the true
association between elevation in total serum cholesterol level and
risk of developing myocardial infarction.
++
Another potential type of selection bias can occur when a case–control
study involves subjects who are hospitalized. Patients with two
medical conditions are more likely to be hospitalized than those
with a single disease. Thus, a hospital-based case–control
study might find a link between two diseases or between an exposure
and a disease when there is no association between them in the general
population. This type of bias, often called Berkson’s bias,
was demonstrated in a study that showed that respiratory and bone
diseases were associated in a sample of hospitalized patients but
not in the general population. Thus, in a hospital-based study,
an exposure such as cigarette smoking, which is correlated with
respiratory disease, may also appear to occur together with bone
disease because those diseases are related in hospitalized patients.
++
Information (or misclassification) bias
can occur when there is random or systematic inaccuracy in measurement. This
can be visualized best in epidemiological studies that involve dichotomous
exposure and disease variables, such as elevated total serum cholesterol
and myocardial infarction. Subjects are classified according to
whether they have had high total serum cholesterol levels and whether they
have had a myocardial infarction. The investigator either can be
correct or incorrect, resulting in true-positive and true-negative
findings, as well as false-positive and false-negative classifications
of subjects with respect to either exposure or disease.
++

If the errors in classification of exposure or disease status
are independent of the level of the other variable, then the misclassification
is termed
nondifferential. Nondifferential
misclassification may occur in a case–control study if
the subject’s memory of exposure status is unrelated to
whether the subject has the disease of interest. An example of nondifferential
misclassification is sometimes referred to as unacceptability bias.
Subjects may answer a question about the exposure with a socially
acceptable but sometimes inaccurate response, regardless of whether
they have the disease of interest. Consider a case–control
study of myocardial infarction in which the exposure of interest
is prior intake of foods high in saturated fats. Regardless of disease
status, respondents may underreport intake of foods with high fat
content because they think low-fat diets are more acceptable to
the investigator. In most instances, when nondifferential misclassification
occurs, it blurs differences between the study groups, making it
more difficult for the investigator to detect a real association
between the exposure and the disease. This is often referred to
as a bias toward the null hypothesis or toward no association.
++
Differential misclassification occurs
when the misclassification of one variable depends on the status
of the other. In a case–control study, this type of misclassification
could occur if the information on exposure status depends on whether
the subject has the disease. If a case with a myocardial infarction
is more likely to overestimate the level of dietary fat intake than
a control subject, a biased result may occur. In this instance,
the bias would lead to an overestimate of the relationship between
dietary fat intake and risk of developing myocardial infarction.
++
The difference between nondifferential and differential misclassification
can be demonstrated by examining the data in Figure 10–7.
Consider a case–control study of the relationship between high-fat
diets and risk of developing myocardial infarction in which the
true odds ratio (OR) is 2.3. With nondifferential
misclassification, the subjects did not recall the amount of fatty foods
eaten, but the errors in recall did not depend on whether they had
a myocardial infarction. In this situation, 20% of both
cases and controls who ate high-fat diets underreported fat intake.
The resulting OR of 2.0 was an underestimate
of the true OR. On the other hand,
if all the patients who had a myocardial infarction correctly recalled
their dietary fat exposure status, but only 80% of the
exposed controls correctly reported their exposure, then differential
misclassification would occur. This type of misclassification can
result in either an underestimate or overestimate of the true OR. In this example, the investigator
overestimated the OR.
++
++
Two common types of differential information bias are often referred
to as recall bias and interviewer bias. Recall bias results
from differential ability of subjects to remember previous activities
and exposures. Patients who have a serious disease may search their
memory for an exposure in an attempt to explain or to understand
why they acquired the illness. Control subjects, who do not have
the disease, may be less likely to remember an exposure because
it has less meaning and is less important for them.
++
When interviewers are employed to determine exposures in case–control
studies, results may be influenced by how the interviewers collect
information. If they are aware of the research hypothesis, the interviewers
intentionally or unintentionally may influence the responses of
the subjects. They may probe more deeply for responses from cases
than from controls. If a dietary exposure is examined, the interviewers
may ask certain subjects specific questions about particular food items.
Interviewers may also give the subjects subtle clues by tone of
voice or body language that suggest a preference for certain responses.
Generally, it is desirable to blind the interviewers to the research
hypothesis under investigation. In a case–control study,
however, it may be difficult to blind the interviewers to the disease
status of cases and controls. Nevertheless, if the interviewers
are not aware of the exposure of primary interest, biased data collection
still can be minimized.
++
As a way to reduce misclassification and to improve accuracy
of study measurements, investigators increasingly are using biological markers. As shown in Table
10–3, these markers can measure many facets of disease
and exposure—or the relationship between the two. For example,
biological markers can measure
++
- 1. Susceptibility (biological
markers can be used to identify subjects with particularly high
risk due to a particular biological predisposition)
- 2. Internal dose (biological
markers can be used to measure the amount of a chemical or other
exposure in the body)
- 3. Biologically effective dose
(biological markers can be used to measure the amount of a substance that
reaches the target sites)
- 4. Biological effect (biological
markers can be used to quantify a deleterious effect of a particular exposure).
++
++
Biological markers are used in most substantive areas of investigation,
including nutritional, cardiovascular, reproductive, cancer, and
infectious disease epidemiology. Use of biological markers is important
in observational studies for several reasons. These markers are
important methodologically because they can serve to reduce misclassification
by allowing more accurate assessment of exposure or disease status.
Furthermore, they may allow the investigator to define more homogeneous
disease categories or to identify susceptible subjects, so that
the study can focus on specific subgroups. Finally, biological markers
can help provide insight into the underlying disease process and
pathogenesis.
++
The use of levels of serum dioxin to measure exposure of men
who worked with the herbicide Agent Orange during the Vietnam War
illustrates the use of a biological marker to measure internal dose.
After the Vietnam War, concern arose about wartime exposures of
servicemen to Agent Orange, in part because of its contamination
with the highly toxic trace contaminant known as 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). Because of this concern,
Air Force researchers began epidemiologic studies to assess the
health effects among Air Force veterans associated with exposure
to Agent Orange and TCDD. Researchers initially used job descriptions
to classify exposure to TCDD. Later, after laboratory techniques
became available to measure minute concentrations of TCDD within
the blood, the researchers discovered that classification of exposure
based on job descriptions was associated with substantial misclassification.
In subsequent studies, the more accurate serum TCDD measurements
were used to assess exposures.
++
Despite the importance of biological markers and the possibility
that their use may reduce information bias, they do not eliminate
the possibility of systematic errors. Although it may be diminished
by employing a biological marker, misclassification remains a possibility.
For example, marker instability and inter- or intraindividual variability
can contribute to measurement errors. Moreover, if required biological
specimens are collected after disease occurrence, as often happens
in case–control studies, the presence of disease in cases
may affect the biological marker. This possibility can make the
biological marker particularly susceptible to differential misclassification
and measurement error. Bias can even be created if the investigator
adjusts inappropriately for a factor that is caused by the exposure
of interest and is associated with the outcome.
++
Case–control studies of the relationship of β-carotene
and cancer illustrate the potential for residual information bias.
β-Carotene is a fat-soluble antioxidant found in many fruits and
vegetables. It acts as a provitamin (vitamin A), protects against
development of cancer in animals, and may reduce the risk of developing
cancer in humans. In a case–control study of serum levels
of this antioxidant, differential misclassification could create
or accentuate a protective effect, if cases with advanced cancer
had altered nutritional status and a resulting lowering of β-carotene
levels. Although these biases are somewhat speculative, the potential
for bias in case–control studies is evident.
++
Thus, use of biological markers offers many advantages, particularly
an improved assessment of exposure and a more homogeneous definition
of disease. Nevertheless, because use of these markers does not
eliminate the possibility of information bias, caution in interpretation
is still warranted.
++
Confounding refers to the mixing
of the effect of an extraneous variable with the effects of the
exposure and disease of interest. Confounding can be demonstrated
by the following hypothetical example. Suppose investigators undertake
a case–control study of the association between high total
serum cholesterol level and risk of developing myocardial infarction.
From the results of other studies, the researchers know that the
risk of myocardial infarction is associated with obesity, and that total
cholesterol levels also correlate with obesity (see
Figure 10–8).
Suppose that in our hypothetical case–control study, 36
of 60 patients with myocardial infarction (60%) are found
to have high total serum cholesterol levels, and only 24 of 60 controls
(40%) are discovered to have elevated serum cholesterol
levels. This would suggest that elevated total serum cholesterol
levels are associated with an increased risk of developing myocardial
infarction.
++
++
When the observed association is examined separately in obese
and nonobese persons, however, a different conclusion is reached.
Among obese persons, 34 of 40 patients with myocardial infarction
(85%) and 18 of 20 controls (90%) are found to
have elevated total serum cholesterol levels. Among nonobese persons,
2 of 20 patients with myocardial infarction (10%) and 6
of 40 controls (15%) have high total serum cholesterol
levels. Thus in the case of both obese and nonobese individuals,
elevated total serum cholesterol levels are more common in controls
than in patients with myocardial infarction. Keep in mind that in
the hypothetical study, obesity was associated with myocardial infarction,
since 52 of 60 obese subjects (87%) had elevated total
serum cholesterol levels, and only 8 of 60 nonobese persons (13%)
had high total serum cholesterol levels. Clearly, in this hypothetical
example, the results are confounded by the extraneous variable,
obesity. The results are illustrated in Figure 10–9.
++
++
For a variable—in this case, obesity—to be
considered a potential confounder, it must satisfy two conditions:
++
- 1. Association with the
disease of interest in the absence of exposure
- 2. Association with the exposure
but not as a result of being exposed.
++
Because it can be evaluated in the analysis of results, confounding
differs from selection bias and information bias. The presence of
confounding is demonstrated by a change in the apparent strength
of association between the exposure and the disease of interest
when the effects of extraneous variables are taken into account.
Confounding, which is not an all-or-none property of an extraneous
variable, may occur to different degrees in different studies.
++
Generally, the list of potential confounders in a study is limited
to established risk factors for the disease of interest. There are
two accepted methods for dealing with potential confounders. The first
is to consider them in the design of the study by matching on the
potential confounder or by restricting the sample to limited levels
of the potential confounder. The other method is to evaluate confounding
in the analysis by stratification, as demonstrated schematically
in Figure 10–9, or by using multivariate analysis techniques
such as multiple logistic regression.
++
The goal of any epidemiologic study is to provide a valid conclusion.
To accomplish this objective, complete attention must be given to
all aspects of the study, from inception to design and data collection,
and finally to analysis and reporting of results. It is important
to remember that bias can be introduced at any of these stages,
leading to erroneous results. Thus it is useful to look carefully
for potential sources of bias and to consider their possible impact.
Clinicians must judge whether results can be generalized to their
particular practice. Understanding the potential problems with measurement
and bias in medical research improves the ability of physicians
to decide on appropriate preventive and therapeutic strategies.