In this section, several aspects of case–control design
are discussed, including sources of cases, sources of controls,
and collection of information.
One of the first steps in a case–control study is to
identify and select cases—a step that also determines the
source population. Case identification should be complete, and the
source population—the population from which cases arise—should
be well defined. For example, cases might be sampled at random from
all patients who are diagnosed with EMS during the study period
and who reside within a certain geographic region, such as a state
of the United States, or from all cases that occur among subscribers
to a health maintenance organization. The source population consists
of state residents in the first instance, and subscribers to the
health maintenance organization in the second instance. In the previously
cited study of EMS, the source population consisted of residents
of the metropolitan area of Minneapolis–St. Paul, Minnesota.
These cases may be identified by a surveillance system or by reviewing
hospital records, other medical records, or death certificates available
through institutional or population-based disease registries.
In some situations, complete identification of cases in a well-defined
source population may be too time consuming or otherwise infeasible.
If so, a common alternative involves use of a “convenience
sample.” Cases might be sampled from patients admitted
to particular hospitals or from those seen in certain clinics. Although
such cases can often be identified easily, the underlying source
population may not be well defined, thus making it difficult to
generalize results confidently.
The investigator typically studies newly diagnosed or incident cases,
although it is sometimes
necessary to include previously existing or prevalent
Prevalent cases should be excluded primarily because
the exposure may affect the prognosis or the duration of the illness.
When this effect occurs, the exposure status of existing prevalent
cases tends to differ from that of all cases. For example, suppose
that prior use of l
prevents death or prolongs the duration of EMS. Prevalent cases
of EMS might then have a higher reported use of l
than would all cases with this disease. Consequently, a case–control
comparison of use of l
tend to be distorted by an inflated estimate of use for cases. The
general principle involved is that the likelihood of a case being
included in the study must not depend on whether that case was exposed
to the risk factor of interest.
Another important step in designing a case–control study
is to specify the definition of a case. The criteria should minimize
the likelihood that an affected person (true case) is missed (ie,
the criteria must be sensitive) or
that a nonaffected person is falsely classified as a case (ie, the
criteria must be specific). In general,
there is a trade-off between the desire to include all cases (particularly
when the disease is extremely rare, as is EMS) and the desire to
prevent dilution of the case group with nonaffected persons. Moreover,
restrictive criteria may require information that is unavailable
for some subjects, making it impossible for such subjects to be
classified fully. In practice, inclusion criteria are chosen to
minimize misclassification yet promote feasibility. For example,
in the previously cited study of EMS in Minneapolis–St.
Paul, cases met specific criteria including the following: elevated
eosinophil counts, myalgia or muscle weakness, and residence in
the study area.
The next key step in a case–control study is to identify
and select controls. Ideally, controls are chosen at random from
the source population. If the source population is a state, city,
or other well-defined area, controls in that area might be contacted
by dialing telephone numbers at random (random-digit dialing), by
visiting residences, by mailing letters soliciting participation,
or by other means. An important goal is to select controls so that
participation does not depend on exposure.
That is to say, the sample of controls should have the same prevalence
of exposure as the source population of unaffected persons. If participation
does depend on exposure, the case–control comparison may
be distorted. In the previously cited study of EMS, the investigators
selected controls by random-digit dialing in the Minneapolis–St.
Paul area (the source population). Because the population of Minneapolis–St.
Paul has fairly complete telephone coverage, this approach to selecting
controls is unlikely to be influenced by use of l
(the exposure) or, among users, by the manufacturer of l
-tryptophan. Accordingly, within the
control group selected by random-digit dialing, the manufacturer
-tryptophan should be comparable
among users to that of the source population.
Once cases and controls are selected, the next step is to obtain
as accurate information as possible about each individual’s
prior exposure to the risk factor of interest, as well as to other
exposures. The information concerning other exposures is used to
determine whether association of disease with a risk factor is due
to the exposure of interest or to other characteristics of exposed
persons. Because factors cannot affect risk after the disease occurs,
the timing of exposures is critical. With slowly developing diseases
that lack early evidence of involvement, establishing the temporal sequence
of exposure and onset of disease can be difficult or impossible.
Interviews and questionnaires are the most common means of determining
a subject’s exposure history. Interviews can be conducted
in person or by telephone. To ensure that information from cases
and controls is obtained in the same manner, interviews should be
standardized, monitored, and conducted by trained interviewers.
Interviews are useful for collecting data because (1) questions
may cover a wide range of potential risk factors, (2) costs are
relatively low, and (3) information can be obtained on exposures
that occurred years prior to the onset of illness. Occasionally,
there is concern that cases and controls may recall exposures differently,
perhaps distorting case–control comparisons. For example,
cases—perhaps in an attempt to explain their illnesses—may
overreport exposures. This is of particular concern when there has
been a great deal of publicity about the association between the
exposure and the disease of interest. For instance, after the association
of l-tryptophan with EMS was first identified
and publicized, knowledge of this association could have affected
the reported exposures of cases in subsequent investigations.
To minimize problems associated with subject recall, attempts
can be made to verify exposures through other methods. In the context
of the association between use of l-tryptophan
and development of EMS, for example, the interviewer might request
that the subject produce the l-tryptophan
package. By inspecting the package, the interviewer can confirm
that it was opened (and therefore the product presumably was used);
the manufacturer and the lot number can also be identified.
Information concerning risk factors may also be obtained from
medical, occupational, or other records. These methods of obtaining
information are not based on self-reporting and consequently should
avoid the reporting bias that may occur when information is obtained
through interviews. However, the amount of information found in
records is often limited, so that all of the data of interest may
not be available. Furthermore, this information may not be recorded
in a standardized manner, leading to variability in subject classification.
An objective means of characterizing exposure is through the
use of a biological marker, such as measurement of an agent—an
indicator of an agent—in blood or other specimens. However,
there are several difficulties inherent in the use of biological
markers. First, obtaining the specimens can involve an invasive
procedure that discourages subject participation. Second, many exposures do
not have known biological markers. Third, even if a marker exists,
it may be transient and thus not present when the measurement is
taken. For example, levels of l-tryptophan
in blood would reflect only relatively recent exposure and would
decline rapidly after exposure is stopped. Finally, the disease
state may alter metabolism, thereby distorting case–control
The type of case–control study described in the Minneapolis–St.
Paul investigation of use of l
and development of EMS, in which newly diagnosed cases and controls
are sampled from a source population, is used quite commonly. It
is often called a population-based
because cases and controls are sampled from a defined population,
in this instance, by virtue of place of residence.