It is apparent to anyone who reads the medical literature today that some knowledge of biostatistics and epidemiology is a necessity. This is particularly true in occupational and environmental health in which many of the findings are based on epidemiologic studies of subjects exposed to low levels of an agent. Research has become more rigorous in the area of study design and analysis, and reports of clinical and epidemiologic research contain increasing amounts of statistical methodology. This Appendix provides a brief introduction to some of the basic principles of biostatistics and epidemiology.
Data collected in medical research can be divided into three types: nominal (categorical), ordinal, and continuous.
Nominal (categorical) data are those that can be divided into two or more unordered categories, such as gender, race, or religion. In occupational medicine, for example, many outcome measures, such as cancer rates, are considered separately for different gender and race categories.
Ordinal data are different from nominal data in that there is a predetermined order underlying the categories. Examples of ordinal data include clinical severity, socioeconomic status (SES), or ILO (International Labor Office) profusion category for pneumoconiosis on chest radiographs.
Both nominal and ordinal data are examples of discrete data. They take on only integer values.
Continuous data are data measured on an arithmetic scale. Examples include height, weight, blood lead levels, or forced expiratory volume. The accuracy of the number recorded depends on the measuring instrument, and the variable can take on an infinite number of values within a defined range. For example, a person's height might be recorded as 72 in or 72.001 in or 72.00098 in depending on the accuracy of the measuring instrument.
Once research data are collected, the first step is to summarize them. The two most common ways of summarizing data are measures of location, or central tendency, and measures of spread, or variation.
A. Measures of Central Tendency
The mean is the average value of a set of interval data observations. It is computed using the following equation:
where n is sample size and xi is a random variable, such as height, with i = 1, …, n.
The mean can be strongly affected by extreme values in the data. If a variable has a fairly symmetric, or bell-shaped, distribution, the mean is used as the appropriate measure of central tendency.
The median is the "middle" observation, or 50th percentile; that is, half the observations lie above the median and half below it. It can be applied to interval or ordinal ...