Clinicians should use measures of test performance such as sensitivity and specificity to judge the quality of a diagnostic test for a particular disease.
Test sensitivity is the ability of a test to detect disease and is expressed as the percentage of patients with disease in whom the test is positive. Thus, a test that is 90% sensitive gives positive results in 90% of patients with disease and negative results in 10% of patients with disease (false negatives). Generally, a test with high sensitivity is useful to exclude a diagnosis because a highly sensitive test renders few results that are falsely negative. To exclude infection with the virus that causes AIDS, for instance, a clinician might choose a highly sensitive test, such as the HIV p24 antigen and HIV-1/2 antigen/antibody combination fourth-generation rapid test (HIV-1/2 Ag/Ab Combo rapid test).
A test's specificity is the ability to detect absence of disease and is expressed as the percentage of patients without disease in whom the test is negative. Thus, a test that is 90% specific gives negative results in 90% of patients without disease and positive results in 10% of patients without disease (false positives). A test with high specificity is useful to confirm a diagnosis, because a highly specific test has few results that are falsely positive. For instance, to make the diagnosis of gouty arthritis, a clinician might choose a highly specific test, such as the presence of negatively birefringent needle-shaped urate crystals on microscopic evaluation of joint fluid.
Table e2–4 illustrates the generic structure of a 2 × 2 table (disease versus test) and the calculation formulas for test performance metrics including sensitivity, specificity, positive predictive value, and negative predictive value. To determine test sensitivity and specificity for a particular disease, the test must be compared against an independent "gold standard" test or established standard diagnostic criteria that define the true disease state of the patient. For instance, the sensitivity and specificity of rapid antigen detection testing in diagnosing group A beta-hemolytic streptococcal pharyngitis are obtained by comparing the results of rapid antigen testing with the gold standard test, throat swab culture. Application of the gold standard test (ie, swab culture) to patients with positive and negative rapid antigen testing establishes sensitivity and specificity of the rapid test. However, for many disease states (eg, pancreatitis), an independent gold standard test either does not exist or is very difficult or expensive to apply—and in such cases, reliable estimates of test sensitivity and specificity are sometimes difficult to obtain.
Table e2–4. The 2 x 2 table (disease versus test) and test performance metric calculation formulas.
|TP (true positive)
|FP (false positive)
|(TP + FP)
|FN (false negative)
|TN (true negative)
|(FN + TN)
|(TP + FN)
|(FP + TN)
|N = (TP + FP + FN + TN)
Sensitivity and specificity can also be affected by the population from which these values are derived. For instance, many diagnostic tests are evaluated first using patients who have severe disease and control groups who are young and well. Compared with the general population, this study group will have more results that are truly positive (because patients have more advanced disease) and more results that are truly negative (because the control group is healthy). Thus, test sensitivity and specificity will be higher than would be expected in the general population, where more of a spectrum of health and disease is found. Clinicians should be aware of this spectrum bias when generalizing published test results to their own practice. To minimize spectrum bias, the control group should include individuals who have diseases related to the disease in question, but who lack this principal disease. For example, to establish the sensitivity and specificity of the anti-cyclic citrullinated peptide test for rheumatoid arthritis, the control group should include patients with rheumatic diseases other than rheumatoid arthritis. Other biases, including spectrum composition, population recruitment, absent or inappropriate reference standard, and verification bias, should also be considered in certain situations, where critical appraisal of published articles may be necessary.
It is important to remember that the reported sensitivity and specificity of a test depend on the analyte level (threshold) used to distinguish a normal from an abnormal test result. If the threshold is lowered, sensitivity is increased at the expense of decreased specificity. If the threshold is raised, sensitivity is decreased while specificity is increased (Figure e2–3).
Hypothetical distribution of test results for healthy and diseased individuals. The position of the "cutoff point" between "normal" and "abnormal" (or "negative" and "positive") test results determines the test's sensitivity and specificity. If point A is the cutoff point, the test would have 100% sensitivity but low specificity. If point C is the cutoff point, the test would have 100% specificity but low sensitivity. For many tests, the cutoff point is determined by the reference interval, ie, the range of test results that is within 2 SD of the mean of test results for healthy individuals (point B). In some situations, the cutoff is altered to enhance either sensitivity or specificity. (Reproduced, with permission, from Nicoll D et al. Guide to Diagnostic Tests, 7th ed. McGraw-Hill, 2017.)
Figure e2–4 and Table e2–4 show how test sensitivity and specificity can be calculated using test results from patients previously classified by the gold standard test as having disease or not having disease.
Calculation of sensitivity, specificity, and probability of disease after a positive test (posttest probability). TP, true positive; FP, false positive; FN, false negative; TN, true negative. (Reproduced, with permission, from Nicoll D et al. Guide to Diagnostic Tests, 7th ed. McGraw-Hill, 2017.)
Receiver Operator Characteristic
The performance of two different tests can be compared by plotting the receiver operator characteristic (ROC) curves, which graphically represent the pairs of sensitivity and specificity at various reference interval cutoff values. Each point on the ROC curve represents a sensitivity/(1–specificity) pair corresponding to a particular decision threshold. The resulting curve for each test, obtained by plotting the sensitivity against (1–specificity), often shows which test is more accurate; a clearly superior test will have an ROC curve that always lies above and to the left of the inferior test curve, and, in general, the better test will have a larger area under the ROC curve. For instance, Figure e2–5 shows the ROC curves for PSA and prostatic acid phosphatase in the diagnosis of prostate cancer. PSA is a superior test because it has significantly larger area under the curve and higher sensitivity and specificity for all cutoff values.
Receiver operator characteristic (ROC) curves for PSA and prostatic acid phosphatase (PAP) in the diagnosis of prostate cancer. For all cutoff values, PSA has higher sensitivity and specificity; therefore, it is a better test based on these performance characteristics. (Reproduced with permission from Nicoll D et al. Routine acid phosphatase testing for screening and monitoring prostate cancer no longer justified. Clin Chem. 1993;39:2540.)
Note that, for a given test, the ROC curve also allows one to identify the cutoff value that minimizes both false-positive and false-negative results, which is located at the point closest to the upper-left corner of the curve. The optimal clinical cutoff value, however, depends on the condition being detected and the relative importance of false-positive versus false-negative results.
ES. Laboratory diagnosis of COVID-19. J Pediatr (Rio J). 2021;97:7.
et al. Cancer diagnosis using deep learning: a bibliographic review. Cancers (Basal). 2019;11:1235.