This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF) Free
Right arrow Letters to the Editor: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Letters to the Editor are posted
Right arrow Alert me if a correction is posted
Services
Right arrow E-mail this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrowReprints and Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by SZABO, R. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by SZABO, R. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Technorati  
What's this?
The Journal of Bone and Joint Surgery 80:111-20 (1998)
© 1998 The Journal of Bone and Joint Surgery, Inc.


Current Concepts Review

Current Concepts Review - Principles of Epidemiology for the Orthopaedic Surgeon*

ROBERT M. SZABO, M.D., M.P.H.{dagger}, SACRAMENTO, CALIFORNIA


    Introduction
 Top
 Introduction
 Epidemiological Observations
 Diagnosis of Disease
 Epidemiological Measures of...
 Causal Inference
 Statistical Inference
 Bias
 Study-Design Strategies for...
 References
 
It has been stated that "`the object of any science is the accumulation of systematized verifiable knowledge,' and that this is to be achieved through `observation, experiment and thought.'"12 Orthopaedists are concerned primarily with individual patients; epidemiologists study the occurrence of disease or other health-related conditions or events in defined populations26. Epidemiological research is based on the systematic collection of observations related to the phenomenon of interest in a defined population. These data then are subjected to quantification, which includes the measurement of random variables, the estimation of population parameters, and the statistical testing of hypotheses22.

The changing profile of health-care-delivery systems requires orthopaedists to go beyond the individual and to consider their practices in terms of their effects on the lives entrusted to them. Epidemiology is the biomedical discipline focused on the distribution and determinants of disease in groups of individuals who happen to have some characteristics, exposures, or diseases in common. Viewed as the study of the distribution and societal determinants of the health status of populations, epidemiology is the basic-science foundation of public health38. The American Academy of Orthopaedic Surgeons has launched a national public-education effort, the Play It Safe program, for the prevention of injuries in children. This program calls attention to the problem and offers safety guidelines for the maintenance and layout of playgrounds. It is one of several new campaigns sponsored by The American Academy of Orthopaedic Surgeons in which orthopaedists clearly assume an expanded role compared with their role in the past. Other programs of The American Academy of Orthopaedic Surgeons address areas such as safety in sports and the prevention of domestic violence. These programs demonstrate that The American Academy of Orthopaedic Surgeons recognizes the increasing responsibility of its members in the field of population medicine. Issues once deemed relevant only in the public-health domain are now concerns of orthopaedic surgeons. In order to be active and effective, orthopaedists must acquaint themselves with epidemiological principles.

Orthopaedic surgeons, as clinical scientists, are concerned with the causes of disease and with distinguishing treatments that are beneficial to their patients from those that may be detrimental. They read The Journal of Bone and Joint Surgery, as well as many other journals, in search of information that will allow them to be better health-care providers. In so doing, they rely on the authors, their reputations, and the editorial boards to guide them because they lack the knowledge to evaluate critically the biases in the study designs and the inferences made with statistics. Epidemiological principles form the foundation of causation theory, statistical inference, and research-study designs.

The purpose of the current review is to consider epidemiological concepts that are important to the study of musculoskeletal disease so that the reader may better understand the different study-design strategies for the investigation of cause-and-effect relationships and may appreciate the limitations of each.


    Epidemiological Observations
 Top
 Introduction
 Epidemiological Observations
 Diagnosis of Disease
 Epidemiological Measures of...
 Causal Inference
 Statistical Inference
 Bias
 Study-Design Strategies for...
 References
 
It has been stated that "any work which stands to elucidate the cause of disease, the mechanism of disease, the cure of disease, or prevention of disease, must begin and end with observations on man, whatever the intermediate steps may be."31 Observations and measurements form the fundamental units of data. The quality of the data is commonly described with use of four terms: accuracy, precision, reliability, and validity. Accuracy is the degree to which a measurement represents the true value of the attribute being measured26. A measurement or observation can represent a true value without detail. To say that a man is obese may be an accurate observation; to say that he weighs 310 pounds and eleven ounces (140.9 kilograms) is precise. Precision is the quality of being sharply defined through exact detail26. Reliability is a measure of how dependably an observation is exactly the same when repeated; it refers to the measuring procedure rather than to the attribute being measured26. Reliability is not synonymous with repeatability or reproducibility; rather, it is a broader term that includes the concept of consistency, which refers to how closely the findings in different samples or populations conform to one another under different conditions or at different times.

In epidemiological terms, a test is valid if it measures what it purports to measure. When the results obtained from a study are distorted because of bias in the study design or the analysis of the data, the study lacks validity. An important distinction must be made between the terms internal validity and external validity. Internal validity concerns inferences about the population of individuals of restricted interest from which a study sample has been drawn; external validity concerns inferences about an external population beyond the study's restricted interest. For example, many methods for plotting the factors pertaining to limb-length discrepancy have been described. Anderson et al. devised charts for the prediction of the amount of growth remaining in a limb on the basis of data collected from 100 children (fifty girls and fifty boys) at the Children's Hospital in Boston1-3. Fifty-one of these children were normal, and forty-nine had paralytic poliomyelitis affecting one lower extremity, which was not included in the study. Moseley pointed out that the assumption that the lengths of the lower extremities of all children of a certain skeletal age are the same proportion of the limb lengths of the same individuals when they reach adulthood, regardless of their growth percentile or chronological age, is unlikely to be true for children of different races or for those with a markedly different familial habitus28. Therefore, an orthopaedist residing in a continent other than North America might have reason to question the external validity of the data reported by Anderson et al. and might want to repeat those studies in a more relevant population before making inferences that will be used to manage patients locally.


    Diagnosis of Disease
 Top
 Introduction
 Epidemiological Observations
 Diagnosis of Disease
 Epidemiological Measures of...
 Causal Inference
 Statistical Inference
 Bias
 Study-Design Strategies for...
 References
 
Before a disease can be studied, it is necessary to define it. Often, this results in a case definition, which may differ from a clinical definition. For example, carpal tunnel syndrome can be defined by its historical, physical, or electrophysiological parameters, or by some combination of these parameters. Studies in which different definitions are used may lead to different conclusions. For example, the case definition of a disease may be different for screening purposes than for diagnostic purposes or for determining its etiology, and this may lead to confusion in interpreting the results of the study. Screening is the application of a test to people who are asymptomatic for the purpose of classifying them with respect to their likelihood of having a particular disease. Individuals who have a positive test result subsequently are tested further in order to establish a diagnosis. For example, a positive result on skin-testing for tuberculosis indicates that a chest radiograph should be made. Similarly, surveillance is used to detect changes in the trends or distribution of disease so that investigative or control measures can be initiated if needed.

The surveillance case definition for work-related carpal tunnel syndrome proposed by the National Institute for Occupational Safety and Health includes symptoms related to the median nerve; at least one occupational risk factor; and objective evidence of carpal tunnel syndrome on physical examination, including the Tinel sign or Phalen sign, decreased sensation to pinprick, and positive findings on nerve-conduction studies39. These types of definitions that rely on some identifying tests should be examined in terms of four parameters: sensitivity, specificity, positive predictive value, and negative predictive value (Table I). Sensitivity refers to the proportion of individuals in a tested population who actually have a given disease and are identified as having it. Sensitivity is defined as the number of true-positive results divided by the sum of the true-positive and false-negative results. Specificity refers to the proportion of individuals in a tested population who do not have a given disease and are identified as not having it. Specificity is defined as the number of true-negative results divided by the sum of the true-negative and false-positive results. Generally, sensitivity is increased at the expense of specificity. For diagnostic and screening tests, the probability that a person who has a positive result truly has a given disease is known as the positive predictive value, while the probability that a person who has a negative result truly does not have the disease is the negative predictive value. Sensitivity, specificity, and predictive values are conditional probabilities. Katz et al. found that applying the case definition of carpal tunnel syndrome proposed by the National Institute for Occupational Safety and Health, without using electrodiagnostic studies, to a sample of seventy-eight symptomatic workers resulted in the misclassification of 38 per cent; 50 per cent of those who satisfied the case definition did not have carpal tunnel syndrome according to electrodiagnostic criteria, whereas 25 per cent who did not satisfy the definition did have it19. Therefore, this case definition would not be useful as a basis for making decisions about treatment or for determining causation of symptoms unless electrodiagnostic studies were included.



View larger version (43K):
[in this window]
[in a new window]
 
TABLE I

 


    Epidemiological Measures of Disease
 Top
 Introduction
 Epidemiological Observations
 Diagnosis of Disease
 Epidemiological Measures of...
 Causal Inference
 Statistical Inference
 Bias
 Study-Design Strategies for...
 References
 
In epidemiological studies, groups of people are compared quantitatively with respect to some framework of time. The measurement of disease at one particular time provides a prevalence rate. Prevalence is the total number of individuals who have a characteristic or disease at a particular point in time divided by the number who are at risk of having that characteristic or disease at that designated point in time26. Prevalence depends on both the number of people who have had the disease or characteristic in the past and the duration of the disease or characteristic. The measurement of disease over a period of time provides an incidence rate. Incidence is the number of new cases of a disease in a defined population within a specified time-period divided by the number who are at risk of having that disease or characteristic at that designated time-period26. In trying to determine whether a given exposure (such as occupational repetitive trauma) influences the development of a given disease (such as carpal tunnel syndrome), a comparison of the incidences with and without the exposure provides more information about whether the disease is due to that exposure, whereas a determination of prevalence simply reveals the rate of disease among individuals who have and have not been exposed.

The concept of an association, or statistical dependence, between a factor and a disease is fundamental to ascription of the factor as possibly causal. An exposure or attribute that increases the probability of a disease is a risk factor; for example, negative ulnar variance is a risk factor for Kienböck disease7,16. Epidemiologists calculate a measure of association as a single summary parameter that estimates the association between a disease and a so-called exposure. A single summary parameter is a representative measurement of a population under study just as a mathematical mean describes a group by representing the average. These measures of association draw on two concepts: the probability of an event or disease and the odds of an event or disease. The probability of an event, or any outcome of interest, is the relative frequency of this event over an infinite number of random trials. The probability (Pr) of an event (E) is always greater than or equal to 0 or less than or equal to 1, expressed as 0 <= Pr(E) <= 1. A conditional probability is the probability of an event (E), given that another event (D) has occurred, expressed as Pr(E|D) and equal to Pr(E and D)/Pr(D). Odds is the ratio of the probability of an event occurring to that of it not occurring26.

The two most frequently used measures of association are the relative risk and the odds ratio. Relative risk indicates the average risk of disease that is due to a given exposure in the exposed group. It provides an estimate of the magnitude of an association between an exposure and a disease; it is the ratio of the risk of the disease among exposed individuals to that among unexposed individuals. The relative risk of an event (E) given another event (D) is a conditional probability, expressed as: Pr(E|D)/Pr(E|not D). If E and D are independent, then the relative risk will be 1. The greater the dependence between the events, the further the relative risk is from 1. The relative risk can be calculated on the basis of historical or prospective cohort studies as well as cross-sectional studies. The odds ratio is the ratio of the odds of a disease occurring among exposed individuals to that of it occurring among unexposed individuals. The odds ratio is used as a measure of effect in studies (such as case-control studies) in which incidence rates cannot be derived directly. In case-control studies, participants are selected on the basis of disease status; therefore, it is not possible to calculate the rate of disease on the basis of the presence or absence of exposure. The odds ratio and risk ratio are very similar in instances of rare disease but are quite dissimilar when the prevalence begins to exceed 5 or 10 per cent.

To facilitate the calculation of these measures, epidemiological data are presented in a two-by-two table (Table II). Such a table can be used, for example, to describe a radiographic study of the association between ulnar variance and Kienböck disease (Table III)7. The odds ratio is the appropriate measure of association for this case-control study, but in this example it is not very different from the calculated relative risk. Although the authors of that study found a significant association (p = 0.0000 [the probability of the event occurring by chance alone is at least less than one in 10,000]) between negative ulnar variance and Kienböck disease, they were careful to point out that "the association must not be considered a primary etiological relationship."7



View larger version (38K):
[in this window]
[in a new window]
 
TABLE II

 


View larger version (42K):
[in this window]
[in a new window]
 
TABLE III

 


    Causal Inference
 Top
 Introduction
 Epidemiological Observations
 Diagnosis of Disease
 Epidemiological Measures of...
 Causal Inference
 Statistical Inference
 Bias
 Study-Design Strategies for...
 References
 
Contemporary epidemiologists are concerned with determining the etiology of disease. A cause is an event that, either alone or in conjunction with other elements, produces a sequence of other events that result in an effect34. Orthopaedists frequently are asked by patients, referring physicians, employers, and others: "What is the cause of this disease?" when, for example, a patient has Kienböck disease. It is hopeless to attempt to answer that question; however, we may be able to answer other questions, such as "What are some of the causal contingencies in Kienböck disease?" and "Is this factor (for example, repetitive microtrauma) a risk factor for Kienböck disease?" Epidemiologists prefer to use the term risk factor instead of cause to indicate an attribute or exposure that is related to an increased probability of a disease. Additionally, for a factor to be considered a risk factor, it must precede the occurrence of the disease and the observed association must not be due to problems with the study design or the analysis of the data.

The important, or primary, causal contingency may depend on one's viewpoint. For example, when an intoxicated, speeding motorcycle driver is struck by an automobile, sustains multiple fractures, and subsequently dies from adult respiratory distress syndrome in the intensive-care unit, a pathologist may attribute the cause of death to pulmonary interstitial edema; an internist, to cardiac asystole; an orthopaedic surgeon, to a delay in stabilizing the femoral and pelvic fractures; Mothers Against Drunk Driving (MADD), to alcohol intoxication; and a law-enforcement official, to excessive speed. Each observer selects the causal contingency on the basis of his or her perspective. It is compelling to search for a primary, or the most important, causal factor among many. From a practical standpoint, if the goal is to prevent a disease it may be more beneficial to focus on a causal factor somewhat remote from the disease. For instance, general improvement in living conditions and economic development in underdeveloped countries can do more to reduce the incidence of tuberculosis and its orthopaedic manifestations than can any vaccination, chemotherapy, or operative procedure.

Epidemiologists, more than other medical scientists, are concerned with philosophical theories rather than the purely technical aspects of the experimental method because experiments play a minor role in the analysis of naturally occurring phenomena44. Causal inference is "the logical development of a theory based on observations and a series of arguments, that attributes the development of a disease to one or more risk factors."22 Different philosophical constructs underlie the process for making an inference. Reasoned arguments can proceed either from the general to the particular (deductive reasoning) or from the particular to the general (inductive reasoning). The traditional view of science is that induction—the formation of a hypothesis based on observation—is cardinal to the scientific method. Hume, without using the term induction, asserted that the generalization of a necessary connection between cause and effect cannot be derived from experience but depends on contiguity and succession, which rely on repeated observations11. The flaw of a hypothesis that is derived by induction is that it can be refuted by the first observation that proves an exception. For example, if, by induction, we believe that all hearts are on the left side of the chest, our belief can be destroyed by one patient who has dextrocardia.

Popper, who was one of the most influential philosophers of science of the twentieth century41, rejected induction, claiming that it is always possible to produce a theory to fit any set of observations4. He popularized a so-called hypothetico-deductive system, advocating deductive reasoning for making predictions from a hypothesis and for stating what it prohibits. Thus, knowledge is advanced by testing hypotheses and discarding those that fail. According to this refutation, or falsification, perspective, the epidemiologist starts with a hypothesis, collecting data about a disease that then are used either to refute or to accept the hypothesis. The alternation between the generation of hypotheses and the collection of data allows many hypotheses to be discarded without any experiment being performed. Thus, experiments are reserved for the testing of hypotheses that have survived deductive efforts to falsify them in observational studies4. Others have argued that no scientific theory can be falsified with certainty because a theory always includes auxiliary hypotheses, and if some of the consequences of the theory turn out to be false one of the auxiliary hypotheses rather than the theory may be false30. For example, every study involves the auxiliary hypothesis that no unknown bias is occurring, and it may be this hypothesis that is falsified rather than the main theory. Any study can falsify a theory only to a degree. Judgments about causality in epidemiology depend on new knowledge. Kuhn, in his thesis of scientific revolutions, argued that the evidence that scientists draw on is determined by an overriding contemporary paradigm, which dictates the way in which a causal sequence is construed23. For example, tuberculosis was thought to be a social disease in the nineteenth century, and malaria was thought to be caused by bad air. After Koch discovered the tubercle bacillus and developed the germ theory, a new paradigm was created that attributed specific diseases to specific agents43.

The probability that an association exists is the first criterion used in causal inference in epidemiology. For an association to be considered causal, the cause must precede the effect (the property of time-order) and there must be an asymmetrical direction such that the cause leads to the effect43. Hill established classic operational causal criteria: strength of association, consistency, specificity, temporality, biological plausibility, dose-response effect, coherence, experimental evidence, and analogy13,15. Strength of association refers to the extent to which a supposed cause and effect are related and should not be confused with statistical significance. The most common measure of strength of association is relative risk42. Another measure of association is the rate ratio, which is the ratio of the incidence rate among exposed individuals to that among unexposed individuals. The stronger the relative risk or rate ratio for any speculative cause, the more likely it is to be causal. An association may be weak yet still be highly significant in a study with large numbers. For example, in a study of 34,243 hip fractures, Medicare data demonstrated a weak but significant association between geographic area of residence and the risk of hip fracture in women18. The rate ratio for women living in the Middle Atlantic states was 1.07 (95 per cent confidence interval, 1.01 to 1.14), whereas for those in the East South Central states it was 0.86 (95 per cent confidence interval, 0.79 to 0.94). Therefore, the risk of hip fracture for women in the Middle Atlantic region of the United States is approximately 23 per cent higher than that for women in the East South Central region, demonstrating a weak but significant association perhaps worthy of additional study.

Consistency means that a result is found in many studies despite different circumstances, research designs, or time-periods. Specificity describes the precision with which a factor will predict the occurrence of a specific disease; it adds plausibility to the causal claim but, if absent, does not detract from it43. Temporality is the property of time-order. Biological plausibility means that the hypothesized effect makes sense in the context of current biological knowledge. A dose-response effect is present when the frequency of a disease increases with the dose or level of exposure. Coherence refers to an association that is compatible with a preexisting theory about the outcome and the suspected causal factor. Experimental evidence may demonstrate an alteration in the frequency of the associated events and thus may support a causation hypothesis. Analogy is the process of reasoning by comparing similar cause-and-effect relationships, such as the effect of rubella during pregnancy with that of a similar virus during pregnancy.

None of Hill's criteria are either necessary or sufficient for making a causal judgment, but they are helpful in answering the question: "Is there any other way of explaining the set of facts before us, is there any other answer equally, or more likely than cause and effect?"13 In summary, epidemiologists employ both deductive and inductive methods. The variation among the viewpoints of epidemiologists with regard to causality is rooted in the variation among philosophical viewpoints. Hill's criteria provide interpretive guidelines for evaluating epidemiological evidence. The establishment of causation is a continuing and evolving process.


    Statistical Inference
 Top
 Introduction
 Epidemiological Observations
 Diagnosis of Disease
 Epidemiological Measures of...
 Causal Inference
 Statistical Inference
 Bias
 Study-Design Strategies for...
 References
 
Statistical inference is part of the basis of epidemiology because the observations studied by epidemiologists are subject to random fluctuations17. The testing of hypotheses is a statistical procedure to determine the probability that the data that are collected are consistent with the specific hypothesis under investigation. The hypothesis in the study of the association between ulnar variance and Kienböck disease, for example, may be denoted as HA (alternative hypothesis) and stated as: there is a difference in the prevalence of Kienböck disease among people who have negative ulnar variance compared with those who have neutral or positive ulnar variance. The opposite of HA is a hypothesis, HO, that states: there is no difference in the prevalence of Kienböck disease among people who have negative ulnar variance compared with those who have neutral or positive ulnar variance. Conventionally, the investigator seeks to deny or nullify HO; thus, it is known as the null hypothesis. HA and HO are mutually exclusive and exhaustive: that is, one or the other must be true, but they cannot both be true. Thus, if HO is denied, then HA is affirmed. HA and HO refer to the entire population, even though data are available only from one sample of patients.

The significance test is based on the calculation of a test statistic (for example, the t value, z value, or chi-square value) and on some theoretical assumptions. One such assumption is that the null hypothesis is true, and another is that sampling uncertainty is random. On the basis of these assumptions, it is possible to calculate how likely or unlikely the outcome observed in the sample would be. The test statistic, therefore, is a number that compares the observed and expected values of the parameter being measured under the null hypothesis. The significance test encompasses the rationale that there is some range of test-statistic values such that either the assumptions of the test, including the null hypothesis, are true and a rare event has occurred or one of the assumptions is untrue and, specifically, the null hypothesis is false29. The point at which a test-statistic value is rare enough to warrant rejection of the null hypothesis is determined by convention but typically is set at the value that would occur no more than 5 or 1 per cent of the time in repeated tests if the null hypothesis were true. This absolute value, which must be exceeded in order for the null hypothesis to be rejected, is called the critical value. The probability that the test-statistic value is too small to be consistent with HO being true is known as the significance level, or {alpha}; it is conventionally set at either 0.05 or 0.01 and is commonly called the p value. In summary, if the test statistic represents an occurrence of less than 5 (or 1) per cent of the time under random sampling if the null hypothesis were true, then the result is considered significant and the null hypothesis is rejected in favor of HA.

Many tests of significance have been used, depending on the nature and distribution pattern (discrete, continuous, categorical, and so on) of the data being analyzed. Often, an investigator selects the appropriate test, calculates the test statistic (for example, the t value, z value, or chi-square value) and the significance level (the p value) with use of the collected data, and either rejects or fails to reject the null hypothesis. One caveat is to beware of what Sackett called "data dredging bias."36 When data are analyzed for all possible statistical associations without a previous specific hypothesis, the likelihood that an investigator will find a significant association by chance alone increases as the number of statistical tests that are performed increases. Unplanned data analyses should be identified as such; as Lang and Rhodes stated, "if the fishing expedition catches a boot, the fisherman should not claim that they were fishing for boots."25

There are two types of error associated with the significance test. It is possible to reject the null hypothesis when a rare event has occurred even though the null hypothesis is actually true. This is called a type-I error. In all cases in which a null hypothesis is true, a type-I error will occur (100 x {alpha}) per cent of the time, where {alpha} equals the significance level (usually 0.05 or 0.01). In a type-II error, the null hypothesis is false but the calculated test statistic is not significant and therefore is determined to be consistent with the null hypothesis being true. The null hypothesis is thus accepted in error. The relative frequency with which a type-II error occurs is symbolized by ß. An experiment usually is designed to control the probability of ß to be less than 0.20. Power is the complement of a type-II error, or the probability that the null hypothesis will be rejected when it is indeed false and is equal to (1 - ß). Power is a function of the level of significance, the reliability of the sample data (the degree of spread in the data or the standard deviation), and the size of the experimental effect. A power analysis should be considered if no significant difference can be found between the two groups being compared in a study. The groups may not be large enough to allow detection of a significant difference (the null hypothesis being that there is a difference), and the power analysis will demonstrate the probability that the null hypothesis has been correctly rejected. The probability of a type-I or type-II error is inversely related for any determined experimental design and fixed sample size. Power can be increased by improving the experimental design and increasing the sample size.

Estimation is another tool used in statistical inference concerned with calculation of the values of specific population parameters. A point estimate (the sample mean) can be used to estimate the population mean. A weakness in the point estimate is its failure to make a probability statement regarding how close the estimate is to the population parameter24. A confidence interval estimate remedies this problem by providing an interval of plausible estimates of the population mean as well as a best estimate of its precise value. The confidence interval that is conventionally chosen is 95 or 99 per cent, similar to the conventional choice of 0.05 or 0.01 for the level of significance. The 95 per cent confidence interval means that, assuming the sample mean will follow an approximately normal distribution, 95 per cent of all sample means based on a given sample size will fall within ±1.96 standard errors of the population mean. This number is derived from the mathematics of normal curve distribution. Similarly, 99 per cent of all sample means based on a given sample size will fall within +2.576 standard errors of the population mean (the 99 per cent confidence interval)33. A confidence interval, therefore, is a range of values for a study variable specifying the probability (usually 95 per cent) that the true value of the variable is included within the range. The size of a confidence interval gives some idea of the precision of the point estimate in a way that is not offered by a p value; the narrower the confidence interval, the more precise the data. As the sample size increases, the size of the confidence interval decreases. As the standard deviation increases, reflecting the increased variability between individual observations, the size of the confidence interval increases. As the desired level of confidence increases (for example, from 95 to 99 per cent), the size of the confidence interval increases. For example, as determined from the sample represented in Table III, the odds of Kienböck disease occurring in association with negative ulnar variance are about sixteen times greater than the odds of it occurring in association with positive or neutral ulnar variance. It can be confidently assumed that 95 per cent of all such sample means that are collected will fall within the 95 per cent confidence interval of 5.2 to 50.6. To base confidence on the confidence interval, however, and to accept this estimate as a statement of truth about the general population requires the leap of faith characteristic of induction. Instead, it is necessary to use deductive reasoning and to interpret the estimate as a tentative, as yet unrefuted hypothesis27.

Statistical reasoning is based on the precepts that natural processes can be described by stochastic models and that the study of random collections of individuals will allow identification of "systematic patterns of scientific import."45 The truth or falsehood of a hypothesis cannot be inferred from a significance test. The type-I and type-II error rates define a critical region for the summary statistic, which represents a decision rule as to whether the null hypothesis is to be rejected or not. Goodman warned that a decision rule tells nothing about whether a particular hypothesis is true; it says only that, if investigators behave according to such a rule, in the long run they will reject a true hypothesis not more, for example, than once in 100 times and they may have evidence that they will reject the hypothesis sufficiently often when it is false8. Poor data are often irrelevantly supported by sophisticated statistical techniques. Susser pointed out that an overreliance on significance testing is insufficient for the causal analysis of data40. A significance test makes no assumption about the plausibility of the null hypothesis. Investigators must focus on the substance of the issues being studied and must not become sidetracked by the mechanics of data analysis. Judgment must take precedence over statistical inference when large sources of error are not quantified in the statistical analysis. These sources of error are collectively known as bias and deserve careful consideration in the design, execution, and analysis of all studies.


    Bias
 Top
 Introduction
 Epidemiological Observations
 Diagnosis of Disease
 Epidemiological Measures of...
 Causal Inference
 Statistical Inference
 Bias
 Study-Design Strategies for...
 References
 
Bias is defined as "any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth."26 Bias can lead to an incorrect estimation of the association between an exposure and the risk of a disease. Such an association is considered real if all attempts to explain it away as due to bias have failed. There are three broad categories of bias: confounding bias, selection bias, and information bias22.

Confounding is a distortion in an effect measure (such as relative risk) that results from an effect of another variable (the confounder) associated with the disease and exposure being studied. Confounding can lead to an overestimate or an underestimate of the true association between a disease and an exposure and even can change the direction of the observed effect. For a factor to be a confounder, it must, in and of itself, be a risk factor for the disease in the unexposed population and it must be associated with the exposure variable in the population from which the cases were derived. In addition, it must not be an intermediate step in the causal pathway between the exposure and the disease. For example, if a study were to be designed to explore whether smoking is a risk factor for motor-vehicle accidents, consumption of alcohol would be considered a confounder because it is both a risk factor for motor-vehicle accidents and it is associated with smoking. During the introduction of a new operative procedure, the good-risk patients may be selectively managed with the procedure while the poor-risk patients may receive the standard treatment. This is called a confounding-by-indication bias10. Age and gender are well recognized confounders.

An effect modifier is a factor that changes the magnitude of an effect measure (for example, relative risk or odds ratio). Effect modification differs from confounding: the latter is a bias that the investigator tries to prevent or remove from the data, whereas the former is a constant of nature. For example, immunization status against serum hepatitis is an effect modifier for the consequences of being stuck by a needle that was used on a person infected with hepatitis. Immunization status is an effect modifier because people who are immunized are less likely to contract hepatitis than those who are not.

Selection bias (also known as detection bias and unmasking bias) refers to a distortion in the estimation of an effect due to systematic differences in characteristics between subjects who are selected for a study and those who are not. Selection bias can result when a procedure used to identify a disease varies with the exposure status. For example, this bias can be introduced by an examiner who performs a clinical evaluation without being blinded to the disease or exposure status of the subject. The results of the evaluation may differ if the examiner expects the disease to be present in the patients and absent in the controls. A lack of a response to a questionnaire or loss of patients to follow-up would not be serious problems if they merely resulted in a reduction in the number of subjects available for study; however, they may result in selection bias if the respondents and the non-respondents or the patients being followed and those who have been lost differ with respect to some characteristic being studied.

Information bias (also known as observational bias and misclassification bias) refers to a distortion in the estimation of an effect that results from error in the measurement of either an exposure or a disease or from the misclassification of subjects with regard to at least one variable. In describing inaccuracy of measurement, two types of misclassification bias can occur: non-differential misclassification, when the inaccuracy is the same for the two study groups (for example, the patients and controls in a case-control study), and differential (non-random) misclassification, when the inaccuracy differs between groups (for example, when an exposure measure, such as repetitive work tasks, is determined more accurately among the patients than among the controls). Non-differential misclassification increases the similarity between the exposed and unexposed groups. Any association between exposure and disease will be underestimated, so the observed estimate of effect is said to be biased toward the null value of 1.0, meaning that if exposure and disease had no association the expected relative risk would be equal to 1.0. Differential misclassification leads to a biased risk estimate that may be either away from (an overestimate) or toward (an underestimate) the null value. Whereas confounding bias is generally correctable in the analysis stage of a study, selection and information biases may not be correctable.


    Study-Design Strategies for Determining the Relationships between Exposure and Disease
 Top
 Introduction
 Epidemiological Observations
 Diagnosis of Disease
 Epidemiological Measures of...
 Causal Inference
 Statistical Inference
 Bias
 Study-Design Strategies for...
 References
 
In ecological (correlation) studies, data from entire populations are used to compare disease characteristics between different groups during the same period of time or in the same population at different points in time. In an ecological study, the population rather than the individual is designated as the unit of study and the exposure status is determined on the basis of a summary value for a group to which that person belongs. For example, an investigator might examine the relationship between per capita consumption of cigarettes and operative procedures for the treatment of non-unions in five counties in California in 1996. Even if this study showed that counties with a higher rate of consumption of cigarettes had a higher rate of operations for non-unions, the investigator still could not be sure that the individuals who smoked in these counties truly had a higher rate of non-union. This conclusion, if indeed it is erroneous, is known as an ecological fallacy because the correlation between two ecological variables is often different from the corresponding individual correlation within the same populations32. The ecological study design is not often used in investigations of musculoskeletal disease.

Fifteen types of study designs have been described for investigating the etiology of disease, but more commonly they are grouped into three categories of observational studies: cross-sectional, case-control, and cohort22.

Cross-sectional studies, also known as prevalence studies or disease-frequency surveys, are used to assess the status of an individual with respect to the presence or absence of both exposure and disease at the same point in time. It is not possible to determine whether the exposure preceded or resulted from the disease. Silverstein et al., in a cross-sectional study, determined the prevalence of carpal tunnel syndrome among 652 workers in thirty-nine jobs at seven different industrial sites39. Specific hand-force and repetitiveness characteristics then were estimated for the different jobs. The prevalence of carpal tunnel syndrome ranged from 0.6 per cent among workers in low-force, low-repetition jobs to 5.6 per cent among those in high-force, high-repetition jobs. In order to infer that a statistical association between job-related exposure factors and carpal tunnel syndrome is evidence of etiology, it should be demonstrated that the job exposure occurred before the carpal tunnel syndrome. The temporal relationship between physical load factors and the onset of carpal tunnel syndrome cannot be demonstrated in a cross-sectional study examining the prevalence of carpal tunnel syndrome. Prevalence studies are performed on survivor populations and thus may be affected by selection bias; for example, individuals who have more severe carpal tunnel syndrome may have left the workforce and thus may not be accounted for in a cross-sectional survey. Ecological and cross-sectional designs are descriptive epidemiological strategies that have as their objective the formulation of etiological hypotheses; however, because of their inherent limitations they can rarely be used to test hypotheses.

Case-control and cohort studies are the two basic types of observational analytical study designs that have as their objective the testing of hypotheses. The goal of an observational study is to arrive at the same conclusions that would have been derived in an experimental trial9. In a case-control study, subjects are identified on the basis of whether they have the disease of interest (for example, carpal tunnel syndrome) or not (controls), and past exposure to factors of interest (for example, repetitive trauma) is determined. This study design is good for the investigation of rare diseases but is highly susceptible to selection and recall bias. Recall bias occurs when a study relies on the patient's memory to determine exposure status because a patient who has a disease is more likely to remember possible exposures than a healthy person. Case-control studies are retrospective because they start after the onset of disease and postulated causal factors are evaluated retrospectively. In a case-control study of the epidemiology of acute herniation of lumbar intervertebral discs, Kelsey and Hardy compared the characteristics of patients who had such herniation with those of two control groups who were known not to have it and found that driving a motor vehicle was associated with an increased risk of herniated discs21. That study provides an excellent demonstration of how the many concerns in case-control studies, including selection of controls, analysis of data with various confounders, and consideration of other biases, should be addressed.

A cohort is a group of people who are followed over a period of time. In a cohort study, the study group is defined on the basis of exposure status and is followed forward in time in order to assess the occurrence of disease. All potential subjects must be free of the disease being studied at the time that exposure status is defined. A cohort study can be either prospective or retrospective. In a prospective (concurrent) cohort study, the disease has not yet occurred at the beginning of the study. Probably the most impressive prospective cohort study is the Framingham Study5, which began around 1950. Coronary heart disease and its consequences were thought by the medical profession to be inevitable changes of aging; however, clinical observations and descriptive epidemiological studies suggested that preventable environmental factors played a role. More than 5000 people who did not have coronary artery disease were enrolled in the long-term study, originally planned to last for twenty years, and each participant was given a comprehensive medical examination every two years5. The findings of this study have continued to emerge, delineating the risk factors for heart disease and associated atherosclerotic and non-atherosclerotic disorders, and they have led to preventive and treatment paradigms that have resulted in better health care.

In a retrospective (historical) cohort study, both the exposure and the disease have already occurred. It would be unethical to test the hypothesis that low levels of exposure to radiation shorten human life expectancy with use of a prospective study; however, if a group of people (a cohort) that already has been exposed can be identified, then even if the exposure was in the past a retrospective cohort study would be feasible. Seltser and Sartwell undertook such a study, comparing members of the Radiological Society of North America with members of other medical specialty societies37. These authors demonstrated that the death rate was highest among radiologists, intermediate among internists, and lowest among ophthalmologists and otolaryngologists; improvements in equipment and safety techniques showed a so-called disappearing effect in the latter part of the study37. The advantages of a cohort study include a temporal sequence of exposure and disease that is usually clear, a minimization of observational bias in determining exposure, the ability to examine multiple effects of one exposure, and usefulness when exposure is rare. Cohort studies are time-consuming, expensive, not suitable for the investigation of rare diseases, and potentially biased with regard to loss of subjects to follow-up when subjects must be followed for many years. A case-control study can be nested within a cohort study. In this study design, patients and controls are drawn from the population in a cohort study, as has been done frequently in the Framingham study5. Nested case-control studies obviate the issue of recall bias because the data are collected prospectively.

Intervention studies (clinical trials) are a type of cohort study in which the participants are assigned by the investigator to receive one of the exposures or treatments under study. The advantage of this design is that known and unknown confounders are distributed, on the average, equally among the study groups. Susceptibility bias may occur in a clinical trial if the two groups are dissimilar in terms of their initial state. A randomized clinical trial precludes the possibility of susceptibility bias20. A double-blind randomized trial is the best way to test the hypothesis that Coumadin (warfarin) prevents deep-vein thrombosis after total hip arthroplasty because randomization minimizes confounding and blinding eliminates selection bias. More than thirty years ago, Hill reflected on the newly popularized controlled trial: "The history of science, however, shows that frequently with a new discovery, a new technique, or a new theory of disease, the pendulum at first swings too far ... there is a `blind acceptance of double-blind trials without a critical evaluation of their short-comings and their ability to mislead as well as to lead.'"14 The problems with randomized operative trials in orthopaedics, including ethical, performance, outcome, and philosophical issues, were discussed in depth by Keller et al.20. The randomized-surgeon design has been suggested as a solution to some of these problems35.

A common practice in the epidemiological literature is the documentation of unavoidable departures from the ideal study design as well as a discussion of estimates of the magnitude and direction of biases and deduction of the extent to which these errors threaten or do not threaten validity. However, this approach has not been routinely adopted rigorously in the orthopaedic literature. Gartland reviewed ten articles on the long-term follow-up of patients after primary total hip arthroplasty to determine the strategy that had been used in the design of each study6. He concluded that all of the studies were deficient in design, were flawed by confusing data, and contained results of doubtful validity. The quality of orthopaedic studies and literature will improve if sufficient attention is paid to alternative explanations and there is recognition of the central role that epidemiological principles play in advancing knowledge and understanding.


    Footnotes
 
*No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article. No funds were received in support of this study.

{dagger}Department of Orthopaedic Surgery, University of California, Davis, School of Medicine, 2230 Stockton Boulevard, Sacramento, California 95817. E-mail address: rmszabo@ucdavis.edu.


    References
 Top
 Introduction
 Epidemiological Observations
 Diagnosis of Disease
 Epidemiological Measures of...
 Causal Inference
 Statistical Inference
 Bias
 Study-Design Strategies for...
 References
 

  1. Anderson, M., and Green, W. T.: Lengths of the femur and the tibia. Norms derived from orthoroentgenograms of children from five years of age until epiphyseal closure. Am. J. Dis. Child., 75: 279-290, 1948.[Abstract/Free Full Text]
  2. Anderson M.; Green, W. T.; and Messner, M. B.: Growth and predictions of growth in the lower extremities. J. Bone and Joint Surg., 45-A: 1-14, Jan. 1963.[Abstract/Free Full Text]
  3. Anderson, M.; Messner, M. B.; and Green, W. T.: Distribution of lengths of the normal femur and tibia in children from one to eighteen years of age. J. Bone and Joint Surg., 46-A: 1197-1202, Sept. 1964.[Free Full Text]
  4. Buck C.: Popper's philosophy for epidemiologists. Internat. J. Epidemiol., 4: 159-168, 1975.[Abstract/Free Full Text]
  5. Dawber, T. R.; Kannel, W. B.; and Lyell, L. P.: An approach to longitudinal studies in a community: the Framingham Study. Ann. New York Acad. Sci., 107: 539-556, 1963.
  6. Gartland, J. J.: Orthopaedic clinical research. Deficiencies in experimental design and determinations of outcome. J. Bone and Joint Surg., 70-A: 1357-1364, Oct. 1988.[Abstract/Free Full Text]
  7. Gelberman, R. H.; Salamon, P. B.; Jurist, J. M.; and Posch, J. L.: Ulnar variance in Kienböck's disease. J. Bone and Joint Surg., 57-A: 674-676, July 1975.[Abstract/Free Full Text]
  8. Goodman, S. N.: P values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. Am. J. Epidemiol., 137: 485-496, 1993.[Abstract/Free Full Text]
  9. Gray-Donald, K., and Kramer, M. S.: Causality inference in observational vs. experimental studies. An empirical comparison. Am. J. Epidemiol., 127: 885-892, 1988.
  10. Greenland, S., and Neutra, R.: Control of confounding in the assessment of medical technology. Internat. J. Epidemiol., 9: 361-367, 1980.
  11. Hendel, C. W. J. [editor]: Hume Selections. Vol. 1, pp. 22-28. New York, Charles Scribners, 1955.
  12. Hill, A. B.: Observation and experiment. New England J. Med., 248: 995-1001, 1953.
  13. Hill, A. B.: The environment and disease: association or causation?. Proc. Roy. Soc. Med., 58: 295-300, 1965.[Medline]
  14. Hill, A. B.: Reflections on controlled trial. Ann. Rheumat. Dis., 25: 107-113, 1966.[Medline]
  15. Hill, A. B.: Principles of Medical Statistics. Ed. 9, pp. 288-296. New York, Oxford University Press, 1971.
  16. Hulten, O.: Über anatomische Variationen der Handgelenkknochen. Acta Orthop. Scandinavica, 9: 155-196, 1928.
  17. Jacobsen, M.: Against Popperized epidemiology. Internat. J. Epidemiol., 5: 9-11, 1976.[Abstract/Free Full Text]
  18. Karagas, M. R.; Lu-Yao, G. L.; Barrett, J. A.; Beach, M. L.; and Baron, J. A.: Heterogeneity of hip fracture: age, race, sex, and geographic patterns of femoral neck and trochanteric fractures among the US elderly. Am. J. Epidemiol., 143: 677-682, 1996.[Abstract/Free Full Text]
  19. Katz, J. N.; Larson, M. G.; Fossel, A. H.; and Liang, M. H.: Validation of a surveillance case definition of carpal tunnel syndrome. Am. J. Pub. Health, 81: 189-193, 1991.[Abstract/Free Full Text]
  20. Keller, R. B.; Rudicel, S. A.; and Liang, M. H.: Outcomes research in orthopaedics. J. Bone and Joint Surg., 75-A: 1562-1574, Oct. 1993.[Free Full Text]
  21. Kelsey, J. L., and Hardy, R. J.: Driving of motor vehicles as a risk factor for acute herniated lumbar intervertebral disc. Am. J. Epidemiol., 102: 63-73, 1975.[Abstract/Free Full Text]
  22. Kleinbaum, D. G.; Kupper, L. L.; and Morgenstern, H.: Epidemiologic Research: Principles and Quantitative Methods. Belmont, California, Lifetime Learning Publications, 1982.
  23. Kuhn, T. S.: The Structure of Scientific Revolutions. Ed. 2. Chicago, University of Chicago Press, 1970.
  24. Kuzma, J. W.: Basic Statistics for the Health Sciences. Ed. 2. Mountain View, California, Mayfield Publishing, 1992.
  25. Lang, T. A., and Rhodes, R.: Ten common statistical reporting errors in biomedical literature. CBE Views, 19: 82-83, 1996.
  26. Last, J. M. [editor]: A Dictionary of Epidemiology. Ed. 2. New York, Oxford University Press, 1988.
  27. Maclure, M.: Popperian refutation in epidemiology. Am. J. Epidemiol., 121: 343-350, 1985.[Free Full Text]
  28. Moseley, C. F.: A straight-line graph for leg-length discrepancies. J. Bone and Joint Surg., 59-A: 174-179, March 1977.[Abstract/Free Full Text]
  29. Oakes, M.: Statistical Inference. Chestnut Hill, Massachusetts, Epidemiology Resources, 1990.
  30. Pearce, N., and Crawford-Brown, D.: Critical discussion in epidemiology: problems with the Popperian approach. J. Clin. Epidemiol., 42: 177-184, 1989.[Medline]
  31. Pickering, G. W.: Opportunity and the universities. Lancet, 2: 895-898, 1952.[Medline]
  32. Robinson, W. S.: Ecological correlations and the behavior of individuals. Am. Sociol. Rev., 15: 351-357, 1950.
  33. Rosner, B.: Fundamentals of Biostatistics. Ed. 4. Belmont, California, Duxbury Press, 1995.
  34. Rothman, K. J.: Causes. Am. J. Epidemiol., 104: 587-592, 1976.[Free Full Text]
  35. Rudicel, S., and Esdaile, J.: The randomized clinical trial in orthopaedics: obligation or option?. J. Bone and Joint Surg., 67-A: 1284-1293, Oct. 1985.[Abstract/Free Full Text]
  36. Sackett, D. L.: Bias in analytic research. J. Chronic Dis., 32: 51-63, 1979.[Medline]
  37. Seltser, R., and Sartwell, P. E.: The influence of occupational exposure to radiation on the mortality of American radiologists and other medical specialists. Am. J. Epidemiol., 81: 2-22, 1965.[Free Full Text]
  38. Shy, C. M.: The failure of academic epidemiology: witness for the prosecution. Am. J. Epidemiol., 145: 479-487, 1997.[Abstract/Free Full Text]
  39. Silverstein, B. A.; Fine, L. J.; and Armstrong, T. J.: Occupational factors and carpal tunnel syndrome. Am. J. Indust. Med., 11: 343-358, 1987.[Medline]
  40. Susser, M.: Judgment and causal inference: criteria in epidemiologic studies. Am. J. Epidemiol., 105: 1-15, 1977.[Free Full Text]
  41. Susser, M.: The logic of Sir Karl Popper and the practice of epidemiology. Am. J. Epidemiol., 124: 711-718, 1986.[Free Full Text]
  42. Susser, M.: Rules of inference in epidemiology. Reg. Toxicol. and Pharmacol., 6: 116-128, 1986.
  43. Susser, M.: Falsification, verification and causal inference in epidemiology: reconsiderations in the light of Sir Karl Popper's philosophy. In Causal Inference, pp. 33-57. Edited by K. J. Rothman. Chestnut Hill, Massachusetts, Epidemiology Resources, 1988.
  44. Susser, M.: What is a cause and how do we know one? A grammar for pragmatic epidemiology. Am. J. Epidemiol., 133: 635-648, 1991.[Abstract/Free Full Text]
  45. Zeger, S. L.: Statistical reasoning in epidemiology. Am. J. Epidemiol., 134: 1062-1066, 1991.[Free Full Text]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
JBJSHome page
R. T. Loder, T. Starnes, and G. Dikos
Atypical and Typical (Idiopathic) Slipped Capital Femoral Epiphysis. Reconfirmation of the Age-Weight Test and Description of the Height and Age-Height Tests
J. Bone Joint Surg. Am., July 1, 2006; 88(7): 1574 - 1581.
[Abstract] [Full Text] [PDF]


Home page
J Am Acad Orthop SurgHome page
K. P. Spindler, J. E. Kuhn, W. Dunn, C. E. Matthews, F. E. Harrell Jr, and R. S. Dittus
Reading and Reviewing the Orthopaedic Literature: A Systematic, Evidence-based Medicine Approach
J. Am. Acad. Ortho. Surg., July 1, 2005; 13(4): 220 - 229.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF) Free
Right arrow Letters to the Editor: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Letters to the Editor are posted
Right arrow Alert me if a correction is posted
Services
Right arrow E-mail this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrowReprints and Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by SZABO, R. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by SZABO, R. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Technorati  
What's this?