Copyright © 2007 by The Journal of Bone and Joint Surgery, Inc.
Commentary & Perspective
Commentary & Perspective by
Jeffrey N. Katz, MD, MSc, and Elena Losina, PhD*,
Brigham and Women's Hospital, Harvard Medical School, and the Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
Posted August 2007
Cowan and colleagues report in this issue of The Journal that the great majority of
randomized controlled trials involving lateral epicondylitis received poor
scores when subjected to standard methods for rating the quality of randomized
controlled trials. The authors suggest that we rethink the sacred stature currently
accorded to the randomized controlled trial in the medical literature. This is
a provocative suggestion. The randomized controlled trial indeed sits atop the
hierarchy of study designs, and for good reason. Recall that high-quality
observational studies led a generation of women to take hormone replacement
therapy to prevent cardiac disease1,2. Then a randomized controlled
trial showed that, in fact, hormone replacement therapy raised—not lowered—the
risk of cardiac disease3. The observational studies were well
performed but incorrect.
Randomized controlled trials were originally introduced
after World War II and focused primarily on pharmacologic interventions4.
Decades later, the design is applied increasingly to surgery in general and to orthopaedic
surgery in particular. Why are orthopaedic trials so useful? They have several
critical methodological features, the most crucial of which is randomization.
In randomized controlled trials, patients are randomly assigned to treatment
groups. Randomization is intended to balance all patient characteristics across
the treatment groups. This includes recognized, quantifiable patient features,
such as age, sex, comorbidity, and baseline functional status, as well as
unmeasured or qualitative predictors of outcome, such as a positive outlook on life
and, perhaps, yet unknown genetic factors that are related to treatment
outcomes.
To optimize the benefit of randomization and the attendant
balanced allocation of prognostic factors, an intent-to-treat approach should
be used during the data-analysis stage of randomized controlled trials. This
strategy analyzes all patients in the group to which they were randomly
assigned, irrespective of whether they actually received the intended
treatment. Thus, in a trial that compares conservative therapy with surgery for
patients with meniscal disorders, the intention-to-treat analysis would include
in the nonoperative arm all patients
randomized to nonoperative therapy, even if a large number of them crossed over
and had surgery over the course of the trial. The extent of crossover may
depend on the complexity and risks associated with surgery. The Spine Patient
Outcomes Research Trial (SPORT) provides an important example of the critical
impact of crossover in the surgical trials5,6. The SPORT trials for
disc disease and for degenerative spondylolisthesis were launched because of
the virtual absence of trial data to aid patients in making informed decisions
with regard to treatment. Almost half of the patients randomly assigned to
surgery in both of these trials never underwent the procedure. Further, half of
the patients randomly assigned to nonoperative therapy crossed over to surgery.
These crossover subjects dramatically degrade the effective sample size,
rendering the randomized portions of the trials uninterpretable. It appears now
that the only interpretable information we can glean on comparative treatment
effects in SPORT is from the observational portions of the study5.
In observational studies, imbalances in measured
characteristics can be addressed with sophisticated statistical adjustment. But
we cannot adjust for features that we cannot measure or do not even recognize.
Whereas treatments are assigned, simplistically speaking, by the flip of a coin
in the randomized controlled trial, patients and their physicians choose the
treatments in observational studies. Factors that lead patients and physicians
to choose one treatment over another may also affect the outcome and bias the
study. For example, it may be that the subjects who took hormone replacement
therapy in the observational studies were more health conscious and avoided
cardiac risks more effectively than nonusers, resulting in an apparent
protective effect of hormone replacement therapy.
Another critical methodological aspect related to randomized
controlled trials is blinding. This is the procedure of masking treatment
assignments. In a single-blind study, the subject does not know what treatment
he or she received. In a double-blind study, neither the subject nor the
evaluators on the study team know what treatment any particular subject
received. The advantage of blinding is clear: it can reduce observer bias. The
disadvantages include the logistical difficulty of achieving blinding in many
settings and the ethical problems that may ensue. Surgical trials pose
particular challenges in this regard. In drug trials, placebo tablets that look
identical to the active agent facilitate blinding. In surgery, the same level
of concealment may require a sham procedure. Barriers to implementation of sham
surgery include ethical, operational, and financial considerations. For
example, Moseley et al. randomized patients with osteoarthritis of the knee to either
arthroscopic débridement or a sham procedure7. Lively discussion ensued
with regard to whether it is ethical to expose a patient to the small but real
risks of sham surgery.
Returning to the studies reviewed by Cowan and colleagues,
are the poorly rated trials so flawed as to be uninterpretable? The scoring
systems presented in Tables 1 and 2 of the paper by Cowan et al. penalize
trials for errors of both commission and omission and for errors in design as
well as in reporting. Points are subtracted if the trials did not blind
assessors or if there was poor subject retention over the follow-up period. These
are serious problems that could threaten validity. On the other hand, many of
the quality criteria penalize investigators for failing to report particular features rather than failing to implement them. This is a less serious
problem. In fact, many of these trials were published before the development of
a standardized reporting format for randomized controlled trials, the use of
which is now required by many journals8. Thus, failure to report
certain design features may have no bearing on the validity of the trial
findings.
Can a well-performed observational study provide the sort of
definitive statement that emerges from an excellent trial? Should we focus
resources on improving trials or on developing alternate designs? The two goals
are compatible with one another; we can and should nurture both trial and
cohort methodology. Indeed, some scientific objectives, such as long-term
treatment outcomes and risk factors for complications, can best be addressed
with longitudinal designs. But if the question is whether Treatment A works
better than Treatment B, there is no good substitute for the randomized
controlled trial.
Given the unique advantages of randomization and the many
unresolved treatment questions in orthopaedics, continued work on improving the
surgical randomized controlled trial seems an especially important priority for
orthopaedic investigation. Several problems are most pressing, including the
difficulty of blinding patients and providers in the surgical setting. Crossover
degrades the effective sample size and makes the intent-to-treat analysis
difficult to interpret. If the nonoperative group receives a highly
heterogeneous set of treatments, it is difficult to characterize the contrast
(surgery compared with what?). Negative trials provide further challenges in
that they are less likely to be published than positive trials, potentially
leading to publication bias. Similarly, trials with modest effect sizes may be
regarded as negative if they are underpowered and do not show statistically significant
differences. Yet meta-analyses of such trials may clarify true treatment
effects. Thus, further work on blinding, limiting crossovers, standardizing
nonoperative regimens, encouraging the publication of negative trials, and synthesizing
quantitative data across the literature through meta-analysis will be
especially fruitful starting points for strengthening the development and
impact of orthopaedic randomized controlled trials.
The work to be done cannot be accomplished by single
investigators or disciplines. Teams of clinical investigators and
methodologists, including biostatisticians, epidemiologists, and surgeons with
a sensitivity for both clinical and methodological concepts, are needed to
successfully apply randomized controlled trial design to surgical problems. We
have the personnel and tools to continue to move this research agenda forward.
The many unanswered questions in orthopaedic and musculoskeletal care impel us
to do so.
*The author did not receive any outside funding or grants in
support of his research for or preparation of this work. Neither he nor a
member of his immediate family received payments or other benefits or a
commitment or agreement to provide such benefits from a commercial entity. No
commercial entity paid or directed, or agreed to pay or direct, any benefits to
any research fund, foundation, division, center, clinical practice, or other
charitable or nonprofit organization with which the author, or a member of his
immediate family, is affiliated or associated.
References
1. Grodstein F, Manson JE, Stampfer MJ. Postmenopausal hormone use and secondary prevention of coronary events in the nurses' health study. A prospective, observational study. Ann Intern Med. 2001;135:1-8.
2. Stampfer MJ, Colditz GA, Willett WC, Manson JE, Rosner B, Speizer FE, Hennekens CH. Postmenopausal estrogen therapy and cardiovascular disease. Ten-year follow-up from the nurses' health study. N Engl J Med. 1991;325:756-62.
3. Manson JE, Hsia J, Johnson KC, Rossouw JE, Assaf AR, Lasser NL, Trevisan M, Black HR, Heckbert SR, Detrano R, Strickland OL, Wong ND, Crouse JR, Stein E, Cushman M; Women's Health Initiative Investigators. Estrogen plus progestin and the risk of coronary heart disease. N Engl J Med. 2003;349:523-34.
4. Medical Research Council, Streptomycin in Tuberculosis Trials Committee. Streptomycin treatment of pulmonary tuberculosis. Br Med J. 1948;2:769-83.
5. Weinstein JN, Lurie JD, Tosteson TD, Hanscom B, Tosteson AN, Blood EA, Birkmeyer NJ, Hilibrand AS, Herkowitz H, Cammisa FP, Albert TJ, Emery SE, Lenke LG, Abdu WA, Longley M, Errico TJ, Hu SS. Surgical versus nonsurgical treatment for lumbar degenerative spondylolisthesis. N Engl J Med. 2007;356:2257-70.
6. Weinstein JN, Tosteson TD, Lurie JD, Tosteson AN, Hanscom B, Skinner JS, Abdu WA, Hilibrand AS, Boden SD, Deyo RA. Surgical vs nonoperative treatment for lumbar disk herniation: the Spine Patient Outcomes Research Trial (SPORT): a randomized trial. JAMA. 2006;296:2441-50.
7. Moseley JB, O'Malley K, Petersen NJ, Menke TJ, Brody BA, Kuykendall DH, Hollingsworth JC, Ashton CM, Wray NP. A controlled trial of arthroscopic surgery for osteoarthritis of the knee. N Engl J Med. 2002;347:81-8.
8. Moher D, Schulz KF, Altman D; CONSORT Group (Consolidated Standards of Reporting Trials). The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA. 2001;285:1987-91.
|