May 31, 2010

What the Health Media Missed: JAMA Study Quantifies Reporting Spin

If you have been reading this blog for a while, you know that from time to time I pick out some company funded study that has turned into a media sound bite promoting the company's product and read it to see what it really found. Almost always close reading of these studies finds statistical abuse so blatant that one concludes that the peer reviewers who approved it for publication flunked Statistics 101.

Now a study in JAMA quantifies just how bad this statistical abuse really is. The study is,

Reporting and Interpretation of Randomized Controlled Trials With Statistically Nonsignificant Results for Primary Outcomes. Isabelle Boutron et al. JAMA 2010;303(20):2058-2064.

A "statistically nonsignificant result" is one that could be due entirely to chance. If I test the proposition that a coin toss is more likely to come up heads by tossing a coin five times no combination of heads and tails that emerges from this "experiment" would be statistically meaningful. You have to toss that coin a lot more times to get a meaningful result because the same coin that would come up heads 50% of the time over 1,000 coin tosses may easily come up heads three or even four times when tossed only five times.

So what this JAMA study was examining was how researchers reported studies in which the hypothesis being tested came up with a statistically meaningless result, for example, a study that asks whether Drug X decreases the number of heart attacks in some population which finds that the change in the number of heart attacks in the study group could be entirely due to chance.

What the JAMA study found was that in 72 studies where the primary outcome resulted in a statistically nonsignificant result there was significant "spin."

Spin was defined thus:
...specific reporting strategies, whatever their motive, to highlight that the experimental treatment is beneficial, despite a statistically nonsignificant difference for the primary outcome, or to distract the reader from statistically nonsignificant results

In plain English, "spin" means claiming some treatment works when the statistics show it does not.

How frequent was spin? The JAMA Study finds:
The title was reported with spin in 13 articles (18.0%)

Spin was identified in the Results and Conclusions sections of the abstracts of 27 (37.5%) and 42 (58.3%) reports, respectively, with the conclusions of 17 (23.6%) focusing only on treatment effectiveness.

Spin was identified in the main-text Results, Discussion, and Conclusions sections of 21 (29.2%), 31 (43.1%), and 36 (50.0%) reports, respectively.

More than 40% of the reports had spin in at least 2 of these sections in the main text.
So no, I am not paranoid when I assert that peer reviewers approve the publication of studies that claim results where none occurred, based on ignorance of how statistics work.

Some of the ways I've seen this kind of spin deployed are to report a study as showing that results "trended" towards the desired outcome. In the case of our coin toss, the coin tossed five times that comes up heads three times "trends" towards heads. This is meaningless, since if you toss it 1,000 times it comes up 50% heads and 50% tails.

When a drug "trends" towards reducing heart attacks because 103 people who took the drug had heart attacks while 105 people who didn't take the drug had heart attacks, you have the identical meaningless finding. You would need to see a much bigger difference in outcomes to determine that the drug was effective when testing it in such a small group.

But using the "trend" argument is a very common way that drug companies and supplement companies suggest to the statistically ignorant (including the peer reviewers) that their product works when it doesn't.

This is why so many claims made for years about vitamins, supplements and new drugs turn out to be wrong when someone attempts to confirm the result with a large, statistically meaningful study.

Beyond mistating the study result, a common strategy for a company trying to spin gold out of dross is to misapply a statistical technique to the meaningless result that amplifies it into a number that appears, to those who don't understand the GIGO principle, to be meaningful.

When there is a statistically meaningless finding for incidence of heart attack, use the inflationary measure, "risk of heart attack," and if that still doesn't give you a meaningful statistic, try measuring "change in risk." If risk (already a statistical amplification of incidence) drops a meaningless 2% in the drug group and 1% in the placebo group, we have a 50% "change in risk." That sounds very impressive to the math-flunkees who decide which studies get published, because I keep seeing just that kind of statistical slight of hand being used. But it is meaningless, because since the difference in incidence could be entirely due to chance and come out the other way if the study was repeated again any statistic derived from incidence is statistically meaningless, too.

This doesn't even get into the related issue that drug companies falsify results or run "controlled trials" where the two groups under study are not well matched. If you compare a sicker older placeo group to a younger healthier group who take a drug, your drug may look effective when it isn't.

And then, there is the issue of "controlling" for various cofactors. The notorious ACCORD study, long interpreted as meaning that tight control killed people with diabetes ignored the fact that the people in the tight control group who experienced bad outcomes did not, in fact, have tight control. There were more people with bad control in the "tight control" group than in the control group, so the slight increase in bad outcomes in that "tight control" group had nothing to do with tight control but was instead due to LACK of tight control.

Sadly hundreds of thousands of people with diabetes have been told by now that "tight control" is dangerous and that they should maintain A1cs of 7%, not 6% to be "safe." It will take years to undo the damage because busy health professionals don't have time to read all the health news and the debunking of ACCORD did not get the press that the original study received.

BOTTOM LINE: Your health and safety is being severely compromised by the peer reviewers who rubber stamp publication of company sponsored research that pretend that statistically meaningless results have meaning. The people who pay for this deceit and stupidity are the patients who end up taking expensive, dangerous, and worthless drugs.

Sadly, in a society that remains mostly math-illiterate, this isn't going to change any time soon.


Jack said...

Good stuff! Thanks.


jeffry said...

""spin" means claiming some treatment works when the statistics show it does not."

this is a mis-statement. absence of evidence is NOT evidence of absence. the lack of significant outcome does NOT mean the statistics show that the treatment doesn't work, it means that no effect was demonstrated. whether the treatment might work has been neither proven, nor disproven. i know you know this; but you need to careful in your wording.

thanks for the great blog.

Jenny said...


Technically that might be true, but so many of these studies are almost guaranteed to be insignificant based on study design. So my guess is that companies design cheap studies that be easily spun and only submit them for publication when the insignificant finding "trends" toward the result they want.

This is much cheaper and profit-enhancing than designing a study with a large enough sample size that the result is likely to be significant.

There have been other studies recently establishing the very high percentage of researchers who out and out falsify their results for me to give those who publish these deceptive studies the benefit of the doubt. My guess is that they prefer an insignificant result (carefully designed to be insignificant) than a better study that might destroy the profits of their product.

michael plunkett said...

great post. Everyone wants to knock newspaper journalists for inaccurate reporting and making much of headlines and sound bites, but... reporters are not experts in these fields. Newsrooms can no longer carry expert reporters anymore. Reporters rely on the experts reporting the findings, the same way a jurist relies on their staff to read cases and present a report. If the findings reported, as we know is the case, bogus- the news article will be inaccurate too. ALL studies should be registered with purpose before undergoing the study, not spin a report afterwords.

Unknown said...

I am interested in the "effect size". Statistical significance is not a stand-in for real life significance. A good read is "The cult of Statistical Significance" by Ziliac and McCloskey. You want your best estimate of the effect size and use your judgment about the
study design and conflicts of interest from there. If the effect size is small who cares? If it is large your have my interest.
Take care!

Scott S said...

I wish science and regulatory agencies were not for sale, but tragically, they are!! This is why we as patients need to be on top of things. Thanks for calling this to our attention.