The economics commentariat and no small part of the political debate in recent weeks has been consumed with the controversy surrounding the work of my Harvard colleagues (and friends) Carmen Reinhart and Ken Rogoff (RR). Their work had been widely interpreted as establishing that economic growth was likely to stagnate in a country once its government debt-to-GDP ratio exceeded 90 percent. Scholars at the University of Massachusetts have demonstrated and RR have acknowledged that they made a coding error that resulted in their omitting some relevant data in forming their results and also have noted that using updated data for several countries reduces substantially the strength of some of the statistical patterns they asserted. Issues have also arisen with respect to how RR weighted observations in forming the averages on which they base their conclusions.

Many have said that the questions raised undermine the claims of austerity advocates around the world that deficits should be quickly reduced. Some have gone so far as to blame RR for the unemployment of millions, asserting that they provided crucial intellectual ammunition for austerity policies. Others believe that even after re-analysis the data support the view that deficit and debt burden reduction is important in most of the industrialized world. Still others regard the controversy as calling into question the usefulness of statistical research on economic policy questions.

Where should these debates settle? From the perspective of someone who has done a fair amount of econometric research, consumed such research as a policymaker and participated as an advocate in debates about fiscal stimulus and austerity, here would be my takeaways.

First, the RR experience should accelerate the evolution of mores with respect to economic research. Rogoff and Reinhart are rightfully regarded as careful, honest scholars. Anyone close to the process of economic research will recognize that data errors like the ones they made are distressingly common.  Indeed the JP Morgan risk models in use when the London Whale trade was placed had errors not unlike those made by RR. In the future, authors and journals and commentators need to devote more effort to replicating significant results before broadcasting them widely. More generally, no important policy conclusion should ever be based solely on a single statistical result.  Policy judgments should be based on the accumulation of evidence from multiple studies done with differing methodological approaches. Even then, there should be a reluctance to accept conclusions from “models” without an intuitive understanding of what is driving them. It is right and understandable that scholars want their findings to inform the policy debate. But they have an obligation to discourage and on occasion contradict those who would oversimplify and exaggerate their conclusions.

Second, all participants in policy debates should retain a healthy skepticism about retrospective statistical analysis. Trillions of dollars have been lost and millions have been unemployed because the lesson was learned from 60 years of experience between 1945 and 2005 that “American house prices in aggregate always go up.” This was no data problem or misanalysis. It was a data regularity until it wasn’t. The extrapolation from past experience to future outlook is always deeply problematic and needs to be done with great care. In retrospect, it was folly to believe that with data on about 30 countries it was possible to estimate a threshold beyond which debt became dangerous. Even if such a threshold existed, why should it be the same in countries with and without their own currency, with very different financial systems, cultures, degrees of openness and growth experiences? And there is the chestnut that correlation does not establish causation and any tendency for high debt and low growth to go together reflects the debt accumulation that follows from slow growth.