*by Unlearning Economics*

Appeared originally at *Inside the Reform of Economics*, 07 October 2015

Econometrics is the statistical wing of economics and as such is typically thought to represent a more data-driven, empirical approach than economic theory.

As such, the question in the title might strike many as absurd. Surely any approach which relies first and foremost on data cannot be deductive by definition? However, although econometrics uses data, it is not necessarily characterised by a ‘data first’ approach, but by a ‘model first’ approach, both in education and often in applied research. It is therefore not unreasonable to label it deductive in nature, at least as it is currently practised.

The definition of deductive is *“characterized by or based on the inference of particular instances from a general law”*, and in this sense econometrics fits the bill. Econometrics classes begin by introducing the canonical linear regression model:

Where *y* is an outcome we are interested in, *x* is a variable(s) we think affects this outcome, β is some measure of the effect of *x *on *y *and *u* is an ‘error term’, which allows for individual random variation around this average effect. The basic modelling process consists of finding a value for β in a modelling equation like this. A standard example is where *x *is education and *y *is wages: if someone’s education increases, by how much do we expect their wages to increase, if at all?

There are a few different ways to justify/derive the linear regression model, but the basic ‘classical’ method – which rests on a set of assumptions sometimes known as the ‘Gauss-Markov’ (GM) assumptions – is the simplest way to do so. These assumptions are abstract and technical, but they essentially add up to the proposition that the linear regression model can be used and is well-behaved enough for students to perform run-of-the-mill statistical inference (e.g. is there actually an effect, or is β=0?).

The optimisation problem implied by this model yields a formula for β into which we can input our real world data for *y* and *x*. This gives us our value for β and therefore pinpoints the effect of *x *on *y*, assuming that the GM assumptions are satisfied. In other words, econometrics asks: *If *our model of the world is true, then what does the data tell us about a specific part of the model? It assumes that the world behaves a certain way () and then uses that general assumption to derive particular conclusions about the data in the form of β. Based on the above definition, this sounds pretty deductive to me.

But wait – aren’t these initial assumptions just pedagogical tools which are dropped, sometimes later on in the same class, in order to make the model more realistic?

Well, yes and no. Yes, some of the assumptions are dropped; no, none of the assumptions which are dropped change the fundamental methods dictated by the model. In fact, virtually every method that tries to address one of the Gauss-Markov assumptions does so by ‘rescuing’ the linear regression model. Have unobservable variables biasing your results? Well, if you’ve got panel data, then just *assume the* *unobservables are also linear*, and subtract them out! If you haven’t got panel data, then try to find a variable *z* which you *assume has a linear relationship with your x*, and run *two *linear regressions instead of one! Having problems with heteroskedasticity, serial correlation and clustering of error terms? Just ‘correct‘ your variance-covariance matrix and proceed with statistical inference as usual. No matter how many assumptions are dropped, the linear model, interpretation of it and general statistical practice always remain in some form.

Yet as with any model, there is no reason to believe the linear model is true. The burden of proof lies on those who claim it explains the world rather than those who do not. And it is fairly easy to construct scenarios where linear regression will give a misleading view of the world: Christopher Achen has demonstrated that even mild non-linearities in simple models can wildly disrupt econometric estimates, yielding a β that is not only the wrong size, but has the wrong sign.

This may not be true in every case, but it becomes a problem when virtually every question is thought to be answerable using *some* form of linear regression, often with little extensive investigation of whether or not the assumptions for regression are fulfilled. It is even worse when ’empirical evidence’ is equated with ‘linear regression’, seemingly without any realisation that this supposedly empirical approach is as model-driven as it is data-driven. The fact that non-parametric models (economists’ fancy word for ‘not assuming incredibly rigid properties of the world beforehand’) exist does not change the fact that linear regression is dominant, or the pedagogy produced by the fixation on this particular model.

In many ways I am not attacking linear regression itself; I am attacking it because it is instilled in the minds of students (and therefore future economists) as the default setting for applied empirical investigation. For example, the popular textbook *Introductory Econometrics* by Jeffrey Wooldridge starts with the chapter ‘The Simple Regression Model’, and the next 17 chapters explore ‘modifications’ to the model in various contexts of the type outlined above. It’s not until the final chapter ‘Carrying Out an Empirical Project’ that we get some hint of the student learning to work with data; but again this is heavily focused on linear regression. Similarly, if I look at the most recent issue of the *Journal of Applied Economics**,* four out of six of the empirical papers use linear regression techniques that could be found in Wooldridge, while the remaining two still contain elements of linear modelling in places.

Whether or not these approaches are appropriate in some circumstances (and they surely are) is not the point. The problem is that econometrics as currently practised starts by assuming the world takes a certain form; furthermore, whenever it tries to drop this assumption it just ends up assuming further, similar properties of the world. This is not unlike the approach in economic theory, which starts with a perfectly competitive market and then explores ‘frictions’ and ‘deviations’ from this ideal. The fundamental point that the world may not share any characteristics with a linear (or perfectly competitive) model as a baseline is inconceivable within this framework. Now, this doesn’t mean that practitioners actually *believe* the world is linear, but it means they are not habitually, rigorously thinking through the implications if it is not. It means that students have had the model instilled in their heads first, and they will be more inclined to try and tweak the core model than to abandon it completely when faced with unusual data.

The predictable cry from the castle will be *“but we have to start with linear regression, because it’s the simplest case”*. But this is just not true. Sure: if you insist on using regression methods as your primary tools, linear regression is probably the ‘simplest’. But much like with economic theory, the idea that this model is ‘simple’ is only true relative to the complexity of more advanced models – it doesn’t mean regression in general is easy to understand. Regression is actually quite an advanced method which can only be used if the data possess certain properties, and carries with it a lot of theoretical baggage that is tough to grasp in an introductory class. The result is that undergraduate and graduate students sit through unnecessary extended matrix manipulations to derive test statistics, a purely theoretical exercise that rests on the deductive assumptions of the linear model. This is complicated and pushes out time that could be spent working with actual data.

A more inductive approach to statistics would emphasise things like the collection and processing of data; understanding how data can exhibit certain patterns; and learning to test whether or not data is coming from a particular distribution (the linear model is a special case where both the *x *and the *y* above come from a joint normal distribution). The latter approach is something that is used often in physics: for example, this blog post on econophysics asks whether income is being generated by distinct processes at the top versus the bottom of the income distribution, an interesting question by many counts, but not the type of approach typically considered in econometrics.

**Conclusion**

Econometrics analyses data, but it does so primarily from the perspective of one type of model. Where the shortcomings of this model are dealt with, they are often done in a way that does not alter the main fundamental assumptions (linearity) or practice (significance tests, interpretation of β) of the econometrician. Thus, as with economic theory, econometrics proceeds from the general to the particular, putting certain presuppositions front and centre of students’ (and therefore future economists’) minds when they turn to the empirical evidence. Econometrics can therefore be fairly characterised as deductive as it is currently practised.

There are a few different ways to justify/derive the linear regression model, but the basic ‘classical’ method – which rests on a set of assumptions sometimes known as the ‘Gauss-Markov’ (GM) assumptions – is the simplest way to do so. These assumptions are abstract and technical, but they essentially add up to the proposition that the linear regression model can be used and is well-behaved enough for students to perform run-of-the-mill statistical inference (e.g. is there actually an effect, or is β=0?).

**Econometrics analyses data, but it does so primarily from the perspective of one type of model. Where the shortcomings of this model are dealt with, they are often done in a way that does not alter the main fundamental assumptions (linearity) or practice (significance tests, interpretation of β) of the econometrician.**