Social construction is not fact—and it is not fiction

July 30, JDN 2457965

With the possible exception of politically-charged issues (especially lately in the US), most people are fairly good at distinguishing between true and false, fact and fiction. But there are certain types of ideas that can’t be neatly categorized into fact versus fiction.

First, there are subjective feelings. You can feel angry, or afraid, or sad—and really, truly feel that way—despite having no objective basis for the emotion coming from the external world. Such emotions are usually irrational, but even knowing that doesn’t make them automatically disappear. Distinguishing subjective feelings from objective facts is simple in principle, but often difficult in practice: A great many things simply “feel true” despite being utterly false. (Ask an average American which is more likely to kill them, a terrorist or the car in their garage; I bet quite a few will get the wrong answer. Indeed, if you ask them whether they’re more likely to be shot by someone else or to shoot themselves, almost literally every gun owner is going to get that answer wrong—or they wouldn’t be gun owners.)

The one I really want to focus on today is social constructions. This is a term that has been so thoroughly overused and abused by postmodernist academics (“science is a social construction”, “love is a social construction”, “math is a social construction”, “sex is a social construction”, etc.) that it has almost lost its meaning. Indeed, many people now react with automatic aversion to the term; upon hearing it, they immediately assume—understandably—that whatever is about to follow is nonsense.

But there is actually a very important core meaning to the term “social construction” that we stand to lose if we throw it away entirely. A social construction is something that exists only because we all believe in it.

Every part of that definition is important:

First, a social construction is something that exists: It’s really there, objectively. If you think it doesn’t exist, you’re wrong. It even has objective properties; you can be right or wrong in your beliefs about it, even once you agree that it exists.

Second, a social construction only exists because we all believe in it: If everyone in the world suddenly stopped believing in it, like Tinker Bell it would wink out of existence. The “we all” is important as well; a social construction doesn’t exist simply because one person, or a few people, believe in it—it requires a certain critical mass of society to believe in it. Of course, almost nothing is literally believed by everyone, so it’s more that a social construction exists insofar as people believe in it—and thus can attain a weaker or stronger kind of existence as beliefs change.

The combination of these two features makes social constructions a very weird sort of entity. They aren’t merely subjective beliefs; you can’t be wrong about what you are feeling right now (though you can certainly lie about it), but you can definitely be wrong about the social constructions of your society. But we can’t all be wrong about the social constructions of our society; once enough of our society stops believing in them, they will no longer exist. And when we have conflict over a social construction, its existence can become weaker or stronger—indeed, it can exist to some of us but not to others.

If all this sounds very bizarre and reminds you of postmodernist nonsense that might come from the Wisdom of Chopra randomizer, allow me to provide a concrete and indisputable example of a social construction that is vitally important to economics: Money.

The US dollar is a social construction. It has all sorts of well-defined objective properties, from its purchasing power in the market to its exchange rate with other currencies (also all social constructions). The markets in which it is spent are social constructions. The laws which regulate those markets are social constructions. The government which makes those laws is a social construction.

But it is not social constructions all the way down. The paper upon which the dollar was printed is a physical object with objective factual existence. It is an artifact—it was made by humans, and wouldn’t exist if we didn’t—but now that we’ve made it, it exists and would continue to exist regardless of whether we believe in it or even whether we continue to exist. The cotton from which it was made is also partly artificial, bred over centuries from a lifeform that evolved over millions of years. But the carbon atoms inside that cotton were made in a star, and that star existed and fused its carbon billions of years before any life on Earth existed, much less humans in particular. This is why the statements “math is a social construction” and “science is a social construction” are so ridiculous. Okay, sure, the institutions of science and mathematics are social constructions, but that’s trivial; nobody would dispute that, and it’s not terribly interesting. (What, you mean if everyone stopped going to MIT, there would be no MIT!?) The truths of science and mathematics were true long before we were even here—indeed, the fundamental truths of mathematics could not have failed to be true in any possible universe.

But the US dollar did not exist before human beings created it, and unlike the physical paper, the purchasing power of that dollar (which is, after all, mainly what we care about) is entirely socially constructed. If everyone in the world suddenly stopped accepting US dollars as money, the US dollar would cease to be money. If even a few million people in the US suddenly stopped accepting dollars, its value would become much more precarious, and inflation would be sure to follow.

Nor is this simply because the US dollar is a fiat currency. That makes it more obvious, to be sure; a fiat currency attains its value solely through social construction, as its physical object has negligible value. But even when we were on the gold standard, our currency was representative; the paper itself was still equally worthless. If you wanted gold, you’d have to exchange for it; and that process of exchange is entirely social construction.

And what about gold coins, one of the oldest form of money? There now the physical object might actually be useful for something, but not all that much. It’s shiny, you can make jewelry out of it, it doesn’t corrode, it can be used to replace lost teeth, it has anti-inflammatory properties—and millennia later we found out that its dense nucleus is useful for particle accelerator experiments and it is a very reliable electrical conductor useful for making microchips. But all in all, gold is really not that useful. If gold were priced based on its true usefulness, it would be extraordinarily cheap; cheaper than water, for sure, as it’s much less useful than water. Yet very few cultures have ever used water as currency (though some have used salt). Thus, most of the value of gold is itself socially constructed; you value gold not to use it, but to impress other people with the fact that you own it (or indeed to sell it to them). Stranded alone on a desert island, you’d do anything for fresh water, but gold means nothing to you. And a gold coin actually takes on additional socially-constructed value; gold coins almost always had seignorage, additional value the government received from minting them over and above the market price of the gold itself.

Economics, in fact, is largely about social constructions; or rather I should say it’s about the process of producing and distributing artifacts by means of social constructions. Artifacts like houses, cars, computers, and toasters; social constructions like money, bonds, deeds, policies, rights, corporations, and governments. Of course, there are also services, which are not quite artifacts since they stop existing when we stop doing them—though, crucially, not when we stop believing in them; your waiter still delivered your lunch even if you persist in the delusion that the lunch is not there. And there are natural resources, which existed before us (and may or may not exist after us). But these are corner cases; mostly economics is about using laws and money to distribute goods, which means using social constructions to distribute artifacts.

Other very important social constructions include race and gender. Not melanin and sex, mind you; human beings have real, biological variation in skin tone and body shape. But the concept of a race—especially the race categories we ordinarily use—is socially constructed. Nothing biological forced us to regard Kenyan and Burkinabe as the same “race” while Ainu and Navajo are different “races”; indeed, the genetic data is screaming at us in the opposite direction. Humans are sexually dimorphic, with some rare exceptions (only about 0.02% of people are intersex; about 0.3% are transgender; and no more than 5% have sex chromosome abnormalities). But the much thicker concept of gender that comes with a whole system of norms and attitudes is all socially constructed.

It’s one thing to say that perhaps males are, on average, more genetically predisposed to be systematizers than females, and thus men are more attracted to engineering and women to nursing. That could, in fact, be true, though the evidence remains quite weak. It’s quite another to say that women must not be engineers, even if they want to be, and men must not be nurses—yet the latter was, until very recently, the quite explicit and enforced norm. Standards of clothing are even more obviously socially-constructed; in Western cultures (except the Celts, for some reason), flared garments are “dresses” and hence “feminine”; in East Asian cultures, flared garments such as kimono are gender-neutral, and gender is instead expressed through clothing by subtler aspects such as being fastened on the left instead of the right. In a thousand different ways, we mark our gender by what we wear, how we speak, even how we walk—and what’s more, we enforce those gender markings. It’s not simply that males typically speak in lower pitches (which does actually have a biological basis); it’s that males who speak in higher pitches are seen as less of a man, and that is a bad thing. We have a very strict hierarchy, which is imposed in almost every culture: It is best to be a man, worse to be a woman who acts like a woman, worse still to be a woman who acts like a man, and worst of all to be a man who acts like a woman. What it means to “act like a man” or “act like a woman” varies substantially; but the core hierarchy persists.

Social constructions like these ones are in fact some of the most important things in our lives. Human beings are uniquely social animals, and we define our meaning and purpose in life largely through social constructions.

It can be tempting, therefore, to be cynical about this, and say that our lives are built around what is not real—that is, fiction. But while this may be true for religious fanatics who honestly believe that some supernatural being will reward them for their acts of devotion, it is not a fair or accurate description of someone who makes comparable sacrifices for “the United States” or “free speech” or “liberty”. These are social constructions, not fictions. They really do exist. Indeed, it is only because we are willing to make sacrifices to maintain them that they continue to exist. Free speech isn’t maintained by us saying things we want to say; it is maintained by us allowing other people to say things we don’t want to hear. Liberty is not protected by us doing whatever we feel like, but by not doing things we would be tempted to do that impose upon other people’s freedom. If in our cynicism we act as though these things are fictions, they may soon become so.

But it would be a lot easier to get this across to people, I think, if folks would stop saying idiotic things like “science is a social construction”.

Argumentum ab scientia is not argumentum baculo: The difference between authority and expertise

May 7, JDN 2457881

Americans are, on the whole, suspicious of authority. This is a very good thing; it shields us against authoritarianism. But it comes with a major downside, which is a tendency to forget the distinction between authority and expertise.

Argument from authority is an informal fallacy, argumentum baculo. The fact that something was said by the Pope, or the President, or the General Secretary of the UN, doesn’t make it true. (Aside: You’re probably more familiar with the phrase argumentum ad baculum, which is terrible Latin. That would mean “argument toward a stick”, when clearly the intended meaning was “argument by means of a stick”, which is argumentum baculo.)

But argument from expertise, argumentum ab scientia, is something quite different. The world is much too complicated for any one person to know everything about everything, so we have no choice but to specialize our knowledge, each of us becoming an expert in only a few things. So if you are not an expert in a subject, when someone who is an expert in that subject tells you something about that subject, you should probably believe them.

You should especially be prepared to believe them when the entire community of experts is in consensus or near-consensus on a topic. The scientific consensus on climate change is absolutely overwhelming. Is this a reason to believe in climate change? You’re damn right it is. Unless you have years of education and experience in understanding climate models and atmospheric data, you have no basis for challenging the expert consensus on this issue.

This confusion has created a deep current of anti-intellectualism in our culture, as Isaac Asimov famously recognized:

There is a cult of ignorance in the United States, and there always has been. The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that “my ignorance is just as good as your knowledge.”

This is also important to understand if you have heterodox views on any scientific topic. The fact that the whole field disagrees with you does not prove that you are wrong—but it does make it quite likely that you are wrong. Cranks often want to compare themselves to Galileo or Einstein, but here’s the thing: Galileo and Einstein didn’t act like cranks. They didn’t expect the scientific community to respect their ideas before they had gathered compelling evidence in their favor.

When behavioral economists found that neoclassical models of human behavior didn’t stand up to scrutiny, did they shout from the rooftops that economics is all a lie? No, they published their research in peer-reviewed journals, and talked with economists about the implications of their results. There may have been times when they felt ignored or disrespected by the mainstream, but they pressed on, because the data was on their side. And ultimately, the mainstream gave in: Daniel Kahneman won the Nobel Prize in Economics.

Experts are not always right, that is true. But they are usually right, and if you think they are wrong you’d better have a good reason to think so. The best reasons are the sort that come about when you yourself have spent the time and effort to become an expert, able to challenge the consensus on its own terms.

Admittedly, that is a very difficult thing to do—and more difficult than it should be. I have seen firsthand how difficult and painful the slow grind toward a PhD can be, and how many obstacles will get thrown in your way, ranging from nepotism and interdepartmental politics, to discrimination against women and minorities, to mismatches of interest between students and faculty, all the way to illness, mental health problems, and the slings and arrows of outrageous fortune in general. If you have particularly heterodox ideas, you may face particularly harsh barriers, and sometimes it behooves you to hold your tongue and toe the lie awhile.

But this is no excuse not to gain expertise. Even if academia itself is not available to you, we live in an age of unprecedented availability of information—it’s not called the Information Age for nothing. A sufficiently talented and dedicated autodidact can challenge the mainstream, if their ideas are truly good enough. (Perhaps the best example of this is the mathematician savant Srinivasa Ramanujan. But he’s… something else. I think he is about as far from the average genius as the average genius is from the average person.) No, that won’t be easy either. But if you are really serious about advancing human understanding rather than just rooting for your political team (read: tribe), you should be prepared to either take up the academic route or attack it as an autodidact from the outside.

In fact, most scientific fields are actually quite good about admitting what they don’t know. A total consensus that turns out to be wrong is actually a very rare phenomenon; much more common is a clash of multiple competing paradigms where one ultimately wins out, or they end up replaced by a totally new paradigm or some sort of synthesis. In almost all cases, the new paradigm wins not because it becomes fashionable or the ancien regime dies out (as Planck cynically claimed) but because overwhelming evidence is observed in its favor, often in the form of explaining some phenomenon that was previously impossible to understand. If your heterodox theory doesn’t do that, then it probably won’t win, because it doesn’t deserve to.

(Right now you might think of challenging me: Does my heterodox theory do that? Does the tribal paradigm explain things that either total selfishness or total altruism cannot? I think it’s pretty obvious that it does. I mean, you are familiar with a little thing called “racism”, aren’t you? There is no explanation for racism in neoclassical economics; to understand it at all you have to just impose it as an arbitrary term on the utility function. But at that point, why not throw in whatever you please? Maybe some people enjoy bashing their heads against walls, and other people take great pleasure in the taste of arsenic. Why would this particular self- (not to mention other-) destroying behavior be universal to all human societies?)

In practice, I think most people who challenge the mainstream consensus aren’t genuinely interested in finding out the truth—certainly not enough to actually go through the work of doing it. It’s a pattern you can see in a wide range of fringe views: Anti-vaxxers, 9/11 truthers, climate denialists, they all think the same way. The mainstream disagrees with my preconceived ideology, therefore the mainstream is some kind of global conspiracy to deceive us. The overwhelming evidence that vaccination is safe and (wildly) cost-effective, 9/11 was indeed perpetrated by Al Qaeda and neither planned nor anticipated by anyone in the US government , and the global climate is being changed by human greenhouse gas emissions—these things simply don’t matter to them, because it was never really about the truth. They knew the answer before they asked the question. Because their identity is wrapped up in that political ideology, they know it couldn’t possibly be otherwise, and no amount of evidence will change their mind.

How do we reach such people? That, I don’t know. I wish I did. But I can say this much: We can stop taking them seriously when they say that the overwhelming scientific consensus against them is just another “appeal to authority”. It’s not. It never was. It’s an argument from expertise—there are people who know this a lot better than you, and they think you’re wrong, so you’re probably wrong.

What good are macroeconomic models? How could they be better?

Dec 11, JDN 2457734

One thing that I don’t think most people know, but which immediately obvious to any student of economics at the college level or above, is that there is a veritable cornucopia of different macroeconomic models. There are growth models (the Solow model, the Harrod-Domar model, the Ramsey model), monetary policy models (IS-LM, aggregate demand-aggregate supply), trade models (the Mundell-Fleming model, the Heckscher-Ohlin model), large-scale computational models (dynamic stochastic general equilibrium, agent-based computational economics), and I could go on.

This immediately raises the question: What are all these models for? What good are they?

A cynical view might be that they aren’t useful at all, that this is all false mathematical precision which makes economics persuasive without making it accurate or useful. And with such a proliferation of models and contradictory conclusions, I can see why such a view would be tempting.

But many of these models are useful, at least in certain circumstances. They aren’t completely arbitrary. Indeed, one of the litmus tests of the last decade has been how well the models held up against the events of the Great Recession and following Second Depression. The Keynesian and cognitive/behavioral models did rather well, albeit with significant gaps and flaws. The Monetarist, Real Business Cycle, and most other neoclassical models failed miserably, as did Austrian and Marxist notions so fluid and ill-defined that I’m not sure they deserve to even be called “models”. So there is at least some empirical basis for deciding what assumptions we should be willing to use in our models. Yet even if we restrict ourselves to Keynesian and cognitive/behavioral models, there are still a great many to choose from, which often yield inconsistent results.

So let’s compare with a science that is uncontroversially successful: Physics. How do mathematical models in physics compare with mathematical models in economics?

Well, there are still a lot of models, first of all. There’s the Bohr model, the Schrodinger equation, the Dirac equation, Newtonian mechanics, Lagrangian mechanics, Bohmian mechanics, Maxwell’s equations, Faraday’s law, Coulomb’s law, the Einstein field equations, the Minkowsky metric, the Schwarzschild metric, the Rindler metric, Feynman-Wheeler theory, the Navier-Stokes equations, and so on. So a cornucopia of models is not inherently a bad thing.

Yet, there is something about physics models that makes them more reliable than economics models.

Partly it is that the systems physicists study are literally two dozen orders of magnitude or more smaller and simpler than the systems economists study. Their task is inherently easier than ours.

But it’s not just that; their models aren’t just simpler—actually they often aren’t. The Navier-Stokes equations are a lot more complicated than the Solow model. They’re also clearly a lot more accurate.

The feature that models in physics seem to have that models in economics do not is something we might call nesting, or maybe consistency. Models in physics don’t come out of nowhere; you can’t just make up your own new model based on whatever assumptions you like and then start using it—which you very much can do in economics. Models in physics are required to fit consistently with one another, and usually inside one another, in the following sense:

The Dirac equation strictly generalizes the Schrodinger equation, which strictly generalizes the Bohr model. Bohmian mechanics is consistent with quantum mechanics, which strictly generalizes Lagrangian mechanics, which generalizes Newtonian mechanics. The Einstein field equations are consistent with Maxwell’s equations and strictly generalize the Minkowsky, Schwarzschild, and Rindler metrics. Maxwell’s equations strictly generalize Faraday’s law and Coulomb’s law.
In other words, there are a small number of canonical models—the Dirac equation, Maxwell’s equations and the Einstein field equation, essentially—inside which all other models are nested. The simpler models like Coulomb’s law and Newtonian mechanics are not contradictory with these canonical models; they are contained within them, subject to certain constraints (such as macroscopic systems far below the speed of light).

This is something I wish more people understood (I blame Kuhn for confusing everyone about what paradigm shifts really entail); Einstein did not overturn Newton’s laws, he extended them to domains where they previously had failed to apply.

This is why it is sensible to say that certain theories in physics are true; they are the canonical models that underlie all known phenomena. Other models can be useful, but not because we are relativists about truth or anything like that; Newtonian physics is a very good approximation of the Einstein field equations at the scale of many phenomena we care about, and is also much more mathematically tractable. If we ever find ourselves in situations where Newton’s equations no longer apply—near a black hole, traveling near the speed of light—then we know we can fall back on the more complex canonical model; but when the simpler model works, there’s no reason not to use it.

There are still very serious gaps in the knowledge of physics; in particular, there is a fundamental gulf between quantum mechanics and the Einstein field equations that has been unresolved for decades. A solution to this “quantum gravity problem” would be essentially a guaranteed Nobel Prize. So even a canonical model can be flawed, and can be extended or improved upon; the result is then a new canonical model which we now regard as our best approximation to truth.

Yet the contrast with economics is still quite clear. We don’t have one or two or even ten canonical models to refer back to. We can’t say that the Solow model is an approximation of some greater canonical model that works for these purposes—because we don’t have that greater canonical model. We can’t say that agent-based computational economics is approximately right, because we have nothing to approximate it to.

I went into economics thinking that neoclassical economics needed a new paradigm. I have now realized something much more alarming: Neoclassical economics doesn’t really have a paradigm. Or if it does, it’s a very informal paradigm, one that is expressed by the arbitrary judgments of journal editors, not one that can be written down as a series of equations. We assume perfect rationality, except when we don’t. We assume constant returns to scale, except when that doesn’t work. We assume perfect competition, except when that doesn’t get the results we wanted. The agents in our models are infinite identical psychopaths, and they are exactly as rational as needed for the conclusion I want.

This is quite likely why there is so much disagreement within economics. When you can permute the parameters however you like with no regard to a canonical model, you can more or less draw whatever conclusion you want, especially if you aren’t tightly bound to empirical evidence. I know a great many economists who are sure that raising minimum wage results in large disemployment effects, because the models they believe in say that it must, even though the empirical evidence has been quite clear that these effects are small if they are present at all. If we had a canonical model of employment that we could calibrate to the empirical evidence, that couldn’t happen anymore; there would be a coefficient I could point to that would refute their argument. But when every new paper comes with a new model, there’s no way to do that; one set of assumptions is as good as another.

Indeed, as I mentioned in an earlier post, a remarkable number of economists seem to embrace this relativism. “There is no true model.” they say; “We do what is useful.” Recently I encountered a book by the eminent economist Deirdre McCloskey which, though I confess I haven’t read it in its entirety, appears to be trying to argue that economics is just a meaningless language game that doesn’t have or need to have any connection with actual reality. (If any of you have read it and think I’m misunderstanding it, please explain. As it is I haven’t bought it for a reason any economist should respect: I am disinclined to incentivize such writing.)

Creating such a canonical model would no doubt be extremely difficult. Indeed, it is a task that would require the combined efforts of hundreds of researchers and could take generations to achieve. The true equations that underlie the economy could be totally intractable even for our best computers. But quantum mechanics wasn’t built in a day, either. The key challenge here lies in convincing economists that this is something worth doing—that if we really want to be taken seriously as scientists we need to start acting like them. Scientists believe in truth, and they are trying to find it out. While not immune to tribalism or ideology or other human limitations, they resist them as fiercely as possible, always turning back to the evidence above all else. And in their combined strivings, they attempt to build a grand edifice, a universal theory to stand the test of time—a canonical model.

The replication crisis, and the future of science

Aug 27, JDN 2457628 [Sat]

After settling in a little bit in Irvine, I’m now ready to resume blogging, but for now it will be on a reduced schedule. I’ll release a new post every Saturday, at least for the time being.

Today’s post was chosen by Patreon vote, though only one person voted (this whole Patreon voting thing has not been as successful as I’d hoped). It’s about something we scientists really don’t like to talk about, but definitely need to: We are in the middle of a major crisis of scientific replication.

Whenever large studies are conducted attempting to replicate published scientific results, their ability to do so is almost always dismal.

Psychology is the one everyone likes to pick on, because their record is particularly bad. Only 39% of studies were really replicated with the published effect size, though a further 36% were at least qualitatively but not quantitatively similar. Yet economics has its own replication problem, and even medical research is not immune to replication failure.

It’s important not to overstate the crisis; the majority of scientific studies do at least qualitatively replicate. We are doing better than flipping a coin, which is better than one can say of financial forecasters.
There are three kinds of replication, and only one of them should be expected to give near-100% results. That kind is reanalysiswhen you take the same data and use the same methods, you absolutely should get the exact same results. I favor making reanalysis a routine requirement of publication; if we can’t get your results by applying your statistical methods to your data, then your paper needs revision before we can entrust it to publication. A number of papers have failed on reanalysis, which is absurd and embarrassing; the worst offender was probably Rogart-Reinhoff, which was used in public policy decisions around the world despite having spreadsheet errors.

The second kind is direct replication—when you do the exact same experiment again and see if you get the same result within error bounds. This kind of replication should work something like 90% of the time, but in fact works more like 60% of the time.

The third kind is conceptual replication—when you do a similar experiment designed to test the same phenomenon from a different perspective. This kind of replication should work something like 60% of the time, but actually only works about 20% of the time.

Economists are well equipped to understand and solve this crisis, because it’s not actually about science. It’s about incentives. I facepalm every time I see another article by an aggrieved statistician about the “misunderstanding” of p-values; no, scientist aren’t misunderstanding anything. They know damn well how p-values are supposed to work. So why do they keep using them wrong? Because their jobs depend on doing so.

The first key point to understand here is “publish or perish”; academics in an increasingly competitive system are required to publish their research in order to get tenure, and frequently required to get tenure in order to keep their jobs at all. (Or they could become adjuncts, who are paid one-fifth as much.)

The second is the fundamentally defective way our research journals are run (as I have discussed in a previous post). As private for-profit corporations whose primary interest is in raising more revenue, our research journals aren’t trying to publish what will genuinely advance scientific knowledge. They are trying to publish what will draw attention to themselves. It’s a similar flaw to what has arisen in our news media; they aren’t trying to convey the truth, they are trying to get ratings to draw advertisers. This is how you get hours of meaningless fluff about a missing airliner and then a single chyron scroll about a war in Congo or a flood in Indonesia. Research journals haven’t fallen quite so far because they have reputations to uphold in order to attract scientists to read them and publish in them; but still, their fundamental goal is and has always been to raise attention in order to raise revenue.

The best way to do that is to publish things that are interesting. But if a scientific finding is interesting, that means it is surprising. It has to be unexpected or unusual in some way. And above all, it has to be positive; you have to have actually found an effect. Except in very rare circumstances, the null result is never considered interesting. This adds up to making journals publish what is improbable.

In particular, it creates a perfect storm for the abuse of p-values. A p-value, roughly speaking, is the probability you would get the observed result if there were no effect at all—for instance, the probability that you’d observe this wage gap between men and women in your sample if in the real world men and women were paid the exact same wages. The standard heuristic is a p-value of 0.05; indeed, it has become so enshrined that it is almost an explicit condition of publication now. Your result must be less than 5% likely to happen if there is no real difference. But if you will only publish results that show a p-value of 0.05, then the papers that get published and read will only be the ones that found such p-values—which renders the p-values meaningless.

It was never particularly meaningful anyway; as we Bayesians have been trying to explain since time immemorial, it matters how likely your hypothesis was in the first place. For something like wage gaps where we’re reasonably sure, but maybe could be wrong, the p-value is not too unreasonable. But if the theory is almost certainly true (“does gravity fall off as the inverse square of distance?”), even a high p-value like 0.35 is still supportive, while if the theory is almost certainly false (“are human beings capable of precognition?”—actual study), even a tiny p-value like 0.001 is still basically irrelevant. We really should be using much more sophisticated inference techniques, but those are harder to do, and don’t provide the nice simple threshold of “Is it below 0.05?”

But okay, p-values can be useful in many cases—if they are used correctly and you see all the results. If you have effect X with p-values 0.03, 0.07, 0.01, 0.06, and 0.09, effect X is probably a real thing. If you have effect Y with p-values 0.04, 0.02, 0.29, 0.35, and 0.74, effect Y is probably not a real thing. But I’ve just set it up so that these would be published exactly the same. They each have two published papers with “statistically significant” results. The other papers never get published and therefore never get seen, so we throw away vital information. This is called the file drawer problem.

Researchers often have a lot of flexibility in designing their experiments. If their only goal were to find truth, they would use this flexibility to test a variety of scenarios and publish all the results, so they can be compared holistically. But that isn’t their only goal; they also care about keeping their jobs so they can pay rent and feed their families. And under our current system, the only way to ensure that you can do that is by publishing things, which basically means only including the parts that showed up as statistically significant—otherwise, journals aren’t interested. And so we get huge numbers of papers published that tell us basically nothing, because we set up such strong incentives for researchers to give misleading results.

The saddest part is that this could be easily fixed.

First, reduce the incentives to publish by finding other ways to evaluate the skill of academics—like teaching for goodness’ sake. Working papers are another good approach. Journals already get far more submissions than they know what to do with, and most of these papers will never be read by more than a handful of people. We don’t need more published findings, we need better published findings—so stop incentivizing mere publication and start finding ways to incentivize research quality.

Second, eliminate private for-profit research journals. Science should be done by government agencies and nonprofits, not for-profit corporations. (And yes, I would apply this to pharmaceutical companies as well, which should really be pharmaceutical manufacturers who make cheap drugs based off of academic research and carry small profit margins.) Why? Again, it’s all about incentives. Corporations have no reason to want to find truth and every reason to want to tilt it in their favor.

Third, increase the number of tenured faculty positions. Instead of building so many new grand edifices to please your plutocratic donors, use your (skyrocketing) tuition money to hire more professors so that you can teach more students better. You can find even more funds if you cut the salaries of your administrators and football coaches. Come on, universities; you are the one industry in the world where labor demand and labor supply are the same people a few years later. You have no excuse for not having the smoothest market clearing in the world. You should never have gluts or shortages.

Fourth, require pre-registration of research studies (as some branches of medicine already do). If the study is sound, an optimal rational agent shouldn’t care in the slightest whether it had a positive or negative result, and if our ape brains won’t let us think that way, we need to establish institutions to force it to happen. They shouldn’t even see the effect size and p-value before they make the decision to publish it; all they should care about is that the experiment makes sense and the proper procedure was conducted.
If we did all that, the replication crisis could be almost completely resolved, as the incentives would be realigned to more closely match the genuine search for truth.

Alas, I don’t see universities or governments or research journals having the political will to actually make such changes, which is very sad indeed.

The Expanse gets the science right—including the economics

JDN 2457502

Despite constantly working on half a dozen projects at once (literally—preparing to start my PhD, writing this blog, working at my day job, editing a novel, preparing to submit a nonfiction book, writing another nonfiction book with three of my friends as co-authors, and creating a card game—that’s seven actually), I do occasionally find time to do things for fun. One I’ve been doing lately is catching up on The Expanse on DVR (I’m about halfway through the first season so far).

If you’re not familiar with The Expanse, it has been fairly aptly described as Battlestar Galactica meets Game of Thrones, though I think that particular comparison misrepresents the tone and attitudes of the series, because both BG and GoT are so dark and cynical (“It’s a nice day… for a… red wedding!”). I think “Star Trek meets Game of Thrones” might be better actually—the extreme idealism of Star Trek would cancel out the extreme cynicism of Game of Thrones, with the result being a complex mix of idealism and cynicism that more accurately reflects the real world (a world where Mahatma Gandhi and Adolf Hitler lived at the same time). That complex, nuanced world (or should I say worlds?) is where The Expanse takes place. ST is also more geopolitical than BG and The Expanse is nothing if not geopolitical.

But The Expanse is not just psychologically realistic—it is also scientifically and economically realistic. It may in fact be the hardest science fiction I have ever encountered, and is definitely the hardest science fiction I’ve seen in a television show. (There are a few books that might be slightly harder, as well as some movies based on them.)

The only major scientific inaccuracy I’ve been able to find so far is the use of sound effects in space, and actually even these can be interpreted as reflecting an omniscient narrator perspective that would hear any sounds that anyone would hear, regardless of what planet or ship they might be on. The sounds the audience hears all seem to be sounds that someone would hear—there’s simply no particular person who would hear all of them. When people are actually thrown into hard vacuum, we don’t hear them make any noise.

Like Firefly (and for once I think The Expanse might actually be good enough to deserve that comparison), there is no FTL, no aliens, no superhuman AI. Human beings are bound within our own solar system, and travel between planets takes weeks or months depending on your energy budget. They actually show holograms projecting the trajectory of various spacecraft and the trajectories actually make good sense in terms of orbital mechanics. Finally screenwriters had the courage to give us the terrifying suspense and inevitability of an incoming nuclear missile rounding a nearby asteroid and intercepting your trajectory, where you have minutes to think about it but not nearly enough delta-v to get out of its blast radius. That is what space combat will be like, if we ever have space combat (as awesome as it is to watch, I strongly hope that we will not ever actually do it). Unlike what Star Trek would have you believe, space is not a 19th century ocean.

They do have stealth in space—but it requires technology that even to them is highly advanced. Moreover it appears to only work for relatively short periods and seems most effective against civilian vessels that would likely lack state-of-the-art sensors, both of which make it a lot more plausible.

Computers are more advanced in the 2200s then they were in the 2000s, but not radically so, at most a million times faster, about what we gained since the 1980s. I’m guessing a smartphone in The Expanse runs at a few petaflops. Essentially they’re banking on Moore’s Law finally dying sometime in the mid 21st century, but then, so am I. Perhaps a bit harder to swallow is that no one has figured out good enough heuristics to match human cognition; but then, human cognition is very tightly optimized.

Spacecraft don’t have artificial gravity except for the thrust of their engines, and people float around as they should when ships are freefalling. They actually deal with the fact that Mars and Ceres have lower gravity than Earth, and the kinds of health problems that result from this. (One thing I do wish they’d done is had the Martian cruiser set a cruising acceleration of Mars-g—about 38% Earth-g—that would feel awkward and dizzying to their Earther captives. Instead they basically seem to assume that Martians still like to use Earth-g for space transit, but that does make some sense in terms of both human health and simply transit time.) It doesn’t seem like people move around quite awkwardly enough in the very low gravity of Ceres—which should be only about 3% Earth-g—but they do establish that electromagnetic boots are ubiquitous and that could account for most of this.

They fight primarily with nuclear missiles and kinetic weapons, and the damage done by nuclear missiles is appropriately reduced by the fact that vacuum doesn’t transmit shockwaves. (Nuclear missiles would still be quite damaging in space by releasing large amounts of wide-spectrum radiation; but they wouldn’t cause the total devastation they do within atmosphere.) Oddly they decided not to go with laser weapons as far as I can tell, which actually seems to me like they’ve underestimated advancement; laser weapons have a number of advantages that would be particularly useful in space, once we can actually make them affordable and reliable enough for widespread deployment. There could also be a three-tier system, where missiles are used at long range, railguns at medium range, and lasers at short range. (Yes, short range—the increased speed of lasers would be only slight compared to a good railgun, and would be more than offset by the effect of diffraction. At orbital distances, a laser is a shotgun.) Then again, it could well work out that railguns are just better—depending on how vessels are structured, puncturing their hulls with kinetic rounds could well be more useful than burning them up with infrared lasers.

But I think what really struck me about the realism of The Expanse is how it even makes the society realistic (in a way that, say, Firefly really doesn’t—we wanted a Western and we got a Western!).

The only major offworld colonies are Mars and Ceres, both of which seem to be fairly well-established, probably originally colonized as much as a century ago. Different societies have formed on each world; Earth has largely united under the United Nations (one of the lead characters is an undersecretary for the UN), but meanwhile Mars has split off into its own independent nation (“Martian” is now an ethnicity like “German” rather than meaning “extraterrestrial”), and the asteroid belt colonists, while formally still under Earth’s government, think of themselves as a different culture (“Belters”) and are seeking independence. There are some fairly obvious—but deftly managed rather than heavy-handed—parallels between the Belter independence movement and real-world independence movements, particularly Palestine (it’s hard not to think of the PLO when they talk about the OPA). Both Mars and the Belt have their own languages, while Earth’s languages have largely coalesced around English as the language of politics and commerce. (If the latter seems implausible, I remind you that the majority of the Internet and all international air traffic control are in English.) English is the world’s lingua franca (which is a really bizarre turn of phrase because it’s the Latin for French).

There is some of the conniving and murdering of Game of Thrones, but it is at a much more subdued level, and all of the major factions display both merits and flaws. There is no clear hero and no clear villain, just conflict and misunderstanding between a variety of human beings each with their own good and bad qualities. There does seem to be a sense that the most idealistic characters suffer for their idealism much as the Starks often do, but unlike the Starks they usually survive and learn from the experience. Indeed, some of the most cynical also seem to suffer for their cynicism—in the episode I just finished, the grizzled UN Colonel assumed the worst of his adversary and ended up branded “the butcher of Anderson Station”.

Cost of living on Ceres is extraordinarily high because of the limited living space (the apartments look a lot like the tiny studios of New York or San Francisco), and above all the need to constantly import air and water from Earth. A central plot point in the first episode is that a ship carrying comet ice—i.e., water—to Ceres is lost in a surprise attack by unknown adversaries with advanced technology, and the result is a deepening of an already dire water shortage, exacerbating the Belter’s craving for rebellion.

Air and water are recyclable, so it wouldn’t be that literally every drink and every breath needs to be supplied from outside—indeed that would clearly be cost-prohibitive. But recycling is never perfect, and Ceres also appears to have a growing population, both of which would require a constant input of new resources to sustain. It makes perfect sense that the most powerful people on Ceres are billionaire tycoons who own water and air transport corporations.

The police on Ceres (of which another lead character is a detective) are well-intentioned but understaffed, underfunded and moderately corrupt, similar to what we seem to find in large inner-city police departments like the NYPD and LAPD. It felt completely right when they responded to an attempt to kill a police officer with absolutely overwhelming force and little regard for due process and procedure—for this is what real-world police departments almost always do.

But why colonize the asteroid belt at all? Mars is a whole planet, there is plenty there—and in The Expanse they are undergoing terraforming at a very plausible rate (there’s a moving scene where a Martian says to an Earther, “We’re trying to finish building our garden before you finish paving over yours.”). Mars has as much land as Earth, and it has water, abundant metals, and CO2 you could use to make air.Even just the frontier ambition could be enough to bring us to Mars.

But why go to Ceres? The explanation The Expanse offers is a very sensible one: Mining, particularly so-called “rare earth metals”. Gold and platinum might have been profitable to mine at first, but once they became plentiful the market would probably collapse or at least drop off to a level where they aren’t particularly expensive or interesting—because they aren’t useful for very much. But neodymium, scandium, and prometheum are all going to be in extremely high demand in a high-tech future based on nuclear-powered spacecraft, and given that we’re already running out of easily accessible deposits on Earth, by the 2200s there will probably be basically none left. The asteroid belt, however, will have plenty for centuries to come.

As a result Ceres is organized like a mining town, or perhaps an extractive petrostate (metallostate?); but due to lightspeed interplanetary communication—very important in the series—and some modicum of free speech it doesn’t appear to have attained more than a moderate level of corruption. This also seems realistic; the “end-of-history” thesis is often overstated, but the basic idea that some form of democracy and welfare-state capitalism is fast becoming the only viable model of governance does seem to be true, and that is almost certainly the model of governance we would export to other planets. In such a system corruption can only get so bad before it is shown on the mass media and people won’t take it anymore.

The show doesn’t deal much with absolute dollar (or whatever currency) numbers, which is probably wise; but nominal incomes on Ceres are likely extremely high even though the standard of living is quite poor, because the tiny living space and need to import air and water would make prices (literally?) astronomical. Most people on Ceres seem to have grown up there, but the initial attraction could have been something like the California Gold Rush, where rumors of spectacularly high incomes clashed with similarly spectacular expenses incurred upon arrival. “Become a millionaire!” “Oh, by the way, your utility bill this month is $112,000.”

Indeed, even the poor on Ceres don’t seem that poor, which is a very nice turn toward realism that a lot of other science fiction shows seem unprepared to make. In Firefly, the poor are poor—they can barely afford food and clothing, and have no modern conveniences whatsoever. (“Jaynestown”, perhaps my favorite episode, depicts this vividly.) But even the poor in the US today are rarely that poor; our minimalistic and half-hearted welfare state has a number of cracks one can fall through, but as long as you get the benefits you’re supposed to get you should be able to avoid starvation and homelessness. Similarly I find it hard to believe that any society with high enough productivity to routinely build interstellar spacecraft the way we build container ships would not have at least the kind of welfare state that provides for the most basic needs. Chronic dehydration is probably still a problem for Belters, because water would be too expensive to subsidize in this way; but they all seem to have fairly nice clothes, home appliances, and smartphones, and that seems right to me. At one point a character loses his arm, and the “cheap” solution is a cybernetic prosthetic—the “expensive” one would be to grow him a new arm. As today but perhaps even more so, poverty in The Expanse is really about inequality—the enormous power granted to those who have millions of times as much as others. (Another show that does this quite well, though is considerably softer as far as the physics, is Continuum. If I recall correctly, Alec Sadler in 2079 is literally a trillionaire.)

Mars also appears to be a democracy, and actually quite a thriving one. In many ways Mars appears to be surpassing Earth economically and technologically. This suggests that Mars was colonized with our best and brightest, but not necessarily; Australians have done quite well for themselves despite being founded as a penal colony. Mars colonization would also have a way of justifying their frontier idealism that no previous frontiers have granted: No indigenous people to displace, no local ecology to despoil, and no gifts from the surrounding environment. You really are working entirely out of your own hard work and know-how (and technology and funding from Earth of course) to establish a truly new world on the open and unspoiled frontier. You’re not naive or a hypocrite, it’s the real truth. That kind of realistic idealism could make the Martian Dream a success in ways even the American Dream never quite was.

In all it is a very compelling series, and should appeal to people like me who crave geopolitical nuance in fiction. But it also has its moments of huge space battles with exploding star cruisers, so there’s that.

What does correlation have to do with causation?

JDN 2457345

I’ve been thinking of expanding the topics of this blog into some basic statistics and econometrics. It has been said that there are “Lies, damn lies, and statistics”; but in fact it’s almost the opposite—there are truths, whole truths, and statistics. Almost everything in the world that we know—not merely guess, or suppose, or intuit, or believe, but actually know, with a quantifiable level of certainty—is done by means of statistics. All sciences are based on them, from physics (when they say the Higgs discovery is a “5-sigma event”, that’s a statistic) to psychology, ecology to economics. Far from being something we cannot trust, they are in a sense the only thing we can trust.

The reason it sometimes feels like we cannot trust statistics is that most people do not understand statistics very well; this creates opportunities for both accidental confusion and willful distortion. My hope is therefore to provide you with some of the basic statistical knowledge you need to combat the worst distortions and correct the worst confusions.

I wasn’t quite sure where to start on this quest, but I suppose I have to start somewhere. I figured I may as well start with an adage about statistics that I hear commonly abused: “Correlation does not imply causation.”

Taken at its original meaning, this is definitely true. Unfortunately, it can be easily abused or misunderstood.

In its original meaning, the formal sense of the word “imply” meaning logical implication, to “imply” something is an extremely strong statement. It means that you logically entail that result, that if the antecedent is true, the consequent must be true, on pain of logical contradiction. Logical implication is for most practical purposes synonymous with mathematical proof. (Unfortunately, it’s not quite synonymous, because of things like Gödel’s incompleteness theorems and Löb’s theorem.)

And indeed, correlation does not logically entail causation; it’s quite possible to have correlations without any causal connection whatsoever, simply by chance. One of my former professors liked to brag that from 1990 to 2010 whether or not she ate breakfast had a statistically significant positive correlation with that day’s closing price for the Dow Jones Industrial Average.

How is this possible? Did my professor actually somehow influence the stock market by eating breakfast? Of course not; if she could do that, she’d be a billionaire by now. And obviously the Dow’s price at 17:00 couldn’t influence whether she ate breakfast at 09:00. Could there be some common cause driving both of them, like the weather? I guess it’s possible; maybe in good weather she gets up earlier and people are in better moods so they buy more stocks. But the most likely reason for this correlation is much simpler than that: She tried a whole bunch of different combinations until she found two things that correlated. At the usual significance level of 0.05, on average you need to try about 20 combinations of totally unrelated things before two of them will show up as correlated. (My guess is she used a number of different stock indexes and varied the starting and ending year. That’s a way to generate a surprisingly large number of degrees of freedom without it seeming like you’re doing anything particularly nefarious.)

But how do we know they aren’t actually causally related? Well, I suppose we don’t. Especially if the universe is ultimately deterministic and nonlocal (as I’ve become increasingly convinced by the results of recent quantum experiments), any two data sets could be causally related somehow. But the point is they don’t have to be; you can pick any randomly-generated datasets, pair them up in 20 different ways, and odds are, one of those ways will show a statistically significant correlation.

All of that is true, and important to understand. Finding a correlation between eating grapefruit and getting breast cancer, or between liking bitter foods and being a psychopath, does not necessarily mean that there is any real causal link between the two. If we can replicate these results in a bunch of other studies, that would suggest that the link is real; but typically, such findings cannot be replicated. There is something deeply wrong with the way science journalists operate; they like to publish the new and exciting findings, which 9 times out of 10 turn out to be completely wrong. They never want to talk about the really important and fascinating things that we know are true because we’ve been confirming them over hundreds of different experiments, because that’s “old news”. The journalistic desire to be new and first fundamentally contradicts the scientific requirement of being replicated and confirmed.

So, yes, it’s quite possible to have a correlation that tells you absolutely nothing about causation.

But this is exceptional. In most cases, correlation actually tells you quite a bit about causation.

And this is why I don’t like the adage; “imply” has a very different meaning in common speech, meaning merely to suggest or evoke. Almost everything you say implies all sorts of things in this broader sense, some more strongly than others, even though it may logically entail none of them.

Correlation does in fact suggest causation. Like any suggestion, it can be overridden. If we know that 20 different combinations were tried until one finally yielded a correlation, we have reason to distrust that correlation. If we find a correlation between A and B but there is no logical way they can be connected, we infer that it is simply an odd coincidence.

But when we encounter any given correlation, there are three other scenarios which are far more likely than mere coincidence: A causes B, B causes A, or some other factor C causes A and B. These are also not mutually exclusive; they can all be true to some extent, and in many cases are.

A great deal of work in science, and particularly in economics, is based upon using correlation to infer causation, and has to be—because there is simply no alternative means of approaching the problem.

Yes, sometimes you can do randomized controlled experiments, and some really important new findings in behavioral economics and development economics have been made this way. Indeed, much of the work that I hope to do over the course of my career is based on randomized controlled experiments, because they truly are the foundation of scientific knowledge. But sometimes, that’s just not an option.

Let’s consider an example: In my master’s thesis I found a strong correlation between the level of corruption in a country (as estimated by the World Bank) and the proportion of that country’s income which goes to the top 0.01% of the population. Countries that have higher levels of corruption also tend to have a larger proportion of income that accrues to the top 0.01%. That correlation is a fact; it’s there. There’s no denying it. But where does it come from? That’s the real question.

Could it be pure coincidence? Well, maybe; but when it keeps showing up in several different models with different variables included, that becomes unlikely. A single p < 0.05 will happen about 1 in 20 times by chance; but five in a row should happen less than 1 in 1 million times (assuming they’re independent, which, to be fair, they usually aren’t).

Could it be some artifact of the measurement methods? It’s possible. In particular, I was concerned about the possibility of Halo Effect, in which people tend to assume that something which is better (or worse) in one way is automatically better (or worse) in other ways as well. People might think of their country as more corrupt simply because it has higher inequality, even if there is no real connection. But it would have taken a very large halo bias to explain this effect.

So, does corruption cause income inequality? It’s not hard to see how that might happen: More corrupt individuals could bribe leaders or exploit loopholes to make themselves extremely rich, and thereby increase inequality.

Does inequality cause corruption? This also makes some sense, since it’s a lot easier to bribe leaders and manipulate regulations when you have a lot of money to work with in the first place.

Does something else cause both corruption and inequality? Also quite plausible. Maybe some general cultural factors are involved, or certain economic policies lead to both corruption and inequality. I did try to control for such things, but I obviously couldn’t include all possible variables.

So, which way does the causation run? Unfortunately, I don’t know. I tried some clever statistical techniques to try to figure this out; in particular, I looked at which tends to come first—the corruption or the inequality—and whether they could be used to predict each other, a method called Granger causality. Those results were inconclusive, however. I could neither verify nor exclude a causal connection in either direction. But is there a causal connection? I think so. It’s too robust to just be coincidence. I simply don’t know whether A causes B, B causes A, or C causes A and B.

Imagine trying to do this same study as a randomized controlled experiment. Are we supposed to create two societies and flip a coin to decide which one we make more corrupt? Or which one we give more income inequality? Perhaps you could do some sort of experiment with a proxy for corruption (cheating on a test or something like that), and then have unequal payoffs in the experiment—but that is very far removed from how corruption actually works in the real world, and worse, it’s prohibitively expensive to make really life-altering income inequality within an experimental context. Sure, we can give one participant $1 and the other $1,000; but we can’t give one participant $10,000 and the other $10 million, and it’s the latter that we’re really talking about when we deal with real-world income inequality. I’m not opposed to doing such an experiment, but it can only tell us so much. At some point you need to actually test the validity of your theory in the real world, and for that we need to use statistical correlations.

Or think about macroeconomics; how exactly are you supposed to test a theory of the business cycle experimentally? I guess theoretically you could subject an entire country to a new monetary policy selected at random, but the consequences of being put into the wrong experimental group would be disastrous. Moreover, nobody is going to accept a random monetary policy democratically, so you’d have to introduce it against the will of the population, by some sort of tyranny or at least technocracy. Even if this is theoretically possible, it’s mind-bogglingly unethical.

Now, you might be thinking: But we do change real-world policies, right? Couldn’t we use those changes as a sort of “experiment”? Yes, absolutely; that’s called a quasi-experiment or a natural experiment. They are tremendously useful. But since they are not truly randomized, they aren’t quite experiments. Ultimately, everything you get out of a quasi-experiment is based on statistical correlations.

Thus, abuse of the adage “Correlation does not imply causation” can lead to ignoring whole subfields of science, because there is no realistic way of running experiments in those subfields. Sometimes, statistics are all we have to work with.

This is why I like to say it a little differently:

Correlation does not prove causation. But correlation definitely can suggest causation.

Why being a scientist means confronting your own ignorance

I read an essay today arguing that scientists should be stupid. Or more precisely, ignorant. Or even more precisely, they should recognize their ignorance when all others ignore and turn away.

What does it feel like to be wrong?

It doesn’t feel like anything. Most people are wrong most of the time without realizing it. (Explained brilliantly in this TED talk.)

What does it feel like to be proven wrong, to find out you were confused or ignorant?

It hurts, a great deal. And most people flinch away from this. They would rather continue being wrong than experience the feeling of being proven wrong.

But being proven wrong is the only way to become less wrong. Being proven ignorant is the only way to truly attain knowledge.

I once heard someone characterize the scientific temperament as “being comfortable not knowing”. No, no, no! Just the opposite, in fact. The unscientific temperament is being comfortable not knowing, being fine with your infinite ignorance as long as you can go about your day. The scientific temperament is being so deeply  uncomfortable not knowing that it overrides the discomfort everyone feels when their beliefs are proven wrong. It is to have a drive to actually know—not to think you know, not to feel as if you know, not to assume you know and never think about it, but to actually know—that is so strong it pushes you through all the pain and doubt and confusion of actually trying to find out.

An analogy I like to use is The Armor of Truth. Suppose you were presented with a piece of armor, The Armor of Truth, which is claimed to be indestructible. You will have the chance to wear this armor into battle; if it is indeed indestructible, you will be invincible and will surely prevail. But what if it isn’t? What if it has some weakness you aren’t aware of? Then it could fail and you could die.

How would you go about determining whether The Armor of Truth is really what it is claimed to be? Would you test it with things you expect it to survive? Would you brush it with feathers, pour glasses of water on it, poke it with your finger? Would you seek to confirm your belief in its indestructibility? (As confirmation bias would have you do?) No, you would test it with things you expect to destroy it; you’d hit it with everything you have. You’re fire machine guns at it, drop bombs on it, pour acid on it, place it in a nuclear testing site. You’d do everything you possibly could to falsify your belief in the armor’s indestructibility. And only when you failed, only after you had tried everything you could think of to destroy the armor and it remained undented and unscratched, would you begin to believe that it is truly indestructible. (Popper was exaggerating when he said all science is based on falsification; but he was not exaggerating very much.)

Science is The Armor of Truth, and we wear it into battle—but now the analogy begins to break down, for our beliefs are within us, they are part of us. We’d like to be able to point the machineguns at armor far away from us, but instead it is as if we are forced to wear the armor as the guns are fired. When a break in the armor is found and a bullet passes through—a belief we dearly held is proven false—it hurts us, and we wish we could find another way to test it. But we can’t; and if we fail to test it now, it will only endanger us later—confront a false belief with reality enough and it will eventually fail. A scientist is someone who accepts this and wears the armor bravely as the test guns blaze.

Being a scientist means confronting your own ignorance: Not accepting it, but also not ignoring it; confronting it. Facing it down. Conquering it. Destroying it.