Why Compilers Don’t Autocorrect “Obvious” Parse Errors

Reading Time: 7 minutes

Last month, someone on Twitter relayed a conversation with their 8 year old daughter, who is learning Python. The kid wants to know “If the computer knows I’m missing a semicolon here, why won’t it add it itself?”*

The face I imagine the 8 year old making at her code while asking this question.

*It turns out that the Twitter person meant “colon,” not semicolon, but for most programming languages, “semicolon” still exemplifies the point: if the compiler “knows” what the problem is, why doesn’t it just fix it?

I thought I’d try to answer this question in a way that an 8 year old might appreciate. It turns out a lot of people outside that age range also appreciate it, so I decided to make it better and blog about it.

Lemme start here: the computer runs a program to understand YOUR program.

That program is called a compiler. Python also has an interpreter: a program that runs your program. But this error is a compilation error: the compiler caught it while trying to understand your program.

Compilers can be very, very complicated. There are lots of opinions about how to write them. You can even find different compilers for the same language! For example the compiler your computer has for Python is probably the ‘standard’ one. We call it CPython because it is written in C, another programming language.

There is also Jython, written in Java (another programming language), and PyPy, a Python compiler written in Python! For now, we won’t get into how Pypy works. That’s a thing an 8 year old might enjoy looking up on her own.

Anyway, handling compilation errors.

Have you ever heard that phrase about how “Every happy family is the same, but every unhappy family is unhappy in their own way?” What it means is that there’s a lot more ways to get something wrong than there are to get it right.

And the number of people who write Python, globally, is 8.2 million!

Imagine that. Writing a compiler that has to catch the ways 8.2 million people mess up. A colon seems obvious, when it’s just you coding, and when the compiler is right about what’s wrong with your program.

But, imagine if it’s even 99% right when it catches a colon error (remember, it is hard to predict how things will go wrong. 99% is pretty good.)

Let’s pretend 8.2 million people each make one colon error a day.

8200000 * .01 = 82,000.

82,000 times the compiler is wrong, daily.

How bad is that?

Well, there are three ‘risk amplifiers’ to consider when you are deciding how to deal with things that could go wrong. Each one makes the risk ‘worse.’

1. It’s catastrophic (breaks very important things)
2. It’s likely (it happens a lot)
3. It’s insidious (it could go uncaught)

So in the case of the Python colon error, we’re talking about this number of 82,000 a day. That’s a made up number, but it illustrates the point that Python is used by enough people that even RARE compilation error mistakes happen pretty often.

They’re likely, in other words. Most of them are not too catastrophic, right? They’re easy to fix, most of the time, and even in the 82,000 cases where the compiler is mistaken about what, exactly, the programmer has done wrong, drawing attention to that area will help the programmer figure it out.

Now let’s talk about insidiousness. This is the most under-appreciated and, probably for that reason, often the most dangerous of the risk amplifiers. Example: NASA lost the Mars Climate Orbiter in 1999 because some of the hardware assumed English units, some of the software assumed metric units, and no one caught it until the thing was already lost.

Even in cases where errors are rare, we want mechanisms to catch them. The compiler error being wrong about the colon sometimes, even if it’s usually right, is already likely because of the sheer number of people writing colons in Python.

What if the compiler adds a colon when that’s the wrong thing…in a space launch program? There are a couple of programming languages that are notorious for this kind of thing: Most famously, JavaScript, and to some extent Ruby.

These languages will try with all their might to divine something runnable from what you wrote. How kind of them, right? But the thing is, that can make it really, really hard to figure out why your program is not working properly, because it’s still doing something. Just, it’s the wrong thing.

The wat video shows some funny, salient examples of this that have made it one of the most popular and long-lived “joke” conference talks I’ve ever seen.

How do we manage this risk?

So a common, risk-averse approach in compiler design is to surface compilation errors to the programmer, and let the programmer—you—figure out exactly what’s wrong.

Because, as smart as we compiler designers think we are, you, dear programmer, know your program better than we do. We think we know what’s wrong. We’re even pretty sure. But we don’t know, and we don’t assume.

We say:

“We think something is wrong here. We think you want a colon. But we want you to look at it, too. Because you might know something we don’t, and we don’t want to make your program wrong by accident.”

Here’s, an example of Python getting the colon compilation error wrong:

It says it wants a colon. The actual problem here is that I typo’d “and” to “nd.” But as you can see, there actually IS a colon in the right spot.

This’ll happen 81,999 more times today.

If the compiler tried to automatically add a colon, I’d have two colons and the code is even wronger.

Or it might do so over and over, resulting in a never-ending string of colons and a hung compiler! (“Hung” means “never finishes running and we have to shut it down manually”).

The compiler avoids causing those kinds of things by leaving it up to the programmer what the problem is, and accepting the ‘cost’ that in most cases, the programmer will be like ‘Ahp, yep, missing a comma. Lemme add that.’

If you liked this piece, you might also like:

This piece I wrote on what causes insidious bugs

This transcript of a talk I gave on analyzing application risk

This introductory series on compiler design

Why Compilers Don’t Autocorrect “Obvious” Parse Errors

Lemme start here: the computer runs a program to understand YOUR program.

Anyway, handling compilation errors.

How bad is that?

How do we manage this risk?

If you liked this piece, you might also like:

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112