Quantcast
Viewing latest article 8
Browse Latest Browse All 10

What it Takes to Make a Language “Object-Oriented” (or at least object-literate)

Reading Time: 11 minutes

I’m working my way through Crafting Interpreters. The project guides programmers through building their own interpreters for the Lox programming language. I started writing a blog series about my progress. You can see all the posts so far right here.

In a previous post covering Chapter 11, we added a resolver to the compiler to statically evaluate scope. We have made it now to Chapter 12: the one that implements classes.

I confess I felt intimidated walking into this chapter. Classes have to be a huge undertaking, right? They’re the fundamental concept that separates a “scripting language” from a fully-fledged object-capable programming language.

They are, in fact, a huge undertaking, but not for the reasons I predicted.

Image may be NSFW.
Clik here to view.

Classes in Context

It turns out that classes are just one of multiple ways to add object literacy to a language. You’ve got:

1. Classes. One glob of code (a class) contains the blueprint for a specific combination of state and behavior (usually, with the behavior acting on the state). Other snippets of code create individual versions of that blueprint (instances), each with their own state settings. Inheritance allows subclasses to add or change state and behavior on a superclass. This is the oldest and most popular approach and you see it in languages like Java, Python, and Ruby.

2. Prototypes. One glob of code contains a bunch of state—keys pointing to values—and those values can point to behaviors, too. Classes allow this, but prototypes remove the separation between behavior and state. Objects can delegate to other objects’ state, including the behaviors on there, in an inheritance-like fashion. Javascript and Lua do this; it’s perhaps most recognizable in the way that React components accept all kinds of named behaviors as constructor arguments (often used as callbacks later).

3. Multimethods. Functions work like variables: they live in specific scopes, and when they are called the compiler looks for them in concentric scopes until it finds one. You can overload a function name with different signatures, and it selects the most appropriate signature for the call occasion at runtime (as opposed to, say, Java, which does it at compile time, and as opposed to duck-typed languages that don’t include argument types in signatures to begin with). Bob wrote a language called Magpie that does it.

A Class-Ready Parser

As it turned out (at least in this book), parsing and resolving the syntax for class definitions worked more or less like parsing and resolution for the other constructs we’ve put into Lox. The first implementation choice I did not anticipate happened at the distinction between a LoxClass, which binds a class’s behavior:

  private final Map<String, LoxFunction> methods;

  LoxClass(String name, Map<String, LoxFunction> methods) {
    this.name = name;
    this.methods = methods;
  }

…and a LoxInstance, which accepts its blueprint LoxClass in its constructor and explicitly handles state separately from the behavior in the blueprint:

private LoxClass klass;

lox/LoxInstance.java
in class LoxInstance

  private final Map<String, Object> fields = new HashMap<>();

  LoxInstance(LoxClass klass) {

    Object get(Token name) {
       if (fields.containsKey(name.lexeme)) {
         return fields.get(name.lexeme);
       }

    throw new RuntimeError(name, 
        "Undefined property '" + name.lexeme + "'.");
     }
    }
}

the two get bound together when defining LoxFunctions on that class instance, together with its own scope and corresponding environment:

  LoxFunction bind(LoxInstance instance) {
    Environment environment = new Environment(closure);
    environment.define("this", instance);
    return new LoxFunction(declaration, environment);
  }

The distinction between behavior and state, not just as two things happening on the same blob of code, but as two things that live in two completely different blobs of code, helped me understand how objects differ from prototypes.

That said, later in the chapter, Lox ends up permitting programmers to blur this distinction between state and behavior. From the text:

Assuming that method in this example is a method on the class of object and not a field on the instance, what should the following piece of code do?

var m = object.method;
m(argument);

This program “looks up” the method and stores the result—whatever that is—in a variable and then calls that object later. Is this allowed? Can you treat a method like it’s a function on the instance?

class Box {}

fun notMethod(argument) {
  print "called function with " + argument;
}

var box = Box();
box.function = notMethod;
box.function("argument");

This program creates an instance and then stores a function in a field on it. Then it calls that function using the same syntax as a method call. Does that work?

Different languages have different answers to these questions. One could write a treatise on it. For Lox, we’ll say the answer to both of these is yes, it does work. We have a couple of reasons to justify that. For the second example—calling a function stored in a field—we want to support that because first-class functions are useful and storing them in fields is a perfectly normal thing to do.

Does that seem edge casey? It gets edge casier!

class Person {
  sayName() {
    print this.name;
  }
}

var jane = Person();
jane.name = "Jane";

var bill = Person();
bill.name = "Bill";

bill.sayName = jane.sayName;
bill.sayName(); // ?

Does that last line print “Bill” because that’s the instance that we called the method through, or “Jane” because it’s the instance where we first grabbed the method?

Equivalent code in Lua and JavaScript would print “Bill”. Those languages don’t really have a notion of “methods”. Everything is sort of functions-in-fields, so it’s not clear that jane “owns” sayName any more than bill does.

In other words, JavaScript doesn’t bind a unit of behavior to the object it lives in when it is created. Instead, any reference to this gets evaluated at runtime. The rules about how this evaluates in JavaScript at runtime smack a little of funhouse mirror:

Image may be NSFW.
Clik here to view.
Javascript: What is ‘this?’ Documentation from: https://www.w3schools.com/js/js_this.asp

But Lox does have the notion of methods, and instances own those methods. So the Lox compiler binds methods to their specific instance. In Lox, this gets assigned at compilation time. It’s a three step process:

  1. Class definition evaluation: make methods into LoxFunctions that reference the environment around the class.
  2. Method expression evaluation: Create an environment that binds a name (“this") to the object the method is referenced from, and then make a new LoxFunction with the same implementation code as step 1, but with the this environment instead of the global one.
  3. Call to the method: Create a new environment for the method body, and set its parent to the this environment.

The following slides (screenshots from the chapter) illustrate the three steps:

  • Image may be NSFW.
    Clik here to view.
  • Image may be NSFW.
    Clik here to view.
  • Image may be NSFW.
    Clik here to view.

“This?”

Languages that bind state to behavior via classes need a way for the behavior to reference the state, and different languages choose different ways to do that. Ruby uses various numbers of @ symbols to designate attributes as belonging to an instance. Python lets the programmer name it and passes it as the first argument to all instance methods, but canonically people name it self. Java uses this, and Lox also uses this.

So what is this? Well, first we make it a keyword in the parser so it cannot get used for a variable name or something. We even add error messages if it’s used outside a class, just like for return used outside a function).

Then, when we are binding functions to their class environments, we literally put a key-value pair in the class environment that binds the string “this” to the instance of the class. I ran into this two weeks ago while implementing the Chapter 11 challenge to make the Lox compiler throw an error for unused variables. The solution expects folks to make the change on the Chapter 11 version of the compiler, which doesn’t have classes yet. I made the change on the finished compiler, so I ended up needing to contend with this in my solution (I set the status of this, and also super, to DECLARED rather than DEFINED because they’re both automatically in the environment, so the compiler should not throw an error if one of those is not used).

Creating Classes

We need a way to create these instances we’ve wired up in the compiler. Bob, on constructors:

I find them one of the trickiest parts of a language to design, and if you peer closely at most other languages, you’ll see cracks around object construction where the seams of the design don’t quite fit together perfectly. Maybe there’s something intrinsically messy about the moment of birth.

A few examples: In Java, even though final fields must be initialized, it is still possible to read one before it has been. Exceptions—a huge, complex feature—were added to C++ mainly as a way to emit errors from constructors.

“Constructing” an object is actually a pair of operations:

  1. The runtime allocates the memory required for a fresh instance. In most languages, this operation is at a fundamental level beneath what user code is able to access. C++’s “placement new” is a rare example where the bowels of allocation are laid bare for the programmer to prod.
  2. Then, a user-provided chunk of code is called which initializes the unformed object.

The latter is what we tend to think of when we hear “constructor”, but the language itself has usually done some groundwork for us before we get to that point. In fact, our Lox interpreter already has that covered when it creates a new LoxInstance object.

We’ll do the remaining part—user-defined initialization—now. Languages have a variety of notations for the chunk of code that sets up a new object for a class. C++, Java, and C# use a method whose name matches the class name. Ruby and Python call it init(). The latter is nice and short, so we’ll do that.

As with methods and with this, naturally the first question Bob answers while implementing class initializers is “how might initializer syntax be misused?” and of course, the answer is “someone could declare a regular old method called init.” So we get a piece of code that explicitly stores whether a function is an initializer:

 
 private final boolean isInitializer;

 LoxFunction(Stmt.Function declaration, Environment closure,
              boolean isInitializer) {
    this.isInitializer = isInitializer;
    this.closure = closure;
    this.declaration = declaration;

I was curious about why we did this when I implemented the aforementioned Ch 11 challenge and had to go through and add this parameter to LoxFunctions to get the “variable not used” errors working on the finished compiler. I love how explicit this is: no inference, no reserved words, none of that. Just a cold boolean in the LoxFunction constructor that stores the initializerness as a piece of state.

Bob on Complexity

There’s a design note at the end of this chapter about the concept of simplicity.

Bob points out, astutely I think, that despite the fact that technologists consider simplicity to be this sacred grail of technical attainment, the reason more things aren’t simple is that simplicity includes a tradeoff. A simple implementation often does fewer things. If it doesn’t do the things you need done, the simple implementation has little use.

He instead presents “power” as the optimizing metric (as in “this is a powerful tool”) and describes it like this (mathematical description illustrates the relationships between the components and isn’t meant to be an accurate formula):

power = breadth × ease ÷ complexity

Where breadth (the number of things you can do) and ease (how little effort it takes to do them) both have a direct relationship to power, and complexity has an inverse relationship to power.

Stated differently, simplicity (the reciprocal of complexity) also has a direct relationship to power. The thing is, these variables aren’t all independent: breadth might require complexity, for example.

I’m hoping to explore this concept more thoroughly in future posts, but I wanted to give it a mention here in case the design note in this chapter might be of interest to others.

If you liked this post, you might also like…

The rest of the Crafting Interpreters series

This post about structural verification (I’m just figuring you’re into objectcraft, so)

This post about why use, or not use, an interface (specifically in Python because of the way Python does, or rather doesn’t exactly do, interfaces)


Viewing latest article 8
Browse Latest Browse All 10

Trending Articles