Embrace abstractions!

Abstraction is probably the most powerful concept in programming. Nevertheless, from time to time, I hear (or read) people questioning the value of abstractions. Here is my answer.

We are surrounded by abstractions

An abstraction is a simplified model (a mental model, if you will) that is used in place of a more complex thing.

We, as human beings, use abstractions all the time. There are everywhere around us. And this is particularly the case in Computer Science.

Think about a file. What is it but an abstraction? The operating system, the file system, the disk driver, and the disk controller, all of them contribute to providing a simplified model of what is actually going on.

Even the things that we, as software people, call hardware, are full of abstractions.

Think of a transistor. What is it but an abstraction? Its essential nature is a stack of tree layers of doped silicon, but we managed to find a way to describe its function in a simple and meaningful manner.

It is an undeniable fact that abstractions are everywhere around us and there is a good reason behind this: our brain needs them.

A remedy against complexity

Abstractions are an essential part of our life because there is a limit on the number of things we are able to keep in mind at a given time. This is often called the short-term memory. It's a well-documented fact that a human brain has only 3 to 5 of theses slots, depending on the individual.

By abstracting away the complexity of the things we use, we reduce the quantity of information we need to compose things. And that allows us to build projects that are bigger than what our brain could handle.

Leveraging abstractions is what will make your code stood the test of time; because, in the long run, what really matters is your ability to manage the complexity of large programs.

"Managing complexity is the most important technical topic in software development"

Steve McConnell

But it's not an excuse

Don't get me wrong. When I say "ignore how", I don't mean you don't have to understand what's behind the abstraction.

On the contrary, it means that even if you understand that complex thing, you deliberately decide to forget for a moment the complexity of it, so you can integrate into a bigger whole.

Think about the bipolar transistor again. You could have a mental model of a linear gain between the base and the collector currents. That abstraction will work for a while. But if it's all you know about a transistor, you'll sooner or later be disappointed: the gain is linear only in a certain range. Now, that doesn't mean that you have to throw the model away, it's still very useful, but you need to understand what it means and in which context it applies.

Meaningful abstractions

An abstraction must be meaningful to the application. Ideally, it should model an actual piece of the problem space and it should represent it in the code space.

Finding the right abstraction can take some effort. In particular, when you group things together or hide a big mess behind a SomethingManager or WhateverHelper, you're not creating abstractions, these are packages or modules at most.

Importance of names

An abstraction describes what is done, allowing us to ignore how it's done. And that's the rule to find a good name for an abstraction. Whether it's a class or a function, its name should say what it does (or models), and not how it's done.

By hiding how things are implemented, you reduce the coupling, because a caller doesn't have to depend on implementation details. You're now free to change the details without touching the calling sites.

When a good naming discipline is applied, the code is easier to debug: from its name, we know exactly what a function should do, so it's clear that we have a bug if the actual the behavior doesn't match.

"There are only two hard things in Computer Science: cache invalidation and naming things."

Phil Karlton

Don't cross the streams

It's vital to be consistent in the level of abstraction in every location of the code. For example, a particular function should only deal with details of one level of abstractions; lower-level details have to be handled by lower-level function.

You would lose the purpose of abstractions if you don't follow this rule. Because the reader would have to know the details of all layers at once. That's exactly what you want to avoid.

"Mixing levels of abstraction within a function is always confusing. Readers may not be able to tell whether a particular expression is an essential concept or a detail."

Robert C. Martin

Emergent Design

Finally, good abstractions can make a design/an architecture emerge:

  • things, which initially looked unrelated, progressively find their commonality;
  • large code files, which looks impossible to split, collapse one line after the other;
  • modules and libraries are extracted, one function after the other;
  • highly coupled modules agree on a common dependency;
  • the code become more pleasant to work with.

When working on legacy code, that's what I'm looking for. I want these "aha!" moments. You'll never have that breakthrough if you don't diligently extract abstractions and higher-level concepts.

Sometimes, you just can't

Software engineering is art of compromise. Sometimes, you can't write clean code. Sometimes, you need to sacrifice the clarity of the code to enable an optimization. Sometimes, you fail to find an expressive way of writing things.

For these situations (and only for theses situations), you have to write a comment. This comment must tell why the code is written like that.

A comment is an apology, it's your last resort, when you fail to find a proper way to write the thing, or when an explanation is needed. There is no shame in that, as long as you make sure the comment is useful.

Let's summarize:

  • the name says what
  • the code says how
  • the comment (if any) says why

A comment that says what or how is a smell. Erase it or refactor. Period.

An illustration

Let's look at a code sample to make the point clear. Here follows a caricatural, but however typical, C++14 function. The fact that this code is inefficient is totally irrelevant, what matter is the way it reads.

std::tuple<int, std::string> parse_filter_sort_return_last() {  
  std::vector<std::tuple<int, std::string>> buffer;

  // Open file and push_back in vector
  int weight;
  std::string color;
  std::ifstream infile("gb.txt");
  while (infile >> weight >> color) {
    buffer.emplace_back(weight, color);
  }

  // Remove non matching
  buffer.erase(std::remove_if(
                   buffer.begin(), buffer.end(),
                   [](auto &gb) { return std::get<std::string>(gb) != "red"; }),
               buffer.end());

  // Sort
  std::sort(buffer.begin(), buffer.end());

  return buffer.back();
}

Now, here is how I think it should be written:

gummy_bear get_biggest_red_gummy_bear() {  
   auto bears = read_gummy_bears();
   keep_only_red_gummy_bears(bears);
   return get_biggest_gummy_bear(bears);
}

Here are the improvements compared to the first version:

  • value are encapsulated in named structures,
  • function names are focusing on the what,
  • function codes are focusing on the how,
  • abstraction level is consistent,
  • lower level details of the algorithm is delegated to lower-level functions,
  • of course, no comment.

Note: I intentionally decided not to show object-oriented programming style so as to show that abstraction is not a concept related to OOP.

Analysis

I think it safe to assume that the second version is easier to read for everyone, including those who are not familiar with C++.

It conveys the intent of the programmer, so it's easy to see if something is not behaving as expected.

Functions are short (less than ten lines) and are focused on one task. Subtasks are delegated to lower-level functions. This greatly reduces the intellectual burden when reading the code.

It's easier to test. After extracting the independent functions, it's now possible to test them in isolation, i.e. unit-testing.

Those functions can now be reused. But, let's be very clear on this point: that was not your initial goal. Don't try to reuse too much, you may lose the meaning of your abstraction and introduce coupling.

"Though you should seldom design for reusability, you must be strict about keeping within the generic concept."

Eric Evans

Objections

To close this article, let's answer common objections made to clean code in general and abstractions in particular.

Objection 1: It hides the complexity of the code

Yes indeed, and that's on purpose. We want to implement complex programs with simple code, not the opposite.

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."

Martin Fowler.

Objection 2: It takes more time to write

Just a little longer. But the long-term benefits outweigh the small losses. Sometimes, you need to spend a few minutes now to save several hours in the future. See this as an investment.

Also remember, the code is written once, but it's read several times; so it's better to optimize the reading time than the writing time.

"It’s harder to read code than to write it."

Joel Spolsky

Objection 3: Comprehension is made difficult by too many indirections

This symptom is more typical of a program that uses too much of inheritance or callbacks (including lambdas of course), those can be very difficult to understand.

However, I don't believe that adding a new type or function to model the domain leads to that kind of mess. For my experience, having a code that mirrors the domain is very clear.

Objection 4: Clean code is incompatible with performance

This is, unfortunately, true; but it's very rare.

In a few situations, optimizations and clean code are not compatible. However, optimizations are often easier to see when the code well factored.

It's easy to optimize clean code, but it's hard to clean optimized code. That's why you should always start clean and then degrade the clarity if absolutely needed.

"Make it work. Make it right. Make it fast."

Kent Beck