Tech by VICE

Know Your Language: C Rules Everything Around Me (Part One)

A brief history of the programming language to end all programming languages.

by Michael Byrne
Sep 6 2015, 10:30am

C.

C is everywhere and in everything. C powers the Mars Curiosity rover, every computer operating system, every mobile OS, the Java Virtual Machine, Google Chrome, ATM machines, the computers in your car, the computers in your robot surgeon, the computers that designed the robot surgeon, the computers that designed those computers, and, eventually, C powers itself as its own implementation language.

When techno-human civilization has finally collapsed, perhaps the result of a nuclear war programmed in C or the result of a bacterial superstrain isolated by software implemented in C, and we have been returned to our caves to gnaw bones and fight over rotten meat, there will still be a program written in C executing somewhere.

All of this isn't just because a lot of people really like coding in C, though it's been estimated that almost 20 percent of all coders use the language (see below). C is far deeper than what we normally think of when we think of "programming language." There are languages that we consider to be more or less foundational—Java, Python, Ruby, Lisp, etc.—which are the very general-purpose languages. C is also general-purpose programming language, but the difference is that C has become the de facto language of machines themselves, whether it's a five dollar microcontroller or a deep-space probe.

A single Know Your Language treatment of C would be dangerously incomplete. A 10 minute read on Processing should equate to about a six hour read on C, at least. Even across two or maybe three, I still won't be providing quite that much, but I'll at least be getting closer to a proper primer on computing's unassuming alpha language.

So, we should start at the beginning. Where did C come from? How did it become the force of nature that it is and almost certainly will continue to be?

Dumb luck? Timing? Prescience?

Need, mostly.

Ken Thompson (sitting) and Dennis Ritchie at PDP-11. Image: Peter Hamer

Moving Unix

Through the 1960s and early 1970s, the concept of software portability was still pretty foreign, as C author Dennis Ritchie writes in a 1993 paper recounting the language's development. If you were to write a program, it would be for one specific variety of computer. If someone came up with a new computer design and a new variety of computer, then running the Unix operating system on it, for example, would basically mean rewriting Unix in its entirety.

Portability nowadays is so extreme that we'll be able to run Windows 10 (in particular) on anything from my Lenovo IdeaPad to a tablet/phone to a Raspberry Pi and potentially an entire universe of very small, very cheap embedded devices. Windows 10 may prove to be the de facto operating system of the Internet of Things, which, even if the IoT were remotely comprehensible to computer scientists in 1972, would be miraculous in its ability to fit whatever possible computing machine we might dream up well into the future.

The first version of Unix was written in 1969 by Ken Thompson, who now works for Google, itself another example of extreme portability via Chrome and ChromeOS. This OG Unix was written for the DEC PDP-7, a $72,000 "minicomputer" roughly the size of a dorm room wall. It was written in a the PDP-7's specific assembly instructions, which means that it was written for what was basically the literal representation of that specific machine.

The result offered a simple command-line interface along with a compiler and interpreter. The coup of this Unix version was that it was self-supporting: software could be written and debugged within the target computer. Previously, the Unix-less PDP-7 required engineers to program on a different variety of computer, the GE-635, print it all out on paper tape, and haul it to the target computer for testing and verification.

Now, the PDP-7 could be used to develop its own programs, and, what's more, it could be used to continue development on its own Unix operating system.

"Thompson's PDP-7 assembler outdid even DEC's in simplicity," Ritchie recalls, "It evaluated expressions and emitted the corresponding bits. There were no libraries, no loader or link editor: the entire source of a program was presented to the assembler, and the output file with a fixed name that emerged was directly executable."

Almost immediately, Malcolm Douglas McIlroy, a Bell Labs engineer who would go on to become one of the crucial figures in Unix development, had written a high-level language for Unix, called TMG. It was a short-lived tool intended to be used as a "compiler-compiler," e.g. a tool meant to write new compilers, the crucial meta-programs that convert higher-level languages (like Fortran, at the time) into assembly language.

TMG inspired Thompson to create a system programming language (SPL) for the then still-unnamed Unix. An SPL is meant to program the software that exists to interface with the machine itself: operating systems, utilities, drivers, compilers. It's an intermediary between any other programming language and the actual guts.

A system programming language can be imagined as existing one level of abstraction away from assembly and then machine instructions. But, unlike an assembly language, it gets to be machine-agnostic. As an abstraction, it operates on and within an idealized computer system, a rough schematic or sketch of what a computer is and does. This isn't the same thing as a virtual machine, a la Java, it's an actual machine-language correspondence, but one that doesn't have to specify the details of the machine, which can be done as the system itself compiles the system language into assembly language.

You could say that C is a language of computation itself—just the right amount of abstraction to maintain universality, but with the ability to communicate with hardware.

It is the why of C—its elegance, persistence, and ubiquity—which looks a lot like the why of computers themselves, and we'll be looking at it in much more detail later on.

PDP-7. Image: Wiki

A, B, C

So, Thompson wrote B. B was/is a descendant of another language called BCPL, which was/is another compiler-compiler, which can also be viewed as a sort of system programming language. It lives between the compiler, the layer that converts more abstract code into assembly code, and the great wild rest of the computer.

B wasn't around for very long before Ritchie finished C. The new language was basically just B with the addition of "types." In B, all data was the same—big integers, small integers, characters, floating point (non-whole) numbers—and referencing a variable in it just meant referencing a location in memory, whatever that location actually contained. In practice, this is not unlike how contemporary interpreted languages like JavaScript handle data, but it was limiting in that every variable declaration required a single set amount of memory, no matter how much was actually required.

In C, however, a programmer can specify whether this or that unit of data needs just a single byte of storage or up to eight or more bytes. This was a new need, as newer computers were able manipulate data at the level of individual bytes rather than the two-byte packages known as words. C was there at just the right time for the transition.

Thrilling, right? Data types.

It's a bit dry, but also completely crucial to what C has become. B viewed memory as a collection of uniformly sized "cells," which equated to a hardware machine "word," or two bytes of data. Newer and more capable machines emerging circa 1970, particularly the PDP-11, made this idea increasingly silly, requiring elaborate procedures for editing and manipulating strings of related data. The first step forward came in 1971 via NB or "new B," a fleeting language crafted by Ritchie that added a small handful of data types.

These were "int" and "char," used to store whole numbers and whole numbers corresponding to characters (like letters), respectively. NB allowed programmers to specify not just individual letters and numbers, but lists of them. So, if I wanted to store this blog post into memory, I could specify a single variable with a single name, but make that single variable correspond to a list of many thousands of consecutive characters. These single variable names corresponded to memory addresses within the machine itself, with the result allowing a flexible yet machine-literal layer of data organization.

A few other things happened in the transition from B to C—more generalizable data types, direct translatability to assembly language—and it became clear that Ritchie had gone beyond B and "NB" into something very different. C.

Once the language had been named, things happened very fast and C started to look much more like the C we use today, including the addition of boolean operators like && and ||. These are basically just tests, where I can specific two different expressions and put them together with a boolean operator and, in the case of &&, ask the computer to return "true" if both expressions are true and false if not. With ||, I can ask the computer to return true if one or both are true. These are extremely crucial programming building blocks for not just C but most any language since.

Also around this time, the concept of a "preprocessor" arose. Now, a programmer could include entire other files and, thus, libraries full of code with a simple "#include" statement. They could also now define macros, which in the beginning were just ways of writing short dictionaries of sorts, where one could #define some name to correspond to a small fragment of code. The preprocessor is conceptually just a way of telling the compiler to do some extra junk before actually compiling the program.

Again: a crucial and very general programming tool. It was a level of organization ready for the eye-crossing sprawls of code that software would become.

Image: Paul Downey/Flickr/Flickr

The White Book

Circa 1972 and 1973, the foundation of C had been set. Ritchie rewrote the Unix kernel for the PDP-11 in C, but others managed to rewrite the C compiler for different machines.

"Although successful compilers for the IBM System/370 and other machines were based on it, much of the modification effort in each case, particularly in the early stages, was concerned with ridding it of assumptions about the PDP-11," Ritchie and Thompson wrote in a 1978 Bell Labs paper, "Portability of C Programs and the UNIX System." "Even before the idea of moving UNIX occurred to us, it was clear that C was successful enough to warrant production of compilers for an increasing variety of machines."

As new compilers were released for new machines, and the usage of C exploded accordingly, the language's first libraries were released, including its first input/output library, which remain C's standard I/O routines. Arguably, what pushed C into its ubiquitous future was a simple matter of documentation, which came in the form of Ritchie's textbook, The C Programming Language, dubbed the "white book," what was and remains the language's authoritative guide and entry point. It's also the origin of "hello, world":

In the coming decade, C compilers were written for mainframe computers, minicomputers, microcomputers, and, eventually, the IBM PC. In 1983, the American National Standards Institute (ANSI) starting working on a standard specification for C, with the result ratified finally in 1989 as ANSI C. This basically enshrined a One True C, where programmers could be assured that the language would behave the same no matter how or where it was implemented. ANSI C was the first definitive C.

A definitive C looks a lot like the abstract definition of a computer. I could write a computer program right now that manipulates the individual bits within a memory location within an actual machine, whatever the machine, using standard C operators. I can say with some reasonable certainty how my C code will be translated into machine instructions and I can use that knowledge to write faster and leaner code, or I can use that knowledge to write entirely new programming languages. An alphabet that writes new alphabets: A,B,C.

Read more Know Your Language.