32-Bit Integers and Why Old Computers Matter
​Image: ​Pauli Rautakorpi/Wiki

FYI.

This story is over 5 years old.

Tech

32-Bit Integers and Why Old Computers Matter

Mutant Pac-Man, system architecture, and the integer overflow of time itself.

The number 256 is what broke the original arcade version of Pac-Man. As a game with no proper exit condition, Pac-Man relied on faith that players would eventually get tired of it before the 256th level. This was reasonable given that every single level after the 20th was just a repeat of level 20. But video games lend themselves to obsession like few things, even in 1980, so of course some players took it as a challenge—a test of endurance and concentration. Those that made it to 256 were in for a strange sight, what computer scientists would call "undefined behavior." This was the result: ​

Advertisement

Image: YouTube grab

They "found a surprise waiting for them on level 256," writes Jamey Pittman ​at the ​Pac-Man Dossier. "It was a surprise no one knew about—not even the developers at Namco."

The 256th level is a psychedelic Pac-Man wasteland. One half of the screen shows the normal game, but the other is a deluge of symbols and chaos. "Although both the player and the ghosts can navigate through the right half of the screen," Pittman writes, "the original maze walls no longer apply. Instead, Pac-Man must be guided through a confusing series of open areas, tunnels, one-way intersections, lone walls, and pass-throughs—all invisible to the player—while four ghosts are in hot pursuit."

The reason for this is integer overflow. Pittman explains further:

Why does this broken level happen in the first place? The culprit is the routine responsible for drawing the bonus symbols along the bottom edge of the screen. Here's what happens: when level 256 is reached, the internal level counter is incremented to 255 (the level counter starts at zero – not one) and the routine for drawing the bonus symbols is called. The routine loads the current level counter value (255) into a CPU register and increments that register by one. Unfortunately, 255 is the largest number that can fit in a single byte which is the size of the Z-80 CPU registers, so when the value is incremented the overflow is discarded leaving a zero in the register instead of the expected value of 256. This zero value leads the routine to believe this is an early level since its value is less than seven. The routine starts drawing bonus symbols using the confused register as a counter. At the end of every drawing loop, the register is decreased by one and then checked to see if it is zero (the signal for the routine to stop drawing symbols). Since the register already has a zero in it to start, the first decrement will roll the value back to 255. It will keep decrementing the register and drawing symbols until the register is reduced to zero again, causing the loop to run a total of 256 times. This means that memory locations outside the bounds of the bonus symbol table are drawn to the screen at increasing locations in video memory. This half-broken level was named the "split screen" by players; developers refer to it as a "kill screen".

Advertisement

For the Z-80 processor, 256 is where hardware meets software: a single-byte data type. (255 is the largest value that can be stored in 8 bits.) Or, more specifically, the limits of data types. What hardware is physically capable of representing.

At the very base level of data is the integer. This is the maximum numerical value that can be stored and manipulated (directly) within a given system architecture. The X-80's single byte came at a time of rapid computer evolution and, almost simultaneously, the first 32-bit (four byte) processors were entering the market. This size remains standard today. Even new 64 bit processors are designed with 32 bit backward compatibility in mind.

2,147,483,647

The number 2,147,483,647 is among the most important numbers in computer science (perhaps even its most important number). It corresponds to a data type more completely known as a 32-bit signed integer. Usually referenced in programming languages simply as an "int," such a value is constrained by a 32-bit allotment of binary representation, which is just what it sounds like: 32 ones and-or zeroes all packaged together. If we were to fill 31 of those slots with 1s—the 32nd being reserved as a "sign bit," telling the system whether the represented number is positive or negative (as there are no minus signs in binary)—the decimal number translation would be 2,147,483,647.

Computers have limits, even now. It doesn't seem like it very often, as even smartphones circa 2015 are supercomputers compared to the machines of 15 years or so ago. Some very-high level programming languages now don't even ask coders to name data types at all, instead inferring it from context. It's easy to let the actual machines fade away too. But 32 bits isn't arbitrary, even as those machines switch over to newjack 64 bit integers. 32 still rules.

Advertisement

32 started ruling circa 1979 with the release of the Motorola 68000 microprocessor. Most of the race during the 1970s had been from 8 bits to 16 bits, but the 16-bit era proved to be short lived as manufacturers built on the momentum of the proto-PC years. The 68000 went on to power Sun's UNIX-based powerhouses and early consumer machines like the Apple Lisa, Macintosh, and Commodore's Amiga computers. Supposedly, they can still be found in use.

But what do those bits actually mean? Hardware. Guts.

Image:  W Nowicki/Wiki

A CPU isn't quite the mystical device it sometimes seems. It's pretty much just wires—by now those wires are impossibly tiny, but they're still in there shuffling electrons around in the name of information. Within a microprocessor you'll find a tiny geography of addressable memory spaces, which are just arrangements of transistors; circuits that do stuff like add and shift sequences of bits (wires and switches); registers (memory spaces storing data as it's in the process of being computed); and, finally, buses, which exist to move memory addresses, instructions, and actual data around the CPU. Like a bus.

The physical entities that are buses and registers, in particular, are finite, which means that a bus can transport bit strings in units of 32 bits and a register can store only up to 32 ones and zeroes. Anything larger needs to be chopped up and stored as separate ints, which is accomplished quite literally.

You're welcome to try sending 33 bits with a 32 bit bus, but expect one of two things to happen. In some cases, the value might "wrap," which is just the same thing that happens to a car's odometer when it overflows. The most significant digit is trimmed off and the new value is wildly inaccurate. Here is the Wikipedia example of a situation where this might occur, in which two values individually within the range of 32-bit representation are added to result in a number larger than 32 bits: 4000000000u + 1000000000u = 705032704u. That's pretty wrong.

The alternative to this sort of value truncation is pretty interesting too. This is where instead of wrapping around, the overflow "saturates." This is a feature of a special case of arithmetic known as saturation arithmetic, where numbers are constrained to a specific range and if a number should happen to exceed that range, it changes to the largest number allowed within the given arithmetic. The process is called "clamping." So, the above equation would instead be: 4000000000u + 1000000000u = 2147483647u. That's not right either, but it's less wrong by quite a lot.

As I write this and as you read it, there is a very important clock ticking. This clock is itself a computer data type, known as time_t. This value changes every second and it's been counting upward since January 1, 1970. It allows UNIX-based systems to be able to easily compute current times and dates, which is pretty handy, just based on a built-in counter. time_t is implemented all over computing, but it has a problem. Yes, time_t is a 32 bit signed integer. What that means is eventually (3:14:07 UTC on Tuesday, January 19, 2038, to be exact), more than 2147483647 seconds will have elapsed.

Time itself will overflow and there's no great solution to the problem. Changing time_t, resetting it, will mean a whole lot of code will no longer be compatible with new code and this violates one of the most basic tenets of computing (and what's gotten us to where we are), which is backwards compatibility. Nuclear missiles probably won't start launching themselves, but it will be a mess and one quite a bit deeper than anything to do with Y2K. Tick tock.