Meet MUMPS, the Archaic Health-Care Programming Language That Predicted Big Data

The human body is an endless stream of numbers, from brain waves to blood cells to whatever else you can imagine or secrete, and most of those numbers are changing all of the time. You might even say that its potential for quantification, for producing raw data, is limitless—making that data useful, however, is a different matter entirely.

An ICU patient is monitored and assessed according to 12 different variables. These include such measurements as body temperature, heart rate, blood oxygenation, blood pH, and others. Together, they’re used to formulate a quantitative answer to the question, “How bad is it, doc?” Many of these physiological signs are measured in real-time via electrodes and like a billion different varieties of catheter. Add to it barrages of lab tests done multiple times per day per patient and the need for 20 or so clinicians (per patient) to have access to all of this data, and the result is very a deep data problem.

Videos by VICE

Multiply that data problem by hundreds of thousands of patients.

This is the fundamental problem that the programming language MUMPS (sometimes called just “M”), or the Massachusetts General Hospital Utility Multi-Programming System, aims to solve. To its proponents, MUMPS allows for a one of a kind synthesis of programming and database management, while to to its detractors, it’s a bizarre anachronism with little connection to the evolution and innovation taking place elsewhere in programming. Probably to most people that do things with computers, MUMPS/M is poorly understood, at best, and more likely to be completely unknown.

This is unfortunate, at least for the reason that MUMPS predated the NoSQL database movement by many decades. NoSQL is a relatively recent push away from relational databases like SQL—read: lots and lots of tables—and toward structures more amenable to Big Data. NoSQL has become a Web 2.0 backbone, supporting the databases of Facebook, Google, and Amazon. MUMPS concepts (directly) underly two of the largest contemporary NoSQL tools: GT.M and InterSystems Caché.

The alternative structures offered by NoSQL might be document-oriented databases (where different types of data are stored in one unified document instead of a bunch of tabular cells) or graph databases (modeled after structures in graph theory) or columnar databases (where data is laid out in key-valued arrays) or some combination. (Confusingly, the NoSQL name corresponds to “not only SQL” rather than “no SQL.”)

“MUMPS is unusual in that it is two things at once: a language and a database,” explains Rob Tweed, a longtime MUMPS developer (and, yes, booster), at the EWD Files. “More accurately it’s a database with an integrated language that is optimized for accessing and manipulating that database. It was originally designed for use in a medical/clinical setting, but in meeting the non-trivial needs of that setting, it happened to pre-empt the now hyper-trendy NoSQL database design goals by several decades. It’s an exceptionally high-performance database with an equally high-performance language, capable of massive scalability.”

“MUMPS is unusual in that it is two things at once: a language and a database.”

When a user is doing something with an SQL database (or other relational database), whether they’re adding to it or retrieving from it, they’re performing a “query” operation, which is built into the database software. This is an abstraction intended to hide the guts of the database from the user, who only needs to interface with it via this one query command. But the thing about abstraction in computer science, whether it’s a high-level virtual machine-based programming language like Java or a bare-bones Unix shell, is that it always has a cost. In a database, this might be the speed by which some piece of data can be stored or accessed. The query operation is simpler for the end user (the database programmer, that is), but slower and clumsier.

MUMPS is based on storing data in simple arrays that are accessed using a key. Imagine just a long list of things with each item having its own unique designation referred to by a variable. Variables (or keys, in this case) are just addresses of different memory locations within those arrays, which are called globals in MUMPS-speak. A MUMPS system, which might be made up of many computers, has its own collection of global arrays stored in non-volatile memory. So, unlike an array created in a language like C++, which exists only for the duration of the program or the program’s existence within a computer’s RAM address space, a MUMPS global sticks around on a server, accessible at any given time to a computer within the system by the addition of the “^” character. We say that it’s persistent.

The result of this is that a MUMPS programmer can tap a database directly rather than using a query. This is faster on its face, eliminating the query abstraction, but direct access also allows a bunch of alternative programming ideas. For one thing, as a programmer, I can take an item stored in one of those globals and give it “children,” which might be some additional properties of that item. So, we wind up with lists of different things that can be described and added to in different ways on the fly. The relationships are hierarchical.

Here’s the Wikipedia example, in which I take an item in a global database, a car, and give it a door and then give that door a color.

This gets to be a bit difficult, but we might say that a typical relational database is sort of a separate entity somewhere remote and it might be used by a bunch of different computers running a bunch of different programs. MUMPS packages the programming and the database itself into one package, which is potentially very fast.

The language is still developed and exists in a number of different implementations, including the open-source Mumps/II, which is maintained by students at the University of Northern Iowa, as well as the aforementioned GT:M. You can also download ANSI Standard MUMPS here, which will work on a Raspberry Pi in addition to OSX and Windows under cygwin.

The MUMPS claim to fame is the Veterans Health Information Systems and Technology Architecture (VistA), which is a vast suite of some 80 different software modules supporting the largest medical system in the United States. It maintains the electronic health records for 8 million veterans used by some 180,000 medical personnel across 163 hospitals, over 800 clinics, and 135 nursing homes. VistA is used for nearly 200 distinct functions, ranging from MRSA tracking and reporting to vital signs monitoring and recording to accounts receivable. It’s considered a model for current efforts to create a nationwide medical health records network, but also well beyond. A low-key corporation called InterSystems, mentioned above, has been building on a MUMPS-based (but not MUMPS-limited) system for decades—the technology, known as Cache, is now used by the European Space Agency to map the Milky Way.

In 1966, when MUMPS was first developed by a pair of researchers working in an animal laboratory, the data that would even come pouring in from high-resolution sky mapping projects and social media platforms with hundreds of millions of users would have been unimaginable. But our bodies and their limitless data points were already there.

Modern Medicine is a series on Motherboard about how health care and medical technology can move forward so rapidly while still being stuck in the past. Follow along here.