FYI.

This story is over 5 years old.

Tech

Know Your Language: Java Still Matters (Part Two)

Big Data offers Java a rebirth in functional programming and distributed computing.

This is the second part of a two-part Know Your Language entry on the Java programming language.

When we left off, Java was a deeply uncool programming language beyond its peak as a tool enabling early-days webpage interactivity and for developing enterprise software. We'd also left off with a promise to look closer at the language's rather utopian origins, which still matter a great deal to Java's present and future.

Advertisement

The Java past

Before 1995, software was very often built with C and C++. These are not idiot-proof programming languages. It's possible—easy even—to really muck things up at very low levels within a computer, such as corrupting physical memory.

In my first computer science classes, I spent who knows how many hours trying to debug memory access violations, or trying to figure out "memory leaks," where some sliver of physical memory is claimed by a program but never returned to the operating system. If that program or piece of code is run iteratively (again and again), the effect is that an increasing amount of physical memory (RAM, generally) is never released back to the OS. This has potentially serious consequences when it comes to performance.

Memory leaks are sort of the canonical error associated with low-level programming. Java's answer was to add a feature called garbage collection. No longer would memory allocation and deallocation be the job of programmers. Instead, there would be another shadow program to keep tabs on things and properly dispose of no-longer-needed memory.

Where memory really matters is in systems that don't have a lot of it to spare: embedded systems. An embedded system is what we'd probably more likely call a thing in the Internet-of-Things. It's a small computer that controls some physical machine or interacts with the physical world in some specific way. The computer in a thermostat is an embedded system; an ATM is an embedded system; an Arduino board is an embedded system.

Advertisement

This was the original purpose of Java—controlling things like TVs and smart appliances. It began within Sun Microsystems (now Oracle) in 1991 as the Oak programming language. James Gosling, one of the language's original developers, wrote at the time: "the goal was … to build a system that would let us do a large, distributed, heterogeneous network of consumer electronic devices all talking to each other."

The language's initial purpose didn't quite work out as the smart appliance dream (for a while, anyhow), so the language was retargeted at the web, where features like platform independence were still selling points. In 1995, a version of Netscape Navigator was released capable of running Java programs. You can read a lot more about Java Applets here, but for a good while these self-contained boxes running Java programs were the beginning and end of the interactive, dynamic web. Eventually, Applets would be mostly killed off by Javascript and CSS.

The Java future

The first part of this post was written in 2015 (sorry!). There have been some Java developments. To understand this, we need to understand something peculiar about Java that I haven't quite addressed. Recall that the central idea of Java code is that it actually runs on this intermediate entity called the Java Virtual Machine or JVM. This is like a ghost computer that lives between the language and the actual computer hardware below. This is where garbage collection is implemented, among other things.

The neat thing about the JVM is that it doesn't require Java. All it requires is Java bytecode. So, if you can make another programming language that can be converted into Java bytecode, then that language can run on the JVM and take advantage of its features.

Advertisement

Another quote from Gosling: "Most people talk about Java the language, and this may sound odd coming from me, but I could hardly care less. At the core of the Java ecosystem is the JVM." A short list of JVM languages that are not Java includes Groovy, Clojure, and Cotlin. Nearly every popular programming language has at least one implementation that runs on the JVM. So, there are versions of Python, Javascript, PHP, and Ruby that all run on the JVM.

But the JVM language I want to talk about is Scala.

Scala is weird and also about to be incredibly important. Its weirdness is in its functional programming features. I'll write a separate post soon about functional programming, but for most programmers used to languages like Python and C++ and even normal Java it's a mindfuck. It's really a rewrite of some of the fundamental mechanics of programming where every computation reduces to the evaluation of a mathematical function, a rule for mutating a data point x into some new data point y.

In practice, functional programming is a way of operating on very large amounts of data using very small, very simple code. A classic operation would be a map function, which takes in a list of data and a formula that specifies how to modify a single data point. The map takes that formula and applies it to every data point and then returns a list of new data points.

Something like this:

function addOne(number) = number + 1

Advertisement

newData = map(addOne, [1,2,3,4])

And then newData will equal [2,3,4,5].

Scala adds some features to Java that make it really good at this sort of thing. Scala works very well when it comes to doing operations on large amounts of data in relatively simple ways, which makes it a natural candidate for distributed computing. This is where computations are done on very large datasets across many different CPUs located in potentially many different locations. Because of the memory constraints of most computers, this is neccessary for a lot of big datasets. And as data grows and grows, it's becoming more neccessary.

That's the problem solved by Apache Spark, a cluster-computing framework written in Scala and typically used within Scala applications. Spark is highly functional and a natural mate for Scala. Unlike fusty old Java, Spark and Scala are things that tend to get engineers excited in 2017.

There's more to the Java future than Scala and distributed computing, of course. There's the Android operating system, which is built around a JVM core. That probably speaks for itself.

Read more Know Your Language.