Know Your Language: Java Still Matters (Part Two)
Big Data offers Java a rebirth in functional programming and distributed computing.
This is the second part of a two-part Know Your Language entry on the Java programming language.
When we left off, Java was a deeply uncool programming language beyond its peak as a tool enabling early-days webpage interactivity and for developing enterprise software. We'd also left off with a promise to look closer at the language's rather utopian origins, which still matter a great deal to Java's present and future.
The Java past
Before 1995, software was very often built with C and C++. These are not idiot-proof programming languages. It's possible—easy even—to really muck things up at very low levels within a computer, such as corrupting physical memory.
In my first computer science classes, I spent who knows how many hours trying to debug memory access violations, or trying to figure out "memory leaks," where some sliver of physical memory is claimed by a program but never returned to the operating system. If that program or piece of code is run iteratively (again and again), the effect is that an increasing amount of physical memory (RAM, generally) is never released back to the OS. This has potentially serious consequences when it comes to performance.
Memory leaks are sort of the canonical error associated with low-level programming. Java's answer was to add a feature called garbage collection. No longer would memory allocation and deallocation be the job of programmers. Instead, there would be another shadow program to keep tabs on things and properly dispose of no-longer-needed memory.
Where memory really matters is in systems that don't have a lot of it to spare: embedded systems. An embedded system is what we'd probably more likely call a thing in the Internet-of-Things. It's a small computer that controls some physical machine or interacts with the physical world in some specific way. The computer in a thermostat is an embedded system; an ATM is an embedded system; an Arduino board is an embedded system.
This was the original purpose of Java—controlling things like TVs and smart appliances. It began within Sun Microsystems (now Oracle) in 1991 as the Oak programming language. James Gosling, one of the language's original developers, wrote at the time: "the goal was ... to build a system that would let us do a large, distributed, heterogeneous network of consumer electronic devices all talking to each other."
The Java future
The first part of this post was written in 2015 (sorry!). There have been some Java developments. To understand this, we need to understand something peculiar about Java that I haven't quite addressed. Recall that the central idea of Java code is that it actually runs on this intermediate entity called the Java Virtual Machine or JVM. This is like a ghost computer that lives between the language and the actual computer hardware below. This is where garbage collection is implemented, among other things.
The neat thing about the JVM is that it doesn't require Java. All it requires is Java bytecode. So, if you can make another programming language that can be converted into Java bytecode, then that language can run on the JVM and take advantage of its features.
But the JVM language I want to talk about is Scala.
Scala is weird and also about to be incredibly important. Its weirdness is in its functional programming features. I'll write a separate post soon about functional programming, but for most programmers used to languages like Python and C++ and even normal Java it's a mindfuck. It's really a rewrite of some of the fundamental mechanics of programming where every computation reduces to the evaluation of a mathematical function, a rule for mutating a data point x into some new data point y.
In practice, functional programming is a way of operating on very large amounts of data using very small, very simple code. A classic operation would be a map function, which takes in a list of data and a formula that specifies how to modify a single data point. The map takes that formula and applies it to every data point and then returns a list of new data points.
Something like this:
function addOne(number) = number + 1
newData = map(addOne, [1,2,3,4])
And then newData will equal [2,3,4,5].
Scala adds some features to Java that make it really good at this sort of thing. Scala works very well when it comes to doing operations on large amounts of data in relatively simple ways, which makes it a natural candidate for distributed computing. This is where computations are done on very large datasets across many different CPUs located in potentially many different locations. Because of the memory constraints of most computers, this is neccessary for a lot of big datasets. And as data grows and grows, it's becoming more neccessary.
That's the problem solved by Apache Spark, a cluster-computing framework written in Scala and typically used within Scala applications. Spark is highly functional and a natural mate for Scala. Unlike fusty old Java, Spark and Scala are things that tend to get engineers excited in 2017.
There's more to the Java future than Scala and distributed computing, of course. There's the Android operating system, which is built around a JVM core. That probably speaks for itself.