Google Offers Up Its Entire Machine Learning Library as Open-Source Software

November 10, 2015, 11:16pm

Via its research blog, Google announced on Monday that it was releasing the second generation of its machine learning framework as an open-source library called TensorFlow.

“The most important thing about TensorFlow is that it’s yours,” write Google Technical Lead Rajat Monga and Google Senior Fellow Jeff Dean. “We’ve open-sourced TensorFlow as a standalone library and associated tools, tutorials, and examples with the Apache 2.0 license so you’re free to use TensorFlow at your institution (no matter where you work).”

Videos by VICE

The whole thing is ready, waiting, and, based on some skimming around, quite accessible; the project comes well equipped with documentation and tutorials. As of now, it features APIs for both Python (more support) and C/C++ (better performance, less support), but as the TensorFlow project page notes, it’s hoped that eager open-sourcers will get to work on building front-ends for Java, JavaScript, Go, and the rest of the programming language hoard. Via the C++ API, it’s possible to integrate the framework into Android projects (Ctrl + F “iOS” not found).

Here’s some background, courtesy of Monga and Dean:

Deep Learning has had a huge impact on computer science, making it possible to explore new frontiers of research and to develop amazingly useful products that millions of people use every day. Our internal deep learning infrastructure DistBelief, developed in 2011, has allowed Googlers to build ever larger neural networksand scale training to thousands of cores in our datacenters. We’ve used it to demonstrate that concepts like “cat”can be learned from unlabeled YouTube images, to improve speech recognition in the Google app by 25%, and to build image search in Google Photos. DistBelief also trained the Inception model that won Imagenet’s Large Scale Visual Recognition Challenge in 2014, and drove our experiments in automated image captioning as well as DeepDream.

While DistBelief was very successful, it had some limitations. It was narrowly targeted to neural networks, it was difficult to configure, and it was tightly coupled to Google’s internal infrastructure — making it nearly impossible to share research code externally.

This isn’t the first open-source machine learning library by any stretch. Even just for Python there’s a pretty long list.

As Cade Metz points out at Wired, some large part of the significance of the TensorFlow release is that the system lets programmers use hardware like Google uses hardware. In particular, this means running software on graphics-processing units (GPUs), or at least the possibility of it.

GPUs are conventionally intended for playing games and handling, well, graphics. This is because graphics tasks fit a certain computing paradigm that most other software tasks don’t, at least historically. This is parallel programming, in which many computations can be done at once on different units of data.

In graphics applications, software is constantly updating and manipulating potentially millions of pixels and those calculations are unique in that that they don’t usually need to wait around for other computations to finish. They’re independent. The software telling this page to do what it’s doing is, by contrast, sequential. It spends a lot of time hanging around waiting for different things to happen. This is not so good for parallel programming and GPUs.

That’s all a GPU is and it happens that machine learning applications can exploit this because they’re likewise needing to calculate many, many small pieces in parallel. That’s how machine learning learns: examining vast numbers of raw data points.

So, TensorFlow could be made to exploit the GPU in your gaming laptop naturally, which could prove to be a big cool thing for making machine learning even more general than it currently is.

That’s part of what makes it interesting to me. Machine learning and its AI parent are still widely interpreted as being esoteric and science-fiction-y fields, when they really are just natural ways of handling a lot of data. Machine learning is used in applications (usually pretty banal stuff) ranging from market analysis to image/speech recognition to spam filters. Will TensorFlow open up machine learning even more? That would be cool, at least.