# Why Machine Learning Needs GPUs

Matrix multiplication and AI: A short primer

Last summer, I spent a week at a conference dedicated to graphics-processing units—GPUs. It was presented by GPU big name Nvidia, a brand that is largely associated with gaming hardware. At the conference, however, gaming was a sideshow. For that matter, graphics themselves (excluding VR) were a sideshow, despite being in the actual name. In general, this was a machine learning conference, and, to most of the attendees, of course it was.

With chipmaker AMD’s announcement this week at CES that the bleeding-edge of its GPU product line will be targeted at machine learning, at least initially, I thought it would be a good opportunity to take a step back and offer a bit of background on why GPUs and machine learning are so intimately connected in the first place. It has to do with matrices.

First, understand that there’s no magic to machine learning. It’s just math. And in the grand scheme of math, the basic ideas behind machine learning are even kind of simple, at least conceptually. Machine learning is optimization. Given some very long equation with a lot of variables in it, can we come up with a good/reliable way of tweaking those variables such that our very long equation spits out accurate predictions? While this may be a conceptually simple question to ask, actually computing the specific tweaks needed is labor-intensive.

To get some intuition about this sort of optimization, start just by thinking of cause and effect. The air outside is cold. Why? We might look at things like where the jet stream is; what the air pressure is; whether it’s cloudy or sunny out; how much moisture is in the there; and-or what season it is. I’m no meteorologist, but those seem like things that might reasonably predict the air temperature outside, so if, say, we didn’t know the air temperature ahead of time, but we knew all of this other stuff, we might be able to predict the temperature reliably.

Of course, not all of those things are of equal importance when it comes to predicting the air temperature. For example, what season it is might be 10 times as important as anything else, while air moisture might matter only a third as much as elevation. The point is that we can take a bunch of observations and then assign each of them a weight (or emphasis) indicating how important that observation is compared to the others. Then we can take some new observations, plug them into the optimized, weighted equation, and make a solid prediction of how cold it is.

The weighted equation is what’s normally called a model. It models relationships that exist in the world and so it has predictive utility. The hard math is in how we come up with the model, or how we figure out how important each of those different observations are relative to the other ones.

We do this by taking a lot of observations and doing a lot of optimizations one after another. Each one would then look something like the following:

Plug in actual observations into the above equation and we can come up with values for the weights that come closest to the actual temperature on the right-hand side.

For the resulting weights to be meaningful, we have to do this a lot, with a lot of observations. Training a real-life machine learning model might involve doing this same thing millions of times, with each iteration tweaking those weights just a little bit to better optimize the resulting model.