FYI.

This story is over 5 years old.

Tech

This Algorithm Knows What You're Doing

A step toward software that can tell your next move before you've made it.
Image: Shutterstock

Some things computers aren't so great at yet, like recognizing human faces (don't believe the hype) or finding the prime factors of very large numbers (the basis of most current cryptography). There are just too many possibilities, a deluge of small differences: possibilities that must be exhausted before arriving at a correct answer. Whether noses or numbers, it's a matter of processing and work. The actions of humans—the changes observed in a real-world figure over a period of time—fall into this category of processing resistance. Human brains still do it better. For now.

Faces and behavior are subjects of much intensive research on behalf of the twin states of security and social media. The closed-circuit networks laid out over many cities are (so far) limited by the need for human eyes. It's not hard to imagine just how fervent the desire is on the part of all the various security agencies across the globe to have every single camera manned at all times. A watcher (or two or three) at every corner in London.

Advertisement

This June at the Conference on Computer Vision and Pattern Recognition, researchers based at MIT and the University of California will present a new activity-recognition algorithm to destroy them all. The algorithm is based on natural language processing algorithms and bests the existing technology in several key areas. For one, it uses computer memory intelligently (recognition takes a lot of virtual memory), allowing memory usage to remain fixed, and thus allowing the viewing/processing of very large files and, critically, streams of video.

The ability to handle video streams is in itself a major advance. It's made possible by the new algorithm's ability to process partially completed actions. If, say, someone on the street suddenly spins around and reaches into their coat pocket, the algorithm is designed to produce probabilities of how that action will be completed. So, yes, in a way it also attempts to predict the future. That's a bit spooky, naturally, but less a matter of pre-crime than it is of basic action identification. Single actions take place over time intervals, after all.

The algorithm is posited on the strange notion of "behavioral grammar." We make meaning from words/out of words using rules that establish the relationships between those words. The team argues that actions are likewise made up of sub-actions, similarly related by rules. This comes before this sometimes, but this other thing always comes before this. 

Advertisement

"One of the challenging problems they try to solve is, if you have a sentence, you want to basically parse the sentence, saying what is the subject, what is the verb, what is the adverb," Hamed Pirsiavash, an MIT postdoc, explained in a statement. "We see an analogy here, which is, if you have a complex action—like making tea or making coffee—that has some subactions, we can basically stitch together these subactions and look at each one as something like verb, adjective, and adverb." The result is one coherent action.

The algorithm is based on machine learning, which is very simply a process by which a computer program learns from input data sets and over a period of "training" becomes a uniquely adapted, "smarter" program. In this case, the program implementing the algorithm is tasked with observing videos depicting different actions while being given a set number of subactions to scan for. The program is not informed of what the subactions actually are or how they're related. That's up to the algorithm to determine as it finds the hidden grammar.

So, the program starts at the beginning of a particular video and the beginning of a particular action. With just a snippet of the whole action (one word of the sentence, in our analogy's terms), the program reports a vast number of probabilities of what the final action could be, ranking them from highest to lowest. As the video continues, every piece of new information (time passing is itself new information, even if there's no visual change) eliminates some of those possibilities, which are then reconsidered based on the new information and re-ranked. Eventually a critical point is reached in the action's progression such that the program can make a very good estimation of what's really going down.

This is a new skill for computers and would seem fundamental to the whole action-determination enterprise. The research team tested it out using videos of different athletic activities, like weightlifting or bowling, posted to Youtube. The new algorithm beat out every one of its predecessors. And it's worth noting that happens to be a fairly large and powerful field of predecessors, some utilizing different probabilistic approaches (hidden Marcov models, Kalman filtering) and some using analyses of motion and what physics tells us about the probable future results of some observed current position and velocity (summing changes in position over time in a three-dimensional space). They're all pretty interesting, particularly for math-heads, but this is so far the one to rule.

Following the usual pattern for advances in would-be spooky surveillance tech, the MIT team is also interested in medical applications. A system employing this algorithm might warn elderly individuals that they have or haven't taken their medication, or it might offer automated corrections to patients undergoing physical therapy. It might also offer a way to monitor the return of motor function to patients with neurological damage, by identifying a distinct set of action/motion subunits that can be "manually" (re)constructed into a full motion.

Many of us will of course be thinking about those London cameras, a machine eye behind each one ready to sound the alarm if some action-probability threshold is reached for some suspicious action. Human machines would perhaps be wise to learn how to implement algorithms like this themselves, as much as that's possible—just so they can avoid tipping off the machine watchers.