New Machine Learning Program Recognizes Handguns in Even Low-Quality Video

The handgun is a ballistic extension of the hand itself. It’s remarkable for confining vast destruction to a small, subtle instrument. It can be wielded casually, and, until the trigger is pulled, nigh invisibly. A semi-automatic pistol, once awoken, will fire as fast as the shooter can twitch their finger. The relation between package and power offered by a handgun feels almost nuclear.

A team of computer scientists based at the University of Granada in Spain thinks that we can help neutralize the threat of handguns through early detection. If we can register the gun before it’s actually fired, we can regain some control. To this end, they’ve developed a machine learning program that can reliably detect handguns based on visual recognition and classification. It’s capable of catching guns from even low-quality YouTube footage in just under a quarter second.

Videos by VICE

“The crime rates caused by guns are very concerning in many places in the world, especially in countries where the possession of guns is legal or was legal for a period of time,” Siham Tabik and colleagues write in a paper posted recently to the arXiv preprint server. “One way to reducing this kind of violence is prevention via early detection so that the security agents or policemen can act. In particular, one innovative solution to this problem is to equip surveillance or control cameras with an accurate automatic handgun detection alert system.”

Watch Motherboard’s A Smarter Gun in full:

Given the success of machine learning facial recognition systems, we might assume this is a simple problem. The algorithm just needs to look at a whole bunch of images of handguns in a variety of settings and it will eventually learn the visual features that make a handgun a handgun, or at least enough of those features to identify a handgun.

It would be this simple if only we had as many pictures of handguns as we do of faces, which we don’t. Facial recognition using convolutional neural networks (CNNs) is possible only because we have many millions of images of faces to learn from. Tabik only had about 3,000 images of handguns, which is all but nothing when it comes to doing visual recognition tasks with machine learning.

Over the past few years, researchers have been developing a fudge of sorts that allows, in some cases, for the creation of visual recognition models even when data is scarce. Generally, this is known as transfer learning. The basic idea is that we can use knowledge about one category of thing and apply that to another, related category. We may have very few images of trucks, for example, but if we have enough images of cars then we can get somewhere.

Say we already have a good model trained for recognizing cars. It’s possible to take that car recognition model and “fine tune” it using new images of trucks. The model for cars already contains a lot of semantic information about trucks because, well, trucks are pretty similar to cars. So, through the fine tuning process, we can retrain the existing car model to be a truck model by teaching it the difference between a car and a truck. Its abstract representation—the generic features it uses to recognize something—expands and adapts.

The model Tabik and co. fine tuned is called VGG-16 and is based on a 1.28 million image dataset known as ImageNet. Its utility is in object recognition across many unrelated categories. Given in input image, it can classify it with respect to 1,000 different object classes: can opener, vulture, coral fungus, toilet paper, etc. By fine tuning the VGG-16 model with their 3,000 handgun images, they were able to create essentially new classes of object.

“The best performing model shows a high potential even in low quality YouTube videos and provides satisfactory results as automatic alarm system,” Tabik writes. “Among 30 scenes,
it successfully activates the alarm, after five successive true positives, within an
interval of time smaller than 0.2 seconds, in 27 scenes.”

When it comes to gun detection technology there isn’t a whole lot occupying the gulf between ultra high-tech and crudely obvious. On the one hand, we have the NYPD testing a system that tracks guns based on the radiative signatures of human bodies, while, on the other, a firm called Shooter Detection Systems is pushing a system that automatically detects and reports gunfire: “a smoke alarm for gun fire detection.” As of 2014, the latter had been installed in about a half-dozen schools. Tabik’s system is somewhere in between those two poles.

Training a machine learning model is computationally taxing. It takes a long time—days, weeks. But once you have the model—which is really just a big grid of numbers saved in a file—actually doing object recognition is pretty quick. Across several different combinations of detection classes, the researchers were able to all but eliminate false positives, but still wound up with a significant set of false negatives (as above), resulting in overall accuracies of between 90 and 95 percent, excluding a first attempt that mostly missed the mark.

This relatively new ability of building out new object recognition classes from small datasets has the potential to make this stuff much more real-world in the near future.