This story is over 5 years old.

The Sentient Surveillance Camera

What the world’s first talking artificially intelligent camera says about the surveillance age.

Surveillance cameras of one sort or another get pointed at us every day. You probably pass through their gaze on your way to work, on public streets and transit, and inside stores and businesses. In New York City, there are thousands—no one is sure exactly how many, but the ACLU counted 4,400 in just two downtown areas and Harlem alone. In London, there are over half a million; the average Londoner is famously captured on CCTV hundreds of times a day. There are estimated to be 20 million to 30 million surveillance cameras in China.


So we know our images are being captured, over and over—when the cameras aren't broken, anyway—and we know that they're being recorded and stored somewhere. Beyond that, what do we know about what the cameras are actually seeing? What is the quality, not just quantity of the information they're recording?

In an era where artificial intelligence is beginning to converge with surveillance—in the wake of the Boston bombings, for instance, the BPD is reportedly experimenting with artificially intelligent mass surveillance with the help of the Houston company AISight—how can we begin to understand the data these networks might soon be processing about us?

They're not easy questions to grasp, let alone answer, but they're Ross Goodwin and Gene Han's forté. Both are masters students at NYU, where they research artificial intelligence and machine learning. On the side, they're inventors; DIY makers who build experimental tech to test out their ideas. Their latest project is a surveillance camera that "reads" a person's face and "speaks" aloud what it "sees."

Goodwin in particular is interested in how machines "perceive" humans—I've written about his work before; his last project featured a camera that would send the images it took to a crude artificial intelligence that then attempted to describe it in English. It was funny, and, for a second at least, it opened a channel between the human and mechanical mind. His next project, built with Han's software, takes the experiment a step further, and forces an audience to directly interact with a roving lens that's gathering and reciting information about it in real-time.


They call it their "sentient surveillance camera."

"Our idea was to raise awareness regarding the omnipresence of surveillance equipment, and the current state of technological advancement with artificial intelligence," Goodwin told me. "We wanted to create an entity with its own sense of social awareness, its own eyes, and an ability to communicate with humans, albeit with some glitchiness that underscores the limitations of the current technology."

At its heart, it's an interactive art project, designed to get us to thinking about the quickly complicating relationship between humans and the robotics that surround and surveil us.

"The pan-tilt-zoom surveillance camera is constantly moving and scanning its view area for human faces," Goodwin said. When it recognizes one, it uses Haar Cascade detection, a machine learning approach where a cascade function is trained from positive and negative images, and "zooms in on the face and sends an image of the face and its surroundings to our server, which utilizes convolutional neural networks to extract concept words from the scene."

"Those words are expanded into sentences and paragraphs using related words from a lexical relations database," he said, as was the case with his previous project. "The paragraphs are then automatically read aloud using Apple's text-to-speech utility."

As you can see in the short doc we made about the camera, when you take it out into the real world, the results are almost always surprising and occasionally genuinely unnerving. Some of its victims delight in the "insights" the sentient bot relays, while others recoil. It freaks people out when the talking surveillance machine "knows" something about them, and is able to tell them so. The bot makes a few accurate observations—alongside plenty of gibberish, you'll note—but the fact that it gets anything right about us at all mostly makes us squirm.


"In our daily lives, we will soon be confronting AI as a self-contained entity, rather than merely a tool we use"

In part, this is because we already know that the surveillance apparatus is everywhere, though we may have pushed it to the back of our minds. To think it's doing more than just watching, that it is actively rendering "thoughts" or "judgments" about us—like, say, whether we might look like we're about to commit a crime—or compiling data in a language that could be comprehensible to us, that it could confront us with, directly—it's, well, creepy.

"Once I started to care about the surveillance camera, I realized there are so many eyes watching our every move," Han told me. "I started tracking all the surveillance cameras around me, and it was literally watching all my moves in the city."

Surveillance, in the form of augmented reality, is already being marshalled to sell us beauty products based on an algorithm's judgment of our facial composition. Governments can "tag" faces their CCTV networks pick up. But artificial intelligence can do both better and faster. The infrastructure is mostly already there—it's the software that's catching up. Soon we really will be living in a world watched over by artificially intelligent camera eyes. It might be nice to start thinking about what we want them to see.

Regular old humans like Ross and Gene can help.

"In our daily lives, we will soon be confronting AI as a self-contained entity, rather than merely a tool we use," Goodwin said. "The sentient surveillance camera presents one possible implementation of such an entity, albeit a bizarre one, but it's designed to raise questions about the places these technologies could take us, and the possibilities for technology that actively judges us and forms its own conclusions about its environment."

Han concurs.

"Now we have technologies that can read and comprehend the image. Algorithms that can be trained from images to make decisions that mock humans," he said. "With all the cloud computing systems and so much processing power, all surveillance footage can be analyzed, looking for something and acting upon it," he said.

"That's the future I thought of and it is possible right now."