Scanning Code for Viruses Is No Longer a Job for Humans

If the information security company Kaspersky Lab had to use humans to analyze all the malware it sees, it would need to hire 350,000 people—half of those living in Washington, DC, or almost the entire population of the city of Florence, Italy. Yet, they do the trick with just 3,300 employees—and those analysts only inspect a tiny fraction of the whole volume of code.

It’s possible with the help of algorithms and machine learning.

Videos by VICE

At Kaspersky and other antivirus and infosec companies, computers carry out many mundane tasks. They also make sensitive decisions, such as figuring out whether a chunk of ones and zeros is malicious or not in order to protect you from hackers.

Every major infosec company relies on automation. They simply could not afford to do otherwise.

***

Alexey Malanov, malware expert at Kaspersky Lab, said 99 percent of the code his firm analyzes is seen only by machines—and it’s been that way for five years. The process keeps improving in terms of speed and efficacy, he said.

Automation works because most malware is an alteration of code already known. “Even if a cybercriminal creates something from scratch, in most cases he’ll integrate previously known malicious functionality,” said Malanov. “Automation will process all this.”

Machines don’t get tired, aren’t bothered by repetitive tasks, work around the clock, and don’t make mistakes if instructed properly. But most importantly, they’re faster, analyzing millions of samples in 24 hours.

When a company receives a malware sample, the files are first run through the automation process. Machines will try to see where and when it was downloaded, how many other downloads have occurred, and who wrote and signed the software, also known as reputation features.

In addition, they will try to find similarities in the code across the virus collection. Machines also analyse the sample’s behaviour in a real system to see if it tries, for instance, to delete or encrypt files.

Machines have been filling in for cybersecurity researchers since the dawn of the industry. “Automation has been used from day one at Symantec,” said Andrew Gardner, senior technical director for security technology and response, referring to the early 1990s, when the company introduced its first antivirus product.

99 percent of the code Kaspersky Lab analyzes is seen only by machines—and it’s been that way for five years

Highly advanced algorithms have discovered new threats weeks before a separate human analyst team independently did, he said.

“Machines are better at grinding out statistical deviations, [and] abstracting patterns from large data sets,” Andrew Gardner at Symantec said. “In general, once a human has an intuition it is almost always beneficial to automate that if possible.”

Machine learning works along with a wide range of clustering and classifying algorithms, used to identify whether or not the scanned file is malicious or not, said Liviu Arsene, senior e-threat analyst at Bitdefender, another antivirus company that uses machines to process over 99 percent of the malware it receives.

“The purpose of these systems is to break down large amounts of files into smaller clusters or groups that share similarities with each other. After that, a security analyst will step in, analyzing one or more files from each cluster and applying his findings to all files in the cluster,” Arsene said.

***

So what’s left for humans to do?

Humans are better at discovering new features hidden within the malware, they have a better intuition and make non-obvious connections. They are able to tackle a problem from creative angles.

Cybersecurity researchers study the concepts behind new malicious programs, or sophisticated infection schemes. They try to write procedures to decrypt user files that have been encrypted by a Trojan-Ransom class malware. Also, they can draw conclusions from mistakes, something machines aren’t good enough at.

Although he praises automation and believes that manual malware detection is not only obsolete, but also unfeasible, Arsene emphasised the fact that machines are only as good as their human counterparts make them. He said that a challenge for cybersecurity experts is to keep false positives at a minimum.

“Humans need to constantly track how these algorithms are performing and fine-tune them. Consequently, the human factor is still mandatory in deciding when these systems need to be retrained as to offer maximum performance,” he said.

While machine learning and automated systems are definitely shaping cybersecurity, they’ll probably never completely replace humans, researchers say.

In the early ages of malware research, it was common for hackers to hide messages in the malware they wrote. Malanov remembered one example from a hacker hoping to convince human analysts not to flag his malware: “This malware is not detected by any antivirus. Please, virus analyst, let it remain so in the future. For this to happen, you do NOT have to do ANYTHING.”

Of course, that’s no longer the case. Now, a virus analyst could do nothing and the malware would still be caught, thanks to machines that carry out the hard work.

The Hacks We Can’t See is Motherboard’s theme week dedicated to the future of security and the hacks no one’s talking about. Follow along here.