Researchers Figure Out How to Detect Malware in Encrypted Traffic

July 7, 2016, 8:35am

When a company or organisation wants to keep tabs on what’s flowing through its network, it often has to switch off encryption to actually see what’s happening and deal with any threats. The downside of this approach, however, is that it may infringe on the privacy of people using the network who would prefer not to have everything they’re doing open to inspection.

Now, researchers from Cisco claim that encrypted traffic can give away enough tell-tale signs for them to figure out if it’s related to certain pieces of known malware, without the need to decrypt it.

Videos by VICE

Transport Layer Security, or TLS, is a cryptographic protocol commonly used for encrypting traffic in transit, and according to the researchers, it’s increasingly used in malware. “The use of TLS by malware poses new challenges to network threat detection because traditional pattern-matching techniques can no longer be applied to its messages,” Blake Anderson, Subharthi Paul, and David McGrew write in their paper.

When traffic isn’t encrypted, it’s relatively easy to pick out certain characteristics of the data that have been linked to malware before—to match patterns between them. But with encrypted traffic, that’s not really possible, because the actual contents of the traffic are obfuscated.

To get around this problem, the researchers analysed how 18 different malware families used encryption, based on thousands of malware samples and tens of thousands of malicious traffic flows.

It turns out encrypted malware traffic is often noticeably different from encrypted enterprise traffic.

“While TLS obscures the plaintext, it also introduces a complex set of observable parameters that allow many inferences to be made about both the client and the server,” the paper reads. In other words, certain characteristics can give away whether a flow of encrypted traffic likely contains something related to known malware.

Those indicators include specific encryption algorithms, different sized encryption keys, and use of the Tor client.

The researchers also trained four machine learning classifiers to sort out all the encrypted traffic and label those belonging to a particular malware family. To do this, they used information such as the traffic flow’s metadata, packet lengths, and distribution of bytes. Certain pieces of malware also connected to common sets of servers. For example, the “Yakes” and “Razy” families of malware connected to servers from Chinese company baidu.com.

One way a piece of malware might be trickier to identify is if it varies its approach. “Malware families that actively evolve their use of cryptography are more difficult to classify,” the researchers write. Malware authors could also try to mimic ordinary enterprise traffic. But, as the researchers point out, this would require an ongoing and non-trivial effort to constantly keep up the malware up to date.

Being able to link malware samples to a particular family is of course valuable to organisations and businesses. If they can do that without the traditionally expensive process of decrypting traffic, even better.