As a journalist, I spend an inordinate amount of time every day pressing Control + C, Control + V (or Command + C, Command + V on Mac) on my keyboard. Depending on your job, you’re probably very familiar with those key combinations to copy and paste text.
I rarely think that the text I’m copying and pasting has something hidden within it that could reveal my sources. And I probably should worry about that because it’s incredibly easy to hide invisible characters—so-called “zero-width” characters—within text and use those as a watermark or fingerprint of sorts. In some situations, that could reveal to the original author who you are, or the fact that you copy pasted the original text and put it somewhere else.
Tom Ross, a developer at Sky, created a simple script to demonstrate just how easy it is to do that. In simple terms, this technique slips a character or characters that have no width—and are thus invisible to the eye—within other regular text.
You can try it for yourself on this site made by Ross.
Ross explained in a blog post that he invented this copy paste detection technique to fingerprint announcements that him and his competitive video game team shared on their private site in case they leaked elsewhere. (He declined to say what video games they played because he didn’t want to identify his team).
In essence, this is a pretty good leak-detection and leaker-identification technique.
This technique doesn’t work everywhere. If you copy paste text containing zero-width characters in some applications or sites you will be able to spot them as red dots, if the app or site renders the zero-width characters.
For example, here’s how Diff Checker, a website designed to compare text files to spot differences, renders a sentence with zero-width characters in it.
But if I paste the same text into TextEdit on a Mac computer, it looks fine.
Also, depending on where you paste the text you might see that some words—the ones with hidden characters within them—are flagged by the spellchecker.
Even our own Content Management System doesn’t spot anything wrong unless I look at the text in our HTML editor. In fact, I’ve slipped a few hidden characters—and an easter egg—within this blog post as a test.
If you’re worried about this, there isn’t really a silver bullet. If you are a developer though, you can create your own script or tool that detects zero-width characters (and even other deceiving techniques such as swapping the letter “a” with its Cyrillic counterpart “а”), according to both Ross and Zach Aysan, a data scientist and cyber security consultant who warned about the dangers of this technique to identify leaks and leakers in December of last year.
“If you had a script that only allowed ‘whitelisted' characters (e.g. A-Z, 0-9 ,!?() and so on) then you could theoretically run all text through it before pasting,” Ross told me in an online chat.
Aysan came up with a helpful list of countermeasures for journalists who work with leaked documents.
- Avoid releasing excerpts and raw documents.
- Get the same documents from multiple leakers to ensure they have the exact same content on a byte-by-byte level.
- Manually retype excerpts to avoid invisible characters and homoglyphs.
- Keep excerpts short to limit the amount of information shared.
- Use a tool that strips non-whitelisted characters from text before sharing it with others.
Get six of our favorite Motherboard stories every day by signing up for our newsletter.