Pastebin Made It Harder To Scrape Its Site And Researchers Are Pissed Off

Pastebin quietly changed its terms and services that allowed researchers to study leaked data, malware, and stolen passwords.
April 16, 2020, 3:25pm
IMG_1496
Image: Motherboard

The most famous paste site, used by hackers of all stripes to host lists of stolen passwords, announcements of data breaches, and malware has made it harder for security researchers to scrape it looking for that kind of information.

And security researchers are pissed off.

Pastebin is one of the most famous websites that allows anyone, even without being registered, to “paste” any kind of text and make it public. Over the years, it became a repository for all kinds of unsavory data, such as the personal details of people who got doxed by hackers, leaked passwords, hacker manifestos, and even malware payloads. Naturally, this meant it was a treasure trove for security researchers investigating data breaches or hunting hackers.

On Tuesday, several security researchers complained on Twitter that they were unable to search Pastebin or scrape it using a special API, which they paid to get access to. (The lifetime subscription, which was required to scrape the site, cost $50.)

“Many individuals and companies monitored Pastebin for various reasons as it is a hot-bed of intelligence from malware payloads to leaked data. For Pastebin to suddenly cut everyone off without notice or even a short sunset period was a really bad decision,” said Oliver Hough, a security researcher who used Pastebin in his day job. “Considering they don't do a great job of moderating their own content and they have now lost what was to them free crowd sourced content moderation, I can only predict that bad actors will make more use of Pastebin in their campaigns as the ability to proactively catch some malicious payloads or scripts before they are used has now gone.”

Do you work at Pastebin or used to? We’d love to hear from you. You can contact Lorenzo Franceschi-Bicchierai securely on Signal at +1 917 257 1382, OTR chat at lorenzofb@jabber.ccc.de, or email lorenzofb@vice.com

When researchers asked the company on Twitter, Pastebin said that the Scraping API “has been discontinued due to active abuse by third parties for commercial purposes, such activity is prohibited by our current [Terms & Conditions].”

Pastebin changed the T&C on April 11, according to an archived version of the page. At the time, it allowed scraping for a variety of purposes.

"Researchers may scrape public, non-personal information from Pastebin for research purposes, only if any publications resulting from that research are open access. Archivists may scrape Pastebin for public data for archival purposes. You may not scrape Pastebin for spamming purposes, including for the purposes of selling Pastebin users' personal information, such as to recruiters, headhunters, and job boards," the terms and conditions read. The new T&C page has removed any reference to scraping.

We sent Pastebin a series of detailed questions, but the company decided to respond with a blanket statement.

“Great questions. Most of your questions are actually addressed in our T&C's,” a spokesperson said via email. “We continue to provide updates to our community and improve the coding platform. As you know, we were founded over 19 years ago, for developers by developers and with this growth, we are modernizing like other platforms. And also learning from them.”

“Security researchers are always welcome and report information that pertains to any violations of our T&C's. Our key audience has always been developers, engineers and authors,” the statement continued. “We will continue to update all the new features and changes through our platform as well as our social media.”

Subscribe to our new cybersecurity podcast, CYBER.