Most people who’ve spent time on the internet have some understanding that many websites log their visits and keep record of what pages they’ve looked at. When you search for a pair of shoes on a retailer’s site for example, it records that you were interested in them. The next day, you see an advertisement for the same pair on Instagram or another social media site.
The idea of websites tracking users isn’t new, but research from Princeton University released last week indicates that online tracking is far more invasive than most users understand. In the first installment of a series titled “No Boundaries,” three researchers from Princeton’s Center for Information Technology Policy (CITP) explain how third-party scripts that run on many of the world’s most popular websites track your every keystroke and then send that information to a third-party server.
Some highly-trafficked sites run software that records every time you click and every word you type. If you go to a website, begin to fill out a form, and then abandon it, every letter you entered in is still recorded, according to the researchers’ findings. If you accidentally paste something into a form that was copied to your clipboard, it’s also recorded. Facebook users were outraged in 2013 when it was discovered that the social network was doing something similar with status updates—it recorded what users they typed, even if they never ended up posting it.
These scripts, or bits of code that websites run, are called “session replay” scripts. Session replay scripts are used by companies to gain insight into how their customers are using their sites and to identify confusing webpages. But the scripts don’t just aggregate general statistics, they record and are capable of playing back individual browsing sessions. The scripts don’t run on every page, but are often placed on pages where users input sensitive information, like passwords and medical conditions.
In the video below, you can see what a session replay script from the company FullStory can record:
Most troubling is that the information session replay scripts collect can’t “reasonably be expected to be kept anonymous,” according to the researchers. Some of the companies that provide this software, like FullStory, design tracking scripts that even allow website owners to link the recordings they gather to a user’s real identity. On the backend, companies can see that a user is connected to a specific email or name. FullStory did not return a request for comment.
To conduct their study, Englehardt, Gunes Acar, and Arvind Narayanan looked at seven of the most popular session replay companies including FullStory, SessionCam, Clicktale, Smartlook, UserReplay, Hotjar, and Russia’s most popular search engine Yandex. They set up test pages and installed session replay scripts on them from six of the seven companies. Their findings indicated that at least one of these company’s scripts is being used by 482 of the world’s top 50,000 sites, according to their Alexa ranking.
Prominent companies who use the scripts include men’s retailer Bonobos.com, Walgreens.com, and the financial investment firm Fidelity.com. It’s also worth noting that 482 might be a low estimate. It’s likely that the scripts don’t record every user that visits a site, the researchers told me. So when they were testing, they likely did not detect some scripts because they were not activated. You can see all the popular websites that utilize session replay scripts documented by the researchers here.
Since the Princeton researchers released their research, both Bonobos and Walgreens said they would stop using session replay scripts. “We take the protection of our customers’ data very seriously and are investigating the claims made in the study that was published yesterday. As we look into the concerns that were raised, and out of an abundance of caution, we have stopped sharing data with FullStory,” a spokesperson from Walgreens told me in an email last Thursday.
Bonobos did not return a request for comment, but the company told Wired that it “eliminated data sharing with FullStory in order to evaluate our protocols and operations with respect to their service. We are continually assessing and strengthening systems and processes in order to protect our customers’ data."
Fidelity did not say it would stop using session replay scripts. “We don’t comment on relationship (sic) we have with vendors or companies but one of our highest priorities is the protection of customer information,” a spokesperson said in a statement.
Companies that sell replay scripts do offer a number of redaction tools that allow websites to exclude sensitive content from recordings, and some even explicitly forbid the collection of user data. Still, the use of session replay scripts by so many of the world’s most popular websites has serious privacy implications.
“Collection of page content by third-party replay scripts may cause sensitive information such as medical conditions, credit card details, and other personal information displayed on a page to leak to the third-party as part of the recording,” the researchers wrote in their post.
Passwords are often accidentally included in recordings, despite that the scripts are designed to exclude them. The researchers found that other personal information was also often not redacted, or only redacted partially, at least with some of the scripts. Two of the companies, UserReplay and SessionCam, block all user inputs by default (they just track where users are clicking), which is a far safer approach.
It’s not just what users input that matters, however. When you log into a website, what’s displayed on the screen can also be sensitive. The researchers found that “none of the companies appear to provide automated redaction of displayed content by default; all displayed content ends up leaking.”
For example, the researchers tested Walgreens.com, which used to run a script from the company FullStory. Despite the fact that Walgreens does use a number of redaction features offered by FullStory, they found that information like medical conditions and prescriptions still are being collected by the session replay script, along with users’ real names.
Finally, the study’s authors are worried that session script companies could be vulnerable to targeted hacks, especially because they’re likely high-value targets. For example, many of these companies have dashboards where clients can playback the recordings they collect. But Yandex, Hotjar, and Smartlook’s dashboards run non-encrypted HTTP pages, rather than much more secure, encrypted HTTPS pages.
“This allows an active man-in-the-middle to inject a script into the playback page and extract all of the recording data,” the study authors wrote.
In an emailed statement, a spokesperson for Yandex told me the company tries to use HTTPS wherever it can, and said it is going to update its product soon to no longer use HTTP. "HTTP is used intentionally, as session recordings load websites using iframe. Unfortunately, loading http content from https websites is prohibited on the browser level so http player is required to support http websites for this feature," the statement read.
A spokesperson for SmartLook said something similar in an emailed statement: "Our product team is already aware of this and they are already working on fixing the issue."
HotJar and UserReplay did not issue a statement in time for publication. SessionCam CEO Kevin Goodings wrote in a blog post that “Everyone at SessionCam can get behind the CITP’s conclusion: ‘Improving user experience is a critical task for publishers. However, it shouldn’t come at the expense of user privacy.’ The whole team at SessionCam lives these values every day. The privacy of your website visitors and the security of your data is of paramount importance to us.” A spokesperson from Clicktale said in an email that the company "takes both customer and end-user privacy extremely seriously, using multiple layers of security and technologies to ensure that data is kept private and secure."
It’s not just session scripts that are following you around the internet. A study published earlier this year found that nearly half of the world’s 1,000 most popular websites use the same tracking software to monitor your behavior in various ways.
If you want to block session replay scripts, popular ad-blocking tool AdBlock Plus will now protect you against all of the ones documented in the Princeton study. AdBlock Plus formerly only protected against some, but has now been updated to block all as a result of the researchers’ work.
Update 11/20/17 10:30 AM: This story has been updated with comment from Yandex.
Update 11/21/17 9:33 AM: This story has been updated with comment from SmartLook.
Update 11/21/17 3:45 PM: This story has been updated with comment from Clicktale.
Got a tip? You can contact this reporter securely on Signal at +1 201-316-6981, or by email at firstname.lastname@example.org
Get six of our favorite Motherboard stories every day by signing up for our newsletter.