Thousands of websites routinely leak identifier data to third-party data brokers, advertisers, and marketing companies, new research has revealed. These entities track users through data that hasn’t even been submitted to the websites. Here’s the scale at which they do it.Â
Most Internet users know that the data they fill out in web forms are collected by data brokers, advertisers, and marketing companies to track them. But what about the data entered in forms, sometimes auto-filled by websites but not submitted? Here’s news for you: these are captured and carted off to buyers too!
A study conducted by researchers from Radboud University, KU Leuven, and the University of Lausanne revealed that thousands of websites leak information to third-party trackers before site visitors even hit the ‘Enter’ or the signup button.
The findings suggest that thousands of popular websites, notably e-commerce, leaked user data to third parties even as they typed in their information. This is alarmingly similar to how a keylogger operates. It logs everything that the target user is typing on their device. “If there’s a Submit button on a form, the reasonable expectation is that it does something – that it will submit your data when you click it,†GüneÅŸ Acar, professor and researcher in the digital security group at Radboud University, told Wired.
Advertisers, data brokers, and marketing and analytics companies use email addresses or derived identifiers for cross-site, cross-platform, and persistent identification of individuals and, by extension, to monetize content for their clients. The use of scripts that monitor and capture keystrokes when the user is filling a form has previously been reported by Gizmodo.
Kunal Modasiya, the senior director of product management at PerimeterX, told Toolbox, “Third-party web trackers that run on websites have the same level of resource access as first-party scripts, i.e. they can interact with any sensitive fields and exfiltrate the data even before the user submits.â€
“Client-side supply chain attacks can cause tremendous damage to a brand’s reputation and its ability to comply with growing data privacy regulations, including GDPR and CCPA,†he added.
Findings of the Leaky Forms Study
Number of sites leaking data or allowing exfiltration
The study sampled 100,000 highest-ranking websites. Of the 2.8 million web pages analyzed using a crawler based on DuckDuckGo’s Tracker Radar Collector tool, 1,844 websites allowed trackers to exfiltrate email addresses irrespective of the ‘submit status’ of the entered data when visited from Europe.
When the researchers visited the same websites from the U.S, they found 2,950 websites that allowed data exfiltration, 60% more than Europe. The number of sites allowing exfiltration was less than 5% of the sampled sites but higher than researchers expected.
“We were super surprised by these results. We thought maybe we were going to find a few hundred websites where your email is collected before you submit, but this exceeded our expectations by far,†Acar added.
The difference in the number of leaky sites is because of the enforcement of GDPR in Europe. An email address, an IP address, a tracking cookie, an identification number, and an online identifier are “almost always†considered personal data under GDPR. So are hashed or encrypted email addresses, so long as they contain a unique identifier that can be linked to a person. The U.S. has no such federal privacy laws yet.
Disturbingly, the study revealed that even passwords were collected in the same fashion as emails by third-party session replay scripts across 52 websites. All of these have now rectified the issue upon disclosure by the four researchers.
See More: Why Security Does Not Equal Privacy
Top website categories where data exfiltration is taking place
The fashion/beauty sector made up the highest number of sites from where user data was exfiltrated to third-party tracker domains, both in the U.S. and the EU. Online shopping followed closely at #2. For the rest of the website categories, it varies according to geography. The following table lists the top 10 site categories leaking user data to tracker domains in the U.S.:
Categories |
EU/US Sites | US Filled sites | US Leaky sites |
US (Leaky / Filled) |
Fashion/Beauty |
1669 | 1179 | 224 | 19% |
Online Shopping | 5395 | 3744 | 567 |
15% |
Recreation/Hobbies |
1098 | 760 | 95 | 13% |
General News | 7390 | 2855 | 162 |
6% |
Blogs/Wiki |
5415 | 3848 | 392 | 10% |
Business | 13462 | 3055 | 237 |
8% |
Marketing/Merchandising |
4964 | 7924 | 484 | 6% |
Travel | 2519 | 3218 | 192 |
6% |
Software/Hardware |
4933 | 1379 | 82 | 6% |
Sports | 1910 | 2855 | 162 |
6% |
Tracker domains collecting email addresses
In the U.S., tracker domains with the highest number of websites from where they collect emails belong to LiveRamp, Taboola, Bounce Exchange, Adobe, and Awin. A breakdown is given in the table below.
Entity Name |
Tracker Domain | Num. of sites | Prom. |
Min. Rank |
LiveRamp |
rlcdn.com | 524 | 553.8 | 217 |
Taboola | taboola.com | 383 | 499 |
95 |
Bounce Exchange |
bouncex.net | 189 | 224.7 | 191 |
Adobe | bizible.com | 191 | 212 |
242 |
Awin |
zenaps.com | 119 | 212 | 196 |
Awin | awin1.com | 118 | 111.2 |
196 |
FullStory |
fullstory.com | 230 | 105.6 | 1311 |
Listrak | listrakbi.com | 226 | 66 |
1403 |
LiveRamp |
pippio.com | 138 | 65.1 | 567 |
SmarterHQ | smarterhq. | 32 | 63.8 |
556 |
In the EU, the top five tracker domains collecting emails belonged to Taboola, Adobe, FinStory, Awin, and Yandex.
Meta and TikTok were the two biggest companies among the tracker domains that collected user data, mainly emails, in this manner. Both companies implement Automatic Advanced Matching, a feature documented to collect only hashed personal identifiers from the web forms only if the form is submitted.
However, the researchers discovered that both Meta and TikTok collected hashed personal data when the user clicks links or buttons that in no way resemble a submit button. “In fact, Meta and TikTok scripts don’t even try to recognize submit buttons, or listen to (form) submit events,†they noted.
Tracker domain collecting passwords
The top tracker domain collecting passwords, both in the EU and the U.S., was from Russia’s tech giant Yandex.Â
Tracker domains collecting passwords in the U.S.:
Entity Name |
Tracker Domain | Num. of sites | Prom. | Min. Rank |
Yandex | yandex.ru | 45 | 17.23 |
1688 |
Mixpanel |
mixpanel.com | 1 | 0.12 | 84547 |
LogRocket | lr-ingest.io | 1 | 0.12 |
82766 |
Tracker domains collecting passwords in the EU were Yandex.com, Yandex.ru, mixpanel.com, and lr-ingest.io.
See More: How Radical Data Privacy Fuels Growth
Top ten U.S. websites leaking email addresses to tracker domains
Rank |
Website | Third-party |
Hash/encoding/compression |
95 |
issuu.com | taboola.com | Hash (SHA-256) |
128 | businessinsider.com | taboola.com |
Hash (SHA-256) |
154 |
usatoday.com | taboola.com | Hash (SHA-256) |
191 | time.com | bouncex.net |
Compression (LZW) |
196 |
udemy.com | awin1.com | Hash (SHA-256 with salt) |
zenaps.com |
Hash (SHA-256 with salt) |
||
217 | healthline.com | rlcdn.com |
Hash (MD5, SHA-1, SHA-256) |
234 |
foxnews.com | rlcdn.com | Hash (MD5, SHA-1, SHA-256) |
242 | trello.com | bizible.com |
Encoded (URL) |
278 |
theverge.com | rlcdn.com | Hash (MD5, SHA-1, SHA-256) |
288 | webmd.com | rlcdn.com |
Hash (MD5, SHA-1, SHA-256) |
In the EU, the top five websites leaking data to third-party tracker domains were usatoday.com, trello.com, independent.co.uk, shopify.com, and marriott.com.
For additional details, the Leaky Forms study is available hereOpens a new window .
“Measuring the effect of consent choices on the exfiltration, we found their effect to be minimal. Based on our findings, users should assume that the personal information they enter into web forms may be collected by trackers — even if the form is never submitted,†the researchers concluded.
“Considering its scale, intrusiveness and unintended side-effects, the privacy problem we investigate deserves more attention from browser vendors, privacy tool developers, and data protection agencies.â€
Modasiya further advised organizations to implement “comprehensive real-time visibility and control into their site’s client-side supply chain attack surface, to identify vulnerabilities and anomalous behavior.†This way, they can steer clear of the threat from client-side supply chain attacks that are a possibility through third-party trackers.
“Additionally, they need to employ a comprehensive mitigation strategy that helps proactively mitigate compliance risk. This includes blocking the specific action of the third-party tracker without removing it from their website so they can access approved fields for legitimate purposes.â€
Let us know if you enjoyed reading this news on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We would love to hear from you!