Mass Surveillance by Eavesdropping on Web Cookies
Interesting research:Abstract: We investigate the ability of a passive network observer to leverage third-party HTTP tracking cookies for mass surveillance. If two web pages embed the same tracker which emits a unique pseudonymous identifier, then the adversary can link visits to those pages from the same user (browser instance) even if the user’s IP address varies. Using simulated browsing profiles, we cluster network traffic by transitively linking shared unique cookies and estimate that for typical users over 90% of web sites with embedded trackers are located in a single connected component. Furthermore, almost half of the most popular web pages will leak a logged-in user’s real-world identity to an eavesdropper in unencrypted traffic. Together, these provide a novel method to link an identified individual to a large fraction of her entire web history. We discuss the privacy consequences of this attack and suggest mitigation strategies.
Cookies that give you away: The surveillance implications of web tracking
Over the past three months we’ve learnt that NSA uses third-party tracking cookies for surveillance (1, 2). These cookies, provided by a third-party advertising or analytics network (e.g. doubleclick.com, scorecardresearch.com), are ubiquitous on the web, and tag users’ browsers with unique pseudonymous IDs. In a new paper, we study just how big a privacy problem this is. We quantify what an observer can learn about a user’s web traffic by purely passively eavesdropping on the network, and arrive at surprising answers. At first sight it doesn’t seem possible that eavesdropping alone can reveal much. First the eavesdropper on the Internet backbone sees millions of HTTP requests and responses. How can he associate the third-party HTTP request containing a user’s cookie with request to the first-party web page that the browser visited, which doesn’t contain the cookie? Second, how can visits to different first parties be linked to each other? And finally, even if all the web traffic for a single user can be linked together, how can the adversary go from a set pseudonymous cookies to the user’s real-world identity?The diagram illustrates how the eavesdropper can use multiple third-party cookies to link traffic. When a user visits ‘www.exampleA.com,’ the response contains the embedded tracker X, with an ID cookie ‘xxx’. The visits to exampleA and to X are tied together by IP address, which typically doesn’t change within a single page visit [1]. Another page visited by the same user might embed tracker Y bearing the pseudonymous cookie ‘yyy’. If the two page visits were made from different IP addresses, an eavesdropper seeing these cookies can’t tell that the same browser made both visits. But if a third page, however, embeds both trackers X and Y, then the eavesdropper will know that IDs ‘xxx’ and ‘yyy’ belong to the same user. This method applied iteratively has the potential of tying together a lot of the traffic of a single user.
Once we had this idea, we wanted to test if it would
actually work in practice. Everything depends on just how densely
third-party trackers are actually embedded on sites. We conducted
automated web crawls of 65 simulated users’ web browsing over three
months, and found that unique cookies are so prevalent that the
eavesdropper can reliably link 90% of a user’s web page visits to the
same pseudonymous ID. (We omitted pages that embed no ID cookies at all,
but those are a minority.)
We also found that the cookie linking method is extremely
robust and succeeds under a variety of conditions (Section 4.1). We
considered how variations in cookie expiration dates, the size of the
user’s history (i.e., the number of pages visited), and the types of
pages visited affect the eavesdropper’s changes, and found the impact to
be minimal. Perhaps most significantly, however, we found that this
surveillance method can still link about 50% of a user’s history to the
same pseudonymous ID even with just 25% of the current density of
trackers on the web. This means that even if 75% of sites or trackers
adopt mitigation strategies (such as deploying HTTPS), the eavesdropper
still learns a lot.
Finally, we studied how an eavesdropper might learn the
real-world identity behind a cluster of web pages associated with a
pseudonymous ID. It turns out that this is surprisingly easy — many
sites display real-world attributes such as real name, username, or
email on unencrypted pages to logged in users, which means that the
eavesdropper gets to see these identifiers. We conducted a survey of
such leakage on popular sites, and found that over half of popular sites
with account creation leak some form of real-world identity (Section
4.2).
While it’s no surprise that web traffic contains sensitive
information about individuals, what we’ve shown is just how complete a
profile can be extracted even if the user’s traffic is mixed with
millions of other users. Further, an eavesdropper can connect these
profiles to real-world identities without needing the co-operation of
any websites. While HTTPS deployment by trackers can help, the only
practical solution at the current time seems to be for users to install
anti-tracking and anonymity tools.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.