CAT | Anonymity
Tor just announced that they have detected and blocked an attack that may have allowed hidden services and possibly users to be de-anonymized.
It looks like this may be connected to the recently canceled BlackHat talk on Tor vulnerabilities. One hopes so, otherwise the attack may have been more hostile than simple research.
Tor is releasing updated server and client code to patch the vulnerability used in this attack. This shows once again one of the key architectural weaknesses in Tor, the distributed volunteer infrastructure. On the one hand, it means that you are not putting all of your trust in one entity. On the other hand, you really don’t know who you are trusting, and anyone could be running the nodes you are using. Many groups hostile to your interests would have good reason to run Tor nodes and to try to break your anonymity.
The announcement from Tor is linked below.
Thanks to WhoIsHostingThis for providing this informative infographic (click to enlarge). They provide a cool service that allows you to look up the hosting service behind any website.
- A decision giving Canadians more rights to Anonymity
- Iraq’s recent blocking of social media and more
- Iran’s outright criminalization of social media
- A court decision requiring warrants to access cell tower location data
- Another court stating that irrelevant seized data needs to be deleted after searches
- A massive failure of data anonymization in New York City
- A court requiring a defendant to decrypt his files so they can be searched
- The Supreme Court ruling protecting cellphones from warrantless search.
- Phone tracking streetlights in Chicago
- And a small change for iPhones bringing big privacy benefits
The Importance of Privacy & The Power of Anonymizers: A Talk With Lance Cottrell From Ntrepid — The Social Network Station A recent interview I did, talking about data anonymization and mobile device privacy. Lance Cottrell is the Founder and Chief Scientist of Anonymizer. Follow me on Facebook, Twitter, and Google+.
One often hears that some massive collection of data will not have privacy implications because it has been “anonymized”. Any time you hear that, treat the statement with great skepticism. It turns out that effectively anonymizing data, making it impossible to identify the individuals in the data set, is much harder than you might think. The reason comes down to combinatorics and structured information.
This article on Medium by Vijay Pandurangan discusses a massive data set of NYC taxies, complete with medallion number, license number, time and location of every pick up and drop off, and more. The key to unraveling it is that there are just not that many taxi medallions, and the numbering structure only allows for a manageable possible number of combinations (under 24 million). While that would be a lot to work through by hand, Vijay was able to hash and identify every single one in the database in under 2 minutes.
Another approach would have been to make a set of known trips, note the location, time, etc., then use that to map the hash to the true identity. More work but very straight forward.
Even harder is the problem of combinatorics when applied to “non-identifying” data. One will often see birth date (or partial birth date) zip code, gender, age, and the like treated as non-identifying. Just five digit Zip-code, date of birth, and gender will uniquely identify people 63% of the time.
A study of cell phone location data showed that just 4 location references was enough to uniquely identify individuals.
This is a great resource on all kinds of de-anonymization.
The reality is that, once enough is collected is is almost certainly identifiable. Aggregation provides the best anonymization, where individual records represent large groups of people rather than individuals.
Update: small edit for clarification of my statement about aggregation.