TAG | anonymity
Tor just announced that they have detected and blocked an attack that may have allowed hidden services and possibly users to be de-anonymized.
It looks like this may be connected to the recently canceled BlackHat talk on Tor vulnerabilities. One hopes so, otherwise the attack may have been more hostile than simple research.
Tor is releasing updated server and client code to patch the vulnerability used in this attack. This shows once again one of the key architectural weaknesses in Tor, the distributed volunteer infrastructure. On the one hand, it means that you are not putting all of your trust in one entity. On the other hand, you really don’t know who you are trusting, and anyone could be running the nodes you are using. Many groups hostile to your interests would have good reason to run Tor nodes and to try to break your anonymity.
The announcement from Tor is linked below.
The Russian Ministry of Internal Affairs recently announced a contest to create a method to identify Tor users, with a prize of about $114,000.
Clearly the government is worried about the ability of Tor to allow people to bypass the increasingly draconian Internet laws that have been put in place. This puts a big target on Tor, but people have been working on breaking Tor for years. This year a talk at Black Hat on cracking Tor anonymity was pulled without explanation after it was announced and scheduled.
Being free and well established, Tor has the largest user base of any privacy service, so it is the obvious first target. Its distributed design also introduces paths for attack not available in other designs like Anonymizer Universal.
It will be interesting to see if this move drives Tor users to other services, and whether that in turn leads to expanded efforts to crack those tools.
Thanks to WhoIsHostingThis for providing this informative infographic (click to enlarge). They provide a cool service that allows you to look up the hosting service behind any website.
The Importance of Privacy & The Power of Anonymizers: A Talk With Lance Cottrell From Ntrepid — The Social Network Station A recent interview I did, talking about data anonymization and mobile device privacy. Lance Cottrell is the Founder and Chief Scientist of Anonymizer. Follow me on Facebook, Twitter, and Google+.
One often hears that some massive collection of data will not have privacy implications because it has been “anonymized”. Any time you hear that, treat the statement with great skepticism. It turns out that effectively anonymizing data, making it impossible to identify the individuals in the data set, is much harder than you might think. The reason comes down to combinatorics and structured information.
This article on Medium by Vijay Pandurangan discusses a massive data set of NYC taxies, complete with medallion number, license number, time and location of every pick up and drop off, and more. The key to unraveling it is that there are just not that many taxi medallions, and the numbering structure only allows for a manageable possible number of combinations (under 24 million). While that would be a lot to work through by hand, Vijay was able to hash and identify every single one in the database in under 2 minutes.
Another approach would have been to make a set of known trips, note the location, time, etc., then use that to map the hash to the true identity. More work but very straight forward.
Even harder is the problem of combinatorics when applied to “non-identifying” data. One will often see birth date (or partial birth date) zip code, gender, age, and the like treated as non-identifying. Just five digit Zip-code, date of birth, and gender will uniquely identify people 63% of the time.
A study of cell phone location data showed that just 4 location references was enough to uniquely identify individuals.
This is a great resource on all kinds of de-anonymization.
The reality is that, once enough is collected is is almost certainly identifiable. Aggregation provides the best anonymization, where individual records represent large groups of people rather than individuals.
Update: small edit for clarification of my statement about aggregation.