CAT | Anonymity
The Importance of Privacy & The Power of Anonymizers: A Talk With Lance Cottrell From Ntrepid — The Social Network Station A recent interview I did, talking about data anonymization and mobile device privacy. Lance Cottrell is the Founder and Chief Scientist of Anonymizer. Follow me on Facebook, Twitter, and Google+.
One often hears that some massive collection of data will not have privacy implications because it has been “anonymized”. Any time you hear that, treat the statement with great skepticism. It turns out that effectively anonymizing data, making it impossible to identify the individuals in the data set, is much harder than you might think. The reason comes down to combinatorics and structured information.
This article on Medium by Vijay Pandurangan discusses a massive data set of NYC taxies, complete with medallion number, license number, time and location of every pick up and drop off, and more. The key to unraveling it is that there are just not that many taxi medallions, and the numbering structure only allows for a manageable possible number of combinations (under 24 million). While that would be a lot to work through by hand, Vijay was able to hash and identify every single one in the database in under 2 minutes.
Another approach would have been to make a set of known trips, note the location, time, etc., then use that to map the hash to the true identity. More work but very straight forward.
Even harder is the problem of combinatorics when applied to “non-identifying” data. One will often see birth date (or partial birth date) zip code, gender, age, and the like treated as non-identifying. Just five digit Zip-code, date of birth, and gender will uniquely identify people 63% of the time.
A study of cell phone location data showed that just 4 location references was enough to uniquely identify individuals.
This is a great resource on all kinds of de-anonymization.
The reality is that, once enough is collected is is almost certainly identifiable. Aggregation provides the best anonymization, where individual records represent large groups of people rather than individuals.
Update: small edit for clarification of my statement about aggregation.
Canada’s Supreme Court just released a ruling providing some protection for on-line anonymity. Specifically, the ruling requires law enforcement to obtain a warrant before going to an Internet provider to obtain the identity of a user. Previously they were free to simply approach the provider and ask (but not compel) the information.
The judges found that there is a significant expectation of privacy with respect to the identifying information, and that anonymity is a foundation of that right.
Unfortunately the case in question revolves around child pornography, which creates a great deal of passion. Much of the reaction against the decision has come from those working to protect abused children. Because the ruling has implications primarily far from child porn cases, I applaud the court in taking the larger and longer view of the principle at work.
It is important to remember that the court is not saying that the information can not be obtained. This is not an absolute protection of anonymity. This decision simply requires a warrant for the information, ensuring that there is at least probable cause before penetrating the veil of anonymity.
Paying for anonymity is a tricky thing, mostly because on-line payments are strikingly non-anonymous. The default payment mechanism on the Internet is the Credit Card, which generally requires hard identification. There are anonymous pre-paid cards, but they are getting harder to find, and most pre-paid cards are requiring registration with real name and (in the US) social security number.
We are working on supporting Bitcoin which provides some anonymity, but not as much as you might think. New tools for Bitcoin anonymity are being developed, so this situation may improve, and other crypto currencies are gaining traction as well.
When it comes to anonymity, cash is still king. Random small US bills are truly anonymous, and widely available (1996 study showed over half of all physical US currency circulates outside the country). While non-anonymous payments only allow Anonymizer to know who its customers are, not what they are doing, that information might be sensitive and important to protect for some people.
That is why Anonymizer accepts cash payments for its services. Obviously it is slower and more cumbersome, but for those who need it, we feel it is important to provide the ultimate anonymous payment option. If you are looking at a privacy provider, even if you don’t plan to pay with cash, take a look at whether it is an option. It could tell you something about how seriously they take protecting your privacy overall.
Here is more evidence that, if a service has access to your information, that it can get out. In this case the privacy services Whisper and Secret have privacy policies that say they will release messages tied to your identity if presented with a court order, but also to enforce their terms of service and even in response to a simple claim of “wrongdoing” (whatever that might mean).
Anonymizer has no logs connecting user activity to user identity, thus we don’t have these problems.