On March 12th, 2013, Mark Graham (my former boss at Oxford Internet Institute and permanent friend for life), Monica Stephens, and Catherine D’Ignazio participated as a panel at South by Southwest 2013. There, we discussed the importance of geolocation data, the potential future of it, and ramifications of an Internet ecosystem rife with metadata about the places we live and work – the whole bit was titled Location! The Importance of Geo-data. The wonderful Erhardt Graeff has already produced a blog entry that covers the material covered – this is reproduced below:
Devin Gaffney’s Presentation: What Your Friends Told Me
Devin’s talk is a study of Twitter data to locate a user by using their friends’ data. He cites two concepts as background to his project:
Homophily: Similarity breeds connection (McPherson et al. 2001)
Privacy Leakage: Network data shifts the locus of information control away from individuals (Jernigan and Mistree 2009)
Previous studies have shown that it’s possible to predict the sexuality of someone on the MIT campus simply by looking at the percentage of their LGB friends. “You are not actually anonymous by virtue of being on the internet.”
Devin pulled data from Twitter’s geodata stream in which tweets contain GPS information. He then collected data about the users’ friends, and compared their friends location data against their own.
Using self-reported location for a user on their Google or Yahoo profiles matching their screennames plus where you tweeted from results in an estimated range of 29 to 185 miles from actual location. Using friends’ location data, he was able to reduce that noise to 3.1 to 3.8 miles away. 95% of people could be located using his algorithm even though they didn’t have a location in their bio but were putting out geographic data in their tweets and probably didn’t know.
Plotting out the user actual locations and the user tweet data created a very noisy map, whereas the friends’ data map dramatically reduced the noise leaving only a few bad guesses left. It also showed distinct patterns of information flow across geographic paths. Most notable was the dense interconnection of Western Europe and Eastern Seaboard. Devin suggests that this might reveal social and economic ties that exist in the real world, which is where he hopes to go next with it.
It’s surprising how relatively trivial it seems to infer everything about someone simply because they are reflection of their friends and others around them. Devin suggests the implications for this are that privacy only really works when enough people are private. Scarier implication he imagines is the case of someone with a restraining order placed on them who could track down their victim through friends data even if they have changed their identity.