What can you learn online? How about where someone is? Or where they live, where they work, where they hang out… One of the interesting ways to segment social media data is by the contributor's location, but it's a rare feature in social media analysis platforms because of the difficulty of doing it well. More than they realize, though, people are publishing their locations.
I've heard of two main methods to assign locations to social media sources. The easier method, which initially sounds more accurate, tracks down the IP network address of the associated computer. Every computer on the Internet has one, and in principle, the address corresponds somewhat to location. But the goal isn't to find a server, it's to segment online contributors by geography.
If addresses matched locations in some mythical past, they're useless for location now. Facebook is Facebook, wherever an individual user is. Blogs are hosted by a few big players; even with private domains, there's no guarantee that the web host is anywhere near the user. This blog, for example, is hosted on a machine in Pennsylvania—a long way from where I'm sitting. I have accounts on lots of social media sites, none of which are here.
So IP addresses might help you locate a computer server, but they're not a reliable indicator of where an individual user of that system may be.
Revealed demographics
The more interesting process, which I've heard from a handful of SMA companies, is to extract information revealed by the user, linking profiles across services to develop a profile of the person. If someone links a blog to accounts on services like Twitter, Facebook, and LinkedIn, then the combined profiles can build a better picture of the person. Location is one of the major components of that picture.
People have lots of opportunities to announce their location in social media, especially in all those member profiles we fill out. The location field in Twitter might be misleading (remember all the people who changed their location to Tehran in a show of support last year?), but if it agrees with Facebook, Linkedin, or the About Me page on the blog, you have a location.
That's without getting into location-based services like Foursquare. Everyone using those is building a personal tracking database on purpose.
Are you uncomfortable yet? At least this is all based on information that people shared intentionally—so far.
Oops, too much information
The New York Times has an article today, Web Photos That Reveal Secrets, Like Where You Live, which discusses the location metadata attached to digital photographs now. Sarah Perez wrote on the same topic a few weeks ago (Researchers Warn of Geotagging Dangers - Are You Concerned?). Cyberstalking, meet cybercasing: how to reveal your home address on Craigslist.
Both articles emphasize the privacy concerns, as they should. In aggregate, the data creates what Marshall Kirkpatrick calls your new superpower; applied to individuals, it's just creepy.
So, how much location information do you want? Where's the line between constructive location and demographics data and creepy/dangerous? Finally, whose ethics apply to analyzing this data?
Photo by Silver Smith.
Thanks for pointing out the problems associated with IP-based location determination (we don't use this approach). With more and more users on mobile phones and with the ability to geo-tag each tweet, there are better options available now. The confidence level on location accuracy remain low for many cases.
Regardless of the source, I believe that the location info should be used in aggregated analysis and should never be personally identifiable.
Drawing the line at personally identifiable information is one good option, which has the advantage of being relatively clear and non-creepy. However, one of the tactics in social media is to identify "influencers," and it's hard to avoid creating profiles of individuals in that process.
Figuring out which standards apply is one of the posts in the drafts folder. At this point, I don't think there's one right answer.