June 2013 Archives

Writing at Wired UK, Paul Wright has some concerns about the use of social media monitoring in law enforcement: Meet Prism's little brother: Socmint. I'll quote a couple of sections, but you need to read the whole piece; its tone is at least as important as its content.

For the past two years a secretive unit in the Metropolitan Police has been developing the tools for blanket surveillance of the public's social media conversations,

Operating 24 hours a day, seven days a week, a staff of 17 officers in the National Domestic Extremism Unit (NDEU) has been scanning the public's tweets, YouTube videos, Facebook profiles, and anything else UK citizens post in the public online sphere.

The intelligence gathering technique—sometimes known as Social Media Intelligence or Socmint—has been used in conjunction with an alarming array of sophisticated analytical tools. [emphasis added]

Wright has a fairly alarmist—but accurate—take on something that's obvious to anyone who thinks about it: outside of a few protected spaces, what we do in social media is public, and government security and law enforcement agencies are using that data. It's the details of what they do with it that will make some people uncomfortable.

The problem is that public is gaining new depth of meaning as information moves online, and we haven't sorted the implications.

Nothing changes, but everything's changed
The new public information is persistent, searchable, and rich with analytic potential. I wrote about this last year (Why Government Monitoring Is Creepy), and it's still where I think we need to start. People seem to be expecting a sort of semi-privacy online, but the technology doesn't have that distinction. Data is either public or private, and the private space is shrinking.

The "alarming array" of tools refers to all the interesting stuff we've been talking about doing with social media data for years: text analytics, social network analysis, geospatial analysis… For business applications, we've mostly talked about analysis on aggregate data, but if you apply the lens toward profiling individuals and don't care about being intrusive, you can start to justify the concerns.

But several privacy groups and think tanks—including Big Brother Watch, Demos and Privacy International—have voiced concerns that the Met's use of Socmint lacks the proper legislative oversight to prevent abuses occurring.

It's worth noting that Wright's piece is specifically about law enforcement use of social media data, and he points to others who are concerned about overreach by law enforcement agencies. Here are the organizations mentioned, along with links to some of their relevant work:

This is the social data industry's PRISM problem: the risk that the revelations of intelligence agency practices will raise broader privacy concerns that include the business use of public social media data. They're different issues, but the interest sparked by the NSA disclosures has people thinking about privacy.

In this case, Wired makes the connection explicit with their headline, calling social media intelligence "Prism's little brother." As Wright demonstrates in his article, open-source social media monitoring raises issues, too.

Legitimate questions, too
There's more going on here than a question of perception. If invasion of online privacy gains traction as an issue, the important distinction between public and private data is only part of the issue. If we limit the topic to public data, the question becomes, what are the limits to the use of public data?

An important part of answering that question will depend on understanding why there should be limits, which goes to what is being done with the data. It's going to be worth separating the concepts of accessing the data and using it. What you do in your analysis may be even more sensitive than the data you base it on.

People are sharing more than they realize, and analysts can do more with that data than people think. As monitoring becomes pattern detection becomes predictive modeling, it becomes more likely to make people uncomfortable. Last year's pregnant daughter is this year's precrime is next year's thoughtcrime, or so the thinking goes.

Will concerns like this lead to new restrictions by governments or the companies who control the data? Will people cut back on their public sharing? Or will these concerns fade when the next topic takes the stage (squirrel!)?

What are the constraints?
The existing limits on social media monitoring and analysis boil down to this: If it is technically possible, not illegal, and potentially useful, do it (depending on your affiliations, professional ethical standards may also apply). What we're seeing is that the unrestricted use of social data has the potential to make people uncomfortable, which could have consequences for those who would use the data.

It's worth thinking about the constraints on using social data, which involves more than the ethics question. I have some thoughts, which I'll share later.

Secret agentTrust is an issue for an industry based on extracting meaning from what people share in social media. People don't have to use these services, and if they decide that their information might be used against them, they can stop. This week's revelations about the US intelligence agency monitoring social networks (among other sources) creates a massive trust issue for everyone who works with social media data. What now?

(This won't be an analysis of the NSA and Prism. I'm working from the same sources you are, and we'll probably have new information by the time I finish writing, anyway.)

The world reacts to US actions
David Meyer points out a threat to cloud computing vendors as customers and governments react to the news. US-based vendors can expect special challenges selling in Europe, where privacy is more protected and signs of a blowback from Prism are already appearing. In cloud computing, the trust issue relates to custody of the data—do you trust your vendor to keep your data safe and secure?—and the government version translates as a question of US-based vendors' ability to keep commitments to foreign governments.

But cloud computing is essentially just data center outsourcing. What does it mean for an industry that exists because of people's willingness to share publicly?

Access to the data is everything
The challenge to the social data industry is different. It's indirect, but potentially existential. What happens to your business as a result of the reaction to Prism? Will social networks tighten their terms of use to block data mining? Will EU safe harbor agreements create new requirements to protect user data (possibly by keeping it outside the US)? Will new legislation designed to limit government abuses include new limits on private-sector users?

Secret collection of private data by government agencies is fundamentally different from social media monitoring outside government. In business, we're working with publicly available data, which anyone can access without breaking the law or hacking a system. It's not espionage, but the facts aren't the problem.

The problem, as ever, is perception. The NSA is all over the news, and in the heated environment of a breaking story, subtle distinctions can get lost. The risk to the social data industry is that a reaction to government surveillance could become a problem for anyone doing the less intrusive type of monitoring.

How will you respond? What's your plan for minimizing the overreaction if it starts to get out of hand?

Responding as an industry
At its Big Boulder conference this week, Gnip announced the Big Boulder Initiative, which is an effort to start an industrywide discussion of the issues it faces. Trust is one of five issues they highlighted as starting points for discussion. The other news this week highlights the wisdom of the choice.

I'd go farther and ask a question I've asked before: should the companies who work with social data form an association to coordinate these discussions, codify standards, and speak for the group?

The ethics of social data
Let's go back to trust and consider the ethics of working with social data. Bob Gourley at CTOVision recently gave me a copy of Ethics of Big Data, a short e-book that lays out a process for establishing ethical limits to the use of big data. It's a worthy challenge, but I think the first step in the process—exploring an organization's values—will lose everyone. The Friedmanesque view that a business exists only to make a profit is common, which leaves only the law as a restraint on what can be done. "Be profitable" isn't the sort of value that will drive a hearty discussion of ethics.

I do think it's possible to have ethics of listening, but I don't see an existing standard that really applies. I don't see, for example, how ethical standards for social scientists, with their strict limits on personally identifiable information (PII) apply to social media monitoring in customer service. The standard for competitive intelligence boils down to "don't break the law," which appears to be the relevant limit on secret government programs, too.

Here's a starting point for discussion
I suggested a set of ethical standards for listening vendors in 2010 as a starting point, but the discussion went nowhere. Maybe it's time to try again. Comments are closed on the old post, but I'd welcome any discussion of the draft here.

The usual defense of social media monitoring in the private sector is that we're working with publicly available data, but monitoring public data can still be creepy. What's the plan for protecting public-source data mining from an overreaction to something far more invasive?

Photo by Marsmettnn Tallahassee.

Asking a computer to make sense of everyone's written opinions is a big challenge, but it's not the last one that social media will impose on anyone who wants to analyze it. We're sharing a lot of pictures in our virtual hangouts lately, which means it's time to update the old question. Instead of "what are people saying about us," the new question is something like, "what do people's pictures tell us about what they think of us and how they use our products?"

Just as the shared images give us access to new types of information about people, their tastes, and more, emerging technologies offer the promise of helping us understand the images at scale. To the vocabulary of text analytics or natural language processing, add computer vision. As with its text-processing cousin, it's not as evolved as your eyes, but it doesn't blink, and it doesn't sleep.

Looking at the photo directly
Let's say you want to track publicly shared photos that contain your company's logo. Without image analysis, monitoring depends on keywords in posts and photo descriptions, filenames, tags, and other metadata. It's better than nothing, but it has limitations. You're going to pick up images that don't actually include your logo, and you'll miss photos that include your logo but aren't about your logo.

If your tool can "see" product logos in photographs, you get access to a different type of information. You start to catch products and logos in the wild, where people really use them. The brand protection guys will like enhanced abilities to track counterfeits and parodies, but maybe this opens the door to a new kind of online ethnography, too.

Finding the technology
As demand picks up , you can expect the serious competitors in social media analysis to add image search capabilities. Already, Ninestars has added image recognition from a partner, and Meltwater's OculusAI acquisition suggests future capabilities with images. They won't be the last.

These companies are going at the image recognition challenge directly:

What's next?
Computer vision has lots of potential beyond spotting logos in photos. I imagine that this sort of product/logo identification will extend to video, though I'll need to talk to an expert to understand when to expect that.

And then there are people. We already have identity tagging in Facebook, and big money is going toward advancing facial recognition. I also found Real Eyes, a company that analyzes emotional responses from video, so visual analysis of faces isn't limited to identifying their owners.

The computers aren't just reading. They're starting to watch, too. Can you do something good with that?

This is one of those list posts that will grow as people point out more companies. Who'd I miss?

About Nathan Gilliatt

  • ng.jpg
  • Voracious learner and explorer. Analyst tracking technologies and markets in intelligence, analytics and social media. Advisor to buyers, sellers and investors. Writing my next book.
  • Principal, Social Target
  • Profile
  • Highlights from the archive


Monthly Archives