February 2012 Archives

Translator boothsDoes your social media program include foreign language requirements? Even if your company does business in only one country, you might need multiple languages. The question is, how much capability do you need to check off the language box?

An email from a vendor contact in Tokyo reminded me of a conversation at the Tech@State conference a few weeks ago. We were talking about monitoring Arabic-language social media, and someone pointed out that their analysts know Arabic. They don't need translation; they just need to collect the content for their analysts to read.

It's an important distinction, and it led us into a conversation about what a monitoring platform needs to do to support different types of users. The short answer is, language capability is more than a one-box checklist item. You have to know your needs in order to evaluate tools.

International support in the software
Let's start with the scenario from that initial conversation: an organization is looking for a software platform for monitoring or analyzing social media content in a specific set of languages. Here are some tool capabilities that might back up a claim about language support:

  1. Find content written in a language.
    Theoretically, all you need is support for the required character set, search terms in the desired language, and a broad range of sources. In practice, it's harder to collect content in some countries and languages than others. Ask about source coverage in the countries you need to include.

  2. Translate foreign-language content.
    In the age of Google Translate and other machine-translation programs, it's easy to add a translate button to a tool. If your needs are simple, machine translation could be good enough.

  3. Filter content by language.
    The most basic level of language support involves identifying the language used in a text. Based on some of my testing, that's harder than it looks. Tools that can identify source languages usually offer filters based on the language, which is useful for directing items to analysts who can read them, as well as for analysis of content by language.

  4. Apply text analytics to content in the language.
    Adding more languages to the analytics engine of a social media platform is hard work. I've heard from several sources that adding sentiment analysis in another language, for example, is equivalent to starting over. If you want your tool to do text analytics in a specific language—sentiment, topics, entity extraction, and the rest—ask specifically if those features are supported in the languages you need to analyze.

  5. Provide a user interface in a language.
    So far, this has all been about the content. If you're working with native-speaker analysts, though, you may also want to support them with a user interface in their language. I've talked with people at companies that monitor social media in multiple languages using teams in multiple countries. Giving them a UI in their own language(s) is a nice touch, and one that probably pays off in increased productivity.
International support in services
Now, let's look at the other side of the business: the services market. One easy way to add coverage of additional markets is to send the work to an agency that has those capabilities. The first question is, can they support the languages you need? The follow-up question is, how do they do it?

  1. Multilingual analysts
    Is it adequate to have an analyst who knows the language? Depending on your circumstances, it could be.

  2. Native language analysts
    Anyone who's studied a foreign language knows that it's easier to learn as a small child. Native fluency makes the analyst more likely to catch subtleties that a non-native speaker might miss.

  3. Native analysts located in foreign market
    If your native-fluent analysts are current residents of the foreign country of interest, they may be better attuned to current events and cultural trends than their peers working in another country.

  4. Vendor based in foreign market
    Social media analysis firms are virtually everywhere (try searching a country name in the directory). You can find native analysts who work abroad for international firms, and you can find them working for smaller firms based in their country. Working with foreign vendors adds complexity, but it could be the right answer in some circumstances.
It's easy to make up a list of languages and mark them yes or no. When I did my first report on companies in social media analysis in 2007, I didn't go much farther than that (I did ask about native fluency among analysts). If you're building a capability with international scope, be clear about the level of language support you need, and you'll be a big step closer to finding the right partners for your program.

I've been thinking lately about nuances in product requirements. More to follow.

Photo by David Weekly.

Why Government Monitoring Is Creepy

Eavesdrop phoneQuiz: A government agency wants to monitor social media in the course of performing its function. Is that an obvious use of public information, or further evidence of a dark conspiracy? Oh, good, I see lots of hands for both answers. Let's look at what's really going on here.

You have zero privacy anyway. Get over it.
—Scott McNealy (1999)
When people hear about social media monitoring by a government agency—such as the recent news of FBI, DHS, and CIA programs—the usual response is outrage about the perceived violation of privacy. People are living their lives online, and they don't want the government listening in.

Superficially, that's completely understandable. Most of us don't want people eavesdropping on us, even if we aren't hiding anything and don't harbor conspiracy theories. We just like our conversations to be kept within the group we think we're talking to. The usual response makes intuitive sense, even if we realize that these online conversations are, technically, public.

(By the way, I'm assuming that we're talking about governments in free, democratic countries here. Events over the last few years have clearly demonstrated the danger to people sharing information and opinions in countries with repressive regimes during times of instability. Sometimes, it's easy to decide whether the government is using or abusing people's information.)

Expectations of privacy
Where do we get this expectation of privacy in public places? Everybody knows that Twitter is public (unless you make your updates private), Facebook has public updates, YouTube is for the world, many forums are public, and blogs are a form of publishing, right?

How can we expect privacy in a public place?

Read that last sentence again, and I think we'll start to see what happened. We're not really talking about a public place—it's not a place at all. All of this Internet-based communication happens in a virtual space, which is shared by everyone. Virtual means almost, which also means not. A virtual space is not a real space; it's an artificial environment that is different from the real world in important ways.

The nature of public is one of those ways.

Public doesn't mean what it used to mean
Imagine having a conversation with a friend in a public place—a city street, maybe, next to a bus stop, or a sports stadium during a game. These are public places. We may have norms against eavesdropping, but someone standing close to you might hear your conversation. So your expectation of privacy is reduced, compared to when you have a conversation in a home or office.

The physical world imposes limits on the potential audience for conversations. Sound drops off over distance, and quickly. Other sounds in the environment block out the conversation, too. If you're talking while a bus leaves the stop or a big play happens on the field, even the person you're talking to might have trouble hearing you. A few feet away, you're inaudible. Across the street or stadium, you may as well not exist.

The Internet is different. A whisper on the other side of the world is as clear as a shout in a quiet room. A million people can talk at the same time, and we can pick out individual conversations—all of them. Say something today, and it's still there tomorrow. Time, distance and the crowd—none of them recreate the semi-privacy we experience in physical settings.

The conversation at the bus stop and the isolated tweet are both public, and yet they're entirely different. The differences come back to the difference between the Internet and the physical world. People react to the perceived violations of privacy because they learned their ideas of public and private in the physical world, and the different physics of information in the virtual world break their mental models.

A clear dichotomy
The virtual world also breaks the in-between states of semi-private and semi-public. There's no semi online. Private is uncertain, too.

Three can keep a secret, if two of them are dead.
—Benjamin Franklin
Some online venues make the attempt to be private, but it's enforced with terms of service and technical measures that can be defeated. Any notion of privacy in online communications has an element of trust, which may be backed up by contracts or law. But it's not private in the same way as a conversation in a closed room.

Public discussions, on the other hand, are really public, in a globally ubiquitous way that the physical world can't match. Those open Twitter accounts and blog posts, the groups and forums that anyone can read. Comments on newspaper sites and book reviews. Videos and pictures uploaded all over the place. Anyone can see them—milliseconds or months later.

This isn't the first time
We've run into this qualitative change in the nature of public information before. Think about public records that the government keeps, such as on property transactions. These records have always been public, but pre-Internet, realities of the physical world created barriers to access.

If you wanted to look at property records, you had to go to the clerk in the appropriate local government office. You'd probably wait in line, and when it was your turn, you made your request. If you asked for something the clerk could find, you could look at the file, and you might pay ten cents a page to get a copy.

Where's the record today? It's on the web, with a database query engine that lets you look up properties by owner or address, with wild cards in your queries. If you don't find what you want, you look again—as many times as you like. When you find something interesting, you have all the information, which you can save or print as much as you like.

On other web sites, that same public record is aggregated with many others, mashed up in a map that shows house prices everywhere. Zoom out, get the big picture. Zoom in, find out what your neighbor paid for that house. It's the same public record, but putting it on a computer and making it available on the web completely changes what it means to be public.

The world changes faster than we adapt
We're so used to the constant rush of innovations and what we can do with them. We're not so good with thinking about the implications and adjusting our mental models. People start sharing their lives in these public channels, without thinking about what happens to the information. Remember the first stories of job applicants who shared the wrong pictures in Facebook?

Now, government agencies are opening up about their interest in what people have to say online, and we have this wounded sense of privacy based on expectations from the physical world. All that data is public, in the expanded sense of online public information. Did people think that officials wouldn't find it useful?

The value to government is obvious, but we need a reasoned discussion on the appropriate tradeoffs between government use and individual protection. All of which is far too much for an already long-winded blog post.

Related posts:

Photo by Jeff Schuler.

About Nathan Gilliatt

  • ng.jpg
  • Voracious learner and explorer. Analyst tracking technologies and markets in intelligence, analytics and social media. Studying complexity and futures.
  • Principal, Social Target

Subscribe