Recently in Intelligence Category

It's bad enough when people are wrong as they express facts and opinions on the Internet. Mistakes happen. But there's more going on. Some people are intentionally adding noise to the online world, in an attempt to mislead users and analysts. Garbage in, garbage out, so how do we catch the garbage before it becomes part of the analysis?

This post is the second in a series. The first is Can You Trust Social Media Sources? Most of my posts aren't this long; the next will be nice and short.

Catching and deleting spam and other garbage in social media data is one side of an arms race, just like email spam and computer viruses. Developers of social media analysis platforms work to eliminate spam from their results, and spammers develop new tactics to dodge the filters. As long as the incentives remain, people will find ways to game the system.

For most analysts, the main response is to pick a platform that does a decent job of catching the undesirable content. Most do some sort of machine learning to identify and filter spam, and while the results are imperfect, they're useful as a first step. The second step is to allow users to flag content as spam, and it's good if the system learns from that action. A third step is to allow users to blacklist a site altogether; once you know it's not what you're looking for, there's no need to rely on the spam-scoring engine.

Evaluating questionable data
This is where I'd love to give you the magic button that reveals deceptive content. I'd like to have the Liar Liar power, too, but that's not going to happen. Instead, I have some ideas of how to think about questionable results. Most of them are in the form of questions. Some are more probabilistic than definitive, but I think they could be helpful.

  • Consider your purpose
    Your sensitivity to garbage in your data depends on what you're doing with it. If you're monitoring for customer service purposes, flag the spam and move on. If you're reporting on broad trends, you might get better results through sampling, or by focusing on high-quality sources. If you're looking for weak signals, you may not have the luxury of ignoring the low signal-to-noise ratio of a wide search. As always, match the effort to the objective.

    Some people actually need to look at spam—consider the legal department. If a link leads to a site selling counterfeit merchandise and you're in a trademark protection role, the spam is what you're looking for.

  • Consider the source (person)
    Who posted the item in question, and what do you know about them? Is the poster a known person? What do you know from the individual profile? Who does the person work for? What groups is the person connected to? Does the person typically discuss the current topic? Is the person's location consistent with the information shared?

    If you're not sure whether the poster is a person or a persona, develop a profile. A persona is like a cover identity; it can be strong or weak. Does the persona have a presence on multiple networks? Since when? Is it consistent across networks? Does it have depth, or is every post on the same topic? Who does the persona associate with online, and what do you know about them? Do the persona's connections reveal the complexity of relationship types that real people develop (school, work, family, etc.)? Do the profiles and connections give information about background that can be checked?

    For questionable sources, think about the different types of data that might reveal something through social network analysis.

    Back at the Social Media Analytics Summit, Tom Reamy described work by researchers to identify the political leanings of writers, based on their language choices (writing about non-political topics). Can we use text analytics to add information about native language, regional differences, and subject-matter expertise to individual profiles?

  • Consider the source (site)
    Where was the data posted? What do you know about the site? Is it a known or probable pay-to-play or disinformation site? Is it a content-scraping site? Does it have information from a single contributor (such as a blog) or from many (such as a crowdsourcing site)? What else is posted to the site? Where is it hosted? Who owns it? Where are they based? What can you learn from the domain registration?

    What's the online footprint of the site? Is it linked to real people in social networks? Is it used as a source by other people? Credibility flows through networks; do known, credible (not necessarily influential) people link to it and share its content in their networks? Does it appear to have bought its followers, or are they real people?

  • Consider other sources
    If you're going to do something serious—and I'll leave the definition of serious as an exercise for the reader—don't trap yourself in a new silo for social media data. What else do you know? What do other online sources say? Does the questionable data fit with what you're getting from sources outside of social media? Are you getting similar information from credible sources, or are all of the sources for the questionable data unknown?

    A few months ago, I heard Craig Fugate, the Administrator of the (US) Federal Emergency Management Agency (FEMA), tell a story about government agencies and unofficial sources of information. The story involved a suspected tornado and unconfirmed damage reports in social media. Government agencies prefer official reports from first responders and other trained observers, so the question was how to evaluate reports in social media.

    In the case of severe weather, one answer is to compare the reports with official sources of weather data. If radar indicated a likely tornado passing over a location a few minutes before the damage reports, then you'd know something important that should help evaluate those reports. What's the analogy for your task? Is there a hard-data source that can add relevant information? Does a geospatial view add a useful dimension (such as radar, post location, and photo metadata all in same location would, in the example)?

  • Consider the incentives
    What does a potential adversary stand to gain by fooling you—or someone else looking at the same data—with false information? Who gains by leading you to an incorrect action? Who makes money on your decision? Who benefits from misleading other people with false information (think product reviews and propaganda)? Is questionable information in your system consistent with the aims of an interested party?

    Part of the challenge here is that false information could be intended to mislead anyone. The target could be an individual, a small group, or entire populations. Who gains? Is there a link from the source to an interested party?

  • Consider the costs
    Part of what makes spam so frustrating is the volume level—there's a lot of the stuff around. At some point, the signal-to-noise ratio gets so low that the source becomes useless, unless you can identify and eliminate the junk. In a way, all that junk adds up to a sort of denial-of-service attack at the content layer. Is there a way to deal with that?

    A denial-of-service (DOS) attack and its scaled-up variant, the distributed denial-of-service (DDOS) attack, overload the targeted web site with simultaneous requests, causing it to become unavailable to real visitors. In 2010, Amazon weathered a DDOS attack without losing service. The explanation was that their normal operation looks a lot like a DDOS attack—lots of people visiting the site simultaneously. Their system was built to handle that kind of load, so the attack failed. One answer to a DDOS attack, then, is to have the capacity to handle the load.

    The social media analysis equivalent is to process it all, so what would that look like? Would a deeper analysis of known junk and its sources help improve the identification of junk? Would it tell you something useful about the parties that post the junk?

  • Consider the consequences
    The final point is to revisit the first point. What are you trying to accomplish? What decision will you make based on the data, and what happens if the information was false? What if it was placed there to manipulate your response (even if the information itself is true)? Does the rest of the decision-making process have the safeguards to prevent costly errors?
The hard problem
One way to look at this is to go through the whole process while thinking "spam." Junk results are an annoyance if you're doing day-to-day monitoring for business, and they're a problem if you're doing quantitative analysis. The technology is improving, and you have options for dealing with spam in these settings.

Some junk isn't that hard to catch, especially once a person looks at it. Gibberish blog comments are easy to identify. Names and email that don't match are sort of obvious, too. Content scrapers and other low-quality sites tend to have a certain look. If you have time to look at the spam that evades your filters, you can catch a lot of it.

The real challenge comes in looking for intelligence—whether in business, finance, politics, or government—in the presence of a motivated and well-funded adversary. If someone wants to fool you—or at least keep you from using an online source—they can improve their chances by better imitating the good data surrounding their junk. The quick glance to identify spam becomes a bigger effort, with more uncertainty.

Pay-to-play blogs may have original content from professional writers, so you can't just look for poor quality. False personas may be developed over time, with extensive material to create a convincing backstory. Networks of such personas could post disinformation, along with more normal-looking content, across multiple sites. With time and resources, personas can appear solid, which is why governments are investing in them.

I think some of the techniques above could help, but it's really a new arms race. The problem for everyone else is that this arms race will tend to poison the social media well for everyone who wants to discuss the contested topics.

If your organization is interested in these topics, don't just read the blog. Call me. As long as this post is, it's the short version. Clients get the full story.

XKCD cartoon by Randall Munroe.

FutbolBefore you can pull insights from your data, you need data, but I'm hearing more concerns about data quality in social media analysis lately. Before, people asked about the traditional tradeoff in text queries: finding relevant content while excluding off-topic content. Lately, I'm hearing more about social data that's intentionally tainted. If you're looking for meaning in social media data, you may have to deal with adversaries.

Yes, and you've been playing without an opponent, which is, as you may have guessed, against the rules.
— "Anton Ego," Ratatouille

Ask a company with three initials as a name how many three-letter abbreviations are in use, and you get a sense of the challenge in finding relevant content. Common words as brand names pose a similar challenge (I always like the examples of Apple and Orange, because it's the one time you really can compare them). If people are honest and expressing their real opinions, it's hard enough to find what you're looking for.

The problem is, people aren't always honest. You also need to get rid of intentional noise in the data.

The analyst's adversaries

  • Spam
    We've all seen online spam (sorry, Hormel, you must hate that term). Junk mail for hormones and drugs in email, junk comments on blogs, junk blogs, trashy web sites—the costs are so low that even microscopic conversion rates are profitable, so it persists. Some of that shows up in social media, which is the problem here.

    At the recent Social Media Analytics Summit, Dana Jacob gave a talk on the spam that finds its way into the search results of social media analysis platforms, skewing the numbers. One tidbit that Dana shared to illustrate the challenge: If you consider all of the creative misspellings, there are 600 quintillion (6 x 1020) ways to spell Viagra. So removing all of the spam from your data is a challenge.

    Spam seems to come in two flavors, neither of which will help you understand public opinion or online coverage. One is designed to fool people, to get them to click a link. It may lead to malware or fraud, or to some sort of product for sale. The other is designed to fool search engines with keywords and links embedded in usually irrelevant text. It's usually obvious to a human reader, but the hope seems to be that some search engines will count the links in their ranking of the target site.

  • Gaming analytics platforms
    Another presenter outlined a more direct challenge to the social media analyst when he described his system to game analytics systems with content farms and SEO tactics. He talked about using weaknesses in analytics systems to plant information in them. One slide described his methods as "weaponizing information in a predictive system," which doesn't leave a lot of room for exaggeration.

    He even used a real client as an example. The question is, how many others do the same thing, but discreetly? If you're looking for market intelligence in social media, do you trust your sources?

  • Deception in crowdsourced data
    Another conversation went into the potential poisoning of the crowdsourcing well, in this case one of the crowdmapping efforts in a political conflict. If one party to the conflict entered false reports—perhaps to discredit the project or misdirect a potential response—could it be detected?

  • Sockpuppets
    Beyond the crowdmapping context, can you detect opposition personas that post false reports in social media? It's a standard tactic in the government/political arena, but it could hit you in business, too. All you need is a motivated opponent.
It's a little farther afield, but read Will Critchlow's post on online dirty tricks for more ideas on how our tools can (will) be used against us. If you work with political clients, you'll want to understand how they work. For everyone else, it's another lesson toward being an informed voter.

Next: ideas for detecting deception
I don't mean to be all problem and no solution, but this post is already a long one. I'll share some ideas on how we might detect deception in social media in my next post. For now, I'll end with a happier observation: Sometimes, people lie in real life and get caught when they reveal the truth in social media.

Update: Part 2 is now up: Detecting Deception in Social Media

Photo by John Cooper.

Twitter metadataDo you put social media data on a map? Location is a handy dimension for slicing, dicing, and visualizing your data. The question is, which location are you visualizing? Even a single tweet—in under 140 characters—can have four different locations.

I've taken a real interest in applying geospatial analysis to social media over the past year. It's been especially appropriate in emergency management and some other discussions with government types. Mostly, though, it's just another lens to apply to social media data, another way to find some value in the data we have now.

So, you want to put social media activity on a map. It's worth thinking about what that location really represents. One little statement can have four distinct locations, depending on how you look at it:

  1. Location of the service/server
    Internet-based communications happen in this virtual space where physical location is largely irrelevant, but everything runs on a computer somewhere—even in the cloud.

    You could even separate this one into two (or more) locations—the locations of the server and of the company that owns it—but for most of us, these are the least relevant locations. A few specialists need to know the physical or logical location of a server, but for the rest of us, there's nothing to see here.

  2. Location of the account
    Look at an account on Twitter, Facebook, or other social network. Most of them have a place for users to provide their location. Its accuracy depends on the account owner, which is why you see so many Twitter accounts located in "Earth" or something similarly uninformative. During the pro-democracy protests in Iran, a lot of people set their Twitter locations to Tehran in sympathy with the protesters.

    At its most useful, the location associated with an account tells you a default location for a user—home base.

  3. Location of the post
    Social and mobile are increasingly two aspects of the same technology-adoption trend, as more people take their social media through mobile devices. With geolocation tagging and location-based services, they're sharing their immediate location: "I am here, now." This is the location you're most likely to see represented on a map.

  4. Location of the described event
    This last location won't be encoded in an API, because it's found in the content people share. When they talk about events in the real world, they mention places, possibly indirectly. You'll need a text analytics tool that recognizes locations to extract those. When they post pictures, the photos may include location metadata from the camera.
Let's put them all together with a couple of hypothetical examples. We'll ignore the location of the server, because it's not relevant for most uses.

  • Let's say that I tweet about an event in Egypt (4) during a break at a conference in Washington (3). My account location (2) is in North Carolina. How does that compare with a geotagged photo (4) of the same event sent from Cairo (3) by an account that says it's located in Cairo (2)?

  • It's another stormy day in the middle of America, and someone posts a picture of a damaged building (4) on Facebook. The account location (2) and post location (3) are nearly the same, and they're in the projected path of a tornado, based on National Weather Service radar data. Do you believe that a tornado hit the building?
Despite all of that muddying of the water, you're probably ok if you use the per-post geolocation data for most purposes. When in doubt, always remember to state your question clearly, and then you can pick the right data to answer it.

Illustration: Map of a Twitter status object by Raffi Krikorian.

Why Government Monitoring Is Creepy

Eavesdrop phoneQuiz: A government agency wants to monitor social media in the course of performing its function. Is that an obvious use of public information, or further evidence of a dark conspiracy? Oh, good, I see lots of hands for both answers. Let's look at what's really going on here.

You have zero privacy anyway. Get over it.
—Scott McNealy (1999)
When people hear about social media monitoring by a government agency—such as the recent news of FBI, DHS, and CIA programs—the usual response is outrage about the perceived violation of privacy. People are living their lives online, and they don't want the government listening in.

Superficially, that's completely understandable. Most of us don't want people eavesdropping on us, even if we aren't hiding anything and don't harbor conspiracy theories. We just like our conversations to be kept within the group we think we're talking to. The usual response makes intuitive sense, even if we realize that these online conversations are, technically, public.

(By the way, I'm assuming that we're talking about governments in free, democratic countries here. Events over the last few years have clearly demonstrated the danger to people sharing information and opinions in countries with repressive regimes during times of instability. Sometimes, it's easy to decide whether the government is using or abusing people's information.)

Expectations of privacy
Where do we get this expectation of privacy in public places? Everybody knows that Twitter is public (unless you make your updates private), Facebook has public updates, YouTube is for the world, many forums are public, and blogs are a form of publishing, right?

How can we expect privacy in a public place?

Read that last sentence again, and I think we'll start to see what happened. We're not really talking about a public place—it's not a place at all. All of this Internet-based communication happens in a virtual space, which is shared by everyone. Virtual means almost, which also means not. A virtual space is not a real space; it's an artificial environment that is different from the real world in important ways.

The nature of public is one of those ways.

Public doesn't mean what it used to mean
Imagine having a conversation with a friend in a public place—a city street, maybe, next to a bus stop, or a sports stadium during a game. These are public places. We may have norms against eavesdropping, but someone standing close to you might hear your conversation. So your expectation of privacy is reduced, compared to when you have a conversation in a home or office.

The physical world imposes limits on the potential audience for conversations. Sound drops off over distance, and quickly. Other sounds in the environment block out the conversation, too. If you're talking while a bus leaves the stop or a big play happens on the field, even the person you're talking to might have trouble hearing you. A few feet away, you're inaudible. Across the street or stadium, you may as well not exist.

The Internet is different. A whisper on the other side of the world is as clear as a shout in a quiet room. A million people can talk at the same time, and we can pick out individual conversations—all of them. Say something today, and it's still there tomorrow. Time, distance and the crowd—none of them recreate the semi-privacy we experience in physical settings.

The conversation at the bus stop and the isolated tweet are both public, and yet they're entirely different. The differences come back to the difference between the Internet and the physical world. People react to the perceived violations of privacy because they learned their ideas of public and private in the physical world, and the different physics of information in the virtual world break their mental models.

A clear dichotomy
The virtual world also breaks the in-between states of semi-private and semi-public. There's no semi online. Private is uncertain, too.

Three can keep a secret, if two of them are dead.
—Benjamin Franklin
Some online venues make the attempt to be private, but it's enforced with terms of service and technical measures that can be defeated. Any notion of privacy in online communications has an element of trust, which may be backed up by contracts or law. But it's not private in the same way as a conversation in a closed room.

Public discussions, on the other hand, are really public, in a globally ubiquitous way that the physical world can't match. Those open Twitter accounts and blog posts, the groups and forums that anyone can read. Comments on newspaper sites and book reviews. Videos and pictures uploaded all over the place. Anyone can see them—milliseconds or months later.

This isn't the first time
We've run into this qualitative change in the nature of public information before. Think about public records that the government keeps, such as on property transactions. These records have always been public, but pre-Internet, realities of the physical world created barriers to access.

If you wanted to look at property records, you had to go to the clerk in the appropriate local government office. You'd probably wait in line, and when it was your turn, you made your request. If you asked for something the clerk could find, you could look at the file, and you might pay ten cents a page to get a copy.

Where's the record today? It's on the web, with a database query engine that lets you look up properties by owner or address, with wild cards in your queries. If you don't find what you want, you look again—as many times as you like. When you find something interesting, you have all the information, which you can save or print as much as you like.

On other web sites, that same public record is aggregated with many others, mashed up in a map that shows house prices everywhere. Zoom out, get the big picture. Zoom in, find out what your neighbor paid for that house. It's the same public record, but putting it on a computer and making it available on the web completely changes what it means to be public.

The world changes faster than we adapt
We're so used to the constant rush of innovations and what we can do with them. We're not so good with thinking about the implications and adjusting our mental models. People start sharing their lives in these public channels, without thinking about what happens to the information. Remember the first stories of job applicants who shared the wrong pictures in Facebook?

Now, government agencies are opening up about their interest in what people have to say online, and we have this wounded sense of privacy based on expectations from the physical world. All that data is public, in the expanded sense of online public information. Did people think that officials wouldn't find it useful?

The value to government is obvious, but we need a reasoned discussion on the appropriate tradeoffs between government use and individual protection. All of which is far too much for an already long-winded blog post.

Related posts:

Photo by Jeff Schuler.

Defining a Silo Buster

Pit stopI recently saw a job description that tells me I'm not the only one looking for the value that's lost when analytical methodologies keep to themselves. Change a few key words, and it becomes something that a lot more organizations could use. Maybe yours?

Cross-pollinating analytics
I really like the idea of learning from other fields, such as the physicians who used lessons from Formula One pit stops to improve patient transfers. Most of us aren't working on anything that is truly different; you just have to find the relevant lessons from unrelated fields. It sounds hard, but I think that opening your mind to the possibility is the step most people miss.

I use the metaphor of cross-pollination a lot when I talk with people about intelligence and analytics (cue a silo rant if you missed it). The short version is, I think the various analytics specialties are missing value when they reinvent each others' solutions and fail to learn from each other.

You can get a broader application of the concept from Matt Ridley: When ideas have sex. We work better when we don't try to do everything ourselves.

Hiring a silo-busting analyst
Breaking down some of those barriers is the idea behind AnalyticsCamp, so I was really pleased when I found this great job description at the CIA a few months ago (emphasis added):

As an Analytic Methodologist, you will have the opportunity to develop and apply analytic methods to add rigor and precision to intelligence analysis and collection. You will provide statistical, operations research, econometric, mathematical, or geospatial modeling support to Agency analysis, and you will incorporate your findings into a broad range of intelligence products. Agency analysts are encouraged to maintain and broaden their professional ties through academic study, contacts and attendance at professional meetings. They may also choose to pursue additional studies in fields relevant to their areas of responsibility.

Maybe I'm seeing what I want to see, but that looks like And not Or thinking to me (though I would like to see a longer list of methods). Notice the continuing development aspects, too. What would you think if we adapted it to business, changing the specific types of analysis to the specialties at work in business and added a few that could be at work?

Your company might not offer some of the specific perks of government work, but what are you doing to encourage your analysts to develop beyond the confines of their current specialties? Are you taking the opportunities to learn from other fields, both near and far?

Photo by curimedia.

It started with a simple challenge: if I were to draw a big circle around the things I find interesting enough to follow and declare them to be one thing, how would I label it? To avoid flying completely off into pointless musing, assume that it's relevant professionally. Considering that the circle included social media, analytics, intelligence, geopolitics, and natural disasters—to pick a few—the label wasn't obvious. By declaring them to be one thing, though, it soon became clear that the theme was the importance—the value—of knowledge.

The label was Omniscience.

"That's pretty ambitious."
Yes, I'm aware of the definition of omniscience, and no, I'm not suggesting that I know everything or ever will. But among the unattainable goals, it's a good one. I mean, what could you do if you knew everything? You can't, but what if you knew a lot more about things that matter to your business?

What if you knew something that was there to be discovered, and your competitor didn't? Is it starting to sound reasonable yet? Maybe even something you'd want to do?

The framework
I've talked through the Omniscience framework with several folks for early reactions, mostly in person. It involved some handwaving, so I knew it wasn't ready to post. Some people suggested related books, but nobody really shot it down. Now, it's your turn (click for a larger view). I'm not sure I need a lot more assigned reading at the moment, but I'm definitely interested in your reaction.

Omniscience overview

A framework, not a recipe
This is the top-level view, and each section has a story, a purpose, and examples. But this is the gist of it: starting with a few simple observations on the nature of things, Omniscience is a challenge to expect more of your intelligence and analytics, drawing on a broader range of techniques to track and anticipate a wider range of things that matter.

Omniscience provides a thread. It links things you know with things you do—and with things you don’t do. It links the very large and the very small, the short-term and the long-term. The way you think and plan and the way you measure and evaluate. It provides a structure to identify missed opportunities and to evaluate new ideas. And although it looks highly theoretical, it's already suggested a practical application that I haven't seen on the market.

Naturally, I think it's a big deal. Does it make sense to you, so far?

In my last post, I suggested that intelligence and analytics are two angles on the same challenge: developing the information value in available data. You're probably already looking—sorry, listening—for useful information online. Rather than thinking of intelligence and analytics as separate specialties, let's approach them as two lenses that might help us find information in data.

I'm going to risk a small definition here; if I'm going to write about intelligence and analytics, it would help if I assert that these aren't two words for the same thing. Proposing a formal definition isn't my point, so let's think about it this way: We do a lot of quantitative analysis these days. We care about the results because they present trends or aggregate data points in some way. For the purposes of this discussion, that's analytics. Other times we care about individual facts, regardless of the quantitative view. That's intelligence (cue James Bond theme).

For example, you might be interested in the most popular adjectives used to describe your product or brand. You care about the results because they represent mass opinion. That's analytics. Conversely, if you discover a death caused by your product, that fact is important regardless of how many people are talking about it. That's intelligence.

Yes, it's a little messy. The point is to notice what we've been missing, not to perfect the language.

What do people say?
Let's apply this to the familiar topic of listening in social media. People say all sorts of things online, but when we start analyzing their meaningful statements, they fall into two categories: statements of fact (which may be false) and statements of opinion.

We spend a lot of time on the notion of analyzing opinions. Most of the usual metrics help us understand trends in the opinions expressed in a large collection of comments. But what about facts? What do we do about them? They don't really fit into a market research paradigm, but some of them may be important to the business. We need to use a different lens.

It must be serious; he has a matrix
In proper consultant fashion, I decided to see what happens when we put these two ideas in a matrix. We use our intelligence and analytics lenses to look at statements of fact and statements of opinion online. Remember, analytics (in this discussion, at least) is about aggregate data, while the intelligence lens can pick up isolated signals. The examples in the boxes are illustrative; I'm sure you can think of more.

Intel analytics grid

Think about the usual discussion of listening in social media. How much of it focuses on measuring customer opinion and brand image (including every discussion of the accuracy of sentiment analysis)? How much more value could we uncover if we asked more questions of the same data? Are you looking for the important signals that don't show up in a Top 10 chart?

This is another piece of the Omniscience framework I'm working on. It starts with four simple thoughts, and it all comes together eventually—I hope.

House on silosIn a finite world, individuals specialize, but organizations don't have the same limitations. Given enough specialists, you can do it all. The challenge is in managing them. Somebody has to get on top of all these silos.

In my ten-minute pretend-keynote at last year's Defrag conference, I asked people to look beyond the existing silos of data and analytics to consider what more we could do. I challenged them with this simple idea:

Analytics + Intelligence –> Strategic Value of Information

What I'm doing is applying and not or to analytics and intelligence. Applying math when that works and finding facts when that works. Around here, the starting point for data is social media, but that's another boundary that turns out to be arbitrary. The same reasoning applies to other data sources.

We use labels like intelligence and analytics to divide the analysis of social media data into closely related specialties. In the process, we risk losing sight of the bigger goal, which all of these specialties support:

Uncover the information in the available data in order to develop insights that support the business.

We're all looking for useful information in data. In the social media realm, some of the data is unstructured content, and some of it is structured data generated by our activities. That distinction is driving some segmentation among the vendors, but it's worth remembering that intelligence vs. analytics isn't an or question; it's an and question—you need to consider both.

In the next post, I'll show you the model that applies intelligence and analytics to expand what we might find in what people say online. There's more to it than the usual summary of opinions.

Photo by Pablo David Flores.

Keeping an Eye on Everything


Have you noticed a lot going on lately? Several Arab countries are renegotiating their governance; storms, floods and earthquakes are making life hard in the Pacific; and pirates are expanding their reach in the Indian Ocean. There may be other things going on, too. How do you keep up? Where do you find meaningful analysis? You're not still waiting for the evening news, I hope.

Mideast map riots protestsBusiness Insider shared this map by Citi's Tina Fordham this morning. It's similar to something I started drawing to explain the context to my son, except Tina kept going and finished the map. I like the idea of summarizing the protests and political developments on a map, because it invites the viewer to think about cross-border effects. The Arab Spring uprisings have spread throughout the region, so looking at the entire region is useful.

What would make it more useful would be to expand its scope, make the map interactive, and update it in near-real time. In short, make it a dashboard for political unrest. So, I started looking for one. What I learned is that real-time incident maps and intelligent summaries may be mutually exclusive.

Update: The Economist made an interactive map of the region that presents political and economic indicators, but no current awareness.

Trying out global situation maps
RSOE EDISThe RSOE Emergency and Disaster Information Service is a dashboard for the world that comes close to what I'm looking for. It pulls information on natural disasters and a few other categories into an impressive application that combines maps, a table of incidents, and incident details. What it doesn't do is cover political unrest or offer broader summaries—but it's free, and it does cover events that don't make the news.

Global Incident Map is another Google Map mashup of incident reports. Incident details and current updates are limited to subscribers, but there's a free trial. The developer also offers other maps of specific topics of interest. The design—especially the flashing icons—has kept me from the trial so far, but it might be interesting to compare to the RSOE map.

Maplecroft world risks 2011Maplecroft reports on risks and risk indicators globally. You'll have to pay for their maps and analysis, but it might be a good investment if your interest is more than personal. The map at right is a top-level summary from their Global Risks Atlas 2011.

ReliefWeb generates maps of countries and regions experiencing emergencies of all types. I'd like to see them create the global situation map, but the maps they do provide can be quite informative. Today, for example, they have a map that reports on humanitarian agencies in, and refugees leaving, Libya.

The map equivalent of a Twitter search is Ushahidi, a crowdsourced crisis monitoring platform that maps reports sent in my email, Twitter, SMS, and probably semaphore in the next version. This example is tracking recovery efforts after the Christchurch earthquake. I haven't found a directory of Ushahidi deployments, but it's easy enough to Google Ushahidi Egypt or look through the Twitter account (@ushahidi) to find the maps. Update: The new Ushahidi Community site has a map of current deployments. The field reports are about as far from high-level analysis as it gets, but if you want details…

My new secret weapon
STRATFOR is an online publisher of political, economic, and military intelligence that has provided excellent coverage of the Arab Spring events. In theory, traditional media do much of the same work, but I've found that STRATFOR regularly picks up angles that aren't mentioned in the media, and they don't lose track of the rest of the world when the media focus on the topic of the week. It's a paid service, but they offer a free version to test the waters.

As we've seen in other domains, software doesn't replace analysts; it gives them new tools and data to work with. So I'm not surprised that the best sources I've found so far require subscriptions. It beats trying to process the firehose, and I do like being informed.

Related:

Today's Wall Street Journal had Twitter abuzz about social media monitoring and privacy in closed communities ('Scrapers' Dig Deep for Data on Web). Specifically, a health discussion board and a social media analysis vendor using individual accounts to access personally identifiable health information. It's obviously an ethical question, but whose ethics apply? As far as I can tell? Nobody's (yet).

People are sharing personal stuff online, sometimes sharing more than they realize. We need to be careful about how we handle this information, but from what I can see, the ethical standards are just as siloed as the measurement standards. People brought along whatever ethics they subscribed to before they started dealing with social media, but the existing standards don't really cover the new activities.

Think about the different functional roles where you might find companies using social media data:

  • Market research
    Market researchers have strong ethical standards that come from social science research. They get into things like informed consent, but does that really apply to data mining of publicly available data? Do they apply if the data is aggregated, and no personally identifiable information is preserved? What ethical standards apply to desk research?

    Jeffrey Henning wrote about the etiquette of eavesdropping and presented a webinar on consumer attitudes towards social media market research. The short version is that people persist in expecting privacy in their online conversations, despite the public nature of the forums they use. But does their expectation of privacy online translate into an ethical obligation for researchers?

    Update: IMRO and CASRO guidelines may apply to social media research.

  • Public relations
    PR ethics say a lot of being honest and transparent in public statements, representing the client and the profession well… but what about the ethics of monitoring and measurement? A recent discussion of ethics in PR measurement suggests that that conversation has only just begun.

  • Marketing
    WOMMA takes strong positions on its members' marketing activities, but the closest it comes to mentioning monitoring or research is when it commits to "promote an environment of trust between the consumer and marketer." Other marketing codes I found had a similar emphasis on outbound marketing over inbound information collection.

    Update: WOMMA also calls for members to "respect the rights of any online or offline communications venue (such as a web site, blog, discussion forum, traditional media, and live setting) to create and enforce its own rules as it sees fit."

  • Customer service
    Is customer service sufficiently organized as a discipline to have its own code of ethics, or does it simply inherit the company's overall standards? I'll bet you that any existing ethics deal with one-on-one interactions with customers.

  • Human resources
    HR ethics related to personal information are based on information that companies aren't supposed to use in hiring decisions. danah boyd shared some thoughts on regulating the use of social media data in hiring.

  • Strategy/intelligence
    SCIP's code of ethics doesn't commit to much more than obeying the law. Other types of intelligence organizations get some leeway even on that. If you don't want competitors spying on you, your only real defense is to learn about INFOSEC.
Bottom line? I haven't seen an existing code of ethics that applies to monitoring, measuring, or mining social media sources. If you wanted to apply an existing standard, you'd have to decide which one. So, how do you pick? Are the rules determined by:

  • The source of the data?
  • What you do with it?
  • The job title/professional affiliation of the user? What if the labels themselves lack agreed definitions?
  • No ethics, just laws?
  • Nothing—there are no rules?
I have some ideas, which I'll share tomorrow. But first, what do you think? Is there an existing standard that you apply? How did you pick it?

Update: Is it time for Ethical Standards for Listening Vendors?

Related:

Photo by Thomas Hawk.

Bruce Schneier's taxonomy of social networking data (via Tim Finin) provides a helpful starting point for thinking about the various ways that personal information finds its way online.

About Nathan Gilliatt

Subscribe

  • Subscribe by email


Recent Comments

  • Mike Gossman: Great job Nathan, those five bullet points are strategic in read more
  • Keith Paul: Awesome resource! Here's a book to add to the shelf... read more
  • Nathan Gilliatt: Look up "copious free time." ;-) read more
  • @deanshaw: What's this "free time" you speak of? ;) read more
  • Nathan Gilliatt: That's what that copious free time is for. :-) read more
  • @deanshaw: I've read Sterne's book and I am trying to work read more
  • Nathan Gilliatt: Thanks, Joshua. I'm not sure I get the question. Measuring read more
  • Joshua Barnes: Hi Nathan, I thought this was pretty insightful in that read more
  • Nathan Gilliatt: Good point. If you're buying a model (and with influence, read more
  • Tonia Ries: Great post, Nathan. Another missing element (aside from @theresa's hypothesis read more

New on SMA