Recently in Social media analysis Category

It's bad enough when people are wrong as they express facts and opinions on the Internet. Mistakes happen. But there's more going on. Some people are intentionally adding noise to the online world, in an attempt to mislead users and analysts. Garbage in, garbage out, so how do we catch the garbage before it becomes part of the analysis?

This post is the second in a series. The first is Can You Trust Social Media Sources? Most of my posts aren't this long; the next will be nice and short.

Catching and deleting spam and other garbage in social media data is one side of an arms race, just like email spam and computer viruses. Developers of social media analysis platforms work to eliminate spam from their results, and spammers develop new tactics to dodge the filters. As long as the incentives remain, people will find ways to game the system.

For most analysts, the main response is to pick a platform that does a decent job of catching the undesirable content. Most do some sort of machine learning to identify and filter spam, and while the results are imperfect, they're useful as a first step. The second step is to allow users to flag content as spam, and it's good if the system learns from that action. A third step is to allow users to blacklist a site altogether; once you know it's not what you're looking for, there's no need to rely on the spam-scoring engine.

Evaluating questionable data
This is where I'd love to give you the magic button that reveals deceptive content. I'd like to have the Liar Liar power, too, but that's not going to happen. Instead, I have some ideas of how to think about questionable results. Most of them are in the form of questions. Some are more probabilistic than definitive, but I think they could be helpful.

  • Consider your purpose
    Your sensitivity to garbage in your data depends on what you're doing with it. If you're monitoring for customer service purposes, flag the spam and move on. If you're reporting on broad trends, you might get better results through sampling, or by focusing on high-quality sources. If you're looking for weak signals, you may not have the luxury of ignoring the low signal-to-noise ratio of a wide search. As always, match the effort to the objective.

    Some people actually need to look at spam—consider the legal department. If a link leads to a site selling counterfeit merchandise and you're in a trademark protection role, the spam is what you're looking for.

  • Consider the source (person)
    Who posted the item in question, and what do you know about them? Is the poster a known person? What do you know from the individual profile? Who does the person work for? What groups is the person connected to? Does the person typically discuss the current topic? Is the person's location consistent with the information shared?

    If you're not sure whether the poster is a person or a persona, develop a profile. A persona is like a cover identity; it can be strong or weak. Does the persona have a presence on multiple networks? Since when? Is it consistent across networks? Does it have depth, or is every post on the same topic? Who does the persona associate with online, and what do you know about them? Do the persona's connections reveal the complexity of relationship types that real people develop (school, work, family, etc.)? Do the profiles and connections give information about background that can be checked?

    For questionable sources, think about the different types of data that might reveal something through social network analysis.

    Back at the Social Media Analytics Summit, Tom Reamy described work by researchers to identify the political leanings of writers, based on their language choices (writing about non-political topics). Can we use text analytics to add information about native language, regional differences, and subject-matter expertise to individual profiles?

  • Consider the source (site)
    Where was the data posted? What do you know about the site? Is it a known or probable pay-to-play or disinformation site? Is it a content-scraping site? Does it have information from a single contributor (such as a blog) or from many (such as a crowdsourcing site)? What else is posted to the site? Where is it hosted? Who owns it? Where are they based? What can you learn from the domain registration?

    What's the online footprint of the site? Is it linked to real people in social networks? Is it used as a source by other people? Credibility flows through networks; do known, credible (not necessarily influential) people link to it and share its content in their networks? Does it appear to have bought its followers, or are they real people?

  • Consider other sources
    If you're going to do something serious—and I'll leave the definition of serious as an exercise for the reader—don't trap yourself in a new silo for social media data. What else do you know? What do other online sources say? Does the questionable data fit with what you're getting from sources outside of social media? Are you getting similar information from credible sources, or are all of the sources for the questionable data unknown?

    A few months ago, I heard Craig Fugate, the Administrator of the (US) Federal Emergency Management Agency (FEMA), tell a story about government agencies and unofficial sources of information. The story involved a suspected tornado and unconfirmed damage reports in social media. Government agencies prefer official reports from first responders and other trained observers, so the question was how to evaluate reports in social media.

    In the case of severe weather, one answer is to compare the reports with official sources of weather data. If radar indicated a likely tornado passing over a location a few minutes before the damage reports, then you'd know something important that should help evaluate those reports. What's the analogy for your task? Is there a hard-data source that can add relevant information? Does a geospatial view add a useful dimension (such as radar, post location, and photo metadata all in same location would, in the example)?

  • Consider the incentives
    What does a potential adversary stand to gain by fooling you—or someone else looking at the same data—with false information? Who gains by leading you to an incorrect action? Who makes money on your decision? Who benefits from misleading other people with false information (think product reviews and propaganda)? Is questionable information in your system consistent with the aims of an interested party?

    Part of the challenge here is that false information could be intended to mislead anyone. The target could be an individual, a small group, or entire populations. Who gains? Is there a link from the source to an interested party?

  • Consider the costs
    Part of what makes spam so frustrating is the volume level—there's a lot of the stuff around. At some point, the signal-to-noise ratio gets so low that the source becomes useless, unless you can identify and eliminate the junk. In a way, all that junk adds up to a sort of denial-of-service attack at the content layer. Is there a way to deal with that?

    A denial-of-service (DOS) attack and its scaled-up variant, the distributed denial-of-service (DDOS) attack, overload the targeted web site with simultaneous requests, causing it to become unavailable to real visitors. In 2010, Amazon weathered a DDOS attack without losing service. The explanation was that their normal operation looks a lot like a DDOS attack—lots of people visiting the site simultaneously. Their system was built to handle that kind of load, so the attack failed. One answer to a DDOS attack, then, is to have the capacity to handle the load.

    The social media analysis equivalent is to process it all, so what would that look like? Would a deeper analysis of known junk and its sources help improve the identification of junk? Would it tell you something useful about the parties that post the junk?

  • Consider the consequences
    The final point is to revisit the first point. What are you trying to accomplish? What decision will you make based on the data, and what happens if the information was false? What if it was placed there to manipulate your response (even if the information itself is true)? Does the rest of the decision-making process have the safeguards to prevent costly errors?
The hard problem
One way to look at this is to go through the whole process while thinking "spam." Junk results are an annoyance if you're doing day-to-day monitoring for business, and they're a problem if you're doing quantitative analysis. The technology is improving, and you have options for dealing with spam in these settings.

Some junk isn't that hard to catch, especially once a person looks at it. Gibberish blog comments are easy to identify. Names and email that don't match are sort of obvious, too. Content scrapers and other low-quality sites tend to have a certain look. If you have time to look at the spam that evades your filters, you can catch a lot of it.

The real challenge comes in looking for intelligence—whether in business, finance, politics, or government—in the presence of a motivated and well-funded adversary. If someone wants to fool you—or at least keep you from using an online source—they can improve their chances by better imitating the good data surrounding their junk. The quick glance to identify spam becomes a bigger effort, with more uncertainty.

Pay-to-play blogs may have original content from professional writers, so you can't just look for poor quality. False personas may be developed over time, with extensive material to create a convincing backstory. Networks of such personas could post disinformation, along with more normal-looking content, across multiple sites. With time and resources, personas can appear solid, which is why governments are investing in them.

I think some of the techniques above could help, but it's really a new arms race. The problem for everyone else is that this arms race will tend to poison the social media well for everyone who wants to discuss the contested topics.

If your organization is interested in these topics, don't just read the blog. Call me. As long as this post is, it's the short version. Clients get the full story.

XKCD cartoon by Randall Munroe.

FutbolBefore you can pull insights from your data, you need data, but I'm hearing more concerns about data quality in social media analysis lately. Before, people asked about the traditional tradeoff in text queries: finding relevant content while excluding off-topic content. Lately, I'm hearing more about social data that's intentionally tainted. If you're looking for meaning in social media data, you may have to deal with adversaries.

Yes, and you've been playing without an opponent, which is, as you may have guessed, against the rules.
— "Anton Ego," Ratatouille

Ask a company with three initials as a name how many three-letter abbreviations are in use, and you get a sense of the challenge in finding relevant content. Common words as brand names pose a similar challenge (I always like the examples of Apple and Orange, because it's the one time you really can compare them). If people are honest and expressing their real opinions, it's hard enough to find what you're looking for.

The problem is, people aren't always honest. You also need to get rid of intentional noise in the data.

The analyst's adversaries

  • Spam
    We've all seen online spam (sorry, Hormel, you must hate that term). Junk mail for hormones and drugs in email, junk comments on blogs, junk blogs, trashy web sites—the costs are so low that even microscopic conversion rates are profitable, so it persists. Some of that shows up in social media, which is the problem here.

    At the recent Social Media Analytics Summit, Dana Jacob gave a talk on the spam that finds its way into the search results of social media analysis platforms, skewing the numbers. One tidbit that Dana shared to illustrate the challenge: If you consider all of the creative misspellings, there are 600 quintillion (6 x 1020) ways to spell Viagra. So removing all of the spam from your data is a challenge.

    Spam seems to come in two flavors, neither of which will help you understand public opinion or online coverage. One is designed to fool people, to get them to click a link. It may lead to malware or fraud, or to some sort of product for sale. The other is designed to fool search engines with keywords and links embedded in usually irrelevant text. It's usually obvious to a human reader, but the hope seems to be that some search engines will count the links in their ranking of the target site.

  • Gaming analytics platforms
    Another presenter outlined a more direct challenge to the social media analyst when he described his system to game analytics systems with content farms and SEO tactics. He talked about using weaknesses in analytics systems to plant information in them. One slide described his methods as "weaponizing information in a predictive system," which doesn't leave a lot of room for exaggeration.

    He even used a real client as an example. The question is, how many others do the same thing, but discreetly? If you're looking for market intelligence in social media, do you trust your sources?

  • Deception in crowdsourced data
    Another conversation went into the potential poisoning of the crowdsourcing well, in this case one of the crowdmapping efforts in a political conflict. If one party to the conflict entered false reports—perhaps to discredit the project or misdirect a potential response—could it be detected?

  • Sockpuppets
    Beyond the crowdmapping context, can you detect opposition personas that post false reports in social media? It's a standard tactic in the government/political arena, but it could hit you in business, too. All you need is a motivated opponent.
It's a little farther afield, but read Will Critchlow's post on online dirty tricks for more ideas on how our tools can (will) be used against us. If you work with political clients, you'll want to understand how they work. For everyone else, it's another lesson toward being an informed voter.

Next: ideas for detecting deception
I don't mean to be all problem and no solution, but this post is already a long one. I'll share some ideas on how we might detect deception in social media in my next post. For now, I'll end with a happier observation: Sometimes, people lie in real life and get caught when they reveal the truth in social media.

Update: Part 2 is now up: Detecting Deception in Social Media

Photo by John Cooper.

When I visited Gnip a few months ago, I saw something I really liked: books, on shelves and ledges scattered around the office. There's a lot to know in this business, and I liked seeing a company make that small investment in developing their team. Are you making the same investment in yourself?

This is why I usually talk strategy and expanding horizons, rather than specific tactics and metrics: it's just too much to cover. When you start going into the details, you find that there are a lot of details. Really, a lot. If you want to read a detailed view of analytics in social media, you don't need a book, you need a shelf.

Loading the shelf
Social media analysis—monitoring, measuring and analyzing social media data—isn't a specialty, it's several, and that means it takes more than one book to cover the details. If you need to understand the whole landscape, you might consider some of these:

Naturally, I'm working on a book, too. We'll see how long that takes. :-)

Beyond the specialist books, I recommend an expanded reading list that gets into the broader view of analytics and big data, web analytics (which overlaps with social media), and visualization. If you want a challenge, you might go deeper into the science and technology that make the analysis possible. Plus, of course, you'll want to be informed about the thinking on marketing and management roles that you're probably supporting.

By the time you catch up, more good books will be out there, but that's ok, because a good analyst is always learning. The question is, what books would you add to the list?

From the comments:
Keith Paul recommends Social Media Metrics Secrets, by John Lovett.

Stoplight smileSentiment is the stoplight chart of social media analysis. It offers red and green candy for the boss, and a useful filter for the analyst who's moved beyond the mood ring. Still, sentiment analysis is the surest source of disagreement in social media analysis. Why is that?

The human vs. machine debate has been going on for years, because the software's always been close to the frontier of the science. I started writing about it in 2007; five years later, you can still find companies working closely with university researchers to find better technologies for scoring text. The lag between the lab and the commercial product is virtually zero.

The tradeoff for bringing new technology to market as soon as possible is that it won't be good enough at first. You could read The Innovator's Dilemma to see how that tends to play out. As long as text analytics remains an active area of research, today's products won't be as good as next year's, either.

Look more closely at the tools
The obvious question is, "What's good enough?" But you can run tests and evaluate your options, and I'm not usually a fan of the obvious questions. Instead, let's look at some questions that help you get under the hood of tools as you consider them. There's more to it than the usual discussion points suggest.

  • Who or what scored the data?
    Start with the obvious question: is the content scored by a person or a computer program? If it's human, is it by your user, by a vendor analyst, or crowdsourced? If it's automated, is it the vendor's own system or another company's?

  • How does the automated scoring work?
    If the system provides automated sentiment scoring, how does it work? The engine that does the scoring could be as simple as a word match or as advanced as one of those research projects that just left the lab. Listen for descriptions of machine-learning approaches or systems that parse the structure of individual sentences. For machine-learning-based systems, can users correct scores, and if so, does the system learn from the changes?

  • At what level is sentiment scored and reported?
    Does a sentiment score reflect a document, a sentence, or a statement within a sentence? How are document-level scores determined? How does the system handle, for example, a positive statement about Brand Y in the midst of many negative statements about Brand X? How does it score documents with mixed sentiment (multiple statements with opposing sentiment)?

  • What's the scale?
    How many points are used on the sentiment scale—three, five, 100? If there's a number associated with sentiment, is that an intensity scale or a confidence score?

  • Does the system go beyond sentiment?
    Does the system analyze statements of opinion beyond sentiment? Can it identify emotions, preferences, or intent?
We could probably get well-informed people to debate each of those topics (sounds kind of fun, actually). Remember that this is an area of continuing research and development, and what's not possible today may be common next year. There's a reason I didn't take a position on the best approach in this post.

It's not sufficient simply to check the right boxes, especially with sentiment analysis. You need to stick with the topic for the long explanation from each vendor, if you want to understand what you're looking at. Let them make the case for their preferred approaches, and then you can make an informed choice.

Photo by Blue Funnies.

Twitter metadataDo you put social media data on a map? Location is a handy dimension for slicing, dicing, and visualizing your data. The question is, which location are you visualizing? Even a single tweet—in under 140 characters—can have four different locations.

I've taken a real interest in applying geospatial analysis to social media over the past year. It's been especially appropriate in emergency management and some other discussions with government types. Mostly, though, it's just another lens to apply to social media data, another way to find some value in the data we have now.

So, you want to put social media activity on a map. It's worth thinking about what that location really represents. One little statement can have four distinct locations, depending on how you look at it:

  1. Location of the service/server
    Internet-based communications happen in this virtual space where physical location is largely irrelevant, but everything runs on a computer somewhere—even in the cloud.

    You could even separate this one into two (or more) locations—the locations of the server and of the company that owns it—but for most of us, these are the least relevant locations. A few specialists need to know the physical or logical location of a server, but for the rest of us, there's nothing to see here.

  2. Location of the account
    Look at an account on Twitter, Facebook, or other social network. Most of them have a place for users to provide their location. Its accuracy depends on the account owner, which is why you see so many Twitter accounts located in "Earth" or something similarly uninformative. During the pro-democracy protests in Iran, a lot of people set their Twitter locations to Tehran in sympathy with the protesters.

    At its most useful, the location associated with an account tells you a default location for a user—home base.

  3. Location of the post
    Social and mobile are increasingly two aspects of the same technology-adoption trend, as more people take their social media through mobile devices. With geolocation tagging and location-based services, they're sharing their immediate location: "I am here, now." This is the location you're most likely to see represented on a map.

  4. Location of the described event
    This last location won't be encoded in an API, because it's found in the content people share. When they talk about events in the real world, they mention places, possibly indirectly. You'll need a text analytics tool that recognizes locations to extract those. When they post pictures, the photos may include location metadata from the camera.
Let's put them all together with a couple of hypothetical examples. We'll ignore the location of the server, because it's not relevant for most uses.

  • Let's say that I tweet about an event in Egypt (4) during a break at a conference in Washington (3). My account location (2) is in North Carolina. How does that compare with a geotagged photo (4) of the same event sent from Cairo (3) by an account that says it's located in Cairo (2)?

  • It's another stormy day in the middle of America, and someone posts a picture of a damaged building (4) on Facebook. The account location (2) and post location (3) are nearly the same, and they're in the projected path of a tornado, based on National Weather Service radar data. Do you believe that a tornado hit the building?
Despite all of that muddying of the water, you're probably ok if you use the per-post geolocation data for most purposes. When in doubt, always remember to state your question clearly, and then you can pick the right data to answer it.

Illustration: Map of a Twitter status object by Raffi Krikorian.

Translator boothsDoes your social media program include foreign language requirements? Even if your company does business in only one country, you might need multiple languages. The question is, how much capability do you need to check off the language box?

An email from a vendor contact in Tokyo reminded me of a conversation at the Tech@State conference a few weeks ago. We were talking about monitoring Arabic-language social media, and someone pointed out that their analysts know Arabic. They don't need translation; they just need to collect the content for their analysts to read.

It's an important distinction, and it led us into a conversation about what a monitoring platform needs to do to support different types of users. The short answer is, language capability is more than a one-box checklist item. You have to know your needs in order to evaluate tools.

International support in the software
Let's start with the scenario from that initial conversation: an organization is looking for a software platform for monitoring or analyzing social media content in a specific set of languages. Here are some tool capabilities that might back up a claim about language support:

  1. Find content written in a language.
    Theoretically, all you need is support for the required character set, search terms in the desired language, and a broad range of sources. In practice, it's harder to collect content in some countries and languages than others. Ask about source coverage in the countries you need to include.

  2. Translate foreign-language content.
    In the age of Google Translate and other machine-translation programs, it's easy to add a translate button to a tool. If your needs are simple, machine translation could be good enough.

  3. Filter content by language.
    The most basic level of language support involves identifying the language used in a text. Based on some of my testing, that's harder than it looks. Tools that can identify source languages usually offer filters based on the language, which is useful for directing items to analysts who can read them, as well as for analysis of content by language.

  4. Apply text analytics to content in the language.
    Adding more languages to the analytics engine of a social media platform is hard work. I've heard from several sources that adding sentiment analysis in another language, for example, is equivalent to starting over. If you want your tool to do text analytics in a specific language—sentiment, topics, entity extraction, and the rest—ask specifically if those features are supported in the languages you need to analyze.

  5. Provide a user interface in a language.
    So far, this has all been about the content. If you're working with native-speaker analysts, though, you may also want to support them with a user interface in their language. I've talked with people at companies that monitor social media in multiple languages using teams in multiple countries. Giving them a UI in their own language(s) is a nice touch, and one that probably pays off in increased productivity.
International support in services
Now, let's look at the other side of the business: the services market. One easy way to add coverage of additional markets is to send the work to an agency that has those capabilities. The first question is, can they support the languages you need? The follow-up question is, how do they do it?

  1. Multilingual analysts
    Is it adequate to have an analyst who knows the language? Depending on your circumstances, it could be.

  2. Native language analysts
    Anyone who's studied a foreign language knows that it's easier to learn as a small child. Native fluency makes the analyst more likely to catch subtleties that a non-native speaker might miss.

  3. Native analysts located in foreign market
    If your native-fluent analysts are current residents of the foreign country of interest, they may be better attuned to current events and cultural trends than their peers working in another country.

  4. Vendor based in foreign market
    Social media analysis firms are virtually everywhere (try searching a country name in the directory). You can find native analysts who work abroad for international firms, and you can find them working for smaller firms based in their country. Working with foreign vendors adds complexity, but it could be the right answer in some circumstances.
It's easy to make up a list of languages and mark them yes or no. When I did my first report on companies in social media analysis in 2007, I didn't go much farther than that (I did ask about native fluency among analysts). If you're building a capability with international scope, be clear about the level of language support you need, and you'll be a big step closer to finding the right partners for your program.

I've been thinking lately about nuances in product requirements. More to follow.

Photo by David Weekly.

Social Media AnalysisIn late 2006, I decided it would be interesting to find every company in the world that offered social media listening tools or services. I thought I'd find a few dozen companies. Five years on, I've found hundreds of companies, and I'm still finding more. Today, I'm sharing the database that I've been using to track them: Companies in Social Media Analysis.

The directory is a bit more than a list. Each company gets its own page, which includes:

  • A link to the company's main website
  • A link to their Twitter account
  • The location of the company's main office
  • A description of the company and its social media analysis products and services, provided by the company itself
  • Links to recent news items on Social Media Analysis that mention the company
  • A Twitter widget showing the company's most recent tweets
Not every company has responded to my request for a description, but I expect more to figure it out soon. :-)

You could browse the list of companies, but it would take you a while—the directory is launching with 292 entries. Instead, I recommend the search function, which has a few tricks up its sleeve. In addition to searching the company descriptions, it can find keywords that don't show up on the company pages, such as states and provinces (spelled out), names that have changed, and companies that have been acquired.

But wait, there's more
Building the directory gave me the push I needed to redesign SMA. It still has the industry news it's always had, and I've made it easier to find the acquisitions scorecard, where I keep track of M&A activity, and the roundup of third-party product reviews. There's a new roundup of investments in social media analysis, and I've added a job board (yes, it accepts international listings). Finally, I added the social network icons that are a required part of social media web sites in 2011.

I always intended for SMA to be a sort of online trade journal, the best source of information about what's going on in the market. The new sections are a step in that direction, and I hope you like them.

The full database just passed 400 companies, which includes many that are no longer active in the market. Listening companies, are you on the wrong list?

Every few seconds this morning, TweetDeck brings another comment on today's announcement that Salesforce.com is buying Radian6. The announcement's not exactly a surprise—Radian6 was the obvious acquisition target in social media analysis (SMA), and their platform had supported Salesforce integration since mid-2009. The price ($340 million in cash and stock, including retention bonuses for the founders) is larger than expected, but overall, it's a logical deal that surprises no one.

Blogging about the day's big news is sort of obvious, so I'll focus on the question I haven't seen raised yet: what does the Radian6 acquisition mean to the other companies in social media analysis? A few thoughts for discussion:

  1. Radian6 just took a big step toward solidifying its position as the standard.
    Radian6 was already the company most likely to be mentioned in any discussion of social media monitoring (note the careful use of the term). The Salesforce endorsement makes them the default choice for 92,000 Salesforce customers. Competitors need more than a me-too monitoring platform to win.

  2. Aquisitions say something about the segmentation of social media analysis.
    The Radian6 deal says a lot about interest in social CRM, or the integration of social media monitoring and customer relationship management. Other acquisitions have tied SMA firms to PR/media (Sysomos/Marketwire, Brandtology/Media Monitors), market research (Cymfony/TNS, Umbria/JD Power, Evolve24/Maritz), and marketing management (Techrigy/Alterian). SMA is a feature set that can work into multiple categories; look for SMA companies to focus their feature sets on specific use cases, and expect acquisitions that work into acquirer's existing businesses.

  3. Enterprise software has noticed social media analysis.
    Salesforce joins SAS in making a serious move to tie social media analysis to nuts-and-bolts business operations. Social business software companies Jive and Lithium have picked up their own listening platforms. Any acquisitions or product announcements by IBM, Oracle, and SAP should be completely expected, but pay attention to the emerging distinctions between social media analytics and social media monitoring (see #2).

  4. Obvious acquisition candidates are getting harder to find.
    Despite the presence of 300+ companies in the space, only a handful of the product leaders are still independent companies. Radian6 is easily the most recognized name in SMA, so most of the remaining independents are not widely known. Looking back to my report on social media analysis platforms for workgroups (March 2010), six of the 21 companies have been bought in the last year.

I don't think there's one obvious candidate for the next acquisition, and in any case, any deal has to start with specific goals (just like any purchase). If it's not already obvious, the many companies in SMA are not the same. The differences in what they do and why is what makes the space so interesting. I know you want the list, though, so here's a quick reaction on who I might look at:

  • Attentio, Brandwatch, Sentiment Metrics and Synthesio don't usually come up in the social media conversation in the US, but they all have solid SMA platforms.

  • Converseon blurs the lines that divide research, creative marketing, and management consulting services. They drive me crazy because they're always working out the same things I am. Nobody else does the one-stop social media shop like they do.

  • Visible Technologies has unique capabilities in managing the response component in social media monitoring, as well as a nicely designed interface for working with solid analytics capabilities.
If your needs are more specific—you want an analyst team or a virtual focus group capability, for example—the list gets longer quickly. And, of course, we have regional specialists around the world who can help fill gaps in your coverage. If you want a real recommendation based on your company's goals and gaps, call me.

Disclosure: Radian6 is paying my way to their user conference next week. I consult with companies (usually buyers) on partnerships and acquisitions in social media analysis, but I do not represent the companies listed here.

Related:

If you start to lose track of all the combinations, remember that I'm keeping a score card on acquisitions in social media analysis.

Judging from the way people are talking about it, social media analysis is segmenting into at least three subspecialties. As usual, we're using multiple labels that occasionally overlap, so the potential for miscommunication is great. Whatever the utility of any one approach, companies need a complete set of tools, so let's keep these emerging specializations in context.

In 2007, I asked for opinions on a generic term for social media monitoring, analysis, research, etc. I settled on social media analysis as an existing term that could stretch to fit the tools and services then on the market. Since then, I've also argued for an expansive interpration of the listening metaphor. Lately, though, I'm seeing a lot more of these labels:

  • Social media monitoring
    In 2005, companies started to learn that people were talking about them online and they needed to pay attention. Today, we have tools and case studies, and more companies are prepared to notice and respond when someone mentions them. The response might come from a customer service or PR function, but the basic idea is what Radian6 calls "the social phone:" social media represent a new customer-service touchpoint, and companies need to respond to every mention that merits or requires a response.

  • Social media analytics
    Every 15 minutes, someone announces a new tool for measuring social media. Most of these focus on the structured data of social media: seemingly hard numbers, such as friend/follower counts, mentions, shares, likes, and Facebook pageviews. This approach blends social media and web analytics, and it's good for questions such as, "is my Facebook campaign working?" If your ROI comes from online sales, this approach is an especially powerful tool for managing social media marketing efforts.

  • Social media intelligence
    Analyzing the content of what people say online—topics, sentiment, emotions, and the trends and underlying causes—is starting to be called social media intelligence (I refuse to use the unfortunately abbreviated buzzword, social intelligence, in this context). This is perhaps the least consistently applied label, but whatever you call it, measuring and analyzing online content looks increasingly distinct from measuring online activity (the analytics view).
But wait, there's more!
We're inventing new terms faster than old terms fade away, and the boundaries are anything but clear. I haven't quite figured out whether Social CRM is the intersection of social media monitoring and CRM or a superset of CRM and all three of the above. Social media measurement combines aspects of the analytics and intelligence views. Here and elsewhere, the definition of the term seems to depend on who's talking about it.

This doesn't begin to cover all of the variations in terminology we're using, and these categories aren't even mutually exclusive. But they do represent a division I'm seeing in both the thinking about, and the capabilities of the tools for, listening in social media. We're getting better (?) at talking past each other, which is not making it easy for beginners.

Update: All that and I forgot to mention social media research—thanks to Annie Pettit for the reminder in the comments. Also, here are a few of the many posts that inspired the topic:

Photo by Dan Thompson.

Ethics and social media monitoring: so much at stake, but the existing standards are linked to specific business functions. Can we fix that? Converseon suggested some questions for clients to use in avoiding service providers with problematic practices. Let's go a step farther and think about appropriate ethical standards for companies that do the actual monitoring and analysis work, regardless of which functional silo they support.

I have a few suggestions:

  1. Obey applicable laws.
    Stay legal—always nice to include that in the code. This will be trickier than it sounds, because (a) the law that applies to online monitoring is "complicated, multi-faceted and unclear," and (b) the Internet is global. Whose laws apply in which situations should be good for generating legal fees somewhere.

  2. Match clients' regulatory obligations.
    In addition to government regulations that apply to them directly, service providers should comply with requirements that apply to their clients. Service providers shouldn't be in the business of doing work that clients are prevented from doing themselves. Yes, this requires learning about clients' regulatory environments.

    Clients should extend their own compliance standards to service providers working for them—if you can't do it, don't hire an outside company to do it for you.

  3. Honor sites' terms of service.
    Whether terms of service are enforceable is a legal question that will eventually be settled, but the strong ethical position is to monitor sites on their terms. If you need to hide your identity or play cat-and-mouse games with site admins, you're in the wrong.

  4. Be transparent in your monitoring.
    Don't conceal your identity, through either technical or non-technical means. Your IP address should map to your company. When using an individual profile to monitor or interact on a site, disclose the individual's affiliation with either the service provider or client.

  5. Respect privacy norms in closed settings.
    Blog monitoring was ok because blogs are publicly available. If an individual login is required and community norms are that information is to be kept within a community, don't use it. These sites create an expectation of personal privacy that should be respected.

  6. Don't overburden servers with automated requests.
    Sites exist to serve their users, or to reach an audience, or to conduct business. Manage your data collection activities to minimize negative impacts on servers.

  7. Where multiple codes of ethics may apply, observe the more restrictive code.
    Existing codes from other fields may impose extra requirements that still apply. For example, entering a community to observe it is ethnography, which has its own ethical standards.

  8. Be honest with clients.
    Don't make promises that your technology can't keep or present insights that aren't supported by the data. If the client wants something you can't do, admit it. If they want something you won't do (or shouldn't), educate them. As Converseon's list suggests, your ethics protect them, too.

  9. Don't freak out the natives.
    It's not good for your business, anyway. The more people think of what you do as creepy, the more likely you are to face regulatory pressure or other challenges. Besides, it's not nice.
I've already heard from an industry insider who's concerned about the potential impact of others' privacy violations on his business. He's right to be concerned. Credit card companies and credit bureaus have assembled vast databases from information that consumers can't control. We can be freaked out about it, but we can't do anything about it. Scare enough people about what happens with their information in social media, though, and they could stop using social media altogether (unlike consumer credit).

Do we need an industry standard?
Incidents like the one in yesterday's WSJ, and the attitudes exhibited in some of the quotes in the article, increase the likelihood of government intervention and externally imposed rules. Who'd rather create a clear and relevant ethical standard for the listening business before that happens?

I've already heard that this topic is too sensitive for an open discussion online. If you want to pursue this, let me know, and we can decide on the right venue.

About Nathan Gilliatt

Subscribe

  • Subscribe by email


Recent Comments

  • Mike Gossman: Great job Nathan, those five bullet points are strategic in read more
  • Keith Paul: Awesome resource! Here's a book to add to the shelf... read more
  • Nathan Gilliatt: Look up "copious free time." ;-) read more
  • @deanshaw: What's this "free time" you speak of? ;) read more
  • Nathan Gilliatt: That's what that copious free time is for. :-) read more
  • @deanshaw: I've read Sterne's book and I am trying to work read more
  • Nathan Gilliatt: Thanks, Joshua. I'm not sure I get the question. Measuring read more
  • Joshua Barnes: Hi Nathan, I thought this was pretty insightful in that read more
  • Nathan Gilliatt: Good point. If you're buying a model (and with influence, read more
  • Tonia Ries: Great post, Nathan. Another missing element (aside from @theresa's hypothesis read more

New on SMA