March 2012 Archives

Stoplight smileSentiment is the stoplight chart of social media analysis. It offers red and green candy for the boss, and a useful filter for the analyst who's moved beyond the mood ring. Still, sentiment analysis is the surest source of disagreement in social media analysis. Why is that?

The human vs. machine debate has been going on for years, because the software's always been close to the frontier of the science. I started writing about it in 2007; five years later, you can still find companies working closely with university researchers to find better technologies for scoring text. The lag between the lab and the commercial product is virtually zero.

The tradeoff for bringing new technology to market as soon as possible is that it won't be good enough at first. You could read The Innovator's Dilemma to see how that tends to play out. As long as text analytics remains an active area of research, today's products won't be as good as next year's, either.

Look more closely at the tools
The obvious question is, "What's good enough?" But you can run tests and evaluate your options, and I'm not usually a fan of the obvious questions. Instead, let's look at some questions that help you get under the hood of tools as you consider them. There's more to it than the usual discussion points suggest.

  • Who or what scored the data?
    Start with the obvious question: is the content scored by a person or a computer program? If it's human, is it by your user, by a vendor analyst, or crowdsourced? If it's automated, is it the vendor's own system or another company's?

  • How does the automated scoring work?
    If the system provides automated sentiment scoring, how does it work? The engine that does the scoring could be as simple as a word match or as advanced as one of those research projects that just left the lab. Listen for descriptions of machine-learning approaches or systems that parse the structure of individual sentences. For machine-learning-based systems, can users correct scores, and if so, does the system learn from the changes?

  • At what level is sentiment scored and reported?
    Does a sentiment score reflect a document, a sentence, or a statement within a sentence? How are document-level scores determined? How does the system handle, for example, a positive statement about Brand Y in the midst of many negative statements about Brand X? How does it score documents with mixed sentiment (multiple statements with opposing sentiment)?

  • What's the scale?
    How many points are used on the sentiment scale—three, five, 100? If there's a number associated with sentiment, is that an intensity scale or a confidence score?

  • Does the system go beyond sentiment?
    Does the system analyze statements of opinion beyond sentiment? Can it identify emotions, preferences, or intent?
We could probably get well-informed people to debate each of those topics (sounds kind of fun, actually). Remember that this is an area of continuing research and development, and what's not possible today may be common next year. There's a reason I didn't take a position on the best approach in this post.

It's not sufficient simply to check the right boxes, especially with sentiment analysis. You need to stick with the topic for the long explanation from each vendor, if you want to understand what you're looking at. Let them make the case for their preferred approaches, and then you can make an informed choice.

Photo by Blue Funnies.

ProfilesWhat do you get when you look at social media as a source of information about people? This topic usually goes off into a discussion of influence, a result of thinking of social media as media. What if, instead of influencers, you think of the people who participate in social media as individuals?

Obviously, you can go creepy if you do this wrong. Easily. But if you're careful about how you use the information, people are sharing a lot of information about themselves.

All you need to start is an indentifier—a name, email address, Twitter handle—and you start connecting dots. When people include, for example, a Twitter handle in a LinkedIn profile, you can have real name, location, employment, schools… Maybe links to more networks to continue the process, too.

It's not a question of probabilities. When people create links between their various network profiles, it's a clear statement that the accounts belong to the same person.

Why build when you can buy?
This is another niche for startups, of course. Several of which are working to reconcile public profiles across multiple social networks and using that information to create information-rich individual profiles. To make it even more useful, most of these companies offer APIs for integrating their profile data into other systems.

Your data plus detailed, individual profiles. What will you build?

Twitter metadataDo you put social media data on a map? Location is a handy dimension for slicing, dicing, and visualizing your data. The question is, which location are you visualizing? Even a single tweet—in under 140 characters—can have four different locations.

I've taken a real interest in applying geospatial analysis to social media over the past year. It's been especially appropriate in emergency management and some other discussions with government types. Mostly, though, it's just another lens to apply to social media data, another way to find some value in the data we have now.

So, you want to put social media activity on a map. It's worth thinking about what that location really represents. One little statement can have four distinct locations, depending on how you look at it:

  1. Location of the service/server
    Internet-based communications happen in this virtual space where physical location is largely irrelevant, but everything runs on a computer somewhere—even in the cloud.

    You could even separate this one into two (or more) locations—the locations of the server and of the company that owns it—but for most of us, these are the least relevant locations. A few specialists need to know the physical or logical location of a server, but for the rest of us, there's nothing to see here.

  2. Location of the account
    Look at an account on Twitter, Facebook, or other social network. Most of them have a place for users to provide their location. Its accuracy depends on the account owner, which is why you see so many Twitter accounts located in "Earth" or something similarly uninformative. During the pro-democracy protests in Iran, a lot of people set their Twitter locations to Tehran in sympathy with the protesters.

    At its most useful, the location associated with an account tells you a default location for a user—home base.

  3. Location of the post
    Social and mobile are increasingly two aspects of the same technology-adoption trend, as more people take their social media through mobile devices. With geolocation tagging and location-based services, they're sharing their immediate location: "I am here, now." This is the location you're most likely to see represented on a map.

  4. Location of the described event
    This last location won't be encoded in an API, because it's found in the content people share. When they talk about events in the real world, they mention places, possibly indirectly. You'll need a text analytics tool that recognizes locations to extract those. When they post pictures, the photos may include location metadata from the camera.
Let's put them all together with a couple of hypothetical examples. We'll ignore the location of the server, because it's not relevant for most uses.

  • Let's say that I tweet about an event in Egypt (4) during a break at a conference in Washington (3). My account location (2) is in North Carolina. How does that compare with a geotagged photo (4) of the same event sent from Cairo (3) by an account that says it's located in Cairo (2)?

  • It's another stormy day in the middle of America, and someone posts a picture of a damaged building (4) on Facebook. The account location (2) and post location (3) are nearly the same, and they're in the projected path of a tornado, based on National Weather Service radar data. Do you believe that a tornado hit the building?
Despite all of that muddying of the water, you're probably ok if you use the per-post geolocation data for most purposes. When in doubt, always remember to state your question clearly, and then you can pick the right data to answer it.

Illustration: Map of a Twitter status object by Raffi Krikorian.

About Nathan Gilliatt

  • ng.jpg
  • Voracious learner and explorer. Analyst tracking technologies and markets in intelligence, analytics and social media. Advisor to buyers, sellers and investors. Writing my next book.
  • Principal, Social Target
  • Profile
  • Highlights from the archive


Monthly Archives