Before you can analyze, you need data. In thinking of what you can do with social media data, I find it helpful to think about three buckets of social media data: content, activity, and people data. Let's talk about content. If you look at social media from one angle, that's what it is: lots of content. What do you do with that?
What is Content Data?
When we talk about listening and how people express their opinions, we're talking about working with content data. From the text of tweets, blog posts, and product reviews to pictures, videos, and audio recordings, content is everything that people are posting and sharing online. When people ask about sentiment, opinion, and complaints, they're asking about content.
Analyzing Content Data
Remember consumer-generated media? That was the mindset in 2006 when I started looking for companies that worked with social media data. People were empowered by these new, "Web 2.0" technologies to share their thoughts and opinions with a global audience. The companies they talked about suddenly needed to pay attention, and the existing paradigm with the closest fit was media analysis. So, much was borrowed.
The media analysis world was about understanding media coverage, when media meant professional writers and paid publications. You could count things: how many articles mentioned you, how many times were you mentioned within articles, and how did that compare with the competition. You could rate mentions as favorable or not, and you could see if your messages were picked up by journalists. There's more to it, but you get the idea.
It turns out that a lot of established media analysis techniques work for consumer-generated media, too. The challenge is that the new media sources generate a lot more content, so you need to sample the data or automate the process to keep up.
The other paradigms that usually enter discussions of content data are opinion research and the customer service queue. You can hardly turn around without running into these, "the world's largest focus group" and the new channel where customers expect a response.
Turning Content Into Usable Data
The promise of all this content is that people are sharing their thoughts with anyone who pays attention. The challenge is in turning the data into something that can be analyzed. That's where we get into coding the data—scoring it for sentiment, identifying the topics and entities (such as people or companies) discussed, rating the opinions and emotions expressed. It's hard work, especially when you consider the need to work with foreign languages.
In the case of text—posts, tweets, and the like—turning raw text into usable data is the job of text analytics. Whether they use statistical approaches that compare new texts to previously scored texts, or they parse the grammar to "read" the content, text analytics systems take text in and give coded, structured data out. From there, the processing gets easier.
All content is not text, but more of it could be. Back in the professional media world, you might be able to get transcripts or closed-caption data to augment video content. Beyond that (and even deeper into the research lab than text analytics), you can find systems that extract speech from audio and video, converting it to text for further analysis. Finally, most content sources include hidden metadata, such as topic tags and author information, that adds context and clues for analysis.
There's a lot to content analysis, which is why it's a growing specialty. I've spent a lot of time blogging about it here over the years, too. But if we step back and look at the big picture, it's only one of three types of social media data.
Photo by Michael Sauers.