April 2007 Archives

Human vs. machine analysis


How do you like your social media analysis? Do you want the speed and scalability of an automated process, or do you prefer the subtlety and insight of a human analyst? Companies offering these services disagree on whether software or people are better at the task, and they're taking different approaches to answer the question.

First, let's define analysis. For this discussion, I'm talking about the process of rating individual items—posts, comments, messages, articles—on things like topic, sentiment and influence. Summarizing the data from many items and creating charts and reports come later.

The extremes
One end of the spectrum is fully automated analysis. Some companies have invested significant time and capital in systems that automate text analysis. They use terms like patent-pending and natural-language processing to describe software that "reads" and scores social media. Automated processes are usually—but not always—behind client dashboards.

The other end of the spectrum is human analysis, for those not convinced that computers can accurately rate written materials. These companies talk about human insight, subtlety and the ability to identify sarcasm. Some make a big deal of the quality of their employees, which makes sense, since their services are the product of their analysts' thought processes.

That's a fairly clear contrast, but it's not that simple. Human analysts benefit from the speed of computers, and automated processes benefit from occasional oversight. Which brings us to two hybrid forms that a number of companies have adopted.

Software-assisted human analysis
The essence of human analysis is the decision making. It's not necessary to make the analyst do all the work when insight is the critical component. So some companies use software that organizes items and provides a user interface for the analyst. The system may even suggest preliminary scores for the analyst to confirm. Software-assisted human analysis uses the computer's speed to increase the efficiency of the human analyst.

Human-assisted software analysis
Software analysis is about speed, scale and predictability. The question is whether the resulting analysis is accurate enough to be useful. So some companies have human analysts audit the results. The process provides confirmation and feeds into machine-learning processes. Human-assisted software analysis uses human insight to check and improve the accuracy of the software.

In practice, most companies hedge their bets. Those with major software investments sell the benefit of automated analysis in their dashboards while offering human analysis and interpretation as separate services. In essence, they offer computer speed and scale and human subtlety and insight in separate packages. The human-analysis companies tend toward the software-assisted model for its efficiency benefits. Almost everyone offers custom research based on the combination of human insight and analytical software. When it's time to crunch the data, everyone seems to agree on that particular combination.


Can't say that this was much of a surprise. BuzzMetrics and The Nielsen Company announced today that Nielsen will acquire the rest of BuzzMetrics. Given that Nielsen already owned most of BuzzMetrics and had put its name on the service, this should reduce some confusion. I guess it explains an offhand comment about the challenges of integration, too.

Upon completion of the BuzzMetrics and NetRatings transactions, Nielsen's premiere Internet information services—which are marketed as Nielsen//NetRatings and Nielsen BuzzMetrics—will be consolidated into a single service unit.
See the insider comments from Jonathan Carson, Pete Blackshaw, Max Kalehoff and (insider emeritus) Matt Hurst.

You can find a complete roundup of mergers and acquisitions in the industry in SMA's acquisitions scorecard.

You wouldn't think we would need ideas from social media to apply to real-world interactions. "Conversation," after all, is a metaphor online, but in a physical space, it's literal. Unless the room is large, one person is up front and, most likely, the presenter is talking in front of page after page of bullet points. Enter the user-generated presentation, which brings the conversation back into the room and keeps everyone awake.

There's just something about PowerPoint—or at least the way most people misuse it. Garr Reynolds put the topic in my head again with Is it finally time to ditch PowerPoint? His answer: no, but presenters should use slides to support the presentation, not to be the presentation:

If your presentation visuals taken in the aggregate (e.g., your “PowerPoint deck”) can be perfectly and completely understood without your narration, then it begs the question: why are you there?
I subscribe to a few presentation-related blogs, so it's not a surprise to read a critique of PowerPoint and its misuse. But Virginia Miracle, B.L. Ochman, John Windsor and Cord Silverstein have posted on the topic in the last 24 hours. You think a few people are fed up with presenters reading their slides to disengaged audiences?

So it's timely to read about the user-generated presentation given this morning by Maggie Fox at the CapCHI workshop in Ottawa. Rather than a typical stand-up at the lectern with the usual slides, Maggie channeled her inner TV talk show host and let the audience direct the conversation in an open Q&A format.

Rather than standing at the front of the room, I wandered with a wireless mic and passed the hand-held around. There were many times (as I had hoped) that participants answered questions themselves; I liked it best when I passed the mic directly from one person to another—I felt more like a facilitator than a talking head.
When the initial technology platform didn't work as planned, they switched to viewing sites on the screen when they came up in discussion, which worked out well. With the active engagement of the audience, the session covered more information than usual and followed the interests of the people in the room. It's a safe bet that the audience (or is that "the people formerly known as the audience"?) stayed awake, too.


Languages under the radar


As Dave says, it's that time again, and in the wake of the new Technorati report (now know as The State of the Live Web), I'm once again wondering about international social media analysis. Specifically, I'm curious about support for Asian languages, which are better represented among bloggers than among the companies that monitor and analyze blogs.

One of the more interesting bits in the Guide to Social Media Analysis is the language matrix, which is now 28 languages wide. It's no surprise to see English on everyone's list, but the overall range is eye-opening, from Arabic to Ukrainian. The most widely used languages in the blogosphere are well represented, with two exceptions: Farsi (#10) and Japanese (#1).

Eight companies support Chinese, including one company in Shanghai (some companies use translation services, which I'm not including here). With the third most popular language in the blogosphere and the most feet on the planet, China's interest is obvious. Richard Edelman observed last November:

In general, there appears to be quite an active anti-corporate, anti-multinational voice on the blogosphere in China. The average blogger is a 30 year old male, of modest means, venting resentments. Japanese companies are the #1 target, with US companies just behind.
So the need is there, the interest is there, and companies are offering the services to clients who want to know what's being said in China.

Japanese, on the other hand, seems to be under the radar, despite its position as the leading language in the blogosphere. Only two companies I know of support Japanese, including one company in Tokyo. Japanese Internet users clearly know how to use blogs to express themselves—although they may have a preference for flames over discussion:

While anonymous writers have long used third-party online bulletin boards such as "2 channel" to criticize individuals and corporations in a phenomenon known as matsuri (meaning "festival"), the difference between matsuri and enjo [flames] is that with enjo, there is "no escape route" for those under attack, as it is their own blogs that are being targeted, Ohya noted.
Are companies not paying attention to Japanese blogs because of the flame wars? Is the language too hard? Are clients not asking for Japanese coverage? Why is Japanese coverage so hard to find?

Korean blogs remain off virtually everyone's radar. Technorati and Edelman gave up, and only one company I've heard from supports Korean (remember, not counting translations). I've heard that Korean culture treats social media as more personal, and there's that issue of missing ping support in Korean blog platforms. Korean representation at the WOMMA Summit suggests that we're missing something, though the attendees I talked to knew of only manual blog monitoring in Korea.

The newest addition to the Technorati Top 10 Languages, Farsi, is flying below the radar, too. At 1% of blogs, it's tied with German and half the level of French, Portuguese and Russian—all languages supported by multiple social media analysis companies. Nobody I've heard from supports Farsi... yet. Anyone wonder what 60 million native Persian speakers might be talking about in those blogs?

Maybe I need to make my own chart comparing language usage in blogs with language support in social media analysis. Hmm...


Visual text analysis


How many ways can you visually summarize text? I mean, stats are nice, but visuals soak into the brain so much faster. An ICWSM paper on visual analysis of weblog content reminded me of other examples of ways to depict text.

The ICWSM paper was presented by a team from Pacific Northwest National Laboratory, who used PNNL's IN-SPIRE software for their visualization. Jeff Clark shows a different text visualization tool on his blog with a visualization of the 2007 State of the Union address. The SOTU is a popular source among text visualizers, because of the interest in identifying themes in the address and the availability of years of data for comparison.

Jeff's visualization illustrates connections between selected terms in the 2007 address and where they appear within the text. An interactive version allows users to explore their own selected terms within the text.

Jason Griffey used TagCrowd to create a tag cloud (text cloud?) for the same SOTU address. This simple visualization illustrates the frequency of the most-used words in the text (with a filter to omit the uninteresing words, such as articles and conjunctions). The cloud doesn't offer as much insight as other visualizations, but it can offer themes at a glance when it works, and the cloud text lends itself to a drill-down application.

Brad Borevitz created an interactive site that compares every SOTU address since 1790. Click on a the timeline, and the cloud adjusts to that year's address. Full text of each address (with the selected term highlighted) and links to historical context are all part of the fun, but the easiest chuckles come from the relative length of the addresses. Can you identify the presidents with the longest addresses without looking?

Style.org has a SOTU parsing tool that highlights search terms in a greeked-text view of multiple addresses. Pick a keyword and see where it appeared in multiple years.

Text visualizations are more than just eye candy. Ed Schipul created keyword density charts to identify themes in recent addresses. He suggests a similar test for writers to see if they're staying on message. His keyword density analyzer creates the data from any web page.

Robert Kosara has links to more examples in visualization sets information free.


Motivated by fear

Is fear the leading motivator for clients who want to monitor social media? A new BusinessWeek article focuses on companies' fears of online attack, but most of the people I talk to are more focused on using social media for market research. Even defensive monitoring activities aren't necessarily motivated by fear.

Most companies are wholly unprepared to deal with the new nastiness that's erupting online. That's worrisome as the Web moves closer to being the prime advertising medium—and reputational conduit—of our time. "The CEOs of the largest 50 companies in the world are practically hiding under their desks in terror about Internet rumors," says top crisis manager Eric Dezenhall, author of the upcoming book Damage Control. "Millions of dollars in labor are being spent discussing whether or not you should respond on the Web."
Web Attack leads with the scare factor, but it doesn't get to whether the examples actually affected the businesses. A barrage of angry emails isn't damaging; failing to correct the issues that inspired the emails is. Online forums create a place for critics to gather, and while they can attract the wrong kind of attention to your brand, they also make it easier for you to know how your stakeholders want you to change. Painful, maybe, but the desire to improve your business is a better motivator than fear.

I'm not sure grouping social media analysis companies with ReputationDefender under a "counter-vigilantes" label is much of a compliment, either. I can read it as "opposed to vigilantes," but I think most readers in a hurry will interpret it as "the vigilantes on the other side."

B2B presents a more representative view of social media analysis as a tool for reputation management:

While public opinion surveys and media analysis have been traditional tools for measuring reputation, monitoring and analyzing online blogs and forums increasingly is important.
Blogosphere sentiment also can constitute leading-edge intelligence, since bloggers tend to pick up industry trends almost immediately and, unlike mainstream media, flog them heavily.
Fear of online attack is one motivator, sure. But there's more to learn than whether employees resent the CEO, and companies should get beyond fear to understand what they can learn from online sources. I haven't seen an example yet when a company's performance was hurt by the online conversation in the absence of an underlying problem in the real world.


About Nathan Gilliatt

  • ng.jpg
  • Voracious learner and explorer. Analyst tracking technologies and markets in intelligence, analytics and social media. Advisor to buyers, sellers and investors. Writing my next book.
  • Principal, Social Target
  • Profile
  • Highlights from the archive


Monthly Archives

New on SMA