One of the nicest compliments I've received over the years came from a company founder who read one of my reports and said I'd summarized his company's work better than they did. It's just one of the things I do—take a pile of information and figure out what it's about. I summarize. So if you need to tease out the short version of something complicated, call me. But I've also been accumulating data on an industry for years, which gives me the material for a different view—the annual recap. Roll tape…

The Year in M&A, Social Media Analysis 2014
I've been tracking companies that extract meaning from social media data since 2006 (it stays interesting if you let the definitions evolve with the market). One way to tell how things are changing is to watch where the money goes, and in 2014, more money flowed to consolidation. VC and PE money funded multiple acquisitons by companies staking out hoped-for prominent positions. Big companies tucked SMA into their products and portfolios, and smaller companies chose "buy" over "build" for key capabilities.

Add some actual mergers and a few acquihires, and we get more transactions than in 2013. In other news, it takes longer to write a recap of 38 deals than one with 18 deals, which is how a year-end post shows up in early January. :-)

More Than $420 Million Invested in Social Media Analysis Companies in 2014
New investments in SMA companies were slightly below 2013 levels in dollar terms, although when you consider deals of unannounced size, we're probably close to the window of uncertainty on that. Some of that money has gone to fund acquisitions, and anybody who took a round of more than $20 million bears watching, but we're also still seeing funding for interesting and innovative companies in the space where social media and data analysis intersect.

Based on the last year's investment activity, look for continued product innovation and market evolution, in addition to ongoing consolidation.

So here's a summary: The opportunities in social media analysis are evolving, and heavy bags of money are being directed toward exploiting them. For the long version and its application to your situation, contact me about becoming a client.

Where do you find books to read? Do you ask your friends, follow reviews or seller recommendations, or just go for the bestsellers? Whether you like your books on paper or downloaded, you have to know it exists to read it, and because we're in the twenty teens, there's a social way to do it online.

Start where you are?
An obvious way to learn about books online is to ask your social networks—wherever you're connected to people online, just ask 'em. If you use different networks for different purposes, that should inform where you ask, but you have the connections. Sometimes it's just as easy as asking.

But asking doesn't always work. A discussion on Facebook about paper and ebooks this week included just such a request, but no responses. So what else can we do?

Networking for readers
How about a social network specifically for readers of books? Goodreads is exactly that, a social network built entirely upon books and the people who read them. You can look through reviews and recommendations organized by books and authors, or approach it socially, with its friends, followers and groups.

I'm getting great ideas from some very smart people I follow on Goodreads. Because of its tight focus on books, I find it easier to maintain a careful approach to connecting in Goodreads than in other networks. In addition, Goodread's updates are tied to specific books, so it doesn't have the noise problem of other networks.

On another level, Goodreads creates yet another opportunity for public image tailoring, because its entries aren't automatic. Some of us might be a bit selective in what we choose to share—more professionally relevant titles than pop fiction, for example—but that actually improves Goodreads as a socially powered recommendation engine. If people I follow choose to share only the good stuff, they're effectively curating the recommendation lists.

Gems from Twitter
Goodreads runs on effort from people in its network; what about suggestions from people who haven't joined? BookVibe takes a different approach, pulling book mentions from a user's Twitter stream to generate its lists. It's not as far along as Goodreads, and there's some overlap, but it does have the advantages of pulling its recommendations from a network you've already assembled and using existing behavior as its raw material.

BookVibe strikes me as a worthy experiment, another startup finding useful information by applying a novel analytical lens to the flood of Twitter data. In this case, the startup is Parakweet , a natural-language processing specialist that set up BookVibe as a technology demonstration.

Remember blogs?
I've seen a few blog posts with suggested reading lists, such as these from the Oxford Martin School and Mention. If you don't have a source on a topic, try searching for "reading list" and a relevant keyword or two. It's not an unusual topic for a blog post or web page.

What about the big dog?
You can't talk about books without mentioning Amazon (I checked—it's a law). I remember an analysis years ago about the many social components of an Amazon product page, although I can't find it now. Product reviews, lists and wish lists are fairly obvious features, and it's possible to find more suggestions by following the creators of reviews and lists. Just find someone you'd like to hear more from and click through to their profile for more of their reviews, lists and tags. It's sort of social, if a bit too much effort.

Amazon has the makings of a really good social network for readers, except that it's missing the social network to run it. That may change, since it bought Goodreads last year. Until then, you can do a bit of social exploration with Amazon's existing features and some manual effort.

Old skool
If all those networks can't suggest good books faster than you read them, then you read too fast. :-) Oh, and the book I'm reading now? I found it on the New Nonfiction shelf at my local library. Curator was a word long before online sharing tools borrowed it.

It's not never too late to add something to the summer reading pile. What are you reading that people should know about?

Surveillance whiteboardAs ubiquitous surveillance is increasingly the norm in our society, what are the options for limiting its scope? What are the levers that we might pull? We have more choices that you might think, but their effectiveness depends on which surveillance we might hope to limit.

One night last summer, I woke up with an idea that wouldn't leave me alone. I tried the old trick of writing it down so I could forget it, but more details kept coming, and after a couple of hours I had a whiteboard covered in notes for a book on surveillance in the private sector (this was pre-Snowden, and I wasn't interested in trying to research government intelligence activities). Maybe I'll even write it eventually.

The release of No Place to Hide, Glenn Greenwald's book on the Snowden story, provides the latest occasion to think about the challenges and complexity of privacy and freedom in a data-saturated world. I think the ongoing revelations have made clear that surveillance is about much more than closed-circuit cameras, stakeouts and hidden bugs. Data mining is a form of passive surveillance, working with data that has been created for other purposes.

Going wide to frame the question
As I was thinking about the many ways that we are watched, I wondered what mechanisms might be available to limit them. I wanted to be thorough, so I started with a framework to capture all of the possibilities. Here's what I came up with:

Constraints on personal data

The framework is meant to mimic a protocol stack, although the metaphor breaks down a bit in the higher layers. The lowest layers provide more robust protection, while the upper layers add nuance and acknowledge subtleties of different situations. Let's take a quick tour of the layers, starting at the bottom.

Hard constraints
The lowest layers represent hard constraints, which operate independently of judgment and decisions by surveillance operators:

  • Data existence
    If the data don't exist, they can't be used or abused. Cameras that are not installed, microphones that are not activated do not collect data. Unposted travel plans do not advertise absence; non-geotagged photos and posts are not used to track individual movements. At the individual level, countermeasures that prevent the generation of data exhaust will tend to defeat surveillance, as will the avoidance of known cameras and other active surveillance mechanisms.

  • Technical
    Data, once generated, can be protected, which is where much of the current discussion focuses. Operational security measures—strong passwords, access controls, malware prevention, and the like—provide the basics of protection. Encryption of stored data and communication links increase the difficulty—and cost—of surveillance, but this is an arms race. The effectiveness of technical barriers to surveillance depends substantially on who you're trying to keep out and the resources available to them.
Soft constraints
The upper layers represent soft constraints—those which depend on human judgment, decisionmaking and enforcement for their power. Each of these will tend to vary in its effectiveness by the people and organizations conducting surveillance activities.

  • Legal
    This is the second of two layers that contain most of the ongoing discussion and debate, and the default layer for those who can't join the technical discussion. The threat of enforcement may be a deterrent to some abuse. Different laws cover different actors and uses, as illustrated in the current indictment of Chinese agents for economic espionage.

  • Market
    In the private sector, there's no enforcement mechanism like market pressure—in this case, a negative reaction from disapproving customers. Companies have a strong motive to avoid activities that hurt sales and profits, and so they may be deterred from risking a perception of surveillance and data abuse. This is the layer least likely to be codified, but it has the most robust enforcement mechanism for business. In government, the equivalent constraint is political, as citizens/voters/donors/pressure groups respond to laws, policies and programs.

  • Policy
    At the organization level, policy can add limits beyond what is required by law and other obligations. Organization policy may in many cases be created in reaction to market pressure and prior hard lessons, extending the effectivenes of market pressure to limit abusive practices. In the public sector, the policy layer tends to handle the specifics of legal requirements and political pressures.

  • Ethical
    Professional and institutional ethics promise to constrain bad behavior, but the specific rules vary by industry and role, and enforcement is frequently uncertain. Still, efforts such as the Council for Big Data, Ethics, and Society are productive.

  • Personal
    Probably the weakest and certainly the least enforceable layer of all, personal values may prevent some abuse of surveillance techniques. Education and communication programs could reinforce people's sensitivity to personal privacy, but I include this layer primarily for completeness. Where surveillance operators are sensitive to personal privacy, abuses will tend not to be an issue.
Clearly, the upper layers of this framework lack some of the definitive protections of the lower layers, and they're unlikely to provide any protection from well-resourced government intelligence agencies (from multiple countries) and criminal enterprises. But surveillance (broadly construed) is also common in the private sector, where soft constraints are better than no constraints. As we consider the usefulness and desirability of the growing role of surveillance in society, we should consider all of the levers available.

One step at a time
This framework isn't meant to answer the big questions; it's about structuring an exploration of the tradeoffs we make between the utility and the costs of surveillance. Even there, this is only one of several dimensions worth considering. Surveillance happens in the private sector and government, both domestically and internationally. There's a meaningful distinction between data access and usage, and different value in different objectives. Take these dimensions and project them across the whole spectrum of active and passive techniques that we might call surveillance, and you see the scope of the topic.

Easy answers don't exist, or they're wrong. It's a complex and important topic. Maybe I should write that book.

If I write both the surveillance book and the Omniscience book (on the value that can be developed from available data), should I call them yin and yang?

Today's announcement that Twitter is buying Gnip raises big questions about the market for social media data. While it's too early to know how things will fall out, the deal changes the shape of the playing field for everyone involved—publishers, data resellers, software developers, and corporate customers.

Twitter has bought other companies in the social media analysis space—BackType (2011), Bluefin Labs (2013), Trendrr (2013)—but Gnip is a bigger deal. Gnip competes with other Twitter partners, and Twitter competes with other Gnip partners. If you weren't sure, things just got interesting.

As a reminder, here's my view of the social data ecosystem:

Social data ecosystem

Anyone who works with data from social media sources has an interest in how the rest of the ecosystem reacts to the Gnip acquisition. Here's my initial take on what to watch for:

  • Twitter competitors
    Twitter isn't the only data source for Gnip. Gnip's sources include full feeds from Tumblr, Foursquare, WordPress, and more. It also manages API access for Facebook, Google, and others that probably see Twitter as a competitor. How will these companies ("publishers" in the data market) react to the deal? Will access to data from Twitter competitors remain available through Gnip?

  • Gnip competitors
    Twitter has offered its data through multiple data partners; how will DataSift, Dataminr, and NTT Data fit into the revised model? What impact will that have on their customers? (In a post, DataSift says its "relationship, contract and data resyndication partnership" are unchanged.)

  • Other data providers
    There are other companies in the social data business, mainly those specializing in collecting data from blogs and forums. Will they add (or drop) services in response to the changing market?
I won't speculate on the answers to these questions today, but they're the questions I'm pondering in the wake of the announcement. Change reverberates, so these are things to watch.

I've asked Twitter for a comment, but I suspect we just have to wait for the answers.

Get the latest industry news at Social Media Analysis.

Poisoning the Online Well

Garbage in, garbage out. The latest from the ongoing Snowden/Greenwald revelation is a reminder that interested parties know how to plant false information on the Internet, and that some of them are probably doing it. It has implications for anyone looking for good information online, anyone with a reputation to protect, and—potentially—for everyone invested in the online world.

The piece itself is worth a look (How Covert Agents Infiltrate the Internet to Manipulate, Deceive, and Destroy Reputations). The details are more disturbing than surprising, but as you read it, ignore the focus on the British intelligence agency GCHQ. It doesn't matter whether you trust your own government's actions, and the common distinction between a country's own citizens and everyone else is also irrelevant. The same tactics are available to every government—and any other motivated group. If they don't do this already, the newly released document provides the suggestion.

For the government intelligence guys, this is just a continuation of the second oldest profession: Get your enemy's secrets; protect your own. Deceive your enemy; avoid deception. It's a challenge when multiple entities are simultaneously trying to (a) get useful information from open sources online and (b) plant deceptive information in the same sources. I wonder how much blue-on-blue deception happens between information operations and open-source intelligence gathering, anyway.

For everyone else, this latest report should serve as a reminder of some of the risks in social media:

  1. Data quality risk
    People tell lies online—I know, but it's true. Some of the false information out there may have been placed by a motivated adversary who wants to mislead you (maybe even you, specifically). The target may be your organization, a related organization or someone who wants to work with you.

    The information you find online can be a useful source, but it's not the only source. If you're informing significant decisions, use all of your available resources, and be alert to the possibility of intentional deception.

  2. Reputation risk
    We're familiar with the concept of online reputation risk; corporate risk managers seem to think it's almost synonymous with "social media." If your business has potential exposure to government opposition (from whatever country), your risk may come from a better organized and funded source than the usual unhappy former customer.

  3. Target risk
    As people conduct their personal and political lives online, they expose themselves to snooping and more. The threats to personal privacy and freedom by government agencies have made the ongoing revelations newsworthy, but these public and semi-public channels are equally exposed to anyone who disagrees.

  4. Collateral damage risk
    Some of these information operations happen in the same online venues as normal personal use. As competing governments start viewing the online world through the cyber battlespace lens, normal users and the platforms themselves could take some damage. Off the top of my head, I'm thinking of legal, market, and technical risks, but that's probably just a start.

    It's too much to go into in a post, but companies with significant exposure to covert online tactics would be well served to chase down the implications of those tactics, and don't limit the discussion to legal exposure. Beyond the specifics on any one program, the revelations of the last year indicate the willingness of government entities in multiple countries to use environments operated by private-sector companies in ways they weren't intended. The safe asumptions are that governments are doing more than we know, and so are other types of organizations.

Politically, it matters very much who is doing what to whom and why. As a practical matter, who and why don't much matter. It's enough to know that someone, somewhere is developing and using methods to use popular online tools against people and organizatons they don't like. If you depend on online tools and don't have a basic literacy in the concept of cyberwar, it's time to learn, so you can recognize it if it comes to your neighborhood.

One of the great strengths of the Internet is the way it overcomes the limitations of distance. A side effect is that it also does away with the concept of a safe distance from danger.


Updating the Highlights Reel

In 2007—has it really been so long?—I posted a list of older posts that I thought were worth remembering. The relentless updating of the reverse-chronological blog format was hiding some good stuff, and I wanted people to find it. Over time, some of those old posts became truly outdated, and I've gotten into some new themes. It was time for an update, and in the process, I was reminded of where we've been—and where we're going.

The complete list: Highlights from the Archive

History of social media
The updated list goes all the way back to 2006, when I first sketched out the role of the social media manager. It's not quite what I would write today, but I think it holds up reasonably well, especially given that the perceived need at the time was "blogger relations." Somewhat more recently, the posts on influence and the meaning of "Like" aren't exactly what everyone else had to say on those topics.

Social media analysis
From "listening" to the latest emerging tech for analytics, I've been watching and writing about SMA for years. A 2008 post on the building blocks of social media analysis set the stage for later lists of companies offering the various pieces. I still like the three buckets of social media data framework as a way of sorting out the many tools in the market, too.

I particularly enjoyed rereading Language Support in Social Media Analysis, a detailed look at all the different ways that a vendor might check the language box. In my public speaking, I tend to go high-level and generalize a lot, and this example shows why. When you get into the specifics, they get very specific, and heavily dependent on a client's situation.

Expanding horizons
For several years, there's been some tension between the blog that started with a strong emphasis on social media and the topics I find interesting more recently. I've hinted at some of the topics with the summer reading posts and some others, and now it's time to put more emphasis on the new stuff.

The whiteboard series of posts was a step toward sharing some of the speculation that develops on the literal whiteboards in my office. The Omniscience, computer attention, and learning ecosystem ideas from that series are themes that I need to revisit, and there are others in the drafts folder.

Expect more connecting of dots from diverse sources, such as last year's Simulations, Customer Journeys, and the Link Between What Could Happen and What Did Happen. I'm not sure why I'm still surprised to find connections between the seemingly unrelated topics I dig into. The latest example crosses long-term policy analysis, simulations, wargames, the mechanics of human insight, network science, and associative memory—my sources keep citing each other. There's no social media angle, just fascinating stuff.

I've been involved in working through the meaning and implications of new technologies for a long time, and there's less for me to do once a technology reaches mass adoption and people understand it. With the social media market maturing into something that holds fewer mysteries, I plan to write more about those new topics.


Social Media Analysis is my attempt at a sort of online industry trade journal covering the companies that work with social media data. Last year, I started a recap of the financial transactions in the business, so let's catch up with 2013.

2013 Saw More, Bigger Investments in Social Media Analysis
First, where the investment money went. And boy, did it go, more than $465 million. The champion fundraiser this year—by far—was HootSuite, with $165 million added to its runway.

The Year in M&A (and an IPO), Social Media Analysis 2013
Once all those companies are funded, some of them get acquired. One even went public. The big theme seems to be consolidation, as buyers picked up companies with complementary technology, products and people. At this rate, we should finish concentrating the industry by about 2080.

SMA would be better with more content, but I need help if it's going to get it. I have ideas for new sections, including opinion columns, product reviews, how-to articles and more. Anyone interested in becoming a contributor?

I'm going to do something old-school and blog about a couple of blog posts today. Consider it a break from the latest outragefest on the 'book. Instead, let's share bright ideas about large-impact innovation and how we've been looking for it in the wrong places. It's what happens when two posts, posted months apart, cross my desktop in the same morning.

First up: Jerzy Gangi's post from August, Why Silicon Valley Funds Instagrams, not Hyperloops, runs down the reasons that venture-funded startups keep launching relatively easy web-based software applications. It's worth a read. The short version is, that's what the investment system is looking for, and [insert Willie Sutton quote here].

Next is "Killer Apps" Evolve, Vinnie Mirchandani previewing Chunka Mui and Paul Carroll’s new book, The New Killer Apps: How Large Companies Can Out-Innovate Start-Ups. Google's self-driving cars are one example (built with investment from both corporate and government sources).

We shouldn't be surprised that startups and investors play by the rules of the game. Innovation and addressing the big issues of our time, however, are not the game they're playing.

The M&A market can be characterized as a giant distributed R&D department for major corporations.
— Jerzy Gangi

Remember corporate R&D? Bell Labs, PARC, Lockheed's Skunk Works? Big companies exist to take on projects and markets that are too big for small companies, and part of what they do is large-scale innovation. Whether they invent in their own labs or build from acquired startups, big changes that take place in the physical world will happen only when somebody puts serious capital behind them.

It's interesting that the old-school sources of innovation—university, government and corporate labs—are still out there, and despite long-term reductions, they're still at work. If we're looking for the world-changing innovations, maybe we just need to put more effort into learning about them and their projects.

SpikeEveryone loves a chart that answers a key question, but I particularly like the ones that make you think: Why did that happen? What changed? What are we missing? What happens next?

A spike on a chart is a big ol' why, waiting to be asked.
me, 2010

It's an old point, but a few examples came to me last week. Beyond the immediate interpretation of the numbers (e.g., big number good, small number bad), I think these patterns imply follow-up questions along the lines of "what happened here" and "why did it happen?"

  • Spike in a trend
    A sudden change means something happened. What? Why? Did the value then return to the usual range? Is the new value temporary or a new normal? Do you need to take some action as a result? The spike is the chart telling you where to look, which I suspect most people do instictively.

  • Smooth line on a historically bumpy trend
    A bumpy trend line that grows more stable is telling you something else, but the follow-up questions are similar. Did the data source stop updating, or is the change real? Remember to watch the derivatives of your metrics, too. If the metric keeps changing but the rate becomes constant, is that real or an artifact of the data collection? What happened, why, what action in response…

  • Crossing lines
    A is now bigger than B; does it matter? Obviously, it depends on what A and B represent, but it's a good place to understand: what happened, why, what it means, how much it matters, and whether to expect it to continue. If it's a metric that people care about, expect to discuss it.
Beyond the numbers
Thinking beyond the graphs, I remembered two things from conceptual diagrams that always make me curious:

  • Empty boxes in a matrix
    If the framework makes sense, its boxes should be filled in, whether it's the consultant's standard two-by-two matrix or something much larger. An empty box may represent an impossible combination—but it could be a missed challenge or opportunity. I once found $12 million in sales in an empty box, and so empty boxes always get my attention.

  • Solid lines around a space
    A clear definition says as much about what something isn't as what it is. When the definition takes the form of a diagram—an org chart, a Venn diagram, a network graph—I wonder about what's just outside the diagram. The adjacent markets and competitors from the future; the people who are near—but not in—an organization. What does the white space represent, and what does that mean to you?
These came to me as I was getting ready to attend a lecture by Kaiser Fung (which was excellent—ask him about the properties of big data). I'm sure there are many more. Without wading into technical analysis waters, what other patterns make you stop and think?

Mapping the Social Data Ecosystem

If you want to work with social media data, you first need some data. But "social media data" isn't a single thing, and sourcing it involves decisions about what you need and where you get it. Those decisions have technical, business, and even legal implications, which is why I've been working on a new research theme for Social Target: the social data ecosystem.

The project grew out of a whiteboard session with a client last year. I showed them how social media data—Twitter content in their example—is available from multiple sources, but your choice affects what you get and creates requirements for your systems to handle it.

The first draft
I've turned that original sketch into this map of the industry, which I'm treating as a hypothesis in the research phase. As I talk with companies in the various categories, I expect to validate the model and get a better understandinging of how the interfaces work.

Social data ecosystem

This is what I mean by the social data ecosystem. It starts with the companies who collect data directly from their users, and it ends with the analyst or manager who is looking for information in social media. In the middle is where all the data changes hands and software turns it into something useful.

Exactly what happens in between is interesting and a bit complicated—but perhaps a bit less complicated once this project is complete.

What's your experience?
I'm interviewing companies throughout the ecosystem now. In addition to understanding the different business models in play, I'm also asking about current issues in the market. I'd like to know what's working, and what's not.

I'd like to hear from you, too. What's been your experience in working with social media data? Comment here or contact me privately, and let's find out together what's going on in this fast-evolving market.

About Nathan Gilliatt

  • ng.jpg
  • Voracious learner and explorer. Analyst tracking technologies and markets in intelligence, analytics and social media. Advisor to buyers, sellers and investors. Writing my next book.
  • Principal, Social Target
  • Profile
  • Highlights from the archive