Today's announcement that Twitter is buying Gnip raises big questions about the market for social media data. While it's too early to know how things will fall out, the deal changes the shape of the playing field for everyone involved—publishers, data resellers, software developers, and corporate customers.

Twitter has bought other companies in the social media analysis space—BackType (2011), Bluefin Labs (2013), Trendrr (2013)—but Gnip is a bigger deal. Gnip competes with other Twitter partners, and Twitter competes with other Gnip partners. If you weren't sure, things just got interesting.

As a reminder, here's my view of the social data ecosystem:

Social data ecosystem

Anyone who works with data from social media sources has an interest in how the rest of the ecosystem reacts to the Gnip acquisition. Here's my initial take on what to watch for:

  • Twitter competitors
    Twitter isn't the only data source for Gnip. Gnip's sources include full feeds from Tumblr, Foursquare, WordPress, and more. It also manages API access for Facebook, Google, and others that probably see Twitter as a competitor. How will these companies ("publishers" in the data market) react to the deal? Will access to data from Twitter competitors remain available through Gnip?

  • Gnip competitors
    Twitter has offered its data through multiple data partners; how will DataSift, Dataminr, and NTT Data fit into the revised model? What impact will that have on their customers? (In a post, DataSift says its "relationship, contract and data resyndication partnership" are unchanged.)

  • Other data providers
    There are other companies in the social data business, mainly those specializing in collecting data from blogs and forums. Will they add (or drop) services in response to the changing market?
I won't speculate on the answers to these questions today, but they're the questions I'm pondering in the wake of the announcement. Change reverberates, so these are things to watch.

I've asked Twitter for a comment, but I suspect we just have to wait for the answers.

Get the latest industry news at Social Media Analysis.

Poisoning the Online Well

Garbage in, garbage out. The latest from the ongoing Snowden/Greenwald revelation is a reminder that interested parties know how to plant false information on the Internet, and that some of them are probably doing it. It has implications for anyone looking for good information online, anyone with a reputation to protect, and—potentially—for everyone invested in the online world.

The piece itself is worth a look (How Covert Agents Infiltrate the Internet to Manipulate, Deceive, and Destroy Reputations). The details are more disturbing than surprising, but as you read it, ignore the focus on the British intelligence agency GCHQ. It doesn't matter whether you trust your own government's actions, and the common distinction between a country's own citizens and everyone else is also irrelevant. The same tactics are available to every government—and any other motivated group. If they don't do this already, the newly released document provides the suggestion.

For the government intelligence guys, this is just a continuation of the second oldest profession: Get your enemy's secrets; protect your own. Deceive your enemy; avoid deception. It's a challenge when multiple entities are simultaneously trying to (a) get useful information from open sources online and (b) plant deceptive information in the same sources. I wonder how much blue-on-blue deception happens between information operations and open-source intelligence gathering, anyway.

For everyone else, this latest report should serve as a reminder of some of the risks in social media:

  1. Data quality risk
    People tell lies online—I know, but it's true. Some of the false information out there may have been placed by a motivated adversary who wants to mislead you (maybe even you, specifically). The target may be your organization, a related organization or someone who wants to work with you.

    The information you find online can be a useful source, but it's not the only source. If you're informing significant decisions, use all of your available resources, and be alert to the possibility of intentional deception.

  2. Reputation risk
    We're familiar with the concept of online reputation risk; corporate risk managers seem to think it's almost synonymous with "social media." If your business has potential exposure to government opposition (from whatever country), your risk may come from a better organized and funded source than the usual unhappy former customer.

  3. Target risk
    As people conduct their personal and political lives online, they expose themselves to snooping and more. The threats to personal privacy and freedom by government agencies have made the ongoing revelations newsworthy, but these public and semi-public channels are equally exposed to anyone who disagrees.

  4. Collateral damage risk
    Some of these information operations happen in the same online venues as normal personal use. As competing governments start viewing the online world through the cyber battlespace lens, normal users and the platforms themselves could take some damage. Off the top of my head, I'm thinking of legal, market, and technical risks, but that's probably just a start.

    It's too much to go into in a post, but companies with significant exposure to covert online tactics would be well served to chase down the implications of those tactics, and don't limit the discussion to legal exposure. Beyond the specifics on any one program, the revelations of the last year indicate the willingness of government entities in multiple countries to use environments operated by private-sector companies in ways they weren't intended. The safe asumptions are that governments are doing more than we know, and so are other types of organizations.

Politically, it matters very much who is doing what to whom and why. As a practical matter, who and why don't much matter. It's enough to know that someone, somewhere is developing and using methods to use popular online tools against people and organizatons they don't like. If you depend on online tools and don't have a basic literacy in the concept of cyberwar, it's time to learn, so you can recognize it if it comes to your neighborhood.

One of the great strengths of the Internet is the way it overcomes the limitations of distance. A side effect is that it also does away with the concept of a safe distance from danger.

Related:

Updating the Highlights Reel

In 2007—has it really been so long?—I posted a list of older posts that I thought were worth remembering. The relentless updating of the reverse-chronological blog format was hiding some good stuff, and I wanted people to find it. Over time, some of those old posts became truly outdated, and I've gotten into some new themes. It was time for an update, and in the process, I was reminded of where we've been—and where we're going.

The complete list: Highlights from the Archive

History of social media
The updated list goes all the way back to 2006, when I first sketched out the role of the social media manager. It's not quite what I would write today, but I think it holds up reasonably well, especially given that the perceived need at the time was "blogger relations." Somewhat more recently, the posts on influence and the meaning of "Like" aren't exactly what everyone else had to say on those topics.

Social media analysis
From "listening" to the latest emerging tech for analytics, I've been watching and writing about SMA for years. A 2008 post on the building blocks of social media analysis set the stage for later lists of companies offering the various pieces. I still like the three buckets of social media data framework as a way of sorting out the many tools in the market, too.

I particularly enjoyed rereading Language Support in Social Media Analysis, a detailed look at all the different ways that a vendor might check the language box. In my public speaking, I tend to go high-level and generalize a lot, and this example shows why. When you get into the specifics, they get very specific, and heavily dependent on a client's situation.

Expanding horizons
For several years, there's been some tension between the blog that started with a strong emphasis on social media and the topics I find interesting more recently. I've hinted at some of the topics with the summer reading posts and some others, and now it's time to put more emphasis on the new stuff.

The whiteboard series of posts was a step toward sharing some of the speculation that develops on the literal whiteboards in my office. The Omniscience, computer attention, and learning ecosystem ideas from that series are themes that I need to revisit, and there are others in the drafts folder.

Expect more connecting of dots from diverse sources, such as last year's Simulations, Customer Journeys, and the Link Between What Could Happen and What Did Happen. I'm not sure why I'm still surprised to find connections between the seemingly unrelated topics I dig into. The latest example crosses long-term policy analysis, simulations, wargames, the mechanics of human insight, network science, and associative memory—my sources keep citing each other. There's no social media angle, just fascinating stuff.

I've been involved in working through the meaning and implications of new technologies for a long time, and there's less for me to do once a technology reaches mass adoption and people understand it. With the social media market maturing into something that holds fewer mysteries, I plan to write more about those new topics.

Onward.

Social Media Analysis is my attempt at a sort of online industry trade journal covering the companies that work with social media data. Last year, I started a recap of the financial transactions in the business, so let's catch up with 2013.

2013 Saw More, Bigger Investments in Social Media Analysis
First, where the investment money went. And boy, did it go, more than $465 million. The champion fundraiser this year—by far—was HootSuite, with $165 million added to its runway.

The Year in M&A (and an IPO), Social Media Analysis 2013
Once all those companies are funded, some of them get acquired. One even went public. The big theme seems to be consolidation, as buyers picked up companies with complementary technology, products and people. At this rate, we should finish concentrating the industry by about 2080.

SMA would be better with more content, but I need help if it's going to get it. I have ideas for new sections, including opinion columns, product reviews, how-to articles and more. Anyone interested in becoming a contributor?

I'm going to do something old-school and blog about a couple of blog posts today. Consider it a break from the latest outragefest on the 'book. Instead, let's share bright ideas about large-impact innovation and how we've been looking for it in the wrong places. It's what happens when two posts, posted months apart, cross my desktop in the same morning.

First up: Jerzy Gangi's post from August, Why Silicon Valley Funds Instagrams, not Hyperloops, runs down the reasons that venture-funded startups keep launching relatively easy web-based software applications. It's worth a read. The short version is, that's what the investment system is looking for, and [insert Willie Sutton quote here].

Next is "Killer Apps" Evolve, Vinnie Mirchandani previewing Chunka Mui and Paul Carroll’s new book, The New Killer Apps: How Large Companies Can Out-Innovate Start-Ups. Google's self-driving cars are one example (built with investment from both corporate and government sources).

We shouldn't be surprised that startups and investors play by the rules of the game. Innovation and addressing the big issues of our time, however, are not the game they're playing.

The M&A market can be characterized as a giant distributed R&D department for major corporations.
— Jerzy Gangi

Remember corporate R&D? Bell Labs, PARC, Lockheed's Skunk Works? Big companies exist to take on projects and markets that are too big for small companies, and part of what they do is large-scale innovation. Whether they invent in their own labs or build from acquired startups, big changes that take place in the physical world will happen only when somebody puts serious capital behind them.

It's interesting that the old-school sources of innovation—university, government and corporate labs—are still out there, and despite long-term reductions, they're still at work. If we're looking for the world-changing innovations, maybe we just need to put more effort into learning about them and their projects.

SpikeEveryone loves a chart that answers a key question, but I particularly like the ones that make you think: Why did that happen? What changed? What are we missing? What happens next?

A spike on a chart is a big ol' why, waiting to be asked.
me, 2010

It's an old point, but a few examples came to me last week. Beyond the immediate interpretation of the numbers (e.g., big number good, small number bad), I think these patterns imply follow-up questions along the lines of "what happened here" and "why did it happen?"

  • Spike in a trend
    A sudden change means something happened. What? Why? Did the value then return to the usual range? Is the new value temporary or a new normal? Do you need to take some action as a result? The spike is the chart telling you where to look, which I suspect most people do instictively.

  • Smooth line on a historically bumpy trend
    A bumpy trend line that grows more stable is telling you something else, but the follow-up questions are similar. Did the data source stop updating, or is the change real? Remember to watch the derivatives of your metrics, too. If the metric keeps changing but the rate becomes constant, is that real or an artifact of the data collection? What happened, why, what action in response…

  • Crossing lines
    A is now bigger than B; does it matter? Obviously, it depends on what A and B represent, but it's a good place to understand: what happened, why, what it means, how much it matters, and whether to expect it to continue. If it's a metric that people care about, expect to discuss it.
Beyond the numbers
Thinking beyond the graphs, I remembered two things from conceptual diagrams that always make me curious:

  • Empty boxes in a matrix
    If the framework makes sense, its boxes should be filled in, whether it's the consultant's standard two-by-two matrix or something much larger. An empty box may represent an impossible combination—but it could be a missed challenge or opportunity. I once found $12 million in sales in an empty box, and so empty boxes always get my attention.

  • Solid lines around a space
    A clear definition says as much about what something isn't as what it is. When the definition takes the form of a diagram—an org chart, a Venn diagram, a network graph—I wonder about what's just outside the diagram. The adjacent markets and competitors from the future; the people who are near—but not in—an organization. What does the white space represent, and what does that mean to you?
These came to me as I was getting ready to attend a lecture by Kaiser Fung (which was excellent—ask him about the properties of big data). I'm sure there are many more. Without wading into technical analysis waters, what other patterns make you stop and think?

Mapping the Social Data Ecosystem

If you want to work with social media data, you first need some data. But "social media data" isn't a single thing, and sourcing it involves decisions about what you need and where you get it. Those decisions have technical, business, and even legal implications, which is why I've been working on a new research theme for Social Target: the social data ecosystem.

The project grew out of a whiteboard session with a client last year. I showed them how social media data—Twitter content in their example—is available from multiple sources, but your choice affects what you get and creates requirements for your systems to handle it.

The first draft
I've turned that original sketch into this map of the industry, which I'm treating as a hypothesis in the research phase. As I talk with companies in the various categories, I expect to validate the model and get a better understandinging of how the interfaces work.

Social data ecosystem

This is what I mean by the social data ecosystem. It starts with the companies who collect data directly from their users, and it ends with the analyst or manager who is looking for information in social media. In the middle is where all the data changes hands and software turns it into something useful.

Exactly what happens in between is interesting and a bit complicated—but perhaps a bit less complicated once this project is complete.

What's your experience?
I'm interviewing companies throughout the ecosystem now. In addition to understanding the different business models in play, I'm also asking about current issues in the market. I'd like to know what's working, and what's not.

I'd like to hear from you, too. What's been your experience in working with social media data? Comment here or contact me privately, and let's find out together what's going on in this fast-evolving market.

Summer Reading (Summer Not)

As the kids go back to school and we ease back into more normal schedules, it's time to take a look back at some of what came off the reading pile in recent months. No novels this year, but if you're interested in learning something, I have a few suggestions.

Big data coverBig data, bigger questions
Viktor Mayer-Schönberger and Kenneth Cukier's Big Data (2013) is an approachable introduction to the trendy topic for readers who need the introduction, but it also gets into important topics for people already in the space. After their very readable sections on the what, why, and how, the professor (Mayer-Schönberger) and the journalist (Cukier) move into the implications of following the big data path, including the risks to privacy and individual freedom. Even if the beginning of the book is a review for you, stick with it until the end. The last third of the book covers issues you—we—need to be thinking about.

Black code coverI thought the next book on the pile would be a change of subject, taking a deeper look into the freaky world of cyberwar and cyber criminals, but the beginning of Black Code (2013) was a smooth transition into even more implications of what we're doing with data and the online world. Ronald Deibert directs the Citizen Lab at the University of Toronto's Munk School of Global Affairs, and he's found more than a few things to be concerned about online.

Writing before secrets started flowing from the NSA and elsewhere, Deibert links data mining, pervasive surveillance, and cyber crime/war (those last two, it turns out, are indistinguishable at the tactical level). If you use electronic communications for anything at all sensitive, you need to read this one. Even if you've read every bit of news out of the Snowden leak, you'll learn more from Deibert's global take on the same themes.

Analyzing the working of wetware
Thinking fast and slowI might have to mark up my copy of Thinking, Fast and Slow (2011) to change the name to Reading, Fast and Slow. Daniel Kahneman's book on how we have two competing systems for processing information—one reflexive and the other thoughful—was too good to read in the short time allowed by the public library. Now that I have my own copy, I'm taking my time with this one. It turns out that psychology didn't stop learning after my college psych class, and some of the observations have practical applications.

Not a summer book, but worth mentioning is Nassim Taleb's Antifragile (2012), which hides some thought-provoking nuggets in its pounds (kilos) of pages. Antifragile is the follow-up to The Black Swan (2007), and its point is either to illustrate how to deal with the dark birds or to send the reader running to a philosophy refresher course. Taleb never entirely escapes his roots as a trader, but he will make you think about your relationship with uncertainty and how to benefit from outcomes that most would consider negative.

On a lighter note
Ctrl alt delete coverI'm winding down with Mitch Joel's Ctrl Alt Delete (2013), an update on the intersection of business and trends in social media. For someone who reads a lot of blogs and other online discussions, it has a lot of review, but he puts pieces together in ways that should inspire new ideas for your business. Especially for those of us who have been working around social media for a long time, some of the observations are helpful for undoing our comfort level with what we already know. As it turns out, it's not 2007 any more, and how people interact with media and a company's marketing efforts is still changing.

This post has become a bit of a tradition. If you like this, you might enjoy these posts from previous years, too: 2012, 2011, 2010

Writing at Wired UK, Paul Wright has some concerns about the use of social media monitoring in law enforcement: Meet Prism's little brother: Socmint. I'll quote a couple of sections, but you need to read the whole piece; its tone is at least as important as its content.

For the past two years a secretive unit in the Metropolitan Police has been developing the tools for blanket surveillance of the public's social media conversations,

Operating 24 hours a day, seven days a week, a staff of 17 officers in the National Domestic Extremism Unit (NDEU) has been scanning the public's tweets, YouTube videos, Facebook profiles, and anything else UK citizens post in the public online sphere.

The intelligence gathering technique—sometimes known as Social Media Intelligence or Socmint—has been used in conjunction with an alarming array of sophisticated analytical tools. [emphasis added]

Wright has a fairly alarmist—but accurate—take on something that's obvious to anyone who thinks about it: outside of a few protected spaces, what we do in social media is public, and government security and law enforcement agencies are using that data. It's the details of what they do with it that will make some people uncomfortable.

The problem is that public is gaining new depth of meaning as information moves online, and we haven't sorted the implications.

Nothing changes, but everything's changed
The new public information is persistent, searchable, and rich with analytic potential. I wrote about this last year (Why Government Monitoring Is Creepy), and it's still where I think we need to start. People seem to be expecting a sort of semi-privacy online, but the technology doesn't have that distinction. Data is either public or private, and the private space is shrinking.

The "alarming array" of tools refers to all the interesting stuff we've been talking about doing with social media data for years: text analytics, social network analysis, geospatial analysis… For business applications, we've mostly talked about analysis on aggregate data, but if you apply the lens toward profiling individuals and don't care about being intrusive, you can start to justify the concerns.

But several privacy groups and think tanks—including Big Brother Watch, Demos and Privacy International—have voiced concerns that the Met's use of Socmint lacks the proper legislative oversight to prevent abuses occurring.

It's worth noting that Wright's piece is specifically about law enforcement use of social media data, and he points to others who are concerned about overreach by law enforcement agencies. Here are the organizations mentioned, along with links to some of their relevant work:

This is the social data industry's PRISM problem: the risk that the revelations of intelligence agency practices will raise broader privacy concerns that include the business use of public social media data. They're different issues, but the interest sparked by the NSA disclosures has people thinking about privacy.

In this case, Wired makes the connection explicit with their headline, calling social media intelligence "Prism's little brother." As Wright demonstrates in his article, open-source social media monitoring raises issues, too.

Legitimate questions, too
There's more going on here than a question of perception. If invasion of online privacy gains traction as an issue, the important distinction between public and private data is only part of the issue. If we limit the topic to public data, the question becomes, what are the limits to the use of public data?

An important part of answering that question will depend on understanding why there should be limits, which goes to what is being done with the data. It's going to be worth separating the concepts of accessing the data and using it. What you do in your analysis may be even more sensitive than the data you base it on.

People are sharing more than they realize, and analysts can do more with that data than people think. As monitoring becomes pattern detection becomes predictive modeling, it becomes more likely to make people uncomfortable. Last year's pregnant daughter is this year's precrime is next year's thoughtcrime, or so the thinking goes.

Will concerns like this lead to new restrictions by governments or the companies who control the data? Will people cut back on their public sharing? Or will these concerns fade when the next topic takes the stage (squirrel!)?

What are the constraints?
The existing limits on social media monitoring and analysis boil down to this: If it is technically possible, not illegal, and potentially useful, do it (depending on your affiliations, professional ethical standards may also apply). What we're seeing is that the unrestricted use of social data has the potential to make people uncomfortable, which could have consequences for those who would use the data.

It's worth thinking about the constraints on using social data, which involves more than the ethics question. I have some thoughts, which I'll share later.

Secret agentTrust is an issue for an industry based on extracting meaning from what people share in social media. People don't have to use these services, and if they decide that their information might be used against them, they can stop. This week's revelations about the US intelligence agency monitoring social networks (among other sources) creates a massive trust issue for everyone who works with social media data. What now?

(This won't be an analysis of the NSA and Prism. I'm working from the same sources you are, and we'll probably have new information by the time I finish writing, anyway.)

The world reacts to US actions
David Meyer points out a threat to cloud computing vendors as customers and governments react to the news. US-based vendors can expect special challenges selling in Europe, where privacy is more protected and signs of a blowback from Prism are already appearing. In cloud computing, the trust issue relates to custody of the data—do you trust your vendor to keep your data safe and secure?—and the government version translates as a question of US-based vendors' ability to keep commitments to foreign governments.

But cloud computing is essentially just data center outsourcing. What does it mean for an industry that exists because of people's willingness to share publicly?

Access to the data is everything
The challenge to the social data industry is different. It's indirect, but potentially existential. What happens to your business as a result of the reaction to Prism? Will social networks tighten their terms of use to block data mining? Will EU safe harbor agreements create new requirements to protect user data (possibly by keeping it outside the US)? Will new legislation designed to limit government abuses include new limits on private-sector users?

Secret collection of private data by government agencies is fundamentally different from social media monitoring outside government. In business, we're working with publicly available data, which anyone can access without breaking the law or hacking a system. It's not espionage, but the facts aren't the problem.

The problem, as ever, is perception. The NSA is all over the news, and in the heated environment of a breaking story, subtle distinctions can get lost. The risk to the social data industry is that a reaction to government surveillance could become a problem for anyone doing the less intrusive type of monitoring.

How will you respond? What's your plan for minimizing the overreaction if it starts to get out of hand?

Responding as an industry
At its Big Boulder conference this week, Gnip announced the Big Boulder Initiative, which is an effort to start an industrywide discussion of the issues it faces. Trust is one of five issues they highlighted as starting points for discussion. The other news this week highlights the wisdom of the choice.

I'd go farther and ask a question I've asked before: should the companies who work with social data form an association to coordinate these discussions, codify standards, and speak for the group?

The ethics of social data
Let's go back to trust and consider the ethics of working with social data. Bob Gourley at CTOVision recently gave me a copy of Ethics of Big Data, a short e-book that lays out a process for establishing ethical limits to the use of big data. It's a worthy challenge, but I think the first step in the process—exploring an organization's values—will lose everyone. The Friedmanesque view that a business exists only to make a profit is common, which leaves only the law as a restraint on what can be done. "Be profitable" isn't the sort of value that will drive a hearty discussion of ethics.

I do think it's possible to have ethics of listening, but I don't see an existing standard that really applies. I don't see, for example, how ethical standards for social scientists, with their strict limits on personally identifiable information (PII) apply to social media monitoring in customer service. The standard for competitive intelligence boils down to "don't break the law," which appears to be the relevant limit on secret government programs, too.

Here's a starting point for discussion
I suggested a set of ethical standards for listening vendors in 2010 as a starting point, but the discussion went nowhere. Maybe it's time to try again. Comments are closed on the old post, but I'd welcome any discussion of the draft here.

The usual defense of social media monitoring in the private sector is that we're working with publicly available data, but monitoring public data can still be creepy. What's the plan for protecting public-source data mining from an overreaction to something far more invasive?

Photo by Marsmettnn Tallahassee.

About Nathan Gilliatt

  • ng.jpg
  • Voracious learner and explorer. Analyst tracking technologies and markets in intelligence, analytics and social media. Studying complexity and futures.
  • Principal, Social Target

Subscribe

  • Subscribe by email