Summer Challenge Reading

Have you set a reading challenge for the summer? This is something different: challenge reading, not a reading challenge. Four books I've read recently that challenge our assumptions and normal ways of working in today's data-centric world. Have fun.

Programmed and dangerous
Weapons of math destructionFirst up is Cathy O'Neil's warning of the unintended consequences of giving decision-making authority to algorithms, Weapons of Math Destruction. Enlisting computers to take over the tedium of large-scale decision-making is great for efficiency, but the cost is the increase in systems that (1) harm the subjects of inquiry (2) at scale (3) without accountability. Absorbing her definition of a WMD, alone, is worth the price of admission.

Black-box scores are automating decisions about education, employment, credit, and even prison terms, using criteria that can be arbitrary, unfair, and unaccountable. Even the seemingly harmless work of ad targeting sometimes embodies the dark preferences of predatory businesses. This one's important for anyone working in analytics.

SensemakingChristian Madsbjerg wants us to toss the algos altogether in favor of older methods, arguing for the humanities in Sensemaking. More thought-provoking than how-to, he makes the case that we've inherited insights into human decision-making that have been developed over centuries (millennia, really) of effort in fields such as philosophy, psychology, and anthropology. As it turns out, work on understanding and anticipating human decisions didn't start with customer databases (who knew?).

Sensemaking is a bit heavy on promotion and light on description, but it's a quick and worthwhile read as a reminder that we have ways of knowing things that aren't packaged in software. His five principles are obvious but constructive, serving as a bit of antidote to big data's streetlight effect. His view of thick data would make a great starting point for a deeper dive.

Technology vs humanityMoving beyond the data-to-decision world, the futurist Gerd Leonhard wonders how we preserve our humanity—and what that even means—as the future invents itself from forces already unleashed. Technology vs. Humanity is one of those books that sets out to map the consequences of multiple sources of change, starting with readily observable technological changes that we already live with.

Leonhard makes much of that insight that changes are accelerating exponentially and affect us combinatorially. It's that combination of trends that happen "gradually, then suddenly" that threatens to change the world before we realize what's happening. Does it really add up to a future of us versus the machines? The point is that we should think about the possibilities before emergent characteristics of market-oriented developments make the big decisions for us.

Log out of Facebook
Deep workFinally, here's something completely different and relevant whether or not you work in the data mines. Cal Newport suggests that knowledge workers set aside distractions and learn to focus on Deep Work. The catch is that "distractions" are most of what we do now, from email and meetings to hallway conversations and—yes—social media. Even collaboration tools, which are meant to foster a certain kind of productivity in work environments, create the conditions in which the highest value work can't happen.

Newport starts with a definition and defense of deep work, which includes some of the highest value work people do: inventing, coding, designing, writing, discovering… As we're changing the typical work environment to make deep work more difficult to do, its value is increasing. Computers aren't good at it, and we've distracted most of the people, so the reward for those who can do it may be growing.

The rest of the book is how-to, and the good news is that the method isn't complicated. The bad news is that you'll have to change your habits. Deep work requires that we create the mental space for it, which means cutting out some of the distractions that we like. The reward is in becoming better at the parts of what we do that are most likely to make the highlights reel.

As a visible project, The Analyst's Canvas is new, but it's been cooking for years. Now comes the fun part: working through the ways to use it. Today, let's talk about raw material: the data that go into the analytic process that leads eventually to information, insight and action.

Look at the row of three boxes across the middle of the canvas: data / sources, analysis and presentation / delivery. In 2008, I used that basic outline to describe the building blocks of social media analysis. With the boxes empty, we have a framework to summarize many different tools and approaches. In the next release of the canvas, this row will be labeled Description.

The analysts canvas captioned

The big idea of the canvas is to keep analytical work grounded with two of my favorite questions: what are you trying to accomplish, and why? Collectively, the boxes on the Description row characterize an intelligence or analytics process from source data to delivery, whether the finished product is a report, a software tool or something else. The row is below the Objective box as a reminder that the work has to support meaningful objectives.

I suspect that many discussions will take the components of an analytical play as a unit, especially since so many capabilities come packaged as turnkey tools with data, analytics and presentation built in. But whether building a new capability or evaluating an existing one, the three component boxes must be the right choices to support the Objective. It's not enough that the pieces work together, because we run the risk of developing elegant solutions to the wrong problems.

Using the canvas as a prompt
Every box is the canvas includes a set of basic prompts to initiate the exploration. The Data exploration begins with these:

  • What data/information is required?
  • What sources will provide it?
  • What are the limitations and drawbacks of the chosen sources?
  • Is this the right source, or is it just familiar or available?

Beyond the prompts, we can use the canvas to ask important questions about our preferred data sources:

  • Does this source contain the information needed to support the Objective?
  • Does the information from this source answer the questions we need to address?
  • What other/additional sources might better answer the questions?

Analyzing the canvas
If you look back at the canvas, you'll see that those questions address the relationship between one box (in this case, Data) and its neighbors (Objective, Questions and Alternatives). Its other neighbor, Analysis, is a special case. Depending on which you consider first, you might ask if the source contains the information needed for your analysis or the analysis is appropriate to the properties of the source.

Source to mission

In the first draft of the Explorer Guide, I included some suggested orders for working through the sections of the canvas for a few scenarios. In this exercise, I'm seeing something different: insights we can gain from the relationships across boundaries within the model. More to come.

Testing the Analyst's Canvas

Analyst's Canvas example flowI've just posted an Explorer Guide to the Analyst's Canvas. It's a longer explanation than last week's first look for anyone who's up for taking it out for a test drive. The canvas, you'll remember, is my new framework for thinking and communicating about analytical work in a way that doesn't imply a preferred set of tools and techniques.

It's all version one and subject to learning as we try it out. What I’m looking for first is reactions from people who aren’t me, because of course my own model works for me. The question is, does it make sense to anyone else?

If you're interested in joining the test phase, I suggest starting by completing a canvas for an existing service or use case. If you do a kind of analytical work that's not what you think I'm looking for, it's more likely exactly what I'm after. Establishing a common vocabulary to communicate across specialties is a major goal of the project, and I'm asking people from very different backgrounds to take a look.

The guide has some questions at the end, but any reaction is helpful. The process starts with downloads of the Analyst's Canvas and Explorer Guide. I can't wait to hear what you think.

Next: Mapping Data Sources to Objectives

First Look: The Analyst's Canvas

I was recently asked how I define media intelligence, which is a bit tricky because my definition starts with objectives rather than a list of capabilities. It's another industry term with the flexibility to reflect the priorities of the person saying it, so the answer is something like "what do you need it to be?" I also like to consider the client's job and next steps in the process: "what do you need to do with it?" This is "it depends" in a more actionable form. I don't lack a definition; it just operates at a higher level of abstraction than most.

In more than a decade of exploring methods of extracting meaningful information from diverse data sources to support diverse objectives, I've developed a framework for thinking about specific applications. I'm sharing it for the first time here (updated to latest version), and I'd like you to try it and tell me what you think.

The analysts canvas captioned

The Analyst's Canvas represents an idealized view of an activity, which you might label as research, analytics, or intelligence. My theory is that this abstracted perspective will support any kind of analytical work. The underlying philosopy is to focus on objectives and mission, prompting a consideration of alternatives and reducing the tendency to limit specialist methods to their organizational silos.

The three boxes in a row represent inputs, the analytical process, and outputs. In a software product, they represent data, algorithms and features. In a research project, they are data, analysis and deliverables. In an intelligence environment, they're sources, methods and deliverables. The upper sections ground the activity in the context of the client's or user's work and the mission that work supports. The lower sections clarify what the activity will produce and prompt a consideration of alternative methods.

What's the point?
I think that the Analyst's Canvas can be used to create value in a few ways:

  1. Organize and communicate requirements for new capabilities, using the mission and objectives sections to maintain focus.

  2. Communicate the value of a proposed or existing product or service, creating different canvases for each use case.

  3. Explore potential new markets for existing assets, such as data sources, analytical methods and presentation vehicles (it doesn't take much looking to find examples of each).
There may be other uses; this is what I have so far. For each application, expect a roadmap of how to apply the canvas, but that's not written yet.

My request: Try it.
I've run thought experiments on how the canvas might work in different contexts, and now I'm looking for the outside view. As I'm writing a fuller explanation of how it works and why, would you be willing to plug in your own scenarios as a test? I'd like to know what example you tried, what problems you ran into, and your opinion on its usefulness as a tool for both decision-making and communication.

Start with a blank version of the canvas (you'll find the Explorer Guide and other supporting material there, too). It's licensed under a Creative Commons Attribution / ShareAlike license, so if you like it, you're free to continue using it.

Eventually, I want to document some wildly different applications of the canvas to demonstrate its flexibility. Step one is trying it out in different environments.

Next: Testing the Analyst's Canvas

My Language-Learning Toolbox

Duolingo owlIn the last year or two, I've rekindled an interest in learning languages. I follow a lot of industry blogs from all over the world, and it's nice to give Google Translate a break sometimes. In the process, I've discovered an amazing variety of resources that weren't around when I was taking classes in school. It's worth a try if you're interested, and you should be—the science says that learning another language is good for your health.

Discovering the Language-Learning Community
One of my early discoveries was Kris Broholm's Actual Fluency podcast, where he interviews people in the language-learning community. Lesson one: there is such a thing. Kris's early interviews were a big help in getting up to speed on the tools and techniques people use to learn a language (unrelated to taking a class in school in any country). His resources page is the first of several worth exploring.

Books for the Shelf Shelves
Learning something new is always a good excuse to add to the library, and as I get farther into this project, the language section is growing rapidly. The used- and donated-book sale at the public library was a huge opportunity to find imports that would have been much more expensive otherwise. Although Amazon's good for finding good foreign language books in used book stores abroad (salut, Asterix!).

I have dictionaries, grammar books, readers, and random books in the languages we're learning (it's become a family thing), but I've also found some good how-to-learn titles, such as Benny Lewis's Fluent in 3 Months and Gabriel Wyner's Fluent Forever. Fi3M is light, but quick and encouraging. FF is more practical; the meat of it is a spaced-repetition based strategy with detailed instructions and available templates for using Anki. Be sure to check out the web sites (FF, Fi3M) for more resources, as everyone playing in this space tends to share their own lists and have a blog. Find someone with a language book or course, and you'll probably find a helpful newsletter, too.

Helpful Software Tools
The usual suspects for software are Anki, MemRise and Duolingo. Try ’em all. Duolingo is fantastic for getting a quick (and free) taste of a new language. Anki is the least pretty, but it’s the one that lets you make your own flashcards for spaced repetition, which is what Gabriel recommends in Fluent Forever (other flashcard apps don’t do spaced repetition). The thing that convinced me is that what Gabriel says is in complete agreement with what I’ve read about learning in Make in Stick, a great book on the neuroscience of learning.

Media, Educational and Native
I like the computer-based tools, but I've found value in the old-school listen-and-repeat audio courses, probably best exemplified by Pimsleur. Our public library has many of their courses, so I've been able to go through them as I exercise. I like their conversational approach and the help with pronunciation, but it should be just one component of the plan.

The big step is finding native media in your target language. I’m using podcasts from foreign news media, live streams of TV networks, and (of course) foreign movies. Find something you like; I once had a Thai roommate who was addicted to an American sitcom that his English class had used. Programming designed for non-native speakers is a nice intermediate step. iTunes, YouTube, Netflix and Roku really change the world when it comes to international and foreign-language programming. All it takes is a little searching.

Opportunities with Real People
Everyone recommends talking with real people as soon as possible. In the physical world, you can find language and cultural exchange programs in major cities, including some non-profits that exist specifically to share a language and culture. Alliance Française and Goethe Institut are two that I've found. In national capitals you might find programs sponsored by foreign embassies. Elsewhere, look for meetups. In our area, the most popular languages have multiple groups with events at every level.

You can interact with real people online, too. italki seems to be the marketplace of choice for finding tutors, and it also lets you set up a language exchange (partners coach each other on their native languages). The writers I mentioned at the top have more recommendations on language exchanges and everything else, so if that sounds appealing, see what they have to say first.

[Expanding on something I wrote to a friend who was planning a trip to Italy.]

Je parle français, et je travaille à améliorer ma facilité. Ich spreche ein bisschen Deutsch, och jag lär mig svenska. Hey, this is fun!

I'm working on a new research project, and I need your help. If monitoring or analyzing social media is part of your job—and you don't work for one of the software providers—please take part in a survey on the state of social media analysis at

I'm looking into current trends and issues on the practice of social media analysis, as well as the technologies. Topics including adoption by industry and business function, successes and failures, and opinions about the state of available software tools. Results of the survey will be available in a free report, which will also include observations from a series of in-depth interviews and trends from Social Media Analysis, later this year.

The survey has 21 questions and should take around 5 minutes to complete. Your response is anonymous. I'm looking for a cross-section of brands, corporate social media listeners, agencies and consultants, in all regions of the world. If you have practical experience with listening to social media, please take a few minutes to share your observations.

You can find more on the project here and here.

Thanks for your help.

Shortly after last month's announcement of the new Facebook topic data service from DataSift, another kind of change showed up in my inbox: the impending disappearance of Facebook post data from social media monitoring tools. The search functions of the API that developers use to monitor public posts in Facebook are going away at the end of April, and the notices and workarounds are going out to customers now. I'm also hearing from software companies looking for alternative sources, though I have not heard of any such alternatives.

The vendor announcements say to expect fewer results with the switch to version 2.x of Facebook's Graph API (optional until April 30). The new version restricts access to information about users' friends, and it eliminates the Public Post search and News Feed search options from the Graph API. Monitoring of posts and comments on specified Facebook pages (including competitors' pages) is still supported, which creates a partial workaround.

As for the broader set of Facebook post data, DataSift's new PYLON API is the so-far exclusive source for most developers in the social media analysis business. The data includes private posts, but everything is anonymized and aggregated, and it doesn't include verbatim text. It's meant for broad analysis, not monitoring or engagement. Access is limited to the US and UK for now, so the answer for the European software developer who emailed me appears to be “no.”

Finding information about what a company doesn't do is tricky, especially with a segment of the industry that likes its trade secrets, but Facebook's announcement makes it pretty clear how they're thinking about user privacy in the data market:

We are not disclosing personally identifying information to anyone, including our partners and marketers. And, the results delivered to marketers are analyses and interpretations of the information, not actual topic data.
Facebook does offer more data through the Public Feed API and Keyword Insights API, but access is limited to high-profile mass media and a short list of developers who support them. For everyone else, it looks like Facebook doesn't want anyone monitoring their users' public posts but Facebook.

One of the nicest compliments I've received over the years came from a company founder who read one of my reports and said I'd summarized his company's work better than they did. It's just one of the things I do—take a pile of information and figure out what it's about. I summarize. So if you need to tease out the short version of something complicated, call me. But I've also been accumulating data on an industry for years, which gives me the material for a different view—the annual recap. Roll tape…

The Year in M&A, Social Media Analysis 2014
I've been tracking companies that extract meaning from social media data since 2006 (it stays interesting if you let the definitions evolve with the market). One way to tell how things are changing is to watch where the money goes, and in 2014, more money flowed to consolidation. VC and PE money funded multiple acquisitons by companies staking out hoped-for prominent positions. Big companies tucked SMA into their products and portfolios, and smaller companies chose "buy" over "build" for key capabilities.

Add some actual mergers and a few acquihires, and we get more transactions than in 2013. In other news, it takes longer to write a recap of 38 deals than one with 18 deals, which is how a year-end post shows up in early January. :-)

More Than $420 Million Invested in Social Media Analysis Companies in 2014
New investments in SMA companies were slightly below 2013 levels in dollar terms, although when you consider deals of unannounced size, we're probably close to the window of uncertainty on that. Some of that money has gone to fund acquisitions, and anybody who took a round of more than $20 million bears watching, but we're also still seeing funding for interesting and innovative companies in the space where social media and data analysis intersect.

Based on the last year's investment activity, look for continued product innovation and market evolution, in addition to ongoing consolidation.

So here's a summary: The opportunities in social media analysis are evolving, and heavy bags of money are being directed toward exploiting them. For the long version and its application to your situation, contact me about becoming a client.

Where do you find books to read? Do you ask your friends, follow reviews or seller recommendations, or just go for the bestsellers? Whether you like your books on paper or downloaded, you have to know it exists to read it, and because we're in the twenty teens, there's a social way to do it online.

Start where you are?
An obvious way to learn about books online is to ask your social networks—wherever you're connected to people online, just ask 'em. If you use different networks for different purposes, that should inform where you ask, but you have the connections. Sometimes it's just as easy as asking.

But asking doesn't always work. A discussion on Facebook about paper and ebooks this week included just such a request, but no responses. So what else can we do?

Networking for readers
How about a social network specifically for readers of books? Goodreads is exactly that, a social network built entirely upon books and the people who read them. You can look through reviews and recommendations organized by books and authors, or approach it socially, with its friends, followers and groups.

I'm getting great ideas from some very smart people I follow on Goodreads. Because of its tight focus on books, I find it easier to maintain a careful approach to connecting in Goodreads than in other networks. In addition, Goodread's updates are tied to specific books, so it doesn't have the noise problem of other networks.

On another level, Goodreads creates yet another opportunity for public image tailoring, because its entries aren't automatic. Some of us might be a bit selective in what we choose to share—more professionally relevant titles than pop fiction, for example—but that actually improves Goodreads as a socially powered recommendation engine. If people I follow choose to share only the good stuff, they're effectively curating the recommendation lists.

Gems from Twitter
Goodreads runs on effort from people in its network; what about suggestions from people who haven't joined? BookVibe takes a different approach, pulling book mentions from a user's Twitter stream to generate its lists. It's not as far along as Goodreads, and there's some overlap, but it does have the advantages of pulling its recommendations from a network you've already assembled and using existing behavior as its raw material.

BookVibe strikes me as a worthy experiment, another startup finding useful information by applying a novel analytical lens to the flood of Twitter data. In this case, the startup is Parakweet , a natural-language processing specialist that set up BookVibe as a technology demonstration.

Remember blogs?
I've seen a few blog posts with suggested reading lists, such as these from the Oxford Martin School and Mention. If you don't have a source on a topic, try searching for "reading list" and a relevant keyword or two. It's not an unusual topic for a blog post or web page.

What about the big dog?
You can't talk about books without mentioning Amazon (I checked—it's a law). I remember an analysis years ago about the many social components of an Amazon product page, although I can't find it now. Product reviews, lists and wish lists are fairly obvious features, and it's possible to find more suggestions by following the creators of reviews and lists. Just find someone you'd like to hear more from and click through to their profile for more of their reviews, lists and tags. It's sort of social, if a bit too much effort.

Amazon has the makings of a really good social network for readers, except that it's missing the social network to run it. That may change, since it bought Goodreads last year. Until then, you can do a bit of social exploration with Amazon's existing features and some manual effort.

Old skool
If all those networks can't suggest good books faster than you read them, then you read too fast. :-) Oh, and the book I'm reading now? I found it on the New Nonfiction shelf at my local library. Curator was a word long before online sharing tools borrowed it.

It's not never too late to add something to the summer reading pile. What are you reading that people should know about?

Surveillance whiteboardAs ubiquitous surveillance is increasingly the norm in our society, what are the options for limiting its scope? What are the levers that we might pull? We have more choices that you might think, but their effectiveness depends on which surveillance we might hope to limit.

One night last summer, I woke up with an idea that wouldn't leave me alone. I tried the old trick of writing it down so I could forget it, but more details kept coming, and after a couple of hours I had a whiteboard covered in notes for a book on surveillance in the private sector (this was pre-Snowden, and I wasn't interested in trying to research government intelligence activities). Maybe I'll even write it eventually.

The release of No Place to Hide, Glenn Greenwald's book on the Snowden story, provides the latest occasion to think about the challenges and complexity of privacy and freedom in a data-saturated world. I think the ongoing revelations have made clear that surveillance is about much more than closed-circuit cameras, stakeouts and hidden bugs. Data mining is a form of passive surveillance, working with data that has been created for other purposes.

Going wide to frame the question
As I was thinking about the many ways that we are watched, I wondered what mechanisms might be available to limit them. I wanted to be thorough, so I started with a framework to capture all of the possibilities. Here's what I came up with:

Constraints on personal data

The framework is meant to mimic a protocol stack, although the metaphor breaks down a bit in the higher layers. The lowest layers provide more robust protection, while the upper layers add nuance and acknowledge subtleties of different situations. Let's take a quick tour of the layers, starting at the bottom.

Hard constraints
The lowest layers represent hard constraints, which operate independently of judgment and decisions by surveillance operators:

  • Data existence
    If the data don't exist, they can't be used or abused. Cameras that are not installed, microphones that are not activated do not collect data. Unposted travel plans do not advertise absence; non-geotagged photos and posts are not used to track individual movements. At the individual level, countermeasures that prevent the generation of data exhaust will tend to defeat surveillance, as will the avoidance of known cameras and other active surveillance mechanisms.

  • Technical
    Data, once generated, can be protected, which is where much of the current discussion focuses. Operational security measures—strong passwords, access controls, malware prevention, and the like—provide the basics of protection. Encryption of stored data and communication links increase the difficulty—and cost—of surveillance, but this is an arms race. The effectiveness of technical barriers to surveillance depends substantially on who you're trying to keep out and the resources available to them.
Soft constraints
The upper layers represent soft constraints—those which depend on human judgment, decisionmaking and enforcement for their power. Each of these will tend to vary in its effectiveness by the people and organizations conducting surveillance activities.

  • Legal
    This is the second of two layers that contain most of the ongoing discussion and debate, and the default layer for those who can't join the technical discussion. The threat of enforcement may be a deterrent to some abuse. Different laws cover different actors and uses, as illustrated in the current indictment of Chinese agents for economic espionage.

  • Market
    In the private sector, there's no enforcement mechanism like market pressure—in this case, a negative reaction from disapproving customers. Companies have a strong motive to avoid activities that hurt sales and profits, and so they may be deterred from risking a perception of surveillance and data abuse. This is the layer least likely to be codified, but it has the most robust enforcement mechanism for business. In government, the equivalent constraint is political, as citizens/voters/donors/pressure groups respond to laws, policies and programs.

  • Policy
    At the organization level, policy can add limits beyond what is required by law and other obligations. Organization policy may in many cases be created in reaction to market pressure and prior hard lessons, extending the effectivenes of market pressure to limit abusive practices. In the public sector, the policy layer tends to handle the specifics of legal requirements and political pressures.

  • Ethical
    Professional and institutional ethics promise to constrain bad behavior, but the specific rules vary by industry and role, and enforcement is frequently uncertain. Still, efforts such as the Council for Big Data, Ethics, and Society are productive.

  • Personal
    Probably the weakest and certainly the least enforceable layer of all, personal values may prevent some abuse of surveillance techniques. Education and communication programs could reinforce people's sensitivity to personal privacy, but I include this layer primarily for completeness. Where surveillance operators are sensitive to personal privacy, abuses will tend not to be an issue.
Clearly, the upper layers of this framework lack some of the definitive protections of the lower layers, and they're unlikely to provide any protection from well-resourced government intelligence agencies (from multiple countries) and criminal enterprises. But surveillance (broadly construed) is also common in the private sector, where soft constraints are better than no constraints. As we consider the usefulness and desirability of the growing role of surveillance in society, we should consider all of the levers available.

One step at a time
This framework isn't meant to answer the big questions; it's about structuring an exploration of the tradeoffs we make between the utility and the costs of surveillance. Even there, this is only one of several dimensions worth considering. Surveillance happens in the private sector and government, both domestically and internationally. There's a meaningful distinction between data access and usage, and different value in different objectives. Take these dimensions and project them across the whole spectrum of active and passive techniques that we might call surveillance, and you see the scope of the topic.

Easy answers don't exist, or they're wrong. It's a complex and important topic. Maybe I should write that book.

If I write both the surveillance book and the Omniscience book (on the value that can be developed from available data), should I call them yin and yang?

About Nathan Gilliatt

  • ng.jpg
  • Voracious learner and explorer. Analyst tracking technologies and markets in intelligence, analytics and social media. Advisor to buyers, sellers and investors. Writing my next book.
  • Principal, Social Target
  • Profile
  • Highlights from the archive


Monthly Archives