(Self) Selection Bias and Big Data

(Self) Selection Bias and Big Data

magazine rack

City Java magazine rack by Ken Jenkins on Flickr (Creative Commons)

11.11.11…tomorrow is Veteran’s Day, Corduroy Appreciation Day, and edcampPDX.

I pitched a talk for edcampPDX: Selection Bias v. Self-Selection Bias. I want to explore the similarities (or differences) of association and selection in our online world with those in our physical world. My thoughts around this topic have been accumulating for a few years. They intensified last spring at the ACPE conference. I really enjoyed debating with Mike Cullum there. We were sitting together with a few others one evening, talking about the amount of data being collected on each of us by companies like Google, Twitter, Facebook, and the like. We meandered. We talked about:

  • targeted ads — good? bad? neutral?
  • Dunbar’s Number — what? you have how many friends??
  • filter bubbles — how did they know I like Nutella??
  • broadcasting v. narrowcasting — wide, birdshot audience? narrower, engaged audience?

This narrowcasting idea is what led us to discussion of echo chambers. Mike shared a fear: if we’re able to dial our reading and following preferences too precisely, we’ll miss a broader world view and contrasting opinions on the events of our time. He’s right, of course. The idea of the echo chamber is real. Anyone who grew up watching Sunday morning news shows knows this was true even before CNN, 24-hour news, and certainly, the Internet.

Real? Yes. A danger? I’m not sure. I think we, as human beings, self-select our bias more effectively than any algorithmic selection. Mike worries that if he clicks on too many Fox News articles or befriends too many Tea Party members (sarcasm), his exposure to differing views will lessen. I think he’s right, again. But I’m not worried. Here’s why….

Walmart versus Whole Foods

When I choose to go to Walmart for my groceries, I quickly track myself into certain selections and options. My brand choices and produce selections differ from those at other retailers. I’m presented with products people at Walmart are likely to buy.

Walmart store

Walmart Store Exterior by Walmart Stores on Flickr (Creative Commons)

The same is true when I choose to go to Whole Foods. I’m again presented with certain options. Brands and produce are different, of course, but I notice the biases most when I’m in the checkout line. At Whole Foods, there is not a tabloid or People magazine in sight. There is, however, Mother Jones and Yoga Journal and other delights.

But…wait! I can’t find The Economist at either location. Walmart and Whole Foods have both pegged me into a hole.

They have. They know what I like to buy by the simple fact that I’ve walked through their door.

But…wait! Who made the decision to walk through the door?

Oh. Me. Right….

Who has the free will to go to a different store? Me.

This analogy can be spun out in several varieties — think urban versus rural versus suburban preferences; think Nascar versus IndyCar; think same-city newspaper battles á la Chicago Sun-Times versus Chicago Tribune; think Fox News versus MSNBC. We self-select early and often. By choosing where to live, where to shop, what to read, and what to watch, we select ourselves right into those same lanes that are created virtually by Google and peers. We narrow our exposure to different experiences and ideas.

Collectively, we have been doing this for centuries. I’m not too worried because of just that. I can currently back out to a broader or different set of options both in my physical and online world. If there comes a time online when narrowing our exposure is replaced with eliminating our exposure (to products, thoughts, ideas), I’ll write a follow-up post.

Whole Foods store

Whole Foods by Joe Shlabotnik on Flickr (Creative Commons)

N.B. This post touches the surface of an issue that is miles deep.