For Better Big Data, Bring in the Experts

That's not a valid work email account. Please enter your work email (e.g. you@yourcompany.com)
Please enter your work email
(e.g. you@yourcompany.com)

I’m willing to bet that most of Google’s employees are smarter than I am, and the vast majority of them will probably accomplish more in their lives than I ever will in mine. I’d put money on it.

Which is why I pay extra close attention when Google makes mistakes: if my own average-dude goofs are good learning opportunities, imagine how fertile Google’s must be.

Enter Google Flu Trends (GFT), the tech giant’s attempt to harness big data to identify flu outbreaks. It’s a noble project, but it has also been a bit of a flop. As David Lazer et al. point out for Science, GFT’s flu predictions haven’t been terribly accurate: in Feb. 2013, for instance, GFT’s estimates for flu-related doctor visits were more than double the Center for Disease Control and Prevention’s predictions (CDC). GFT was aiming to match the CDC, but it vastly overshot its target.

Lazer et al. think GFT’s failures “offer lessons for the use of big data,” and we in HR and recruiting should take note of these lessons. Our industry has taken to big data with great enthusiasm, but if even Google is running into problems with effectively utilizing big data as a tool, then it stands to reason that we, too, need to be vigilant in our usage.

Size Isn’t the Only Thing That Matters

Lazer et al. do a great job of expounding upon the lessons we can learn about big data from GFT, so I won’t waste your time by rehashing the article. I’ve linked it above, and you should read it yourself to get a nuanced and comprehensive view of the situation. But I do want to draw attention to one lesson in particular.

As more and more of us turn to big data to help with sourcing, hiring, and company culture, the HR and recruiting industries have developed a hunger for sheer volume of jobs  data. The thinking seems to be, “the bigger the data pool, the better.” But the reality of effectively harnessing big data is far more complex than that. As Lazer et al. put it, “it’s not just about the size of the data.”

The “big data revolution” has eclipsed small data – those slighter, more traditional data pools and their attendant analytical practices. But big data can’t replace small data, because small data offers information that big data cannot. Writing for Simply Statistics, Jeff Leek points out that traditional applied statistics lead to smaller claims precisely because it takes into account a variety of factors that big data often overlooks: “sampling populations, confounders, multiple testing, bias, and overfitting.”

We can’t overlook these things and expect to reach sound conclusions. GFT used big data to make big claims but ignored these important issues, and that lead to its failure.

If we’re going to get any useful information out of big data, we need to use it as a complement to small data and applied statistics – not as a replacement. “Instead of focusing on a ‘big data revolution,’” Lazer et al. suggest, “perhaps it is time we focus on an ‘all data revolution.’”

Remember: Most of Us Aren’t Data Experts

HR and recruiting professionals need to be especially careful when using big data because, to put it quite simply, most of us have no idea what we’re really doing with the numbers.

Sure, we’ve read about the subject. We’ve learned about and used some of the big data tools available to us. But that doesn’t mean we’re necessarily qualified.

I’ve read a lot about medicine. I find it quite fascinating, and I like to take medical science into account in the way I live my life. I realize it’s important to being healthy and happy. Still, I wouldn’t dare to practice medical science or give you medical advice. For that, I turn to the real experts: doctors.

The same goes for big data – and, really, data usage of all kinds. We can read about it. We can try to use it to guide our practices. But we shouldn’t try to use big data without the help of experts.

In the same Simply Statistics article I linked above, Leek notes that the big data revolution seems to be not-so-conveniently ignoring statisticians, the real experts in the field. This isn’t unique to HR and recruiting. Leek writes that “statistical thinking has also been conspicuously absent from major public big data efforts so far,” listing a number of big-data centric events that failed to seek input from statisticians. The 2013 White House Big Data Partners Workshop, for example, included absolutely no statisticians.

Without guidance from experts, big data becomes nothing more than a hot trend. The conclusions we reach will be inaccurate at best and actively harmful at worst.

What we need to do, then, is to solicit the help of statisticians in our big data efforts. We need people who know how to crunch the numbers properly. Otherwise, we have a great tool on our hands and no way to use it.

You can have a beautiful car sitting in your driveway, but if no one ever teaches you how to drive, it’s not going to take you anywhere.

By Matthew Kosinski