Face-to-Face vs. Facebook Personality Ratings
A Northern Illinois University’s College of Business study has discovered that independent ratings of prospective-employee Facebook profiles more accurately predict subsequent job performance than HR-administered employee self-ratings do. The researchers also discovered, in a second study, that student Facebook scores are a better predictor of future academic success than personality and IQ scores combined.
Very interesting—but this invites another question and a third study.
What remains unclear and tempting to ask and answer is whether such Facebook-based personality assessments are also more accurate than ratings based on face-to-face interviews. Too curious to resist, I contacted the lead researcher of the study, Don Kluemper, assistant professor of management at the N.I.U. College of Business, and asked him precisely that question (which, it turns out, can be asked even more precisely), to which he obligingly replied in great detail presented and interpreted below.
Independent Facebook Profile Ratings vs. HR Self-Evaluation Tests
According to an N.I.U. report about the research, “impressions gleaned from a five- to 10-minute perusal of Facebook pages were actually a stronger predictor of a candidate’s likelihood to excel in a job than the personality surveys that many companies require job candidates to complete.” The researchers of the study—co-authored by Peter A. Rosen and Kevin W. Mossholder, of the University of Evansville and Auburn University, respectively—had Facebook subscriber volunteers complete a personality questionnaire widely used by companies to gauge five key traits: conscientiousness, agreeableness, extraversion, emotional stability and openness. (These are the so-called “Big Five” factors, except for the insertion of “emotional stability” in place of the standard Big Five category of “neuroticism”—a switch that Dr. Kluemper told me (in an email) was made for the purpose of making all five of the traits connotatively positive for ease and consistency of scaling, interpretation and response.)
With the permission of the volunteers, the researchers gave a squad of three raters access to the subjects’ Facebook profiles. The N.I.U. report described the raters’ protocol: “Each rater perused the profiles and answered questions about the subject that were similar to those on the self-report personality questionnaire. For instance, the students were asked to rate their agreement with the statement, ‘I am the life of the party.’ For raters, the question was phrased as, ‘Is this person the life of the party?’”
Summarizing the upshot of the study, Professor Kluemper said, “Based upon other studies, we were able to conclude that, after a five-minute perusal of a Facebook page, raters were able to answer questions regarding the subject about as reliably as would be expected of a significant other or close friend.” Exactly how well did the study’s raters do as compared with personality ratings of friends and significant others? In his comments, Dr. Kluemper said, “Although not assessed directly, Facebook ratings demonstrated comparable reliability (interrater and internal consistency reliability) and convergent validity (correlations) between Facebook ratings and self-ratings that are typically found for personality ratings conducted by friends and significant others.”
Not only were the ratings reliable in terms of strong correlations with more intimate self-evaluations, they were reliable as a basis for extrapolation to the quality of future job performance. The N.I.U. report noted, “Researchers followed a subset of students who were employed six months later, asking their supervisors to complete a performance evaluation. Comparing those scores to the personality scores they found that the Facebook-derived scores provided a more accurate predictor of future job performance than the score derived from the self-evaluation.”
(Technical note: For those who can’t recall their statistics courses or, comparably, never had one, “reliability” and “validity” are two key measures of the statistical merits of a concept.)
Witches, Water and Conceptual Wackiness
“Reliability”, in testing concepts, means consistency of measurement: Does the concept, e.g., Facebook-based Big Five scores or I.Q. score, yield statistical outcomes that are consistent on retest of the same person, consistent within test items, and consistent across raters? “Validity” (in this case, “construct validity”), means that the concept as defined actually measures, operationalizes or otherwise encapsulates what it is claimed to represent.
Dr. Kluemper was careful to point out that, although the N.I.U. study did not research “test/re-test validity”, in the team’s assessment of the Facebook rating method and of the questions, the N.I.U. researchers and their colleagues did test consistency of ratings from rater to rater, suggesting that the ratings are both reliable and valid. In particular, the results demonstrated what is called “convergent validity”, since the “scores” given by self-ratings and those given by Facebook raters converge or correlate with one another in their descriptions and predictions regarding those being rated.
To clearly illustrate the difference between reliability and validity, one example will suffice: The Inquisition witch-hunter’s concept of “ordeal by water”, i.e., throwing alleged witches into moats or ponds to test for collusion with Satan was statistically very reliable in the uniformity of the test results—frequently (and presumably during the occasional “re-test”), the luckless women either drowned, if tethering was absent or failed (which, under the formulation of the concept of a trial by water, belatedly proved their innocence) or were dragged from the water and burned at the stake, if they didn’t sink. Damned if they did, doubly damned if they didn’t (drown).
Unfortunately for the women, the concept of ordeal by water did not validly test for or “measure” Satanic collusion, since there really was none, apart from whatever demonic pacts that were detected only in the deluded imaginations of the accusers or the accused.
In this Facebook study, “reliably answer” the questionnaire’s questions seems, as I’ve suggested, to be tantamount to demonstrating the reliability and the validity of both the rating method and question constructs—and, by implication, predictions made on the basis of these.
Face-to-Face and Facebook Ratings—a Precise Comparison
Dr. Kluemper also demonstrated the rigor of his researcher’s mind in addressing the question of whether the Facebook rating method is as good as face-to-face interview-based ratings. After playfully engaging the question by remarking, “Interesting question”, he got down to the serious and complex specifics of framing as well as answering it.
Before attempting a direct comparison of the two approaches—Facebook rating vs. face-to-face, Professor Kluemper advises greater precision in defining what will be compared: “Will it be one psychologist (assessing personality in the interview) or, like the Facebook method, three evaluators? Multiple raters should reduce random error and improve validity. Assuming both approaches are using the same number of raters, it would depend on the interview itself. Are we talking about a ten-minute interview (the time it takes to evaluate social networking profiles)? Are the interview questions designed to assess personality? What is the level of structure in the interview? The validity of interviews varies widely based on level of structure, from near zero to about .40 (correlation coefficient, which is larger than what we found).” (This latter point suggesting that, at their best, the interviews would be better than the Facebook profile ratings.)
Weighing the usefulness and insight of a face-to-face job or psychological interview against that of a Facebook profile rating, Dr. Kluemper said, “The primary difference would be the strengths of the job-related context for the interview versus the weaknesses of the use of impression management by interviewee (they will work very hard to “look” conscientious, agreeable, extraverted, emotionally stable, and open).” This suggests that, despite crafted and crafty micromanagement of one’s Facebook persona, it probably is still more “spontaneous” and “natural” than an interview show-and-tell, dog-and-pony puppet show, in which applicants have a high-stakes agenda with specifically tailored strategies and tactics that reflect the career focus of their self-presentation.
It may be fair to speculate that to whatever extent it is easier to make Big Five inferences from real personality to real job performance (including initial interviews as a job performance) than the reverse, and given that an interview is more of a staged performance than a display of unguarded or less-guarded personality, one’s online Big Five-defined persona is more likely to be usefully revealing than one’s interview performance regarding future job performance (even when interpreted along the same Big Five lines).
Noting the likely impact of time limitations on interviewing, Dr. Kluemper cautioned, “With the Big Five framework, though, all of the traits may be difficult to assess (particularly in 10 minutes). If the interview were longer, well designed, with well trained evaluators, I would argue that the interview ratings could be as strong as Facebook personality ratings, but also must stress that replication is needed before drawing firm conclusions.”
The professor’s bottom line?—Summing up the merits of the face-to-face and Facebook profile ratings, he said, “I suspect that the two approaches would both predict some aspect of personality, but the benefits of each approach could tap somewhat unique aspects of personality, yielding a stronger combined effect of both approaches.”
More generally, regarding supplanting rather than merely supplementing conventional candidate assessment methods, Dr. Kluemper suggests that, despite the strength of the Facebook profile rating method and allowing that it can be a useful adjunct , he does not recommend that HR professionals replace their current methodology (presumably including face-to-face interviews) with it. At this early stage, he cautions, it might be premature to do so: In a press release, he said, “Before it can be used as a legally defensible screening tool, it has to be proven valid (Editor: and presumably also more broadly reliable),” adding, “This research is just a first step in that direction.”
Strongly accenting the crucial issue of legal implications, Dr. Kluemper offered this caveat: “Given the protected class information available on social networking profiles (race, gender, age, disability, sexual orientation, etc.), there is an unexplored risk of adverse impact. Further validation evidence and an investigation of the potential for adverse impact are sorely needed before such a technique should be employed by practitioners, or at least by those who want to avoid litigation.
Convergence of Methods and Minds
Whatever the results of a statistical and methodological comparison of the two approaches, one germane conjecture has already been confirmed: Given that Dr. Kluemper’s final comment was “I’d love to see this study done”, I would like to imagine it corroborates my initial hunch that, indeed, it would be a very interesting study…
…and that it warrants testing the additional hypothesis that great minds really do think alike—even if modesty (or reality) forces me to cite a sample of only one to start with.
Post your resume to the largest network of recruiters on the planet. START