The Dangers of ‘From 1-to-10’-Scale Rankings

Let’s say you love your spouse (partner), your job and your home, not necessarily in that order. Asked, “In what order?”, you may instinctively rank them on a 1-to-10 scale to determine a 1-to-3 order.

But their ranking says nothing about how much more you love one than another or how much greater your preference for the first over the second is than your preference for the second over the third. [That’s something to be calculated using what is called an “indifference lottery”.]

The same and other limitations are inherent in ranking candidates or jobs on a 1-to-10 or any similar scale.

The usual, simple, simplistic way to rank these is, indeed, to use the popular “on a scale from 1 to 10” approach. That helps make things more precise, but not necessarily correspondingly clearer, useful, accurate, valid for comparisons or more “objective”, despite the comforting appearance of precision that “8, 6, 2” and the like suggest.

So, can we do better than “from 1 to 10”? Or, more to the immediate point, how badly are we doing when we use a 1-to-10 scale?

1 to 10 Problems

In undertaking to answer that, we’d better figure out just how reliable and useful 1-to-10 rankings of applicants, jobs, homes and partners (or as I call them, “hearteners”) are.

One reason that such a 1-to-10 scale is not perfectly illuminating is that the endpoints and intervals are not clearly defined, other than numerically and inconsistently from one scorer or ranking session to another—much as your high school English essay’s “94” was less than scientific and objective, given differences between one school or one teacher and another, or between one [time of] day and another for the same teacher.

For example, one man’s [or teacher’s] 10 may be another’s 6, as can easily be the case when comparing a score of 10 offered by someone whose worst pain has been that of a tennis blister and the 10 of another gruesomely tortured by some tyrant’s secret police. Ditto for ranking holiday destinations and prospects in a nightclub.

To make matters worse, the 1-to-10 ranking never ensures that the underlying score interpretations or criteria are the same in any respect for any two evaluations—even by the same person, despite the deceptive impression that they are, which is subliminally suggested by their quantitative format.

Not only can the magnitude of some parameter required for a given score vary from judge to judge, but also the parameters deemed relevant can differ from one evaluator or occasion to another.

How High Hawaii?

For example, for Hawaii to get a 10 from Joe, the beaches have to have five bikini babes per square blanket; for Eddie, one per will suffice. But for their grandfather, two paramedics within his earshot is an absolute prerequisite for a 10.

Moreover, perceptual scales are often nonlinear—at least for some people, with the “difference” between, say, 1 and 2 being smaller than the difference between 4 and 5, itself smaller than the difference between 9 and 10 (despite being numerically 1 in each instance).

Logarithmic scales, such as the 1-5 earthquake scale, the decibel sound intensity scale and the “Fechner Law ” (of “just noticeable differences ”) for stimulus sensitivity are examples of scales that may wrongly be taken to be linear.

In fact, a jump from a 1 to a 5 earthquake is not an increase to a quake that is 5 times as intense. Because each unit increase, e.g., from 1 to a 2, represents a 10-fold increase in earthquake amplitude, a “7”, comparable to the 2011 Fukushima quake is an earthquake that is, in these terms, 1 million times more powerful than a 1, not 7 times. This is what “an order of magnitude larger” means—“10 times greater”.

So, for me, the jump from 6 to 7 in ranking someone or something may require a change no bigger than the jump from 5 to 6, whereas for you, it may, whether you realize it or not, represent a twofold or tenfold increase of some external variable, e.g., loudness of a sound, tedium of a job.

To grasp this, compare the sensitivity of hearing of the very young with grandpa on a Waikiki beach [who perhaps experiences hearing impairment and diminished sensitivity to sound increases as naturally evolved protection from raucous young’ns.]

Precision: No Guarantee of Interpretation, Accuracy, Relevance or Consistency

The key implication of these shortcomings of a 1-to-10 scale is that their precision assures very little about their interpretation, accuracy, relevance or consistency from one person or occasion to another.

So, if candidate A gets a 4, B gets a 5, C gets a 10 and D gets a 9, the conclusions that somehow C is “twice as good” as B, or that the difference between B and A (and therefore the “marginal effort”, e.g., training, required to raise B to A’s level) is the same as the difference between C and D, namely, 1, are highly suspect and unreliable.

Likewise, in the absence of proven linearity of the ranking scale, the conclusion that C is to be preferred to D only 1/5^th as much as C is to be preferred to B is equally unwarranted, as is the conclusion that it would be smart to hire B (only) at no more than one-half of what C would be paid.

Moreover, the 1-to-10 rankings are unreliable, because unless the criteria for the rankings are rigorously and scientifically, or at least objectively, specified, the yardsticks employed by whoever is doing the rating, e.g., an interviewer, may bear limited resemblance (in terms of linearity/non-linearity or constituent factors and criteria) to the criteria in the mind of whoever has to make the final call about how to apply those scores, e.g., the hiring manager.

A Cardinal Error

The message, “Candidate A is a 7”, generated using a non-linear scale or a purely “ordinal” scale, viz., a scale that merely ranks things in numerical order as 1^st, 2^nd, etc., may erroneously be interpreted by the hiring manager as being a “cardinal” scale—a scale of numbers that, unlike ordinal “1^st” and “3^rd” can be subject to mathematical operations, such as multiplication. Three times 1st does not equal 3rd.

That can result in disastrous misunderstanding, e.g., if the scores of candidates A, B and C are interpreted as not merely the order in which they would rank among 10 candidates [allowing for ties], but also as quantitatively and linearly reflective of the differences among them.

Imagining that preferences for a holiday in Bali, a can of beer and needles in our eyes that rank in the order 1, 2, 3 are taken to suggest that needles in the eyes are not so bad; that illustrates that kind of mistake and its magnitude.

Any confidence that “10, 7, 3” on the 1-to-10 scale is more quantitatively precise, objective or useful than this superficial 1,2, 3 ordering is only as warranted as the scale is consistently and well defined for and between individuals and occasions. In other words, it works only when the scale has what is called “construct validity” and “construct reliability ”.

There’s more: A 1-to-10 scale carries an additional risk by subliminally suggesting that everything and everyone rated does better than zero. Is it not more sensible to make the scale run from -10 to +10? This, despite shared flaws with the 1-to-10 scale, at least allows for much clearer expression of emphatic dissatisfaction.

So, with these limitations and warnings in mind, if asked to rate the 1-to-10 rating system on a scale from 1 to 10, I think that I’d have to give it…

…a pass.

By Michael Moffa