April 7, 2012

G is for Gaussian Curves

Suppose a researcher selects a random sample of 100 men, measures their height, and constructs a histogram for the data.

Now if the sample size increases to a 1000 men, the histogram changes slightly—it becomes more balanced.

If the same size increases to 100,000 it levels even more. 

If it were possible to measure the heights of all adult males in the world, the histogram would create something called the bell curve, or the Gaussian distribution.

This distribution is helpful for figuring out average weight and height distributions, intelligence scores, SAT scores, prices paid for new cars, the life span of light bulbs, the probability of flipping a fair coin—yet somehow it does not apply to Book Reviews. I think it should.

For a Gaussian curve, there is a rule for Standard Deviation. 

  1. Approximately 68% of all the data items fall within one standard deviation within the mean (average) in both directions.
  2. Approximately 95% of the data items fall within 2 standard deviations of the mean.
  3. Approximately 99.7% of the data items fall within 3 standard deviations of the mean.

Applying the GoodReads rating system, this is fair:

1-     didn’t like it
2-     it was okay
3-     liked it
4-     really liked it
5-     it was amazing

If book reviews followed the Gaussian curve, 68% of the books people read should be “liked it” they should be the average, the norm. Between the 95 and 68 marks, there’s 27%.  One out of four books should either be really liked it, or it was okay.  What’s left after 95%?  5 percent split between didn’t like it and it was amazing, so 2.5% of books for 1 stars, and 2.5 for 5 stars.

These are the books you would loan a friend your copy just to make sure they read it. These books are so amazing you’d give them away as gifts. A book this amazing you went out and bought at B&N full price after checking it out at the library.

One star books-These are books that you when you get to the ending you expect to see a printed page with a website leading you to an apology from the author. These are the books you pick up when you want to die a long slow horrible death.

“But all the books I read are amazing,” you say. And for someone reason, everyone seems to either really love a book, or they hate it. We end up with a skewed curve that looks something like this:

Sorry, but all books can't be 5 or 4 stars. This is the problem Harvard has with it’s applicants. They’re all really smart and talented, but only X number of students can attend. What do they have to do then? Raise the bar, scrutinize even more. Think back and ask yourself, was that book really *that* awesome, or was it just really good? Maybe you really just liked it, but you didn’t want to hurt someone’s feelings and so you gave it a higher ranking that you really meant.

Luckily though,the whole Gaussian curve system ends up working out any way even if no one corrects their habit of only leave 1,4,and 5 star reviews. Because once all of the "Amazing" and "Didn't Like It" ratings are averaged, the rating drops to the standard 3-point-something star review--and the Gaussian Curve triumphs again.


  1. ...by far the most interesting "G" word of the day.

    Well done!


  2. Just a thought...there might be selection bias at work here.

    Of all the people who read a book, only a small percentage of them are going to actually leave a rating. And those who do are usually motivated to do so because they want to make a statement. Either love it, or hate it. People generally aren't motivated to put effort into saying something bland.

    To be a true sample, you would have to get a rating from everyone who reads a book. As it is, I reckon most of the middle-ground "like it" ratings are being self-selected out of the sample. Add them back in, and the Gaussian Curve would probably emerge triumphant.

    N.B. This is why medical research is so fraught with misinformation, with new drugs appearing to show better results than they actually deliver. Many researchers don't bother to publish results that show no effect one way or another.

  3. ah stats! numbers can explain anything!

  4. Pretty awesome! I'm glad I took stats so I actually understand what you're talking about. But I do have to agree with Botanist. I used think this about most reviews: "Wow, people either really love or hate stuff!" And then I took stats. Kind of ruins the mystery of life.

  5. I enjoyed your post and have to also agree with the points made by the Botanist.

  6. Really good points. But I don't always rate the books I read and like or don't like enough to finish. I imagine that impacts things, too. Some people feel that a 3 is an insult even though it means you liked the book. Ugh

  7. yeah...i'm bad about rating.
    first of all, i would feel more comfortable in a scale of one to ten...
    but even then...
    the thing is, i have like post-book-high. where when i just finish a book, i'm on cloud nine. i sign-in and i'm like 4 start that SUCKA!
    and if i really hate a book, i'm too scared of being mean to rate it low...

    but i love how you have the gaussian curve win out in the end. it's only fair that way, i suppose.

    opinions seem to be soooo subjective, trying to get any kind of reliable data on quality is killer- unless something OBVIOUSLY blows (i'm looking at you THE FRIGHTENING on netflix instant)

  8. Great breakdown. I knew what a gaussian curve was, but it's fun to learn all these details. :D

  9. Guardian curves as they relate to books? Sexiest curve ever.

  10. Also love that autocorrect changed Gaussian to Guardian.

  11. The chart is correct because you forget that those books that would fall at the bottom of the curve never get published. I've read plenty of one star books from aspiring authors. As authors, we work and work and work on our manuscripts until they are good enough to get a high rating from readers--aka good enough to even enter the market.

    Catch My Words

  12. Agree with Donna and Vic - if it's a one or two star book, I usually don't finish it and thus never rate it. So most of my reviews fall around four stars.

  13. I think you'd have to rate all writing to get a real Gaussian distribution. Most people who write really poorly know better than to publish a book, even counting self-publishing. Most self-publishers submit to some beta-reading and editing, and traditionally published books have severe editing. Most stuff at the 1-2 level is weeded out before it ever sees print. Also, people don't read random selections. They read things that interest them, which probably skews ratings toward the high end as well. Good points though.

    I'm trying to visit all the A-Z Challenge blogs this month.