Skip to main content

No, You Can’t Predict a Best-Seller by Analyzing a Book’s Contents

2102283897_d350638330_bThe Independent has a report on the latest group of outsiders who think they are going to disrupt the book publishing industry.

The last one was a startup which wanted to eliminate the middleman by becoming a middleman, and now someone thinks they can predict the next best-seller simply by running the text of a book through an algorithm.

Good news for budding authors needing a little hand in writing that best-selling book: there’s now a formula for writing successful novels. Well, it won’t help you write the whole book, but it may help you get on the right track.

Former acquisitions editor for Penguin UK, Jodie Archer, and associate professor of English at the University of Nebraska-Lincoln, Matthew Jockers, have been compiling data for the last five years, trying to find out what makes a bestseller.

After analysing 20,000 randomly selected novels from the past three decades, the pair worked out what makes a book ‘successful’ (i.e. one that has appeared in The New York Times bestseller list).

The result is an algorithm – titled the ‘bestseller-ometer’ by its discoverers – which measures certain aspects of books such as theme, plot, style, character and vocabulary, and tells you whether it will be a bestseller; they claim it can pick out a future bestseller to an 80% degree of accuracy.

Yes, they say they can predict the next best-seller based on the text of the books which made the best-seller lists twenty and thirty years ago.

Never mind that the market and audience was different 30 years ago; never mind that publishers are gaming the best-seller lists, and have been for decades; never mind that people buy books based on marketing, word-of-mouth, and other aspects which can’t be found in the text; never mind that the best-seller lists don’t accurately reflect a book’s true sales; never mind that the method for generating best-seller lists has changed over the decades.

No, Archer and and Jockers think that they can predict a best-seller, and they’ve published a book which makes that claim.

Pull the other one; it has bells on.

I find I agree with Mike Shatzkin’s take on this idea:

My team’s view is unanimous. The idea that the odds a book will make the bestseller list can be calculated from the content of the book alone, without regard to consumer analysis, branding, or the marketing effort to promote the book, is ridiculous.

As Pete has explained to us, repeatedly, the customers you’re looking for have not read the book. You capture them by appealing to their interests and their searches in ways that they find appealing and in language they understand.  He reminds us from time to time that the words “civil rights” “don’t appear in To Kill a Mockingbird”.

According to its Amazon page, “the Bestseller Code boldly claims that the New York Times bestsellers in fiction are predictable and that it’s possible to know with 97% certainty if a manuscript is likely to hit number one on the list as opposed to numbers two through fifteen.”

Our verdict on this: absolutely impossible.

And our hunch is that their publisher feels the same way. After all, if you had access to a capability like this, and you believed it, wouldn’t you do a few bestsellers on your own before you revealed any of it to the world?

That last point may or may not be compelling, depending on whether you like or dislike major publishers, but TBH I had reached many of the same conclusions.

And frankly, if this is such a great idea then why aren’t we reading about Archer and Jockers launching a startup to capitalize on it, and publishing a book to promote the startup?

image by vonguard

Similar Articles


William Ockham July 9, 2016 um 11:14 am

You, Shatzkin, and Archer and Jockers are all wrong. Archer and Jockers are wrong because they have fallen prey to survivorship bias. They didn’t factor in all of the texts that meet their criteria, but didn’t hit the bestseller lists. On the other hand, to dismiss this work without understanding it is foolish. I haven’t looked at the research behind this book, but I am familiar with other things Jockers has done. He isn’t an idiot. And he has clearly learned the first rule of popularizing academic research: Get free publicity by making bold controversial claims, even if you can’t back them up. I believe it is very possible that Jockers and Archer have identified something real that would help identify potential bestsellers. I can’t wait to dig into the details.

Nate Hoffelder July 9, 2016 um 12:39 pm

They didn’t factor in all of the texts that meet their criteria, but didn’t hit the bestseller lists.

I had the same thought about survivorship bias, but from what I can tell they studied more than just titles on the best-seller list.

William Ockham July 10, 2016 um 1:07 pm

They couldn’t study all the unpublished stories that met their criteria for making the bestsellers list. They can’t know how many published books originally had all the qualities of a bestseller, but some were edited out in the publication process.

The study necessarily included only published books. I think that is a much bigger flaw than the ones you mention.

Sergegobli July 10, 2016 um 9:40 am

I would have liked to have learned a bit more about this magical algorithm in your article.

Nate Hoffelder July 10, 2016 um 9:49 am

The book isn’t out yet, and I am still waiting for the authors to reply to my email.

Michael Dalvean July 24, 2016 um 5:58 am

The Bestseller Index referred to here was created using data from novels published since 2001. However, it is able to correctly identify Catcher in the Rye (1951) as a bestseller: .

Guest Post: On that 80% Accuracy in Predicting the Next NYTimes Best-Seller | The Digital Reader September 5, 2016 um 5:45 pm

[…] upcoming book The Bestseller Code? – ?Anatomy of the Blockbuster Novel is getting a great deal of buzz. Can one genuinely predict what kind of book will become a NYTimes […]

Write a Comment