Amazon Comments on TOC Crackdown, Inadvertently Confirms Kindle Unlimited Page Count Scam

14.03.2016 Nate Hoffelder 31 Comments

News broke on Friday that Amazon was cracking down on authors and publishers who had put their TOCs at the end of the Kindle ebook, rather than at the beginning.

On Monday Amazon confirmed the report in a statement on the KDP support forums, while at the same time also confirming that the crackdown was indeed a response to a much larger problem.

But first, Amazon’s statement:

We have recently received a number of questions on topics such as TOC formatting and our policing of abuse and fraud among KDP publishers.

In many cases, putting a book’s Table of Contents (TOC) at the end of a book can create a poor experience for readers, and in general we suggest authors locate TOCs to the beginning of a book. If the formatting of a book results in a poor experience or genuine reader confusion, or is designed to unnaturally inflate sales or pages read, we will take action to remove titles and protect readers. That said, absent any other issues of quality, locating the TOC at the end of a book is not in itself outside of our guidelines.

Relatedly, some in the community have contacted us about the activities of a small minority of publishers who may attempt to inflate sales or pages read through the use of various techniques, such as adding unnecessary or confusing hyperlinks, misplacing the TOC or adding distracting content. We both actively police for this type of activity on our own as well as investigate when the community points out such abuse (thank you to those of you who have helped us in this regard). Any abuse we find results in the immediate suspension of a title. Some circumstances, including repeat offenses, will result in KDP account suspension. In any abuse cases, we will also remove related pages read from the allocation of the monthly KDP Select Global Fund.

When I covered the story on Friday, I reported on the crackdown but not the speculation that authors were getting caught up in a response to scammers who were using various means to artificially inflate their page counts in Kindle Unlimited.

As David Gaughran explained, and as was laid out in detail over on KBoards, scammers were using tricks "such as adding unnecessary or confusing hyperlinks, misplacing the TOC, or adding distracting content" to artificially inflate the number of pages read by Kindle Unlimited subscribers.

This details matters because in July of last year Amazon started paying authors and publishers with ebooks in Kindle Unlimited by the number of pages read, rather than the number of times an ebook is borrowed. This was generally viewed as a response to authors who were cheating the system by uploading really short works and getting paid each time one was borrowed, and it was supposed to level the playing field by making sure that longer works are valued the same as a short story.

That’s the way things were supposed to work, but alas, the scammers are smarter than that.

As the theory goes, scammers had figured out that Amazon wasn’t counting the pages a KU subscriber had read in an ebook; instead Amazon was only measuring how far into a book the reader gets. So if the scammer can trick a reader to jump to a point 90% of the way into a ebook, that scammer gets paid for that 90% of the ebook no matter whether it was actually read or not.

I have yet to confirm the accuracy of that theory, but it is the consensus opinion over at KBoards. David Gaughran also thinks that’s what’s going on:

The latest wheeze from this shady crew was to place a message at the start of their KU titles encouraging readers to click through to the end – because this fools Amazon’s system into thinking the entire book has been read, the author of that title then receives an inflated payout from the KU pot, and then honest, hard-working writers who aren’t pulling these cheap tricks on readers have less money to share. It’s a mess. These guys are peeing in the KU pool and Amazon is paying them by the gallon.

And it seems this is what triggered the TOC crackdown.

You can find one such book over here. The scammer’s notice is visible when you read the sample on the Amazon website, and it looks like this:

I have yet to confirm first-hand that this theory is true, but I do believe that Gaughran was right all along.

Amazon’s announcement this morning, as well as the spammy/scammy screensnap above, would only make sense if scammers had found a way to trick Amazon into paying for pages that weren’t read.

And that is frankly shocking. Who would believe that Amazon would decide to charge by the pages read and then not bother to count the pages correctly?

That just doesn’t make any sense, and yet it is what is happening here.

Clearly Amazon is going to have to come up with a more permanent solution. They could start by defining an algorithmic filter to catch the more obvious scams, but in the long run Amazon is going to have to change how they track what is read by Kindle Unlimited subscribers.

image by Matt From London

Comments

M. Louisa Locke March 14, 2016 um 6:36 pm

From my own experience as a reader, what I am wondering is if the problem is not that when someone skips to the end that all the pages in between are counted, but that if the TOC, the reader who doesn’t know how to use the GO TO option to get to the end will end up flipping through the pages to get to the end (which would then cause Amazon to count those pages). If that is the problem, then getting authors to put the TOC at the beginning, which would make it more likely the reader would click on the end–bypassing the interior pages–would actually solve the problem (since none of the pages in between would count.) Otherwise, shifting the TOC to the beginning doesn’t really solve Amazon’s problem with the scammers (since whether they clicked on special go to end to get prize link or a TOC link the effect would be the same).

So I am wondering if people like David (much as I respect his analysis of all things Amazon) aren’t jumping to the wrong conclusion about how Amazon tells how many pages are read (he and others seem to be assuming they simply count the pages between first page read and last page read.) I would love to hear someone from Amazon comment on this.

William Ockham March 14, 2016 um 7:09 pm

Who would believe that Amazon would decide to charge by the pages read and then not bother to count the pages correctly?

I would. Well, I believe they decided to pay by the pages read without having a fool-proof way to count the pages. Look carefully at the difference between what I wrote and the way you framed the issue. Amazon is, first and foremost, a customer-focused company. If they were charging by the page, the system would have to be fool-proof. Paying by the page is a different story. Why is it easy for me to believe this happened? Because they don’t have eye-tracking capability in the Kindle ecosystem. Without that, it is impossible, despite what Andrew Rhomberg may believe. The 'pages read' system has to work on every Kindle device and every Kindle app. That means it is built into the file format. Which means it is based on the way the .mobi format counts segments and handles internal links. And that means a maliciously crafted link can move through file to the end. What is the mystery?

Carolyn Jewel March 14, 2016 um 8:25 pm

"That means it is built into the file format."

No. It does not mean that. If it were in the file format it would be easy to reverse engineer whatever is supposed to be "in the file format," a statement I find too vague to parse out, to be honest. So mostly, I guess, it’s only possible to say, there’s zero data to support that hypothesis.

"Which means it is based on the way the .mobi format counts segments and handles internal links."

No. It doesn’t mean that either.
First off, Amazon converts mobis to the AWZ format, so it wouldn’t be related to the mobi format. If Amazon wanted to count something it calls segments — which I assume here means whatever internal file divisions are present (something that is actually done by the person/program that creates the file to be uploaded) — in order to count segments instead of words, all you’d have to do is count characters and decide how many characters = 1 segment so you can use smaller integers in the arithmetic. But they’re not doing that — that’s just something that doesn’t assist in this issue.

"How Amazon handles internal links" If by that is meant a link from one part of the book to another part of the book, Amazon isn’t doing anything more or less complicated that following the html standard. There’s nothing magical or nefarious going on there. It’s how eBooks are supposed to work. It’s how anchors work.

The fact that someone can put a link in the front that takes the user to the back is exactly what links are designed to do. With or without a TOC. With or without the possibility of a gift card at the end of that link.

Back to KENP etc. For a given device at a given point in time, Amazon knows the font and font size being used, the margins set and, basically, all the css that would affect the number of words that would be displayed on a given device page. That’s how they can tell you about how far you are in the book and how they can give you an integer that represents your relative location in the book.

It’s not hard to count characters and spaces for a given block of text and perform all kinds of calculations and string manipulations. Give a programmer a block of text and she can perform all kinds of arithmetic and string functions on that text. It’s programming 101.

Amazon does not need eye-tracking when they have page swipes. And don’t forget that Amazon is able to link an audio book location to the ebook location and for that to happen they MUST know (for nearly all values of "know") where you were in the eBook or Audio book in either of the two sources. Every device/app out there is able to bookmark a current location and return the reader to it. Most can place a user at an updated location. My point being is there is irrefutable evidence that location is knowable and obtainable.

For those reasons I think it’s more likely that Amazon took an exploitable shortcut because they (may have) wanted to save some processing cycles by assuming that if a user reached some endpoint in the book they could assume the content in between had been "read." Stupider programming shortcuts have been made.

There a lot here to suggest we either don’t have all the information or there are two issues being conflated or both. The first is that the apparent assumption by Amazon that the physical location of a TOC is evidence of nefarious intent. That’s just dumb, but we’ve seen Amazon make dumb assumptions (see, Authors, fans, and review removal criteria). The image that keeps being used to show what spammers are doing DOES NOT involve a TOC. That image is a front matter link that promises a reader something to click that link and end up … somewhere where the reader is supposed to get something of value not related to the book, the story, or the author.

So, Amazon can and does derive location in a Kindle book.

Without knowing more, and Amazon has been cagey, I think the better conclusion (at this point) is that Amazon took some shortcuts that were easily exploited.

Nate Hoffelder March 14, 2016 um 8:48 pm

"For those reasons I think it’s more likely that Amazon took an exploitable shortcut because they (may have) wanted to save some processing cycles by assuming that if a user reached some endpoint in the book they could assume the content in between had been “read.” Stupider programming shortcuts have been made."

I wonder if the required tracking is even technically possible for a platform on the scale of Kindle Unlimited. I mean, it’s one thing for Jellybooks to crunch a few hundred or a few thousand readers' data, but KU operates on the scale of millions.

John March 15, 2016 um 3:43 am

‘If it were in the file format it would be easy to reverse engineer whatever is supposed to be “in the file format.”’

You mean, easy like reverse engineering their new file format? Ask us guys at http://www.mobileread.com/forums/showthread.php?t=263902 how easy this is, I’ll be glad you help us prove it’s so easy.

‘For a given device at a given point in time, Amazon knows the font and font size being used, the margins set and, basically, all the css that would affect the number of words that would be displayed on a given device page.’

And yet, there are long-lasting bugs that have been never fixed, like the minimum line-height of 1.2em, a notice which is lost in the middle of their documentation. Guess why? A smaller line-height would absolutely broke what they call locations. Coincidently, that’s the default for a lot of DTP software. At some time, they even had to manage that in KindleGen, once and for all.

They may know all those settings, it doesn’t guarantee they are using them properly.

And don’t forget Amazon’s got like a million different formats to manage (Topaz, Print-Replica, etc.), which don’t work the same.

While I agree with your conclusion (shortcuts), I urge you to retro-engineer some of their apps, you’ll probably discover that 1. they are managing a lot of stuff on their side (a lot of processing is occurring during the file’s conversion) and 2. more and more of this stuff is actually managed at the file level so that the renderer keeps performance.

Seriously, Kindle is a really complex system which has to manage a lot of complex formats, if you don’t know how complex it is overall, don’t believe you can teach anybody. Some have been retro-engineering this system for years, they still don’t know all the nuts and bolts. And it is not, by any means, as simple as you seem to believe it is. Thank you.

tarwin March 20, 2016 um 2:19 pm

I think the comment about eye-tracking is that page swipes are still not a hundred percent accurate as sometimes you swipe several pages to look for something or skip over a part because you’re "lazy" and don’t feel like using the table of contents or whatever (or maybe it’s not precise enough to take you where you want to go) and the only way to be truly sure that something is read, or at least skimmed, would be to track eye movement to "make sure" that the reader has looked at that part of the text. Basically saying that any likely method is going to be an approximation, though some would be better than others.

Mark Williams – The International Indie Author March 15, 2016 um 3:27 am

Surely a simple way to test this is to load up a new title to KDP, immediately put in KU, download the title in KU, go from TOC to last page without turning any pages, and then immediately unpublish the book so there is no chance another reader will have found it and also downloaded the title.

The number of pages read for that title, according to the KDP dashboard, will then show clearly whether a single leap to the end of a book will count as all pages read or not.

Nate Hoffelder March 15, 2016 um 7:26 am

That’s what I was going to do, yes.

I wanted to see it for myself, but if you want to try it and report back, that would work for me.

Mark Williams – The International Indie Author March 15, 2016 um 11:57 am

I’m not a KU subscriber so cannot download the title in KU, otherwise I would have done so.

Nate Hoffelder March 15, 2016 um 12:06 pm

We can get the assistance of subscribers easily enough. What we need first are books to test.

Olivier March 15, 2016 um 7:07 am

@Nate You write: "And that is frankly shocking. Who would believe that Amazon would decide to charge by the pages read and then not bother to count the pages correctly?"

Counting pages "correctly" would likely involve, e.g., monitoring how long a reader stays on a page, whether he scrolls and so on, and this for every page as long as the book is open. More privacy invasion. Do we really want that?

Nate Hoffelder March 15, 2016 um 8:38 am

How else would Amazon know to pay by the page?

tarwin March 20, 2016 um 2:24 pm

Don’t forget that it would have to measure how long on average a specific reader spends on X amount of text (based on performance in several books) to be able to take into consideration the different reading speeds (especially speed readers).

Olivier March 15, 2016 um 8:43 am

My point is that this pay-by-the-page system is wildly intrusive.

Nate Hoffelder March 15, 2016 um 8:46 am

Agreed.

Brenda G. March 15, 2016 um 8:47 am

Take a look at this guy. He is paying freelancers on upwork to click through his books. someone needs to report him. He says he’s making $5k+ per day. He’s taking money out of all out pockets. https://www.youtube.com/watch?v=Do5XypMzeGQ

Nate Hoffelder March 15, 2016 um 10:35 am

Amazing, isn’t it?

William Ockham March 15, 2016 um 10:47 am

Carolyn Jewel,

Let me clarify a few things. Let’s start with what we know. The system in place to count pages read is hackable. That system is the same system that is used to manage a user’s location in an ebook when the user moves from reading on one device to another (or from reading to listening to the audiobook).

Kindle ebooks don’t contain in any executable code in the ebook files themselves. (This is what is different from JellyBooks). The Kindle app (and I’m including the software running on Kindle devices) tracks the user’s location in the ebook based on the user’s navigation within the file and periodically communicates that back to Amazon’s servers when the device is connected to the internet. That communication is quite constrained. The last time I checked (by using a proxy server and a network sniffer), it was nothing more than a single integer value that corresponded to the current file position pointer and an id that identified the device/app instance. I haven’t checked in a couple of years, but given that there have been very few updates to the oldest supported hardware, I doubt anything changed just to support KU.

I never did any testing with the PDF-based formats, but I can’t imagine that the situation is materially different there.

Now, a little bit about me to establish some context for my statements. I’ve been a professional software developer for over 20 years. I spent a couple of years hacking at the Kindle format more or less full-time (and MobileReads is an excellent resource), but I haven’t spent much time on it in the last three years.

Let’s talk about how the Kindle apps handle links. Go find a book that has endnotes. Click on an endnote link. That will send you near the end of the physical file, in most cases. Click the back link and you are back in the middle of the file. Pick that book up on another device and the reported "furthest" page is not the endnote. Repeat that experiment, but instead of clicking an endnote, go to the ToC and jump ahead several chapters. Now, when you open that book on a different device, the "furthest" page read is that chapter you jumped to. Navigating within a Kindle file (and I’m talking specifically about the .mobi file format which includes files with the .AZW extension) is interesting because, unlike epub files, the html is stored in what is essentially a database format with fixed length records that include weird metadata. The app reconstructs the file in the memory of the device. The long and short of this is that, sure, Amazon can know with some precision the user’s current location within a Kindle file. What they don’t necessarily track is the path the user took to get to that position. And that is exactly what is at issue here. The other interesting thing about Kindle files is that Amazon inserts a "start reading location" into the file after it’s uploaded. That’s important because I assume that the KENP count starts from there. Before you assume that Amazon is taking stupid shortcuts, think carefully about building large-scale distributed systems. The fact that the Kindle app knows locations doesn’t mean that it is counting page swipes. In fact, a little bit of experimentation will demonstrate that Kindle apps do not count page swipes.

Having designed a few large scale distributed apps myself, I am fairly confident I know why the KU system is exploitable and what Amazon can do about it. The system is exploitable because all large scale systems make trade-offs. The switch from a single payment amount to a per page payment was an enormous upgrade for subscribers because it more closely aligned the value of the subscription to the relative costs for producers. Long-term, the system attracts more content that has higher value to subscribers.

One of the trade-offs in that decision was that flaws in the Kindle location tracking system that had always existed were exploited by unscrupulous producers. Without public pressure and reader complaints, this isn’t a huge deal to Amazon unless the bad content drives out the good.

Fixing the technical issue is a lot harder than people think. Effectively eliminating the cheating will almost inevitably result in a few honest authors losing revenue. One solution is to track all the locations that the Kindle app reports and pay only based on some formula that penalizes skipping ahead. That’s problematic on several fronts. Because devices can go off-line, Amazon would have to change their apps (on many different platforms, btw) to store the location frequently and upload the entire dataset. Then the server side code would have to parse and analyse that data to estimate the appropriate payment amount.

Another route would be to automate the identification of the known scams and prevent them from being published. This is more achievable but could easily become a game of "whack-a-mole". It also risks creating a large number of issues that require human intervention and that is something that Amazon must avoid in its business model.

Amazon wants to eliminate this issue (they’ve said so), but they are unlikely to move as quickly or effectively as KU authors would like. As a KU subscriber myself, I would like to see this addressed sooner rather than later. The situation lowers the value of my subscription. Because I’m better situated than almost any other subscriber to do something about it, I’m trying to come up with effective solutions rather than complaining about it. I’m open to suggestions.

I’m convinced that the only real solution is to figure how to raise the costs/risks of pulling off the scams so that these people go after something else.

author/reader March 15, 2016 um 11:47 am

So the upshot of it all is that scammers have likely inflated the total number of pages read–which reduces the payout per page read to all other authors. Not only has Amazon been scammed but legitimate authors have also been scammed–partly as a result of Amazon’s defective KU 2.0 system.

Carolyn Jewel March 16, 2016 um 10:59 am

Allow me to point out that reverse engineering a file format is significantly different from reverse engineering a rendering engine. My response was to the op’s suggestion that such functionality is in the file format when, if it’s present on the device, it’s in the rendering engine. But I think it likely that the calculation takes place on amazon’s servers. These are calculations that take cpu cycles so I would expect Amazon would not burden devices with that. It makes more sense to do that on their side.

Given my 20 plus years in IT and in dev ops I actually don’t need anyone to explain these concepts to me. My job for example includes explaining to software devs why they need to re write their code so we use server resources efficiently.

Clif Davis March 15, 2016 um 5:05 pm

Putting the TOC first means that it is included in the part that can be freely read inside the book. For a non-fiction book this will generally be a good thing, though not always. For a fiction book putting the TOC at the end gives you more text with which to hook them.

Point is, there are legitimate reasons for TOC near the end.

Brent Tharp November 10, 2016 um 8:50 am

Sorry to jump this a little late, but I’ll believe a legitimate reason exists for putting a TOC at the back of a book when I see one printed that way. I’m a professional editor and I have yet to see that. There is no reason for putting a TOC at the end of a book except to try to game the system. And it actually makes the reader experience worse. But the expected ordering of the parts of a book has only been around for a few hundred years, so I imagine there could be some reason a person would want all the parts in reverse order. Wait. Nope. Still none.

Start up: evaluating ebooks, EU’s tax quiz, no more Here on Windows, two cameras on iPhone 7?, and more | The Overspill: when there's more that I want to say March 16, 2016 um 3:01 am

[…] Amazon comments on “table of contents” crackdown, inadvertently confirms Kindle Unlimite… […]

Top 5 Publishing News Stories 3/14-3/18 – Publishing Trendsetter March 18, 2016 um 12:31 pm

[…] Amazon began cracking down on a scam that puts the Table of Contents at the end of an ebook that would make the book appear to have been read to the end. […]

Why Are Amazon and KDP So Weird? | Digital Book World March 23, 2016 um 7:32 am

[…] of a user’s account because he returned too many items, or the debate surrounding whether Kindle Direct Publishing (KDP) authors must put their table of contents pages in the front matter as opposed to the […]

Amazon Reportedly Actively Recruiting Publishers into Kindle Unlimited | The Digital Reader April 22, 2016 um 1:08 pm

[…] wish Amazon luck with that. Given the current problems with scammers cheating the funding pool, publishers would be advised to stay away until after Amazon has fixed the […]

Are Authors Really Losing Out in the Kindle Unlimited Pay-Per-Page Scam – Or Is It Amazon? | The Digital Reader April 26, 2016 um 10:10 am

[…] The Guardian and other parts of the mainstream media begin to notice the month and a half old story on the Kindle Unlimited page count scam, it's time that we consider just who is being […]

Innocent Authors Are Getting Burned in Amazon's Fight Against KU Bot Farms | The Digital Reader July 11, 2016 um 12:35 pm

[…] David Gaughran inferred and Amazon confirmed, those authors were getting caught up in one of Amazon's fights against Kindle Unlimited scammers […]

Kindle Unlimited Funding Dips in February 2016 as the Per-Page Payment Increases | The Digital Reader June 12, 2017 um 8:34 pm

[…] indie publishing community may be in turmoil with the news that scammers are outsmarting Amazon in Kindle Unlimited, but business continues […]

Amazon Seeks Court Confirmation of Arbitration Ruling, Inadvertently Shows Just How Badly-Run the Kindle Store Is | The Digital Reader April 5, 2018 um 2:19 pm

[…] created at least six Amazon publisher accounts, at least one of which engaged linkbaiting, […]

Content Moderation Case Study: Amazon Alters Publishing Rules To Deter Kindle Unlimited Scammers (April 2016) – Literary Reviews September 2, 2020 um 8:10 pm

[…] book were read, the author still got credit for the 100 pages read by an Unlimited user. Scammers inflated “pages read” counts by moving the table of contents to the end of the book or offering dozens of different languages in the same ebook, relying on readers skipping hundreds […]

Write a Comment
Cancel reply

You must be logged in to post a comment.

Amazon Comments on TOC Crackdown, Inadvertently Confirms Kindle Unlimited Page Count Scam

iFixit Rips Open the Fire TV, Finds Few Surprises

Is it Piracy? How Students Access Academic Resources

Got a Minute? Why Not Read Moby Dick

Write a Comment Cancel reply

Write a Comment
Cancel reply