Scribd, Piracy, and Why You Can’t Always Believe What You Read Online

10.03.2014 Nate Hoffelder 15 Comments

When Scribd launched their ebook subscription service last Fall many in the book co0mmunity faulted Scribd for their past history of piracy issues, and it seems that Scribd still hasn’t managed to escape that specter.

Rich Meyer, writing over at indies Unlimited, is advising authors to pull their Smashwords titles from Scribd because users might pirate them. (The fact that Smashwords sells the same titles as easily piratable DRM-free ebooks seems to have escaped him).

In a post that mixes equal parts factual errors, fear, and a misunderstanding of the tech involved, Rich writes:

I really had no problem with the service, as long as it was limited to the iPhones and Google Android stuff. I don’t have a smartphone myself, and don’t care to get one. People aren’t going to go out of their way to monkey with that sort of stuff, so they’d be using it and poof, book gone. But Scribd now has the ability to be used on a Kindle Fire, and that’s a whole different kettle of fish when it comes to illegal access and piracy.

…

The problem with Scribd’s view on piracy is that they are working on the WRONG bloody end of the stick. Pirates aren’t going to be UPLOADING books – they’re going to be DOWNLOADING CONTENT. I just can’t believe they’ve not figured this out: Any subscription service for e-books is basically a smorgasbord for piracy. There is absolutely nothing I can see that stops a person from just downloading books wantonly and copying them to another source (laptop, memory card, thumbdrive) and then cracking the pointless DRM on them and having a field day with them. Believe me, it’s very, very easy, and once you know how, you can set up your system to do it automatically – all you have to do is drop your book into a folder and *POOF* it’s as free as a bird. And if they didn’t bother to even open the books before they return them, the author gets ZILCH.

As Juli Monroe pointed out to me this morning, that bolded section is simply not true. I had linked to Rich’s post in the morning coffee post, and Juli called me on it because she didn’t think the section quoted above was accurate.

She had already looked for the ebooks she had downloaded from Scribd, and she couldn’t find them (not as ebooks, per se). She’s not interested in pirating the ebooks, obviously, but she was curious about the technical details and went looking. If she can’t find a recognizable ebook file, do you really think it will be easy for the average user to strip the DRM?

I don’t think so, and so far as I know there’s no easy DRM stripping tool for Scribd. But just to sate my curiosity I looked into the matter myself. I didn’t currently have a Scribd account, so I took advantage of the free trial. After I downloaded a few ebooks, I went looking and eventually found what I think is the correct folder.

Juli and I both think that Scribd stores their ebooks in a folder called documents_cache. We came to this conclusion independently, and if that is where Scribd puts the ebooks then I seriously doubt the average user will be able to strip the DRM.

The ebooks aren’t stored as ebooks; instead they are stored as collections of JSON, CSS, and image files. And while I can’t speak for the JSON files, the image files have DRM of some kind.

I wouldn’t know where to start to convert this format – but I can guess. Scribd’s ebook format may or may not have something to do with the HPub standard. That is one of the lesser known ebook file formats, and it can be used for sending rich format ebooks over the web. The spec mentions JSON files, but it also mentions certain requirements not met by the Scribd psuedo-ebook format.

I won’t go into the full details here; if you’re interested you can check yourself. but the short version is that I wouldn’t worry too much about someone stripping the DRM from a Scribd ebook; it’s going to take a real hacker to pull it off, not your average user.

Comments

Valentine March 10, 2014 um 2:39 pm

Or you could just have a simple script that scrolls through the book, taking screenshots along the way, and OCR’ing them into a simple text file.
Then convert it to epub or something else using calibre or other conversion tools, which will detect chapters by looking for "Chapter One", or "Chapter 2", or similar, and you can just clean it up.
The automated part should take no more than a few minutes for a full sized book, and probably just as much for the human intervention. (pulling numbers out of thin air, no idea if it takes that little or more, but I do have enough knowledge to create said script, and it’s embarrassingly simple)

If the user can see it, then it can be copied.

Anyway, if they’re desperate enough to keep on using DRM, potentially ruining legitimate user experience with little impact on piracy, maybe they should give book curses a chance.
http://en.wikipedia.org/wiki/Book_curse

DeeDee March 11, 2014 um 5:59 am

Awesome! If I ever decide to write a book, I’d use a book curse instead of the standart copyright text 😀

Juli Monroe March 10, 2014 um 3:12 pm

@Valentine, true. Again, beyond ability of the average user. And the point of the article I objected to was that there was something wrong with Scribd that they supposedly didn’t realize how easy it was to pirate their content by just downloading it and stripping DRM. Your method goes way beyond a simple download and strip.

While I’m anti-DRM in general, I don’t have a problem with it in a subscription service where I never expected to "own" the book. I know I’m paying to borrow it, and I’m okay with that and with DRM under those circumstances.

Oh, and thanks Nate, for adding some technical details about file formats. I’m editing my article to link back to this one, for those who are curious about those.

Nate Hoffelder March 10, 2014 um 8:08 pm

Valentine does raise a good point about the uselessness of DRM, though. This trick is basically a modernization of the old analog loophole.

William Ockham March 10, 2014 um 3:33 pm

I stopped reading the original article when I got to the part about the Kindle Fire is " whole different kettle of fish when it comes to illegal access and piracy" than Android. That’s just a gulf of ignorance greater than I can communicate across. If he’s really worried about privacy he should be worried about the web reader. I just tested it out and it dumps the whole book via http in a barely obfuscated format that is trivial to reverse engineer. We’re talking about the level of cryptography where all the letters from p to z are encoded as the same letter. p = p, etc. But hey, c is coded as e, e is coded as i, and i is coded c, so that’s totally secure.

Nate Hoffelder March 10, 2014 um 6:33 pm

That was painful, yes. I would not even have linked to it if not for the instructions.

Nate Hoffelder March 10, 2014 um 8:06 pm

" it dumps the whole book via http in a barely obfuscated format that is trivial to reverse engineer"

In that case someone could probably slap together a browser plugin which would do the job without too much effort.

Felipe dAdan Lerma March 11, 2014 um 11:33 am

Glad I came across this post. I subscribe to Digital Reader and guess I’d missed this.

I’d commented on the Indies post when it first came out, with what info I had to counter some of the points in the post, but am glad for this more detailed look.

There are legitimate concerns, on many levels, but I’ve placed all my books (over 50) with Scribd, and like how it’s working so far. A few glitches, to be expected, but not bad.

I think the greater issue on many people’s minds is how this will affect their book sales period, including where they sell now. I don’t know. But I did as much due diligence as I could, and, after initially opting out, agreed to put in all my work.

Best wishes then, for all of us, where ever we share and sell our work.

Ros March 13, 2014 um 5:23 am

I find it utterly bizarre that authors and publishers are willingly jumping into business with a site that has such a history of hosting pirated content. Personally, I’ve opted out of Scribd on all my books. I don’t care if it costs me sales. I just don’t want the owners of Scribd to get one single penny from my books.

If the only contact I’ve had with a website is sending them DMCA takedown notices, there’s no way in hell I want to do 'legitimate' business with them. Ever.

Versus July 27, 2014 um 7:20 pm

Scribd and similar sites are not doing enough about piracy. Uploaders of pirated material should be banned and fined, and remuneration given to the correct intellectual property holders in proportion to the degree of pirate downloads and distribution that result. Until Scribd and such sites feel the pain, as in truly carrying the burden of the cost of their enablement of piracy, there will not be proper enforcement.

Pedro August 2, 2014 um 2:37 pm

While it is true that piracy is a problem, it is hard to place the blame on the store that sells the eBook. If you buy a music CD at the store, rip the music to mp3 and upload it online, the store is not responsible for losses to the artist. Same goes for eBooks. The online store comes up with a way to sell the books (DRM, non DRM, or any other strange format), and the author or publishing company agrees to sell their books in that store. If the author or company does not feel its safe, then don’t sell your books there. If enough people feel the same way, then the stores will realize that something needs to change and will adjust accordingly.

Felipe Adan Lerma August 2, 2014 um 4:09 pm

Pedro, those are good points.

And this link ( http://blog.smashwords.com/2014/05/update-on-scribds-efforts-to-protect.html ) isn’t meant to sway folk, but just provide info that Scribd has taken measures to curb abuses.

Some people have said they still have problems, though I’m not sure if they’re part of the books from the various distributors to Scribd (Smashwords, D2D, and BookBaby).

For myself and a few authors I’ve spoken with who go through one of the 3 above, they (and I) seem to be doing ok.

Best wishes.

Bill H July 4, 2017 um 5:55 pm

The author is wrong. There is a large number of pirated works uploaded to scribd that has no DRM protection at all. I found pirated material there last time I looked. I did not download it. The site has a serious problem.

Nate Hoffelder July 4, 2017 um 6:53 pm

All file-sharing services have a piracy problem, but this post was not focused on that part of Scribd’s operation.

Susan Davis July 15, 2022 um 3:22 pm

Trust me, there is no ebook store that is uncrackable. Kindle, Scribd, Google, Apple, there is a hack easily available online to grab these books from any of these popular sources, remove the DRM and boom, you have the book.

The "truth" is that those who are going to pirate are going to pirate and you cannot stop them. Those who wish to pay for books or use apps will use them.

You have to equate it to the paper book. There is nothing to stop someone from removing the spine on a paperbook and then scanning it and and reprinting and and selling it. Or just making photo copies of it and reselling it.

You can only do what you can do to prevent piracy but just as in the paper book world, you cannot stop those who wish to pirate books.

Write a Comment
Cancel reply

You must be logged in to post a comment.

Scribd, Piracy, and Why You Can’t Always Believe What You Read Online

It’s Not Dead: Sony Releases Updates for Sony Reader App for iDevices

Free ebooks don’t drive sales, one author finds

Kindle Unlimited Payout Increases to $1.41 in February 2015 as Loans Drop

Write a Comment Cancel reply

Write a Comment
Cancel reply