Scribd, Piracy, and Why You Can’t Always Believe What You Read Online

When scribd-logo-blk_100x28Scribd launched their ebook subscription service last Fall many in the book co0mmunity faulted Scribd for their past history of piracy issues, and it seems that Scribd still hasn’t managed to escape that specter.

Rich Meyer, writing over at indies Unlimited, is advising authors to pull their Smashwords titles from Scribd because users might pirate them.  (The fact that Smashwords sells the same titles as easily piratable DRM-free ebooks seems to have escaped him).

In a post that mixes equal parts factual errors, fear, and a misunderstanding of the tech involved, Rich writes:

I really had no problem with the service, as long as it was limited to the iPhones and Google Android stuff. I don’t have a smartphone myself, and don’t care to get one. People aren’t going to go out of their way to monkey with that sort of stuff, so they’d be using it and poof, book gone. But Scribd now has the ability to be used on a Kindle Fire, and that’s a whole different kettle of fish when it comes to illegal access and piracy.

The problem with Scribd’s view on piracy is that they are working on the WRONG bloody end of the stick. Pirates aren’t going to be UPLOADING books – they’re going to be DOWNLOADING CONTENT. I just can’t believe they’ve not figured this out: Any subscription service for e-books is basically a smorgasbord for piracy. There is absolutely nothing I can see that stops a person from just downloading books wantonly and copying them to another source (laptop, memory card, thumbdrive) and then cracking the pointless DRM on them and having a field day with them. Believe me, it’s very, very easy, and once you know how, you can set up your system to do it automatically – all you have to do is drop your book into a folder and *POOF* it’s as free as a bird. And if they didn’t bother to even open the books before they return them, the author gets ZILCH.

As Juli Monroe pointed out to me this morning, that bolded section is simply not true. I had linked to Rich’s post in the morning coffee post, and Juli called me on it because she didn’t think the section quoted above was accurate.

She had already looked for the ebooks she had downloaded from Scribd, and she couldn’t find them (not as ebooks, per se). She’s not interested in prating the ebooks, obviously, but she was curious about the technical details and went looking. If she can’t find a recognizable ebook file, do you really think it will be easy for the average user to strip the DRM?

I don’t think so, and so far as I know there’s no easy DRM stripping tool for Scribd. But just to sate my curiosity I looked into the matter myself. I didn’t currently have a Scribd account, so I took advantage of the free trial. After I downloaded a few ebooks, I went looking and eventually found what I think is the correct folder.

Juli and I both think that Scribd stores their ebooks in a folder called documents_cache. We came to this conclusion independently, and if that is where Scribd puts the ebooks then I seriously doubt the average user will be able to strip the DRM.

The ebooks aren’t stored as ebooks; instead they are stored as collections of JSON, CSS, and image files. And while I can’t speak for the JSON files, the image files have DRM of some kind.

I wouldn’t know where to start to convert this format – but I can guess. Scridb’s ebook format may or may not have something to do with the HPub standard. That is one of the lesser known ebook file formats, and it can be used for sending rich format ebooks over the web. The spec mentions JSON files, but it also mentions certain requirements not met by the Scribd psuedo-ebook format.

I won’t go into the full details here; if you’re interested you can check yourself. but the short version is that I wouldn’t worry too much about someone stripping the DRM from a Scribd ebook; it’s going to take a real hacker to pull it off, not your average user.

9 thoughts on “Scribd, Piracy, and Why You Can’t Always Believe What You Read Online

  1. Or you could just have a simple script that scrolls through the book, taking screenshots along the way, and OCR’ing them into a simple text file.
    Then convert it to epub or something else using calibre or other conversion tools, which will detect chapters by looking for “Chapter One”, or “Chapter 2″, or similar, and you can just clean it up.
    The automated part should take no more than a few minutes for a full sized book, and probably just as much for the human intervention. (pulling numbers out of thin air, no idea if it takes that little or more, but I do have enough knowledge to create said script, and it’s embarrassingly simple)

    If the user can see it, then it can be copied.

    Anyway, if they’re desperate enough to keep on using DRM, potentially ruining legitimate user experience with little impact on piracy, maybe they should give book curses a chance.
    http://en.wikipedia.org/wiki/Book_curse

  2. @Valentine, true. Again, beyond ability of the average user. And the point of the article I objected to was that there was something wrong with Scribd that they supposedly didn’t realize how easy it was to pirate their content by just downloading it and stripping DRM. Your method goes way beyond a simple download and strip.

    While I’m anti-DRM in general, I don’t have a problem with it in a subscription service where I never expected to “own” the book. I know I’m paying to borrow it, and I’m okay with that and with DRM under those circumstances.

    Oh, and thanks Nate, for adding some technical details about file formats. I’m editing my article to link back to this one, for those who are curious about those.

  3. I stopped reading the original article when I got to the part about the Kindle Fire is ” whole different kettle of fish when it comes to illegal access and piracy” than Android. That’s just a gulf of ignorance greater than I can communicate across. If he’s really worried about privacy he should be worried about the web reader. I just tested it out and it dumps the whole book via http in a barely obfuscated format that is trivial to reverse engineer. We’re talking about the level of cryptography where all the letters from p to z are encoded as the same letter. p = p, etc. But hey, c is coded as e, e is coded as i, and i is coded c, so that’s totally secure.

    1. ” it dumps the whole book via http in a barely obfuscated format that is trivial to reverse engineer”

      In that case someone could probably slap together a browser plugin which would do the job without too much effort.

  4. Glad I came across this post. I subscribe to Digital Reader and guess I’d missed this.

    I’d commented on the Indies post when it first came out, with what info I had to counter some of the points in the post, but am glad for this more detailed look.

    There are legitimate concerns, on many levels, but I’ve placed all my books (over 50) with Scribd, and like how it’s working so far. A few glitches, to be expected, but not bad.

    I think the greater issue on many people’s minds is how this will affect their book sales period, including where they sell now. I don’t know. But I did as much due diligence as I could, and, after initially opting out, agreed to put in all my work.

    Best wishes then, for all of us, where ever we share and sell our work.

  5. I find it utterly bizarre that authors and publishers are willingly jumping into business with a site that has such a history of hosting pirated content. Personally, I’ve opted out of Scribd on all my books. I don’t care if it costs me sales. I just don’t want the owners of Scribd to get one single penny from my books.

    If the only contact I’ve had with a website is sending them DMCA takedown notices, there’s no way in hell I want to do ‘legitimate’ business with them. Ever.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>