Scirbd has been working long and hard to overcome their infamous reputation as a haven for ebook pirates. They've developed an automated filter system called BookID, and last night Smashwords detailed how BookID works for indie authors.
In a nutshell, here's how it works: BookID automatically scans all Smashwords-delivered books, and analyzes the text for semantic data such as word count, letter frequency, phrases, and other elements. BookID then creates a digital fingerprint of the authorized Smashwords book, and uses this fingerprint to automatically detect and remove unauthorized versions. It proactively removes all files at Scribd that match the same fingerprint, and also uses this fingerprint to proactively block the upload of future unauthorized versions.
According to Smashwords, Scribd recently rolled out a major update to BookID. The service has removed nearly 48,000 copies of ebooks by Smashwords authors since Scribd first added Smashwords titles to its ebook subscription service last year. Around 14,000 different titles were uploaded to Scribd, some of them by multiple users.
While that sounds like good work in preventing piracy, it's not clear just how many of those copies were actually pirated copies. There have been numerous reports that Scribd has taken down public domain works and works uploaded by their creators.
But on the plus side, there is at least one report that Scribd is also prompt in correcting the false positives, which is more than you can say for the similar ContentID system at Youtube. There's something to be said for staying small and niche.