The Biggest Plagiarism Scandal in the History of eBooks Slipped by Amazon Unnoticed
When news broke a little over two months ago that the Brazilian
author person Cristiane Serruya had copied parts of Courtney Milan’s book, The Duchess War, little did we know that this one example of plagiarism and copyright infringement would quickly snowball into what is now the biggest ebook news story of the year.
CopyPastCris, as the scandal has been dubbed, now includes no fewer than 95 books by 43 authors as well as articles and other content from six websites (and two recipes). Numerous passages have been copied from those books and websites into one or more of Serruya’s published works.
Yes, ninety-five books. You can find a running tally of the affected works on Caffeinated Fae, and you can find commentary on Pajiba and SBTB. And if you can stomach it, go read the interview where Serruya pins the blame on her untrustworthy ghostwriters. I wouldn’t beleive it too much, however, because first, the sheer number of examples of plagiarism, and second, a person who identified as one of Serruya’s ghostwriters left a comment on Courtney Milan’s blog disputing Serruya’s claims, and adding one claim, that she hadn’t been paid for her work:
I am a ghostwriter who worked with the person in question in 2017 and early 2018 on 2 books. I do not work on Fiverr but was contacted by her personally. I think I can provide insight to whether the ghostwriter was to blame, as she contends. Her work, when given to me, was a number of mishmashed scenes that needed “expanding”, as she said. I took for granted that these were her own words, and embellished as she requested, as this is how I work–I often help authors who are “too close” to their own book to get it in shape for publication. Now I can see that it’s very possible those were plagiarized scenes that she was hoping a ghostwriter would change enough to make unrecognizable. I did cut off ties with her after she gave me a sob story about her daughter being sick and told me she couldn’t pay me for work already done. I did not work on the above book, but just knowing the way she works, it seems much more possible to me that she cobbled scenes together via other people’s published works and gave them to a ghostwriter to smooth over…. than for a ghostwriter to be entirely responsible for this.
This story is bound to grow over time as more authors discover they too were plagiarized. A lawsuit has even been filed over CopyPastCris, which comes as a surprise. Few authors have the money or the energy to go after Serruya, but Nora Roberts has both. She filed suit last week over six of her works that were copied by Serruya.
It is curious that Roberts is only suing over 6 books, because according to Claire Ryan, Serruya copied from nine of Roberts’s works.
Ryan is a fantasy author, but in her day job she is a senior web programmer. She started asking herself how she would build a system to detect the plagiarism.
The answer became the core of what eventually turned into the algorithm – a program that could find similar text between two ebooks, even if the text had been paraphrased or the names changed.
There were limitations. Too much paraphrasing meant it wouldn’t recognize similarity, and it would probably come back with complete nonsense sometimes. But it just might work.
So on that Wednesday night, I started to write some code. It was just a PHP script, nothing special, but I had a feeling that it would work pretty well. Then I found a copy of The Duchess War on Smashwords, and after a few tweets, one of my followers sent me a link to a copy of Royal Love.
I did the first run on those two books, and the results looked pretty good.
Ryan went on to compare Serruya’s books and as many of the original copied books as she could get her hands on. Some were provided by the affected authors, while others were crowd-sourced from readers, and all were fed into Ryan’s algorithm.
While some of the plagiarism was spotted by readers and authors, much of the work to document the plagiarism was done by Ryan. She wrote the algorithm, she supplied the computer time to run it, and she double-checked the results.
Isn’t it funny how one programmer could find all this and Amazon did not?
I mean, Amazon sold all the ebooks in question, and it employs how many tens of thousands of software engineers, and yet they couldn’t find this massive example of plagiarism.
Yes, I know, Amazon sells millions of ebooks, but this kind of project doesn’t require infinite resources. It’s the kind of thing that startups like the late BookLamp can do with limited resources, so there’s no reason that Amazon, with its $11 billion in profit last fiscal year, couldn’t have found this issue if they were so inclined.