Turnitin has long used Wikipedia as a data source when scouring students’ papers for plagiarism, and now it is returning the favor. eSchoolNews reports that TurnItIn and Wikipedia have collaborated on a new bot which is designed to identify text in Wikipedia articles which had been copied from elsewhere.
Launched in April 2015, EranBot draws on Turnitin’s archive of millions of papers, including academic publications and journals, and scans each new edit or addition to Wikipedia for hints of copied text. Should it find a questionable edit, the bot flags the edit so it can be double-checked by a human editor.
“As an openly licensed free encyclopedia, Wikipedia respects copyright the same way traditional publishers do,” said Jake Orlowitz, head of the Wikipedia Library, the program dedicated to helping editors access reliable sources to improve Wikipedia. “Turnitin now gives us access to a more sophisticated system for flagging potential copyright violations.”
And they mean it. EranBot is only the latest tool Wikipedia editors use to check for plagiarism and copyright infringement. Wikipedia also has its own duplication detector, as well as more focused piracy detection tools that check a single article vs the web, or give the gimlet eye to the contributions of a single editor who is suspected of infringement.
Wikipedia also already had an autonomous bot like EranBot; it was called CorenSearchBot (and later, MadmanBot). According to the Wikipedia entry it is less sophisticated than Eranbot.
Turnitin boasts that their bot is more capable, and that it has the unique ability to learn (so it will only become more accurate over time). According to the, EranBot checks thousands of new edits every day. Around 100 edits are flagged for Wikipedia editors to review.
In addition to scanning Wikipedia, Turnitin is also working with the Wiki Education Foundation to check edits made by students participating in Wiki Ed’s Classroom Program. Unlike EranBot, which was built to detect infringement, this project tries to teach students the difference between citation and quotation, and how to appropriately paraphrase and use source material.
image via Wikimedia Commons