Automatic Aggregators Can Regurgitate Old News

imageNews aggregators like Zite are very useful. I use Zite from time to time myself when I’m in need of stories to blog. But there’s a danger in this app that’s not always obvious.

Today I ran across a story entitled “Amazon Pulls Thousands of E-Books in Dispute”, from a blog entitled “ebook-reader-vergleich.org”. (“Vergleich” is German for “comparison”.) The story talked about Amazon pulling thousands of independently-published books from the Independent Publishing Group over a contract dispute—something that, since I was watching when it happened, I know actually took place months ago, and was resolved a couple of months later. But here is Zite, presenting it as new news.

The reason, of course, is that this ebook-reader-vergleich blog is simply a plagiarist content farm, scooping up articles and using them as search-engine optimization link fodder. (Which is why I’m not linking directly to the site here.) In this case, it snagged a six-month-old Associated Press article—but is presenting it as new news. The dateline at the bottom specifically says August 21st, 2012. And whatever algorithms Zite uses for assigning relevance and importance to articles twigged onto this one.

If I hadn’t known about this story being old, I might have been tempted to believe it was new, and blog it accordingly. I have blogged old stories from Zite as new in the past, in fact, not knowing they were old—not so much because they were from content farms, but because they were posted on their original source a bit more than one, two, or more years ago but without the year in their dateline: for example, a post dated “August 21” instead of “August 21, 2011”—so Zite assumed they were recent.

A human might have caught these glitches, or noticed that “vergleich” article’s source had very little credibility. But Zite’s algorithm doesn’t have that. So it’s worth remembering that when we place our trust in completely automatic news aggregators, we open the door for false positives. We need to bear this in mind all the more as more of these aggregators pop up and become the go-to sources for online news reading.

About Chris Meadows (90 Articles)
Chris Meadows, Editor of TeleRead, has been writing about e-books and mobile devices since 1999: first for ThemeStream, later for Jeff Kirvin's Writing on Your Palm, and then for TeleRead starting in 2006. He has also contributed a few articles to The Digital Reader along the way. Chris has bought e-books from Peanut Press/eReader, Fictionwise, Baen, Barnes & Noble, Amazon, the Humble Bundle, and others. He is a strong believer in using Calibre to keep his library organized.

3 Comments on Automatic Aggregators Can Regurgitate Old News

  1. that is terrible!! i will remember that from now on. google reader does that to me too.

  2. Ah. Algorithms and scraping. Old news is one thing. Finding your Flipboard filled with chatbox ramblings instead of actual articles is another.

    https://twitter.com/caesuras/status/237583464600309760

Leave a comment

Your email address will not be published.


*