Machine-Learning Algorithm Can Rank the World’s Most Notable Authors, But Can it Identify the Most Worthwhile?

255241547_80eb1c2ea0_m[1]If it's possible to judge an author's notability based on their Wikipedia entry then Dr Allen Riddell of Dartmouth College has you covered. Earlier this month Riddell published a paper which laid out his algorithm for generating an independent ranking of notable authors for a given year. he developed it with the goal of helping Project Gutenberg and other digitization projects focus on digitizing the public domain works of the most notable authors. According to MIT Technology Review:

Riddell’s approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future. For this he uses a machine-learning algorithm to mine two databases. The first is a list of over a million online books in the public domain maintained by the University of Pennsylvania. The second is Wikipedia.

Riddell’s begins with the Wikipedia entries of all authors in the English language edition—more than a million of them. His algorithm extracts information such as the article length, article age, estimated views per day, time elapsed since last revision, and so on.

The algorithm then takes the list of all authors on the online book database and looks for a correlation between the biographical details on Wikipedia and the existence of a digital edition in the public domain.

The article goes on to say that the algorithm can also rank authors by specific categories of interest, and not just a broad ranking across the calendar year in which an author died. For example, the top-ranked female American writer is Terri Windling, the top-ranked Dutch poet, Harry Mulisch, and the top-ranked President of France is Charles de Gaulle.

You can find Riddell's website here, and his paper here (PDF).


This is a good idea, but even though Riddell says his ranking system compares well with existing rankings compiled by human experts, I still want to see a human hand in this decision.

Sometimes notability isn't the best way to judge an author's value. I was reminded of that point by one of the stories in this morning's link post. The Boston Globe profiled a small publisher who had, over the course of his career, published two Nobel prize winners:

Boston publisher David Godine likes to say he specializes in books nobody buys, and that includes the works of French writer Patrick Modiano, whose novels about memory and war earned him the 2014 Nobel Prize for Literature.

Godine found Modiano by "asking European publishers to recommend their best writers — not their best-selling writers". Modiano was relatively unknown in English before he won the Nobel Prize, and even though he has a sizable Wikipedia entry he still stands as a reminder that the obscure can be worth more than the notable.

An author who died in obscurity 50 years ago might only be known to scholars and not have a lengthy Wikipedia entry, but might have written Nobel-worthy work. But you might not know that without asking an expert, which is why I think the human touch is still required.

What do you think?

images by eurleifjohntrainor

About Nate Hoffelder (11477 Articles)
Nate Hoffelder is the founder and editor of The Digital Reader: "I've been into reading ebooks since forever, but I only got my first ereader in July 2007. Everything quickly spiraled out of control from there. Before I started this blog in January 2010 I covered ebooks, ebook readers, and digital publishing for about 2 years as a part of MobileRead Forums. It's a great community, and being a member is a joy. But I thought I could make something out of how I covered the news for MobileRead, so I started this blog."

6 Comments on Machine-Learning Algorithm Can Rank the World’s Most Notable Authors, But Can it Identify the Most Worthwhile?

  1. Hmm…noble idea, doomed to failure. Sorry, but I just can’t help but think all the bugs haven’t been worked out yet, or perhaps bookworms.

  2. I think you are confused by a very common misconception. The point of a project like this isn’t to replace human experts. The point is to augment them. At their best, machine learning efforts add to our knowledge and free up humans to do what humans enjoy.

    I haven’t looked at this paper, but I would use it as a springboard to understanding how the collective judgment of the crowd differs from the experts. That is interesting to me.

    • I think you are confused by the assumption that I thought this would only be used one way. I didn’t write that.

      • I am confused. What “human hand” do you want to see in what “decision”? Nothing in your article suggests that Riddell wants to remove “human hands” from any decision. Nor is there any reason to believe that Riddell thinks that notability is the only way to judge an author’s value. You introduced those ideas into the discussion. I think it is completely reasonable to interpret your statement as suggesting that you think “human hands” are being removed from some decision by the existence of this algorithm.

        I think I have pretty good reading comprehension. Please explain to me what I am missing. Your article looks like this to me:

        Description of algorithm to identify notable work in the public domain to help guide digitization decisions
        Claim that human judgment still needed
        Example of work not in the public domain which conflates “best selling” with “notable” and publishers with experts.
        Assertion that Wikipedia could overlook important author, therefore there is the need to keep experts involved in [digitization decisions].

        I admit that the bracketed part is an assumption on my part, but what else could you possibly be talking about? Look at the title of your article. If you weren’t writing about this algorithm replacing human judgment, what were you writing about?

        I didn’t assume you thought this algorithm would only be used one way. I assumed that you thought that one of the ways it would be used would be to remove human judgment. I commented that the purpose of the experiment wasn’t to remove human judgment, but to augment it. As is true for all tools, it is always possible for people to misuse machine learning algorithms. Why jump to the conclusion that this tool will be misused?

        • I just wanted to write about the notability algorithm and contribute something new to the conversation: that it can be defeated by obscurity. And since I had that other post which profiled a publisher who specilized in the obscure, I thought there was a connection. Okay, it was tenuous, but I thought it was there.

    • It’s alright, the computer looked at it for you. You were freed-up for more important things, remember?

Leave a comment

Your email address will not be published.