Machine-Learning Algorithm Can Rank the World’s Most Notable Authors, But Can it Identify the Most Worthwhile?
Earlier this month Riddell published a paper which laid out his algorithm for generating an independent ranking of notable authors for a given year. he developed it with the goal of helping Project Gutenberg and other digitization projects focus on digitizing the public domain works of the most notable authors.
According to MIT Technology Review:
Riddell’s approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future. For this he uses a machine-learning algorithm to mine two databases. The first is a list of over a million online books in the public domain maintained by the University of Pennsylvania. The second is Wikipedia.
Riddell’s begins with the Wikipedia entries of all authors in the English language edition—more than a million of them. His algorithm extracts information such as the article length, article age, estimated views per day, time elapsed since last revision, and so on.
The algorithm then takes the list of all authors on the online book database and looks for a correlation between the biographical details on Wikipedia and the existence of a digital edition in the public domain.
The article goes on to say that the algorithm can also rank authors by specific categories of interest, and not just a broad ranking across the calendar year in which an author died. For example, the top-ranked female American writer is Terri Windling, the top-ranked Dutch poet, Harry Mulisch, and the top-ranked President of France is Charles de Gaulle.
This is a good idea, but even though Riddell says his ranking system compares well with existing rankings compiled by human experts, I still want to see a human hand in this decision.
Sometimes notability isn’t the best way to judge an author’s value. I was reminded of that point by one of the stories in this morning’s link post. The Boston Globe profiled a small publisher who had, over the course of his career, published two Nobel prize winners:
Boston publisher David Godine likes to say he specializes in books nobody buys, and that includes the works of French writer Patrick Modiano, whose novels about memory and war earned him the 2014 Nobel Prize for Literature.
Godine found Modiano by "asking European publishers to recommend their best writers — not their best-selling writers". Modiano was relatively unknown in English before he won the Nobel Prize, and even though he has a sizable Wikipedia entry he still stands as a reminder that the obscure can be worth more than the notable.
An author who died in obscurity 50 years ago might only be known to scholars and not have a lengthy Wikipedia entry, but might have written Nobel-worthy work. But you might not know that without asking an expert, which is why I think the human touch is still required.
What do you think?