Towards a Google Scholar API

A while back I begged Google to come up with an API (Application Programming Interface) for Google Scholar. With the many possibilities for the Google Maps API they practically set the standard for APIs. For the library world they lived up to their promise when they launched the Google Books API back in 2008. But for Google Scholar Google has never delivered an API.

Nearly a Google Scholar API

A less well document feature of Google Scholar is that you can look up information in Google Scholar for a specific article using the DOI of that article.Google Scholar DOI lookup
This search query in Google Scholar with the full DOI returns exactly one result. If you had carried out a title search for this article, Google Scholar had returned 23 results. With the correct article at the top. That’s true. But a title search would take Google Scholar about 0.04 second, whereas a DOI lookup did only take 0.01 second. In many other title queries the time difference is in the order of hundred times slower

Playing around with the DOI in Google Scholar you can retrieve some more interesting results. The citing articles based on the DOI (which is implemented in the Plos article metrics page). The versions or document cluster of an article (useful to identify OA versions of an article) is also a direct query on the basis of the DOI. Unfortunately I don’t see how you can get to the related articles using the DOI (Any suggestions are welcome in the comments). To get the related articles you need an internal Google Scholar article number. My conclusion of these examples is that Google Scholar has already a mechanism in place that can form the backbone of an API. Using the DOI to look up an article in Google scholar resolves in most cases very quick, and precise. Only in a few instances I have come across examples that Google Scholar was in error. Most often for editorial material, or corrections and in some instances when a version in an Open Access repository actually interfered with this mechanism.

It works with partially with ISBN

As long as books and reports have an ISBN assigned to them the item is also possible to retrieve exactly one result based on the ISBN, eg ISBN 9022010007 but playing around with the citations or the document cluster is not directly possible on the basis of this ISBN. On the basis of the ISBN query results it looks like that it should be possible that Google is close to some useful functionality in this area as well.

A monetizing model for Google Scholar

I can imagine that publishers or repositories would really like to make use of functionalities like these from Google Scholar. For publishers and repositories it would be valuable to show the citation data, to link to related items or look up Open Access versions in the document clusters around an article. The announced integration of Google Scholar and Web of Science data for Web of Science customers is a sign that Google is willing to share data. There is likely to be some money involved in this deal as well. I wonder if Google is willing to strike up similar deals with other publishers. PLoS journals are a good example where they are actually very close to using this information from Google Scholar as well. They only don’t dare to screen scrape the information they really want. And need. Currently they only link out. Altmetric data providers as Plum analytics and Altmetric are other partners that are possibly interested to integrate this kind of information from Google Scholar in their metrics dashboard, in the end their customers pay a price for this data integration.

Why am I suggesting a monetizing model for Google Scholar. Currently Google Scholar still seems to be the (important) pet project of Anurag Acharya. It is not included in the major services offered in the Google spine. It looks like Google is not earning any money with it. So Google Scholar is at the risk of to be taken down just next week, following the examples of Google Reader or iGoogle, where no monetizing models were available either. If there is a fair earning model for Google Scholar I do hope it will increase the sustainability of Google Scholar, and that we get new exiting data sources to do research with.