Towards a Google Scholar API

A while back I begged Google to come up with an API (Application Programming Interface) for Google Scholar. With the many possibilities for the Google Maps API they practically set the standard for APIs. For the library world they lived up to their promise when they launched the Google Books API back in 2008. But for Google Scholar Google has never delivered an API.

Nearly a Google Scholar API

A less well document feature of Google Scholar is that you can look up information in Google Scholar for a specific article using the DOI of that article.Google Scholar DOI lookup
This search query in Google Scholar with the full DOI returns exactly one result. If you had carried out a title search for this article, Google Scholar had returned 23 results. With the correct article at the top. That’s true. But a title search would take Google Scholar about 0.04 second, whereas a DOI lookup did only take 0.01 second. In many other title queries the time difference is in the order of hundred times slower

Playing around with the DOI in Google Scholar you can retrieve some more interesting results. The citing articles based on the DOI (which is implemented in the Plos article metrics page). The versions or document cluster of an article (useful to identify OA versions of an article) is also a direct query on the basis of the DOI. Unfortunately I don’t see how you can get to the related articles using the DOI (Any suggestions are welcome in the comments). To get the related articles you need an internal Google Scholar article number. My conclusion of these examples is that Google Scholar has already a mechanism in place that can form the backbone of an API. Using the DOI to look up an article in Google scholar resolves in most cases very quick, and precise. Only in a few instances I have come across examples that Google Scholar was in error. Most often for editorial material, or corrections and in some instances when a version in an Open Access repository actually interfered with this mechanism.

It works with partially with ISBN

As long as books and reports have an ISBN assigned to them the item is also possible to retrieve exactly one result based on the ISBN, eg ISBN 9022010007 but playing around with the citations or the document cluster is not directly possible on the basis of this ISBN. On the basis of the ISBN query results it looks like that it should be possible that Google is close to some useful functionality in this area as well.

A monetizing model for Google Scholar

I can imagine that publishers or repositories would really like to make use of functionalities like these from Google Scholar. For publishers and repositories it would be valuable to show the citation data, to link to related items or look up Open Access versions in the document clusters around an article. The announced integration of Google Scholar and Web of Science data for Web of Science customers is a sign that Google is willing to share data. There is likely to be some money involved in this deal as well. I wonder if Google is willing to strike up similar deals with other publishers. PLoS journals are a good example where they are actually very close to using this information from Google Scholar as well. They only don’t dare to screen scrape the information they really want. And need. Currently they only link out. Altmetric data providers as Plum analytics and Altmetric are other partners that are possibly interested to integrate this kind of information from Google Scholar in their metrics dashboard, in the end their customers pay a price for this data integration.

Why am I suggesting a monetizing model for Google Scholar. Currently Google Scholar still seems to be the (important) pet project of Anurag Acharya. It is not included in the major services offered in the Google spine. It looks like Google is not earning any money with it. So Google Scholar is at the risk of to be taken down just next week, following the examples of Google Reader or iGoogle, where no monetizing models were available either. If there is a fair earning model for Google Scholar I do hope it will increase the sustainability of Google Scholar, and that we get new exiting data sources to do research with.

How Google could help the Open Access world a little

It was back in 2008 when Google Scholar launched the feature that identified free available versions of articles of the Web. In the early days these were indicated by green triangles in front of the reference. Nowdays free available copies are listed in the right hand column. Many of these versions are Open Access versions of articles properly submitted to preprint servers and subject or institutional repositories. Other free versions of the papers identified by Google Scholar are publishers versions of articles posted to personal websites, dropboxes and you name it. Whatever the rights are, if you need a copy of these papers, and don’t have access through your universities library subscriptions, this Google Scholar feature is a very useful tool. In scholarly search classes I always stress this very useful feature of Google Scholar to my students.

In our institution’s bibliography I would love to include a functionality to refer for each article to the so called document clusters in Google Scholar. Consider the following publication the link to the full text included in the record leads you to Science Direct. Whether you can access the paper on SD, depends on the subscriptions. Sometimes you can’t. Therefore it would be nice if we could include a link to the document cluster in Google Scholar. For this paper you get some 29 versions of the paper, but above all 6 of these are free versions of this paper posted on various websites. That’s really helpful.

In AgrisWeb, I learned from Johannes Keizer yesterday, that they link to Google trough a search for the title words. This works quite well, but it could be done better.

Consider the idea that Google Scholar had an API. If we could query that API on the basis of the DOI or PMID or ISSN in combination with volume, issue and pages or any other combination of standard bibliographic metadata. Yes, something like an openURL. And GoogleScholar would only return the correct Google Scholar ID for that article -that number 12564475196117890153 in the link- we could construct various links. Linking to the Google Scholar document cluster is one. Retrieving the Google Scholar citations is another.

Google doesn’t like metadata too much is an often heard argument. But the Google Books API works swell with ISBN numbers, OCLC numbers or LOC numbers. That API is talking metadata. Libraries are massive stores of metadata. So Anurag Acharya please. The pleas for a Google Scholar API are abound. Mostly for retrieval of citations, but for the OA movement those document clusters are really more important! Perhaps you could launch this Google Scholar API as a present for the Open Access week coming up in October?

The changing face of Elsevier Science

The last couple of days I had the pleasure to attend the Elsevier Development Partners meeting. The exact products they are working on might be of interest to some people, but that’s up to Elsevier to announce. But what was really the big surprise at this meeting -which lasted 3 days- was the tone from Elsevier. It was all about open Science. They clearly wanted to open up. There was a lot of talk about sharing information, making mash-ups possible, Application programming Interfaces (API). Elsevier Science wanted to move away from the double barred information silo to become an open solution provider in the scholarly world. If Elsevier is thinking and acting in this direction, then change will become a major issue for the entire scientific publishing industry and that is good news for libraries who want to remain a vital service in the future as well.

This change will take time. It doesn’t happen overnight. But Raphael Sidi just announced the other day on his blog the Elsevier Article API at the programmable Web. So, Elsevier is not only talking, they are acting up on it as well.

Let other publishers follow this example!