Towards a Google Scholar API

A while back I begged Google to come up with an API (Application Programming Interface) for Google Scholar. With the many possibilities for the Google Maps API they practically set the standard for APIs. For the library world they lived up to their promise when they launched the Google Books API back in 2008. But for Google Scholar Google has never delivered an API.

Nearly a Google Scholar API

A less well document feature of Google Scholar is that you can look up information in Google Scholar for a specific article using the DOI of that article.Google Scholar DOI lookup
This search query in Google Scholar with the full DOI returns exactly one result. If you had carried out a title search for this article, Google Scholar had returned 23 results. With the correct article at the top. That’s true. But a title search would take Google Scholar about 0.04 second, whereas a DOI lookup did only take 0.01 second. In many other title queries the time difference is in the order of hundred times slower

Playing around with the DOI in Google Scholar you can retrieve some more interesting results. The citing articles based on the DOI (which is implemented in the Plos article metrics page). The versions or document cluster of an article (useful to identify OA versions of an article) is also a direct query on the basis of the DOI. Unfortunately I don’t see how you can get to the related articles using the DOI (Any suggestions are welcome in the comments). To get the related articles you need an internal Google Scholar article number. My conclusion of these examples is that Google Scholar has already a mechanism in place that can form the backbone of an API. Using the DOI to look up an article in Google scholar resolves in most cases very quick, and precise. Only in a few instances I have come across examples that Google Scholar was in error. Most often for editorial material, or corrections and in some instances when a version in an Open Access repository actually interfered with this mechanism.

It works with partially with ISBN

As long as books and reports have an ISBN assigned to them the item is also possible to retrieve exactly one result based on the ISBN, eg ISBN 9022010007 but playing around with the citations or the document cluster is not directly possible on the basis of this ISBN. On the basis of the ISBN query results it looks like that it should be possible that Google is close to some useful functionality in this area as well.

A monetizing model for Google Scholar

I can imagine that publishers or repositories would really like to make use of functionalities like these from Google Scholar. For publishers and repositories it would be valuable to show the citation data, to link to related items or look up Open Access versions in the document clusters around an article. The announced integration of Google Scholar and Web of Science data for Web of Science customers is a sign that Google is willing to share data. There is likely to be some money involved in this deal as well. I wonder if Google is willing to strike up similar deals with other publishers. PLoS journals are a good example where they are actually very close to using this information from Google Scholar as well. They only don’t dare to screen scrape the information they really want. And need. Currently they only link out. Altmetric data providers as Plum analytics and Altmetric are other partners that are possibly interested to integrate this kind of information from Google Scholar in their metrics dashboard, in the end their customers pay a price for this data integration.

Why am I suggesting a monetizing model for Google Scholar. Currently Google Scholar still seems to be the (important) pet project of Anurag Acharya. It is not included in the major services offered in the Google spine. It looks like Google is not earning any money with it. So Google Scholar is at the risk of to be taken down just next week, following the examples of Google Reader or iGoogle, where no monetizing models were available either. If there is a fair earning model for Google Scholar I do hope it will increase the sustainability of Google Scholar, and that we get new exiting data sources to do research with.

The week in review – Week 7, 2014

Some intermittent blogging activity interfered with the intended schedule of weekly updates of reviews. My apologies for being late. My apologies for the burden in this feed.

On the downloaded articles front, not so much was added to my library:
Fenner, M., & J. Lin. 2014. Novel Research Impact Indicators. LIBER Quarterly, 22. OA version:
Strauss, N. S. 2011. Anything but Academic: How Copyright’s Work-for-Hire Doctrine Affects Professors, Graduate Students, and K-12 Teachers in the Information Age. Richmond Journal of Law & Technology 18(1): 1-47. OA version:

The saves on twitter were many. Here follows my selection:











The numrange operator in Google and Google Scholar

Google allows you to search for numbers within a specific range, eg [stonewashed jeans $20..$30]. As indicated in the example the search is for a price range. That is also the origin of this operator. It was probably first developed for Google Catalogs (now a retired service). In the ordinary Google it is still available. Well hidden in the advanced search form.


The numrange operator works fine for many purposes.
[“mountain bike” $500..$800]
[“Russian revolution” 1900..1920]
[“Theobroma cacao” 2010..2014]
The last example hints on the retrieval of objects on cocoa between the (publication) years 2010 and 2014. Whereas the Russian revolution guesses the years the event took place.

It doesn’t work in Euros

So far it seems fine. But it doesn’t work for Euros.
[“mountain bike” €500..€800]
That probabaly has something to do with the character set. Nor for Pound Sterling [“mountain bike” £500..£800]. Albeit it doesn’t search for Pound sterling or Euros, it does return any number range.

Use three dots

The other problem with the numrange is that it doesn’t work for large figures. Search [water 988650..988700] fails. However, if you use three dots instead of the two dots, it works fine [water 988650…988700]
The other examples work with three dots as well as with two dots.
[“mountain bike” $500…$800]
[“Russian revolution” 1900…1920]
[“Theobroma cacao” 2010…2014]

So the quick conclusion is to use the tree dots rather than the two dots. Hattip for the three dots goes to @Henkvaness in his book Google Code.

Numrange operator in Google Scholar

In Google Scholar the numrange operator doesn’t work. Well that was my experience which I blogged yesterday in my Google Scholar blogpost. The numrange operator works for researchers searching for publications in the first place as a quick way to limit the results to a range of publication years. Google Scholar facilitates this trough the advanced search form or after a search action trough the facets in the search engine results page. But in the default Google Scholar search box the numrange doesn’t work for publication year ranges. Not with two dots [“Theobroma cacao” 2010..2014] nor with three dots [“Theobroma cacao” 2010…2014].

But Henk van Ess reacted on my slideshare “Google Scholar : Google for research” yesterday in the commments that the numrange work in Google Scholar. A little toying around. It works fine indeed for range that are not likely to be publication years. A search with three [“Theobroma cacao” 10…14] or two [“Theobroma cacao” 10..14] works indeed. But as soon as you come near a year range it doesn’t [“Theobroma cacao” 1800…1850].

If you want to search for year ranges in Google Scholar you have to do it through the advanced search form. Or use the more complicated url parameters as_ylo and as_yhi

van Ess, H. 2009. De Google code. Amsterdam: Pearson Education. ISBN 9789043019088 136p.

Google Scholar : Google for research

Or super search tips for researchers and students how to use Google Scholar more efficiently. The embedded Slideshare presentation and this blog post will be kept up to date and in sync. And which is more interesting, all inks or examples in this Slidehare presentation are clickable, so you can see what I mean.

The following scholarly super search tips are an explanation for the embedded slideshare presentation.

You can use, and should use, the usual Google shortcuts. The ones listed in this slide are the most important ones. Search for [“phrase searching”] to keep the words together. Search for specific file types with the ext: (or filetype:) operator. Limit searches to specific parts of the www with the site: operator. Search for the specific words in the title with the allintitle: (or intitle:) operator. Use the OR operator to include synonyms of certain search terms. Exclude specific terms with the sign. And last, but not least combine all these operators. A few more tips like these can be found in the post “Google better with Google

An important Google operator that you can’t use in Google Scholar is the numerical range operator (numrange). The three … (dots) connecting two figures. In Google Scholar you even get a warning that the numrange operator isn’t working when you make use of it. Instead of the numrange operator the facet for publication years is extremely important in Google Scholar.

But before you’re using Google Scholar on a regular basis, turn to the search engine settings. There are three tabs that need a little tuning to optimize Google Scholar for you purposes. In the first tab you should selected the twenty search results per page, and that they open in a new tab/window. And select your preferred bibliography (reference) manager here. In case you use Mendeley, you get the best results when selecting Reference Manager as preferred bibliography manager. In the second tab you can select the language of the interface as well as the search results. It is not recommended to select search results in a single language only. In the last tab you can select the Library links that should be shown. When you are on campus, this is normally selected automatically, but especially when you’re off campus it is recommended to select the appropriate library access that you have to connect to more content directly.

The Google advanced search options are currently hidden behind the small triangle in the search box. You only need that for a few a few type of searches.

At the beginning you might like to use the advanced search form to search for authors. But soon you learn that a search for an author actually translate into the author: operator, eg [author:”KE Giller”] in the Google Scholar search box. If you want to search for the oeuvre of two authors the Advaced search form already fails, you have to do that trough the normal search box [author:”R Leemans” OR author:”KE Giller”]. The second useful option in the advanced search form is the possibility to search for articles in a certain journal. This option doesn’t translate back into a neat operator in standard search box. But in the url you can see what actually happens and you can see that it translates in as_publication= in the url The years option in the advanced search form can be used here, but also after an initial search through the facets. That is what I normally prefer.

The ranking of the search results is heavily influenced by the citations to the articles found. The consequence of this influence of citations on the ranking of the results is that most often older material is at the top of the results page. It is therefore of utmost importance to use the year range option in the advanced search screen or the year range option in facets to select more recent results rather than heavily cited older material found at the top of the results page. When searching for recent results the standard ranking in Google Scholar is counterproductive and you have to make use of the year ranges.

Google Scholar searches for less word variants than the big Google does. There is no verbatim search needed as in the big Google, but “phrase” quotes around a single word still works to search specifically for a single word. Another interesting gem is that the tilde operator still functions in Google Scholar to search for a keyword and its synonyms (hattip @wichor). Something I come across quite a lot amongst experienced search is the use of parentheses, but unfortunately these don’t work in Google Scholar (or the big Google).

Looking into more detail to the search results the snippet of the search results is surrounded by many options. In the first place a clear indication of Open Access versions is indicated in the last column of search engine results page. With the save option you can add the result to the Google Scholar library (not connected to the Google Books Library). Under the Cite option you find three different options to which the reference can be formatted, APA, MLA or Chicago. In combination with the versions option, you can come to a complete reference for to use in your reference list. The import option lets you export the reference to your bibliography management software, such as EndNote, Refworks etc. It only allows you to do it one at the time. The versions tab is useful to locate other full text versions (eg. better scanning quality). In combination with the cite option you can also get properly formatted references. The last options, related articles and Cited by allows you to further search for information based on a useful search. The exact algorithm behind the related search option has not been published or studied and reported widely in the literature.

In Google Scholar it is really easy to initiate search alerts. You only have to be aware of the fact that for a standard search in Google Scholar you are allowed to use 256 characters for a search query, but for an alert the limitation is 100 characters (Barely sufficient for a proper search query). On top of the search alerts, you can receive updates based on your articles in your my citations profile.

On the quality of Google Scholar as a comprehensive search engine for researchers the last word has not been spoken yet. In terms of coverage it is probably larger than any other academic database or search engine. However still not all scholarly sources, such as OA repositories are fully indexed. The big Google index still finds OA resources not indexed in Google Scholar. For systematic reviews Google Scholar is a good addition to the range of databases to search. Metadata quality is still something that needs improvement, as well as the disambiguation of articles and authors. The version function sometimes helps with finding the proper metadata for a reference. The announced coupling to Web of Science should really a big plus in this area.

The week in review – Week 6, 2014

Just added a few bits and pieces to read which I added to my library last week.

Frosio, G. 2014. Open Access Publishing: A Literature Review. CREATe Working Paper Vol. 2014/1. 219 pp.

and article I had missed last year:
Piwowar, H. A. & T. J. Vision. 2013. Data reuse and the open data citation advantage. PeerJ, 1: e175.

An interesting read showing that Peer Review for of grant applications is not always capable of selecting the best proposals
Danthi, N., C. O. Wu, P. Shi & M. S. Lauer. 2014. Percentile Ranking and Citation Impact of a Large Cohort of NHLBI-Funded Cardiovascular R01 Grants. Circulation Research.