How Google Scholar Citations passes the competition left and right

Google Scholar logoLast Thursday Google Scholar Citations went public. It was to be expected. Since August the product has been tested by a few (blogging) scientists. We only had to wait patiently for it to be released to all scientists. Last Thursday the moment was there.

Was it worth the wait? Yes it certainly was. Google Scholar Citations really excels at finding publications you completely forgot about. But even then, there are still –obscure- publications that even Google Scholar doesn’t know about. You simply log in and deselect those few publications that don’t belong to you. You can make searches to find publications that Google has overlooked. You get a comprehensive publication list quite quickly. Well when your name is not too common, that is. How it works for very common names, Korean scientists jump to my mind as well as John Smith, I don’t know yet. But so far nothing new, Ann-Will Harzing’s excellent Publish or Perish software already did this. What is new is the fact that Google Scholar Citations keeps the citations and publications automatically up to data and allows you to publish your own publication list on the Web with the citations and some crude citations metrics.

The two major competitors in this arena are Thomson Reuters with their ResearcherID and Elsevier’s Scopus which has their Scopus ID. With both services you can identify your own publications and assign them to a unique number. IN this way you can create your unique publications list with citation metrics as well. The main disadvantage compared to Google Scholar is their rather limited resource set. Thomson Reuters WoS “only” covers some 10,000 scholarly journals a set of selected proceedings and of recent only 30,000 books. Scopus has nearly double the number of journals but stays behind in proceedings and covers hardly any books. Google Scholar certainly covers more, but we still don’t understand what is included and what not and sometimes have our doubts about currentness of Google Scholar. The larger resource base, including books and book chapters, of Google Scholar makes will make this service more attractive for social scientist and scholars in arts and humanities studies.

On top of the smaller publication base on which these services are based, these two competitors each have their own particular disadvantage as well. You have to maintain you publications list in Thomson Reuters Researcher ID yourself manually. Each time you publish a new article, you have to add it to your profile yourself. Looking around, I see that most researchers are a bit sloppy in this respect. You can however, make your publication list and the citation impact publically available. see for example my meagre list. Scopus on the other hand, maintains your publication list automatically (albeit it made some serious mistakes in this area in the past, but they seem to have improved this service). But, and this is a big but, you can’t publish you properly curated publication list with citations publically on the Web. They used to have 2Collab for this, but since they stopped 2Collab they haven’t come up with an alternative mechanism to publish your publications list with citation impact on a public website. A real pity.

So Google Scholar easily beats ResearcherID since it updates automatically and Scopus ID because you can make your list with citations publically available. To make your publication list openly available is really recommended to all scientists, it helps your personal branding.

Certainly there are disadvantages to Google Scholar aswell. The most serious at this moment all kind of ghost citations. If you look at the citations to our bibliometrics analysis on top of repositories paper, Google counts three citations. But checking the Leydesdorff citations, a reference to our article is not to be found (of course it should have been there, but it isn’t). 0xDE reported a spam account in the name of Peter Taylor, where they collected various Taylors in a single profile boasting an h-index of 94. That Google Scholar can be fooled has been reported Beel & Grip (2010).

When I was interviewed for our university paper on Google Scholar Citations (in Dutch) I told them: Google Scholar is only about five years old. Give them another five years and they will have changed the market for abstracting and indexing database totally. If only 20 percent of all scientists make their publication lists correct (also editing of the references which can be done to improve the mistakes Google has made) even without making them publically available, Google sits on a treasure trove of high quality metadata. Really interesting to see how this story will develop.

Joeran Beel and Bela Gipp. Academic search engine spam and google scholar’s resilience against it. Journal of Electronic Publishing, 13(3), December 2010.

How Google could help the Open Access world a little

It was back in 2008 when Google Scholar launched the feature that identified free available versions of articles of the Web. In the early days these were indicated by green triangles in front of the reference. Nowdays free available copies are listed in the right hand column. Many of these versions are Open Access versions of articles properly submitted to preprint servers and subject or institutional repositories. Other free versions of the papers identified by Google Scholar are publishers versions of articles posted to personal websites, dropboxes and you name it. Whatever the rights are, if you need a copy of these papers, and don’t have access through your universities library subscriptions, this Google Scholar feature is a very useful tool. In scholarly search classes I always stress this very useful feature of Google Scholar to my students.

In our institution’s bibliography I would love to include a functionality to refer for each article to the so called document clusters in Google Scholar. Consider the following publication the link to the full text included in the record leads you to Science Direct. Whether you can access the paper on SD, depends on the subscriptions. Sometimes you can’t. Therefore it would be nice if we could include a link to the document cluster in Google Scholar. For this paper you get some 29 versions of the paper, but above all 6 of these are free versions of this paper posted on various websites. That’s really helpful.

In AgrisWeb, I learned from Johannes Keizer yesterday, that they link to Google trough a search for the title words. This works quite well, but it could be done better.

Consider the idea that Google Scholar had an API. If we could query that API on the basis of the DOI or PMID or ISSN in combination with volume, issue and pages or any other combination of standard bibliographic metadata. Yes, something like an openURL. And GoogleScholar would only return the correct Google Scholar ID for that article -that number 12564475196117890153 in the link- we could construct various links. Linking to the Google Scholar document cluster is one. Retrieving the Google Scholar citations is another.

Google doesn’t like metadata too much is an often heard argument. But the Google Books API works swell with ISBN numbers, OCLC numbers or LOC numbers. That API is talking metadata. Libraries are massive stores of metadata. So Anurag Acharya please. The pleas for a Google Scholar API are abound. Mostly for retrieval of citations, but for the OA movement those document clusters are really more important! Perhaps you could launch this Google Scholar API as a present for the Open Access week coming up in October?

Student’s expectation of databases

A Swedish research project om a comparison on students search behaviour for information with Google Scholar and Metalib concluded with “The study concludes that overall, students were not very satisfied with either tool“. I could leave it at that, but there was this really important paragraph towards the end that concluded:

Our study showed that almost half of searches launched in Metalib by users without training resulted in 0 results and a large part of the reason was the expectation that a search was Google-like in nature, in other words keyword searching with quotation marks used to indicate a phrase. Instead Metalib often uses a default phrase search. The result is a disaster. Libraries need to work with Libris and Fujitsu to do whatever possible to change this discrepancy between student expectations and search rules in Metalib otherwise the product will remain seriously flawed. [Emphasis added]

We, at our library, have to give this conclusion really serious attention. Databases like Scopus and nowadays WoK as well have adapted themselves to become Google like. Our own catalog however, is more Metalib like.

Reading tip: the report consists for more than half of appendices.

Hattip: Nicole C. Engard

Nygren, E., G. Haya & W. Widmark (2007). Students experience of Metalib and Google Scholar. Stockholm, Sweden, Stockholm university library. 158 pp.