Google Scholar profiles of Dutch Universities

Early July I checked the Google Scholar uptake at Dutch Universities. Whilst collecting the data I also noted the total number of citations for each tenth scientist with a Google Scholar profile per university. To make sense of the results I plotted them as logarithmic values against rank number of the researcher when ordered on descending total number of citations.

Total number of citations per researcher according to Google Scholar
Total number of citations per researcher according to Google Scholar

Due to the uneven uptake of Google Scholar profiles per university it is too early to draw any firm conclusions on these profiles. But at face values three groups can be distinguished: Delft, Utrecht, Wageningen, UvA, RUG and VU showing very similar profiles. The middle group with Twente, Radboud Nijmegen, Eindhoven, Erasmus Rotterdam and Leiden University. Maastricht, Tilburg and the Open University close the ranks.

Looking carefully at the leading group. Utrecht has the most researchers high number of tatal citations. Wageningen takes over from around the total citations of 500 per researchers and Delft from around 150 total citations per researcher.

I can imagine when Google Scholar profiles become more established, these kind of graphs can be used in yeat another university ranking. The number of total citations under the graphs gives some indication of the publication success of the university staff at those universities.

Mapping the influence of humanities

David Budtz Pedersen  presented a new research proposal undertaken in Denmark, Mapping the Public Influence of the Humanities, with the aim to map the influence, meaning and value of the humanties in Denmark. His blogpost on the Impact Blog about this project generated a lot of attetion already. Even in the holiday season.

What struck me however, is that the project starts with collecting data from three different sources:

  1. names and affiliations of active scientific experts in Denmark;
  2. by documenting the educational background and research profile of the population;
  3. by downloading and collecting the full corpus of reports, whitepapers, communications, policy briefs, press releases and other written results of Danish advisory groups from 2005 to 2015.

It was the third objective of Buts Pedersen’s project that grabbed my attention: collecting the full corpus of reports, whitepapers, communications, policy briefs, press releases and other written results of Danish advisory groups from 2005 to 2015. That in the country where Atira, the makers of Pure, reside (I know currently wholly owned by Elsevier). It struck a chord with me since this is exactly what should have been done already. Assuming the influence of the humanities is a scholarly debate, all universities contributing to this debate should have an ample filled current research information systems (CRIS) filled with exactly those reports, whitepapers, communications, policy briefs, press releases et cetra.

In this post I want to concentrate on the collection side, assuming that all material collected in the CRIS is available online and free for the public at large to inspect, query and preferably -but not necesarrily- free to download. Let’s look at the collection side for a moment. Most CRIS have all kind of database coupling possibilities with major (scholarly) bibliographic databases: Web of Science, Scopus, Pubmed, Worldcat, CrossRef etc. However, those reports, whitepapers, communications, policy briefs, press releases and other written results are not normally contained in these bibliographic databases. These are the so called grey literature. Not formally published. Not formally indexed. Not easily discovered. Not easily found. Not easily collected. To collect these materials we have to ask and beg researchers to dutifully add these manually in the university CRIS.  That is exactly why universities have bought into CRIS systems. Why libraries are the ideal candidate to maintain CRIS systems. The CRIS system takes away the burden of keeping track of the formal publications through coupling with the formal bibliographic databases. Librarians have knoweldge about all these couplings and search profiles required to make life easy for the researchers. That should leave some time for researchers to devote a little of their valuable time on those other more esoteric materials. Especially in the humanities, where we apparently have more of those grey literature.  A well maintained CRIS should have plentiful of these materials registered. So I was taken aback slightly that this project in Denmark, the cradle of a major CRIS supplier, needs to collect these materials from the start. They should have been registered long time ago already. That is where the value kicks in of a comprehensive, all output inclusive CRIS, resulting in a website with a comprehensive institutional bibliography.

Just a second thought. It is odd to see that two of the major providers of CRIS systems, Thomson Reuters with Converis and Elsevier with Pure are both providers of major news information sources. It is odd that neither of these CRIS products have coupling with the proprietary news databases either Reuters or LexisNexis for press clipping and mentios in the media. From a CRIS managers’ point of view strange to make this observation since we are dealing with the same companies. But the internal company structures seem to hinder these kind seemingly logical coupling of services.

 

Research data information literacy and digital literacy

Following the blogpost of Yasmeen ShorishData, data everywhere…but do we want to drink? The role of data, digital curation, and scholarly communication in academic libraries.” Got me thinking on the curriculum of information literacy in academic libraries. Shorish:

This means that academic libraries must incorporate the work of data information literacy into their existing information literacy and scholarly communication missions, else risk excluding these data librarian positions from the natural cohort of colleagues doing that work, or risk overextending the work of the library.

Information literacy is one of the core activities of information specialists, but usually only aimed at students, ideally graduate students as well and perhaps post grads, but certainly not the researchers or teaching faculty of the institution. Including  research data management under the umbrella of information literacy reinforces the position of library information specialists and bring their complete information literacy offereings under the attention of faculty as well. The data literacy skills help to “sell” information literacy to faculty as well.

Some information specialists might be caught off guard by the new required skills set mentioned by Shorish “experience with SPSS, R, Python, statistics and statistical literacy, and/or data visualization software find their way into librarian position descriptions“. This brings me to the third aspect of information literacy, I would broaden this to the digital skills set, or digital literacy as mentioned in NMC Horizon report 2015. But exactly in this part of the report  research libraries are not mentioned. Undeservedly so in my opinion.

So here we have a task at hand. Quite a large one, if you ask me, but doable. Break out of the shackles of the classical forms of information literacy, include research data management in these courses, or curricula as well and work towards digital literacy courses.

Document types in databases

Currently I am reading with a great deal of interest the Arxiv preprint of the review by Ludo Waltman (2015) from CWTS on bibliometric indicators. In this post I want to provide a brief comment on his section 5.1 where he discusses the role document types in bibliometric analyses. Ludo mainly reviews and comments on the inclusion or exclusion of certain document types in bibliometric analyses, he does not touch upon the subject of discrepancies between databases. I want to argue that he could take his review a step further in this area.

Web of Science and Scopus, differ quite a bit from each other on how they assign document types. If you don’t realize that this discrepancy exists, you can draw wrong conclusions when bibliometric analyses between these databases are studied.

This blogpost is a quick illustration that this is an issue that should be adressed in a review like this. To illustrate my argument I looked to the document types assigned to Nature publications from 2014 in Web of Science and Scopus. The following tables gives an overview of the results:

Document Types Assigned to 2014 Nature Publications in Web of Science and Scopus
WoS Document type # Publications Scopus Document type # Publications
Editorial Material 833 Editorial 244
Article 828 Article 1064
Article in Press 56
News item 371
Letter 272 Letter 253
Correction 109 Erratum 97
Book Review 102
Review 34 Review 51
Biographical Item 13
Reprint 3
Note 600
Short Survey 257
Total 2565 2622

In the first place Scopus yields for the year 2014 a few more publications for Nature than Web of Science does. The difference can be explained by the articles in press that are still present in the Scopus search results. This probably still requires maintenance from Scopus and should be corrected.

More importantly WoS assigns 833 publications as “editorial material” whereas Scopus assigns only 244 publications as “editorial”. It is a well known tactic from journals such as Nature to assign articles as editorial material, since this practice artificially boosts their impact factor. I have had many arguments with professors whose invited news and views items (most often very well cited!) were not included in bibliometric analyses since they were assigned to “editorial material” category and therefore not included in the analysis.

“Letters”, “corrections” or “errata” are in the same order of size between Scopus and Web of Science. “News Items” are a category of publications in Web of Science, but not in Scopus. They are probably listed as “note” in Scopus. Some of the “short surveys” in Scopus turn up in Web of Science as “news item”. But all these categories probably don’t affect bibliometric analyses too much.

The discrepancy in “reviews” between Web of Science and Scopus however is important. And large as well. Web of Science assigns 34 articles as a “review”, whereas Scopus counts 51 “reviews” in the same journal over the same period. Reviews are included in bibliometric analyses, and since the attract relatively more citations than oridinary articles, special baselines are construed for this document type. But comparisons between these databases are foremost affected by differences in document assignation between these databases.

The differences in editorial material, articles and reviews between Web of Science and Scopus are most likely to the affect outcomes of comparisons in bibliometric analyses between these two databases. But I am not sure about the size of this effect. I would love to see some more quatitative studies in the bibliometrics arena to investigate this issue.

 

References

Waltman, Ludo (2015). A review of the literature on citation impact indicators. http://arxiv.org/abs/1507.02099