Mapping the influence of humanities

David Budtz Pedersen  presented a new research proposal undertaken in Denmark, Mapping the Public Influence of the Humanities, with the aim to map the influence, meaning and value of the humanties in Denmark. His blogpost on the Impact Blog about this project generated a lot of attetion already. Even in the holiday season.

What struck me however, is that the project starts with collecting data from three different sources:

  1. names and affiliations of active scientific experts in Denmark;
  2. by documenting the educational background and research profile of the population;
  3. by downloading and collecting the full corpus of reports, whitepapers, communications, policy briefs, press releases and other written results of Danish advisory groups from 2005 to 2015.

It was the third objective of Buts Pedersen’s project that grabbed my attention: collecting the full corpus of reports, whitepapers, communications, policy briefs, press releases and other written results of Danish advisory groups from 2005 to 2015. That in the country where Atira, the makers of Pure, reside (I know currently wholly owned by Elsevier). It struck a chord with me since this is exactly what should have been done already. Assuming the influence of the humanities is a scholarly debate, all universities contributing to this debate should have an ample filled current research information systems (CRIS) filled with exactly those reports, whitepapers, communications, policy briefs, press releases et cetra.

In this post I want to concentrate on the collection side, assuming that all material collected in the CRIS is available online and free for the public at large to inspect, query and preferably -but not necesarrily- free to download. Let’s look at the collection side for a moment. Most CRIS have all kind of database coupling possibilities with major (scholarly) bibliographic databases: Web of Science, Scopus, Pubmed, Worldcat, CrossRef etc. However, those reports, whitepapers, communications, policy briefs, press releases and other written results are not normally contained in these bibliographic databases. These are the so called grey literature. Not formally published. Not formally indexed. Not easily discovered. Not easily found. Not easily collected. To collect these materials we have to ask and beg researchers to dutifully add these manually in the university CRIS.  That is exactly why universities have bought into CRIS systems. Why libraries are the ideal candidate to maintain CRIS systems. The CRIS system takes away the burden of keeping track of the formal publications through coupling with the formal bibliographic databases. Librarians have knoweldge about all these couplings and search profiles required to make life easy for the researchers. That should leave some time for researchers to devote a little of their valuable time on those other more esoteric materials. Especially in the humanities, where we apparently have more of those grey literature.  A well maintained CRIS should have plentiful of these materials registered. So I was taken aback slightly that this project in Denmark, the cradle of a major CRIS supplier, needs to collect these materials from the start. They should have been registered long time ago already. That is where the value kicks in of a comprehensive, all output inclusive CRIS, resulting in a website with a comprehensive institutional bibliography.

Just a second thought. It is odd to see that two of the major providers of CRIS systems, Thomson Reuters with Converis and Elsevier with Pure are both providers of major news information sources. It is odd that neither of these CRIS products have coupling with the proprietary news databases either Reuters or LexisNexis for press clipping and mentios in the media. From a CRIS managers’ point of view strange to make this observation since we are dealing with the same companies. But the internal company structures seem to hinder these kind seemingly logical coupling of services.


Document types in databases

Currently I am reading with a great deal of interest the Arxiv preprint of the review by Ludo Waltman (2015) from CWTS on bibliometric indicators. In this post I want to provide a brief comment on his section 5.1 where he discusses the role document types in bibliometric analyses. Ludo mainly reviews and comments on the inclusion or exclusion of certain document types in bibliometric analyses, he does not touch upon the subject of discrepancies between databases. I want to argue that he could take his review a step further in this area.

Web of Science and Scopus, differ quite a bit from each other on how they assign document types. If you don’t realize that this discrepancy exists, you can draw wrong conclusions when bibliometric analyses between these databases are studied.

This blogpost is a quick illustration that this is an issue that should be adressed in a review like this. To illustrate my argument I looked to the document types assigned to Nature publications from 2014 in Web of Science and Scopus. The following tables gives an overview of the results:

Document Types Assigned to 2014 Nature Publications in Web of Science and Scopus
WoS Document type # Publications Scopus Document type # Publications
Editorial Material 833 Editorial 244
Article 828 Article 1064
Article in Press 56
News item 371
Letter 272 Letter 253
Correction 109 Erratum 97
Book Review 102
Review 34 Review 51
Biographical Item 13
Reprint 3
Note 600
Short Survey 257
Total 2565 2622

In the first place Scopus yields for the year 2014 a few more publications for Nature than Web of Science does. The difference can be explained by the articles in press that are still present in the Scopus search results. This probably still requires maintenance from Scopus and should be corrected.

More importantly WoS assigns 833 publications as “editorial material” whereas Scopus assigns only 244 publications as “editorial”. It is a well known tactic from journals such as Nature to assign articles as editorial material, since this practice artificially boosts their impact factor. I have had many arguments with professors whose invited news and views items (most often very well cited!) were not included in bibliometric analyses since they were assigned to “editorial material” category and therefore not included in the analysis.

“Letters”, “corrections” or “errata” are in the same order of size between Scopus and Web of Science. “News Items” are a category of publications in Web of Science, but not in Scopus. They are probably listed as “note” in Scopus. Some of the “short surveys” in Scopus turn up in Web of Science as “news item”. But all these categories probably don’t affect bibliometric analyses too much.

The discrepancy in “reviews” between Web of Science and Scopus however is important. And large as well. Web of Science assigns 34 articles as a “review”, whereas Scopus counts 51 “reviews” in the same journal over the same period. Reviews are included in bibliometric analyses, and since the attract relatively more citations than oridinary articles, special baselines are construed for this document type. But comparisons between these databases are foremost affected by differences in document assignation between these databases.

The differences in editorial material, articles and reviews between Web of Science and Scopus are most likely to the affect outcomes of comparisons in bibliometric analyses between these two databases. But I am not sure about the size of this effect. I would love to see some more quatitative studies in the bibliometrics arena to investigate this issue.



Waltman, Ludo (2015). A review of the literature on citation impact indicators.

The costs for going Gold in the Netherlands

For a meeting of the Open Access work group of Dutch university libraries and the licenses work group of those same universities I was asked to make an estimate of the Costs for a 100% Gold OA model for the Netherlands. In this blog post I want to explain the methodology how I arrived at the outcome of the current calculation and contribute to this subject.

In the first slide I compare the Dutch output registered in the two most suitable databases for this research question. Scopus and Web of Science. To my own amazement Scopus only covered more Dutch publications after 2004. For the calculation of the Article Processing Charges (APC) paid by the Dutch research community it is fair to concentrate on the articles and reviews only. Editorials, letters and conference proceedings were therefore left out the equation. Scopus had the lead in articles and reviews already in 2003. Also striking in this graph is that WoS is slower in updating the database than Scopus, since year 2013 is clear trailing behind. Based on the presented graph, it is likely that we will see some 40,000 articles and reviews published by Dutch (co-)authors.

Since the Web of Science interface was renewed, in the search results an Open Access facet was added. The open access facet identifies the journals covered by Web of Science and registered in the DOAJ. The list of Open Access journals covered by Web of Science, i.e. the Open Access journals with an impact factor, or those that will soon receive an impact factor is freely available from Thomson Reuters. Because of the improved OA identification -but not perfect- Web of Science was the database of choice for this exercise. In the second slide I show the increase share of Open Access articles in journal articles covered by Web of Science in the Netherlands. In 2013 3,776 of 35,267 articles and reviews were published in Gold Open Access journals. That is 10.7%

Looking into more detail at the share of open access articles from the Netherlands in graph 3. I distighuish two points of inflection. After 2004 the share of Open Access articles really took off. I guess that this has to do with the expanded coverage of Open Access journals by Web of Science. Since 2007 Web of Science really started to expand its journal coverage. The second point of inflexion seems to be 2010, when PLoS ONE really started to become popular after it had received its first Impact Factor listing.

So far I talked about the Dutch publication as if they were all produced by the universities. In actual fact 13% of the output in 2013 was not produced by universities and 87% by universities and their academic hospitals.

Comparing the number of Open Access articles found in Web of Science and the refereed articles registered in Narcis, we see a big gap in the older years that closes in the current years. The gap is largely caused by green Open Access articles, Hybrid Open Access article, and Open Access articles published in journals not covered by Web of Science. The relative importance of these three factors need to be established. The lines touching in 2014 is an indication that Gold Open Access is important in filling the repositories immediately and that registering the Green articles in repositories actually take some time. Also because of publisher’s embargoes.

Price information for Article Processing Charges (APC) can be found on the eigenfactor website. Looking into detail to the articles published in 2013. 3314 articles were published in journals APCs , and only 404 in journals without APCs. The average APC for the paid OA journal was on average € 1220,- Taking the free journal articles into account as well, the PAC dropped to € 1087,- on average. All these prices are VAT exclusive.

The total costs for gold Open Access publishing for the Netherlands as covered by journals indexed in Web of Science increased nearly linearly from € 1.5 million in 2009 to just over € 4 million in 2013.

Over this five year period we paid quite substantial APC to the following publishers. As to be expected most to Springer/BMC and PLoS. Followed by Oxford University Press. The mentioned European Geosciences Union is in fact published by Copernicus publishers in Germany. Frontiers was recently acquired by the Nature Publishing Group. The license work group really has a list to consider next to the ‘traditional’ big deals with the standard publishers. It is wisely to see if deals can be struck on APC with Open Access publishers as well. Heather Morrison showed just the other day that we have had some steep price increases by BMC/Springer.

There are some points to consider. Not all research published by Dutch researchers is produced by Dutch Researchers only. In the Science, Technology and Innovation indicators it is indicated that some 50% of publications involve international collaboration. So for 50% of the articles Dutch authors don’t always have to pay the full APC. It is paid by the corresponding author from another country. The bill is shared. Or any other variation. Some research in this area is badly needed.
The APC are another issue. The eigenfactor collection was a good starting point, but are perhaps a bit behind reality for some journals already. Some publishers provide lists of all their journals, but often the lack sufficient metadata -e.g. issn- to do actually something useful with the lists. But in most cases APC are well hidden away, somewhere deep down in the instructions to authors for a single journal only. Publishers should be more transparent in this area.
Where the number of ‘Dutch’ articles might be an over estimation, 21% VAT is not.
In WoS currently only 718 Open Access journals are indexed, out of 9744 listed in DOAJ. Those 718 journals are an increase of 99 OA journals from the 619 I found in december 2010. But it is still a long way from the nearly 10,000 Open Access journals we know of. Of course WoS wants, and should, only cover the top tier journals, but there is more values in those 10,000 DOAJ journals than the current WoS selection. In addition to that, WoS should find a way to indicated OA articles in Toll Access journals as well.

Having made these considerations. My estimate is that in 2014 some 40,000 articles and reviews will be published by Dutch researchers. Applying the average APC of € 1087,- I arrive at an estimated € 43,500,000,- for the Netherlands if all Dutch research would be published in Gold Open Access journals. That figure should be compared to the current spending on journal subscriptions in the Netherlands by Dutch Universities, which is about € 34 million per year Euro at the moment. Going for gold will cost therefore € 10.5 million. That is a lot of money.

The week in review – week 4

The week in review, a new attempt to get some life back into this weblog. It is inspired of course (for the Dutch readers) on TWIT The Week In Tweets by colleague @UBABert and the older monthly overviews which Deet’jes used to do on

The new Web of Science interface
Whilst I was in Kenya the previous week to give training for PhD students and staff at Kenyatta University and the University of Nairobi, Thomson Reuters released their new version of the Web of Science. So only this week I had a first go at it. We haven’t been connected to Google Scholar yet, still waiting to see that come through, but in general the new interface is an improvement over the old one. Albeit, searching for authors is still broken for those who haven’t claimed their ResearcherID. But apart from that, what I hadn’t noticed in the demo versions of the new interface is the new Open Access facet in Web of Science. I like it. But immediately the question arises how do they do it jumps to my mind. The is no information in the help files on this new possibility. So my first guess would be the DOAJ list of journals. Through a message on the Sigmetrics list a little more confusion was added, since various PLoS journals are included in their ‘Open Access Journal Title List’, but for PLoS ONE. Actual searches in Web of Science quickly illustrate that for almost any topic in the past view years PLoS ONE is the largest OA journal responsible for content within this Open Access facet. I guess this new facet in Web of Science will spark some more research in the near future. I see the practical approach of Web of Science as a first step in the right direction. The next challenge is of course to indicate the individual Open Access articles in hybrid journals. Followed by -and this will be a real challenge- green archived copies of Toll Access articles. The latter is badly needed since we can’t rely only on Google Scholar to do this for us.

Two interesting articles in the unfolding field of Altmetrics deserve mention. The groups of Judit Barr-Ilan and Mike Thelwall cooperated in “Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics” . They show that Research Blogging is a good post peer review blogging platform able to pick the better cited articles. However, the number of articles covered by the platform is really too small to be meaningful to become a widely used altmetric indicator.
The other article, at the moment still a working paper, was from CWTS (Costas et al. 2014). They combined Web of Science covered articles with the indicators and investigated many different Altmetric indicators such as as mentions on Facebook walls, Blogs, Twitter, Google+ and News outlets but not Mendeley. Twitter is by far the most abundant Altmetric source in this study, but blogs are in a better position to identify top publications. However the main problem remains the limited coverage by the various altmetrics tools. For 2012 24% of the publications had an altmetric mention, but already 26% of the publications had scored already a citations. Thus confirming the other study that coverage of the peer reviewed scholarly output is only covered on a limited scale by social media tools.

Scholarly Communication
As a follow up on my previous post on the five stars of transparent pre-publication peer review, a few articles on peer review came to my attention. The first was, yet another, excellent bibliography by Charles W. Bailey Jr. on transforming peer review. He did not cover blogposts, only peer reviewed journals. The contributions to this field are published in many different journals, so an overview like this still has its merits.
Through a tweet from @Mfenner

I was notified on a really interesting book ‘Opening Science‘. It is still lacking a chapter on changes in the peer review system, but it is really strong at indicating new trends in Scholarly Communication and Publishing. Worth further perusing. Rankings Although the ranking season has not started yet. The rankers are always keen of putting old wine in new bags. The Times Higher Education presented this week the 25 most international universities in the world. It is based the THE WUR, released last year, this time only focusing on the ‘international outlook indicator’only which accounts for 7.5% of their standard ranking. Of the Dutch universities Maastricht does well. Despite the fact that Wageningen university host students from more than 150 countries, we only ranked 45th on this indicator. More interesting was an article of Alter and Reback (2014) where they show that rankings actually influence the number of freshman applying for a college in the United States as well as the fact that quality of college life plays an important factor as well. So it makes sense for universities to invest in campus facilities and recreation possibilities such as sports grounds etc. Random notes A study on copy rights, database rights and IPR in Europe for Europeana by Guibault. Too much to read at once, and far too difficult to comprehend at once. But essential reading for repository managers.


Alter, M., and R. Reback. 2014. True for Your School? How Changing Reputations Alter Demand for Selective U.S. Colleges. Educational Evaluation and Policy Analysis. (Free access)
Bailey Jr., C. W. 2014. Transforming Peer Review Bibliography. Available from
Binfield, P. 2014. Novel Scholarly Journal Concepts. In: Opening Science, edited by Sönke Bartling and Sascha Friesike, 155-163. Springer International Publishing. OA version:
Costas, R., Z. Zahedi, and P. Wouters. 2014. Do ‘altmetrics’ correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective. CWTS Working Paper Series Vol. CWTS-WP-2014-001. Leiden: CWTS. 30 pp.
Guibault, L., and A. Wiebe. 2013. Safe to be open : Study on the protection of research data and recommendation for access and usage. Göttingen: Universitätsverlag Göttingen 167 pp.
Shema, H., J. Bar-Ilan, and M. Thelwall. 2014. Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics. Journal of the Association for Information Science and Technology: n/a-n/a. OA version:

The mysterious ways of Web of Science

A while back, one of our researchers asked me how Steven Salzberg arrived at the number of citations for the paper on the Arabidopsis genome in Nature. When he checked Web of Science, it only delivered zero citations and that couldn’t be true for such a breakthrough paper. Peter found 2689 citations! How did he do that?

I checked out the paper in Web of Science myself first as well, and found also zero citations.

Zero citations from Web of Science for the Arabidopsis papers

I was not entirely surprised since I realized it was one of those consortium papers. I knew Thomson had some problems with a consortium paper in the past. But annoying it was.

I first checked about the issue around the human genome project and found it being mentioned even in Science Watch from Thomson. But from the article it appeared that Thomson only improved the tracking for citations from that Human Genome project paper, and not the raised issue per se. Even though the Arabidopdsis paper was even older the citations to this paper had not been corrected. It appeared that something in the searching or tracking of citations by WoS went wrong but where was the error being made?

I made a few futile attempts in the cited ref search with Arabidopsis as author, or Arabidopsis*. Searched in the cited ref search for Kaul as author (which is listed in the end of the original article as first author) but that only resulted in some 130 citations. Not sufficient to justify Steven Salzberg number of citations. I did not like to use the cited ref search to look for the cited articles from Nature in 2000 this is a very large result set that you have to wade through innumerable pages of results since you can’t refine these type of searches by volume or page numbers. (Wouldn’t that be nice?)

To reassure my inquisitive researcher I pointed him to Scopus (Sorry Thomson) where the he could see a reassuring 3000+ citations himself. Meanwhile I did not have a quick fix for this problem.

It was only later when I looked into the problem again, and somehow I was forwarded to the all databases search rather than the Web of Science search tab, which I normally use. To my utter amazement the title search delivered this time two records. Both with zero citations, but more importantly it showed next to [Anon] Ar Gen In, as the author.

Zero citations from Web of Science for the Arabidopsis papers

Now the problem was simple. I had found the author. A cited ref search yielded indeed nearly the 2689 citations from Steven Salzberg.

Zero citations from Web of Science for the Arabidopsis papers

But these figures are not entirely correct either since there are some additional 131 citations to be found with Kaul as a first author reference to Nature with the correct volume and page number.

Of course I requested at Web of Science a correction of the citation data, but forgot to include Kaul’s citations. Hopefully this will be repaired at a later date.

But what makes me really wondering is the slight -but very important- difference in record presentation between the All Databases search and the Web of Science search  on Web of Knowledge. For me personally the standard entry in Web of Knowledge is the Web of Science tab. Not in my normal working routine would I ever go to the all databases tab to look up a number of citations. Just by luck I found the right author name on this occasion. But it shouldn’t have to become the standard way to perform searches shouldn’t it?