Mapping the influence of humanities

David Budtz Pedersen  presented a new research proposal undertaken in Denmark, Mapping the Public Influence of the Humanities, with the aim to map the influence, meaning and value of the humanties in Denmark. His blogpost on the Impact Blog about this project generated a lot of attetion already. Even in the holiday season.

What struck me however, is that the project starts with collecting data from three different sources:

  1. names and affiliations of active scientific experts in Denmark;
  2. by documenting the educational background and research profile of the population;
  3. by downloading and collecting the full corpus of reports, whitepapers, communications, policy briefs, press releases and other written results of Danish advisory groups from 2005 to 2015.

It was the third objective of Buts Pedersen’s project that grabbed my attention: collecting the full corpus of reports, whitepapers, communications, policy briefs, press releases and other written results of Danish advisory groups from 2005 to 2015. That in the country where Atira, the makers of Pure, reside (I know currently wholly owned by Elsevier). It struck a chord with me since this is exactly what should have been done already. Assuming the influence of the humanities is a scholarly debate, all universities contributing to this debate should have an ample filled current research information systems (CRIS) filled with exactly those reports, whitepapers, communications, policy briefs, press releases et cetra.

In this post I want to concentrate on the collection side, assuming that all material collected in the CRIS is available online and free for the public at large to inspect, query and preferably -but not necesarrily- free to download. Let’s look at the collection side for a moment. Most CRIS have all kind of database coupling possibilities with major (scholarly) bibliographic databases: Web of Science, Scopus, Pubmed, Worldcat, CrossRef etc. However, those reports, whitepapers, communications, policy briefs, press releases and other written results are not normally contained in these bibliographic databases. These are the so called grey literature. Not formally published. Not formally indexed. Not easily discovered. Not easily found. Not easily collected. To collect these materials we have to ask and beg researchers to dutifully add these manually in the university CRIS.  That is exactly why universities have bought into CRIS systems. Why libraries are the ideal candidate to maintain CRIS systems. The CRIS system takes away the burden of keeping track of the formal publications through coupling with the formal bibliographic databases. Librarians have knoweldge about all these couplings and search profiles required to make life easy for the researchers. That should leave some time for researchers to devote a little of their valuable time on those other more esoteric materials. Especially in the humanities, where we apparently have more of those grey literature.  A well maintained CRIS should have plentiful of these materials registered. So I was taken aback slightly that this project in Denmark, the cradle of a major CRIS supplier, needs to collect these materials from the start. They should have been registered long time ago already. That is where the value kicks in of a comprehensive, all output inclusive CRIS, resulting in a website with a comprehensive institutional bibliography.

Just a second thought. It is odd to see that two of the major providers of CRIS systems, Thomson Reuters with Converis and Elsevier with Pure are both providers of major news information sources. It is odd that neither of these CRIS products have coupling with the proprietary news databases either Reuters or LexisNexis for press clipping and mentios in the media. From a CRIS managers’ point of view strange to make this observation since we are dealing with the same companies. But the internal company structures seem to hinder these kind seemingly logical coupling of services.


Research data information literacy and digital literacy

Following the blogpost of Yasmeen ShorishData, data everywhere…but do we want to drink? The role of data, digital curation, and scholarly communication in academic libraries.” Got me thinking on the curriculum of information literacy in academic libraries. Shorish:

This means that academic libraries must incorporate the work of data information literacy into their existing information literacy and scholarly communication missions, else risk excluding these data librarian positions from the natural cohort of colleagues doing that work, or risk overextending the work of the library.

Information literacy is one of the core activities of information specialists, but usually only aimed at students, ideally graduate students as well and perhaps post grads, but certainly not the researchers or teaching faculty of the institution. Including  research data management under the umbrella of information literacy reinforces the position of library information specialists and bring their complete information literacy offereings under the attention of faculty as well. The data literacy skills help to “sell” information literacy to faculty as well.

Some information specialists might be caught off guard by the new required skills set mentioned by Shorish “experience with SPSS, R, Python, statistics and statistical literacy, and/or data visualization software find their way into librarian position descriptions“. This brings me to the third aspect of information literacy, I would broaden this to the digital skills set, or digital literacy as mentioned in NMC Horizon report 2015. But exactly in this part of the report  research libraries are not mentioned. Undeservedly so in my opinion.

So here we have a task at hand. Quite a large one, if you ask me, but doable. Break out of the shackles of the classical forms of information literacy, include research data management in these courses, or curricula as well and work towards digital literacy courses.

Document types in databases

Currently I am reading with a great deal of interest the Arxiv preprint of the review by Ludo Waltman (2015) from CWTS on bibliometric indicators. In this post I want to provide a brief comment on his section 5.1 where he discusses the role document types in bibliometric analyses. Ludo mainly reviews and comments on the inclusion or exclusion of certain document types in bibliometric analyses, he does not touch upon the subject of discrepancies between databases. I want to argue that he could take his review a step further in this area.

Web of Science and Scopus, differ quite a bit from each other on how they assign document types. If you don’t realize that this discrepancy exists, you can draw wrong conclusions when bibliometric analyses between these databases are studied.

This blogpost is a quick illustration that this is an issue that should be adressed in a review like this. To illustrate my argument I looked to the document types assigned to Nature publications from 2014 in Web of Science and Scopus. The following tables gives an overview of the results:

Document Types Assigned to 2014 Nature Publications in Web of Science and Scopus
WoS Document type # Publications Scopus Document type # Publications
Editorial Material 833 Editorial 244
Article 828 Article 1064
Article in Press 56
News item 371
Letter 272 Letter 253
Correction 109 Erratum 97
Book Review 102
Review 34 Review 51
Biographical Item 13
Reprint 3
Note 600
Short Survey 257
Total 2565 2622

In the first place Scopus yields for the year 2014 a few more publications for Nature than Web of Science does. The difference can be explained by the articles in press that are still present in the Scopus search results. This probably still requires maintenance from Scopus and should be corrected.

More importantly WoS assigns 833 publications as “editorial material” whereas Scopus assigns only 244 publications as “editorial”. It is a well known tactic from journals such as Nature to assign articles as editorial material, since this practice artificially boosts their impact factor. I have had many arguments with professors whose invited news and views items (most often very well cited!) were not included in bibliometric analyses since they were assigned to “editorial material” category and therefore not included in the analysis.

“Letters”, “corrections” or “errata” are in the same order of size between Scopus and Web of Science. “News Items” are a category of publications in Web of Science, but not in Scopus. They are probably listed as “note” in Scopus. Some of the “short surveys” in Scopus turn up in Web of Science as “news item”. But all these categories probably don’t affect bibliometric analyses too much.

The discrepancy in “reviews” between Web of Science and Scopus however is important. And large as well. Web of Science assigns 34 articles as a “review”, whereas Scopus counts 51 “reviews” in the same journal over the same period. Reviews are included in bibliometric analyses, and since the attract relatively more citations than oridinary articles, special baselines are construed for this document type. But comparisons between these databases are foremost affected by differences in document assignation between these databases.

The differences in editorial material, articles and reviews between Web of Science and Scopus are most likely to the affect outcomes of comparisons in bibliometric analyses between these two databases. But I am not sure about the size of this effect. I would love to see some more quatitative studies in the bibliometrics arena to investigate this issue.



Waltman, Ludo (2015). A review of the literature on citation impact indicators.

Springer and Macmillan merger : some observations

The proposed merger between Springer and Macmillan came as a surprise to me. They are two big brands that come together. However if you look purely at figures in number of journals Macmillan is a midget compared to Springer and combined they are probably slightly bigger than Elsevier. It is the brand value of Nature and the Nature Publishing Group (NPG) that might shine on Springer and its journals if this merger is managed well. Imagine a cascading peer review system for turned down articles from Nature to the complete Springer portfolio rather than the NPG journals only. That would give the Springer journals an enormous boost. In number of journals involved this planned merger will probably not be stopped by the anti-cartel watchdogs.

What has not been mentioned in most press releases is the fact that this deal will for sure create the most profitable Open Access publisher in the world. Springer already acquired BioMed Central some years ago, and is expanding ferociously its own Springer Open brand and platform. Macmillan’s Nature Publishing Group acquired the Swiss Frontiers early 2013. Frontiers showed a healthy growth from 2,500 article in 2006 to 11,000 in 2014. The combined numbers of Open Access articles published by Springer Open, BioMed Central, Frontiers and the Nature Open Access journals (Nature Communications, Nature Reports) is still not topping that of Public Library of Science (PLoS). However the revenue in Article Processing Charges for this portfolio easily surpasses that of PLoS. For the Netherlands I made an estimate for the national APC paid to the largest publishers in early 2013. This new merger is the largest in turnover simply because they charge the highest Gold APC.

Interesting as well is to look at books, I have no figures at hand, but Springer publishes around 6000 scholarly books per year. The number by Macmillan likely to be a lot smaller, but complementary since Macmillan has a much better penetration in the textbook market. If Springer will learn from Macmillan to produce text books, rather than purely scholarly books, their earnings will increase considerably.

What amazes me however, is the fact that Digital Science is not part of the deal. Springer is still a bit of a traditional publisher and so is Mamillan. Books and journals abound it is the mainstay of their businessmodel. Okay Springer have acquired Papers, as competitor to EndNote and Mendeley. Digital Science however, is the collection of start ups from Nature and Macmillan, they have a whole portfolio of new and exciting things, Readcube, Figshare, Altmetric, Symplectic and many more. Those are really the jewels in the crown, but they are not part of the merger and Springer will badly gonna miss them.

Open Access journal article processing charges

OA logoArticle Processing Charges (APC) of Gold Open Access journals are very often deeply hidden in journal websites. Sometimes they aren’t even stated on the journal website, eg. “For inquiries relating to the publication fee of articles, please contact the editorial office“. The lack of good overviews hinders research into APCs between different publishers and journals. To my knowledge there is only the Eigenfactor APC overview that provides a reasonable amount of information, but is already getting outdated. The DOAJ used to have at least a lost of free journals, but that is currently no longer available, due to the restructuring of DOAJ. For this reason I have made a small start to collect the article processing charges of some major Open Access publishers. I do invite anybody to add more journals from any Open Access publishers. However most interesting are of course the price information of journals listed in Web of Science or Scopus. Please inform others and help to complete this list. Anybody with the link can edit the file.

2014-11-30: Ross Mounce did collect information on journal APC as well in 2012 in his blogpost A visualization of Gold Open Access options
2014-11-30: Added all the “free” OA journals based on the information provided by DOAJ in February 2014, and corrected information where necessary.
2014:11-30: Changed the settings of the file with all the information so anybody can edit.