Document types in databases

Currently I am reading with a great deal of interest the Arxiv preprint of the review by Ludo Waltman (2015) from CWTS on bibliometric indicators. In this post I want to provide a brief comment on his section 5.1 where he discusses the role document types in bibliometric analyses. Ludo mainly reviews and comments on the inclusion or exclusion of certain document types in bibliometric analyses, he does not touch upon the subject of discrepancies between databases. I want to argue that he could take his review a step further in this area.

Web of Science and Scopus, differ quite a bit from each other on how they assign document types. If you don’t realize that this discrepancy exists, you can draw wrong conclusions when bibliometric analyses between these databases are studied.

This blogpost is a quick illustration that this is an issue that should be adressed in a review like this. To illustrate my argument I looked to the document types assigned to Nature publications from 2014 in Web of Science and Scopus. The following tables gives an overview of the results:

Document Types Assigned to 2014 Nature Publications in Web of Science and Scopus
WoS Document type # Publications Scopus Document type # Publications
Editorial Material 833 Editorial 244
Article 828 Article 1064
Article in Press 56
News item 371
Letter 272 Letter 253
Correction 109 Erratum 97
Book Review 102
Review 34 Review 51
Biographical Item 13
Reprint 3
Note 600
Short Survey 257
Total 2565 2622

In the first place Scopus yields for the year 2014 a few more publications for Nature than Web of Science does. The difference can be explained by the articles in press that are still present in the Scopus search results. This probably still requires maintenance from Scopus and should be corrected.

More importantly WoS assigns 833 publications as “editorial material” whereas Scopus assigns only 244 publications as “editorial”. It is a well known tactic from journals such as Nature to assign articles as editorial material, since this practice artificially boosts their impact factor. I have had many arguments with professors whose invited news and views items (most often very well cited!) were not included in bibliometric analyses since they were assigned to “editorial material” category and therefore not included in the analysis.

“Letters”, “corrections” or “errata” are in the same order of size between Scopus and Web of Science. “News Items” are a category of publications in Web of Science, but not in Scopus. They are probably listed as “note” in Scopus. Some of the “short surveys” in Scopus turn up in Web of Science as “news item”. But all these categories probably don’t affect bibliometric analyses too much.

The discrepancy in “reviews” between Web of Science and Scopus however is important. And large as well. Web of Science assigns 34 articles as a “review”, whereas Scopus counts 51 “reviews” in the same journal over the same period. Reviews are included in bibliometric analyses, and since the attract relatively more citations than oridinary articles, special baselines are construed for this document type. But comparisons between these databases are foremost affected by differences in document assignation between these databases.

The differences in editorial material, articles and reviews between Web of Science and Scopus are most likely to the affect outcomes of comparisons in bibliometric analyses between these two databases. But I am not sure about the size of this effect. I would love to see some more quatitative studies in the bibliometrics arena to investigate this issue.



Waltman, Ludo (2015). A review of the literature on citation impact indicators.

Springer and Macmillan merger : some observations

The proposed merger between Springer and Macmillan came as a surprise to me. They are two big brands that come together. However if you look purely at figures in number of journals Macmillan is a midget compared to Springer and combined they are probably slightly bigger than Elsevier. It is the brand value of Nature and the Nature Publishing Group (NPG) that might shine on Springer and its journals if this merger is managed well. Imagine a cascading peer review system for turned down articles from Nature to the complete Springer portfolio rather than the NPG journals only. That would give the Springer journals an enormous boost. In number of journals involved this planned merger will probably not be stopped by the anti-cartel watchdogs.

What has not been mentioned in most press releases is the fact that this deal will for sure create the most profitable Open Access publisher in the world. Springer already acquired BioMed Central some years ago, and is expanding ferociously its own Springer Open brand and platform. Macmillan’s Nature Publishing Group acquired the Swiss Frontiers early 2013. Frontiers showed a healthy growth from 2,500 article in 2006 to 11,000 in 2014. The combined numbers of Open Access articles published by Springer Open, BioMed Central, Frontiers and the Nature Open Access journals (Nature Communications, Nature Reports) is still not topping that of Public Library of Science (PLoS). However the revenue in Article Processing Charges for this portfolio easily surpasses that of PLoS. For the Netherlands I made an estimate for the national APC paid to the largest publishers in early 2013. This new merger is the largest in turnover simply because they charge the highest Gold APC.

Interesting as well is to look at books, I have no figures at hand, but Springer publishes around 6000 scholarly books per year. The number by Macmillan likely to be a lot smaller, but complementary since Macmillan has a much better penetration in the textbook market. If Springer will learn from Macmillan to produce text books, rather than purely scholarly books, their earnings will increase considerably.

What amazes me however, is the fact that Digital Science is not part of the deal. Springer is still a bit of a traditional publisher and so is Mamillan. Books and journals abound it is the mainstay of their businessmodel. Okay Springer have acquired Papers, as competitor to EndNote and Mendeley. Digital Science however, is the collection of start ups from Nature and Macmillan, they have a whole portfolio of new and exciting things, Readcube, Figshare, Altmetric, Symplectic and many more. Those are really the jewels in the crown, but they are not part of the merger and Springer will badly gonna miss them.