Research data information literacy and digital literacy

Following the blogpost of Yasmeen ShorishData, data everywhere…but do we want to drink? The role of data, digital curation, and scholarly communication in academic libraries.” Got me thinking on the curriculum of information literacy in academic libraries. Shorish:

This means that academic libraries must incorporate the work of data information literacy into their existing information literacy and scholarly communication missions, else risk excluding these data librarian positions from the natural cohort of colleagues doing that work, or risk overextending the work of the library.

Information literacy is one of the core activities of information specialists, but usually only aimed at students, ideally graduate students as well and perhaps post grads, but certainly not the researchers or teaching faculty of the institution. Including  research data management under the umbrella of information literacy reinforces the position of library information specialists and bring their complete information literacy offereings under the attention of faculty as well. The data literacy skills help to “sell” information literacy to faculty as well.

Some information specialists might be caught off guard by the new required skills set mentioned by Shorish “experience with SPSS, R, Python, statistics and statistical literacy, and/or data visualization software find their way into librarian position descriptions“. This brings me to the third aspect of information literacy, I would broaden this to the digital skills set, or digital literacy as mentioned in NMC Horizon report 2015. But exactly in this part of the report  research libraries are not mentioned. Undeservedly so in my opinion.

So here we have a task at hand. Quite a large one, if you ask me, but doable. Break out of the shackles of the classical forms of information literacy, include research data management in these courses, or curricula as well and work towards digital literacy courses.

Document types in databases

Currently I am reading with a great deal of interest the Arxiv preprint of the review by Ludo Waltman (2015) from CWTS on bibliometric indicators. In this post I want to provide a brief comment on his section 5.1 where he discusses the role document types in bibliometric analyses. Ludo mainly reviews and comments on the inclusion or exclusion of certain document types in bibliometric analyses, he does not touch upon the subject of discrepancies between databases. I want to argue that he could take his review a step further in this area.

Web of Science and Scopus, differ quite a bit from each other on how they assign document types. If you don’t realize that this discrepancy exists, you can draw wrong conclusions when bibliometric analyses between these databases are studied.

This blogpost is a quick illustration that this is an issue that should be adressed in a review like this. To illustrate my argument I looked to the document types assigned to Nature publications from 2014 in Web of Science and Scopus. The following tables gives an overview of the results:

Document Types Assigned to 2014 Nature Publications in Web of Science and Scopus
WoS Document type # Publications Scopus Document type # Publications
Editorial Material 833 Editorial 244
Article 828 Article 1064
Article in Press 56
News item 371
Letter 272 Letter 253
Correction 109 Erratum 97
Book Review 102
Review 34 Review 51
Biographical Item 13
Reprint 3
Note 600
Short Survey 257
Total 2565 2622

In the first place Scopus yields for the year 2014 a few more publications for Nature than Web of Science does. The difference can be explained by the articles in press that are still present in the Scopus search results. This probably still requires maintenance from Scopus and should be corrected.

More importantly WoS assigns 833 publications as “editorial material” whereas Scopus assigns only 244 publications as “editorial”. It is a well known tactic from journals such as Nature to assign articles as editorial material, since this practice artificially boosts their impact factor. I have had many arguments with professors whose invited news and views items (most often very well cited!) were not included in bibliometric analyses since they were assigned to “editorial material” category and therefore not included in the analysis.

“Letters”, “corrections” or “errata” are in the same order of size between Scopus and Web of Science. “News Items” are a category of publications in Web of Science, but not in Scopus. They are probably listed as “note” in Scopus. Some of the “short surveys” in Scopus turn up in Web of Science as “news item”. But all these categories probably don’t affect bibliometric analyses too much.

The discrepancy in “reviews” between Web of Science and Scopus however is important. And large as well. Web of Science assigns 34 articles as a “review”, whereas Scopus counts 51 “reviews” in the same journal over the same period. Reviews are included in bibliometric analyses, and since the attract relatively more citations than oridinary articles, special baselines are construed for this document type. But comparisons between these databases are foremost affected by differences in document assignation between these databases.

The differences in editorial material, articles and reviews between Web of Science and Scopus are most likely to the affect outcomes of comparisons in bibliometric analyses between these two databases. But I am not sure about the size of this effect. I would love to see some more quatitative studies in the bibliometrics arena to investigate this issue.



Waltman, Ludo (2015). A review of the literature on citation impact indicators.

Springer and Macmillan merger : some observations

The proposed merger between Springer and Macmillan came as a surprise to me. They are two big brands that come together. However if you look purely at figures in number of journals Macmillan is a midget compared to Springer and combined they are probably slightly bigger than Elsevier. It is the brand value of Nature and the Nature Publishing Group (NPG) that might shine on Springer and its journals if this merger is managed well. Imagine a cascading peer review system for turned down articles from Nature to the complete Springer portfolio rather than the NPG journals only. That would give the Springer journals an enormous boost. In number of journals involved this planned merger will probably not be stopped by the anti-cartel watchdogs.

What has not been mentioned in most press releases is the fact that this deal will for sure create the most profitable Open Access publisher in the world. Springer already acquired BioMed Central some years ago, and is expanding ferociously its own Springer Open brand and platform. Macmillan’s Nature Publishing Group acquired the Swiss Frontiers early 2013. Frontiers showed a healthy growth from 2,500 article in 2006 to 11,000 in 2014. The combined numbers of Open Access articles published by Springer Open, BioMed Central, Frontiers and the Nature Open Access journals (Nature Communications, Nature Reports) is still not topping that of Public Library of Science (PLoS). However the revenue in Article Processing Charges for this portfolio easily surpasses that of PLoS. For the Netherlands I made an estimate for the national APC paid to the largest publishers in early 2013. This new merger is the largest in turnover simply because they charge the highest Gold APC.

Interesting as well is to look at books, I have no figures at hand, but Springer publishes around 6000 scholarly books per year. The number by Macmillan likely to be a lot smaller, but complementary since Macmillan has a much better penetration in the textbook market. If Springer will learn from Macmillan to produce text books, rather than purely scholarly books, their earnings will increase considerably.

What amazes me however, is the fact that Digital Science is not part of the deal. Springer is still a bit of a traditional publisher and so is Mamillan. Books and journals abound it is the mainstay of their businessmodel. Okay Springer have acquired Papers, as competitor to EndNote and Mendeley. Digital Science however, is the collection of start ups from Nature and Macmillan, they have a whole portfolio of new and exciting things, Readcube, Figshare, Altmetric, Symplectic and many more. Those are really the jewels in the crown, but they are not part of the merger and Springer will badly gonna miss them.

Open Access journal article processing charges

OA logoArticle Processing Charges (APC) of Gold Open Access journals are very often deeply hidden in journal websites. Sometimes they aren’t even stated on the journal website, eg. “For inquiries relating to the publication fee of articles, please contact the editorial office“. The lack of good overviews hinders research into APCs between different publishers and journals. To my knowledge there is only the Eigenfactor APC overview that provides a reasonable amount of information, but is already getting outdated. The DOAJ used to have at least a lost of free journals, but that is currently no longer available, due to the restructuring of DOAJ. For this reason I have made a small start to collect the article processing charges of some major Open Access publishers. I do invite anybody to add more journals from any Open Access publishers. However most interesting are of course the price information of journals listed in Web of Science or Scopus. Please inform others and help to complete this list. Anybody with the link can edit the file.

2014-11-30: Ross Mounce did collect information on journal APC as well in 2012 in his blogpost A visualization of Gold Open Access options
2014-11-30: Added all the “free” OA journals based on the information provided by DOAJ in February 2014, and corrected information where necessary.
2014:11-30: Changed the settings of the file with all the information so anybody can edit.

The invisible web is still there, and it is probably larger than ever

Book review: Devine, J., & Egger-Sider, F. (2014). Going beyond Google again : strategies for using and teaching the Invisible Web. Chicago: Neal-Schuman, an imprint of the American Library Association. ISBN 9781555708986, 180p.

Going Beyond Google Again: Strategies for Using and Teaching the Invisible Web

The invisible web, as we know it, dates back to at least 2001. In that year both Sherman & Price (2001) as well as Bergman (2001) came out with two studies describing the whole issue surrounding the deep, or invisible web, for the first time. These two seminal studies each used a different term to indicate the same concept, invisible and deep, but both described independently from each other convincingly that there was more information available that ordinary search engines can see.

Later on Lewandowski & Mayr (2006) showed that Bergmann perhaps overstated the size of the actual problem, but it certainly remained a problem for those unaware of the whole issue. Whilst Ford & Mansourian (2006) added the concept of the “cognitive inivisbility”, i.e. everything beyond page 1 in the Google results page. Since then very little has happened in the research on this problem in the search or information retrieval community. The notion of “deep web” has continued to receive some interest in the computer sciences, where they look into query expansion and data mining to alleviate the problems. But ground breaking scientific studies on this subject in the area of information retrieval or LIS have been scanty.

The authors of the current book Devine and Egger-Sider have been involved with the invisible web already since 2004 (Devine & Egger-Sider, 2004; Devine & Egger-Sider, 2009). Their main concern is to get the concept of the invisible web in the curriculum for information literacy. The current book documents a major survey in this area. For the purpose of getting the invisible web in the information literacy curriculum they maintain a useful website with invisible web discovery tools.

The current book is largely a repetition of their previous book (Devine & Egger-Sider, 2009). However two major additions to the notion of the invisible web have been added. Web 2.0 or the social web, and the mobile or the apps web. The first concept I was aware of and used it in classes for information professionals in the Netherlands for quite a long time already. The second concept was an eye opener for me. I did realize that search on mobile devices was different, more personalized than anything else, but I had not categorized it as a part of the invisible web.

Where Devine and Egger-Sider (2014) disappoint is that the proposed solutions, curricula etc, only address the invisible as a database problem. Identify the right databases and perform your searches. Make students and scholars aware of the problem, guide them to the additional resources and the problem is solved. However, no solution whatsoever, is provided to solve the information gap due to the social web or the mobile web. On this part the book does not add anything to the version from 2009.

Another notion of the ever increasing invisible web as we know it, concerns grey literature. Scholarly output in the form of peer reviewed articles or books are reasonably well covered by (web) search engines and library subscribed A&I databases, but to retrieve the grey literature still remains a major problem. The whole notion of grey literature is mentioned in this book. Despite the concern about the invisible or deep web, they also fail to stress the advantages that full scale web search engines have brought. Previously we only had the indexed bibliographic information to search whereas web search engines brought us full text search. Full text search, while not being superior, has brought us new opportunities and sometimes improved retrieval as well.

The book is not entirely up to date. The majority of the reference are up to date to 2011, only a few 2012 let alone 2013 references are included. Apparently the book took a long time to write and produce. But what is really lacking is a suitable accompanying website. The many URLs provided in the book on a short list would have been helpful to probably many readers. For the time being we have to do it with their older webpage which is less comprehensive than the complete collection of sources mentioned in this edition.

Where the book completely fails is the inclusion of the darknet. Since Wikileaks and Snowden we should be aware that even more is going on in the invisible web than ever before. Devine & Egger Sider, only mention the darknet or dark web as an area not to treat. This is slightly disappointing.

If you have already the 2009 edition of this book, there is no need to upgrade to the current version.

Bergman, M.K. (2001). White Paper: The Deep Web: Surfacing Hidden Value. The Journal of Electronic Publishing, 7(1).
Devine, J., & Egger-Sider, F. (2004). Beyond Google : The invisible Web in the academic library. The Journal of Academic Librairianship, 30(4), 265-269.
Devine, J., & Egger-Sider, F. (2009). Going beyond Google : the invisible web in learning and teaching. London: Facet Publishing. 156p.
Devine, J., & Egger-Sider, F. (2014). Going beyond Google again : strategies for using and teaching the Invisible Web. Chicago: Neal-Schuman, an imprint of the American Library Association. 180p.
Lewandowski, D., & Mayr, P. (2006). Exploring the academic invisible web. Library Hi Tech, 24(4), 529-539. OA version:
Sherman, C., & Price, G. (2001). The invisible web: Discovering information sources search engines can’t see. Medford NJ, USA: Information today. 439p.
Ford, N., & Mansourian, Y. (2006). The invisible web: An empirical study of “cognitive invisibility”. Journal of Documentation, 62(5), 584-596.

Other reviews for this book
Malone, A. (2014). Going Beyond Google Again: Strategies for Using and Teaching the Invisible Web, Jane Devine, Francine Egger-Sider. Neal-Schuman, Chicago (2014), ISBN: 978-1-55570-898-6. The Journal of Academic Librarianship, 40(3–4), 421.
Mason, D. (2014). Going Beyond Google Again: Strategies for Using and Teaching the Invisible Web. Online Information Review, 38(7), 992-993.
Stenis, P. (2014). Going Beyond Google Again: Strategies for Using and Teaching the Invisible Web. Reference & User Services Quarterly, 53(4), 367-367.
Sweeper, D. (2014). A Review of “Going Beyond Google Again: Strategies for Using and Teaching the Invisible Web”. Journal of Electronic Resources Librarianship, 26(2), 154-155.