Springer and Macmillan merger : some observations

The proposed merger between Springer and Macmillan came as a surprise to me. They are two big brands that come together. However if you look purely at figures in number of journals Macmillan is a midget compared to Springer and combined they are probably slightly bigger than Elsevier. It is the brand of Nature and Nature Publishing group that might shine on Springer and its journals if this merger is managed well. Imagine a cascading peer review system for turned down articles from Nature to the complete Springer portfolio rather than the NPG journals only. That would give those Springer journals an enormous boost. In number of journals this merger will probably not be stopped by the anti-cartel watchdogs.

What has not been mentioned in most press releases is the fact that this deal will for sure create the most profitable Open Access publisher in the world. Springer already acquired BioMed Central some years ago, and is expanding ferociously its own Springer Open brand and platform. Macmillan’s Nature Publishing Group acquired the Swiss Frontiers early 2013 http://www.nature.com/press_releases/npgfrontiers.html Frontiers showed a healthy growth from 2,500 article in 2006 to 11,000 in 2014. The combined numbers of Open Access articles published by Springer Open, BioMed Central, Frontiers and the Nature Open Access journals (Nature Communications, Nature Reports) is still not topping that of Public Library of Science (PLoS). However the revenue in Article Processing Charges for this portfolio easily surpasses that of PLoS. For the Netherlands I made an estimate for the national APC paid to the largest publishers in early 2013. This new merger is the largest in turnover simply because they charge the highest Gold APC.

Interesting as well is to look at books, I have no figures at hand, but Springer publishes around 6000 scholarly books per year. The number by Macmillan likely to be a lot smaller, but complementary since Macmillan has a much better penetration in the textbook market. If Springer will learn from Macmillan to produce text books, rather than purely scholarly books, their earnings will increase considerably.

What amazes me however, is that Digital Science is not part of the deal. Springer is still a bit of a traditional publisher and so is Mamillan. Books and journals abound it is the mainstay of their businessmodel. Okay Springer have acquired Papers, as competitor to EndNote and Mendeley. Digital Science however, is the collection of start ups from Nature and Macmillan, they have a whole portfolio of new and exciting things, Readcube, Figshare, Altmetric, Symplectic and many more. Those are really the jewels in the crown, but they are not part of the merger and Springer will badly gonna miss them.

Open Access journal article processing charges

OA logoArticle Processing Charges (APC) of Gold Open Access journals are very often deeply hidden in journal websites. Sometimes they aren’t even stated on the journal website, eg. “For inquiries relating to the publication fee of articles, please contact the editorial office“. The lack of good overviews hinders research into APCs between different publishers and journals. To my knowledge there is only the Eigenfactor APC overview that provides a reasonable amount of information, but is already getting outdated. The DOAJ used to have at least a lost of free journals, but that is currently no longer available, due to the restructuring of DOAJ. For this reason I have made a small start to collect the article processing charges of some major Open Access publishers. I do invite anybody to add more journals from any Open Access publishers. However most interesting are of course the price information of journals listed in Web of Science or Scopus. Please inform others and help to complete this list. Anybody with the link can edit the file.

Updates:
2014-11-30: Ross Mounce did collect information on journal APC as well in 2012 in his blogpost A visualization of Gold Open Access options
2014-11-30: Added all the “free” OA journals based on the information provided by DOAJ in February 2014, and corrected information where necessary.
2014:11-30: Changed the settings of the file with all the information so anybody can edit.

The invisible web is still there, and it is probably larger than ever

Book review: Devine, J., & Egger-Sider, F. (2014). Going beyond Google again : strategies for using and teaching the Invisible Web. Chicago: Neal-Schuman, an imprint of the American Library Association. ISBN 9781555708986, 180p.

Going Beyond Google Again: Strategies for Using and Teaching the Invisible Web

The invisible web, as we know it, dates back to at least 2001. In that year both Sherman & Price (2001) as well as Bergman (2001) came out with two studies describing the whole issue surrounding the deep, or invisible web, for the first time. These two seminal studies each used a different term to indicate the same concept, invisible and deep, but both described independently from each other convincingly that there was more information available that ordinary search engines can see.

Later on Lewandowski & Mayr (2006) showed that Bergmann perhaps overstated the size of the actual problem, but it certainly remained a problem for those unaware of the whole issue. Whilst Ford & Mansourian (2006) added the concept of the “cognitive inivisbility”, i.e. everything beyond page 1 in the Google results page. Since then very little has happened in the research on this problem in the search or information retrieval community. The notion of “deep web” has continued to receive some interest in the computer sciences, where they look into query expansion and data mining to alleviate the problems. But ground breaking scientific studies on this subject in the area of information retrieval or LIS have been scanty.

The authors of the current book Devine and Egger-Sider have been involved with the invisible web already since 2004 (Devine & Egger-Sider, 2004; Devine & Egger-Sider, 2009). Their main concern is to get the concept of the invisible web in the curriculum for information literacy. The current book documents a major survey in this area. For the purpose of getting the invisible web in the information literacy curriculum they maintain a useful website with invisible web discovery tools.

The current book is largely a repetition of their previous book (Devine & Egger-Sider, 2009). However two major additions to the notion of the invisible web have been added. Web 2.0 or the social web, and the mobile or the apps web. The first concept I was aware of and used it in classes for information professionals in the Netherlands for quite a long time already. The second concept was an eye opener for me. I did realize that search on mobile devices was different, more personalized than anything else, but I had not categorized it as a part of the invisible web.

Where Devine and Egger-Sider (2014) disappoint is that the proposed solutions, curricula etc, only address the invisible as a database problem. Identify the right databases and perform your searches. Make students and scholars aware of the problem, guide them to the additional resources and the problem is solved. However, no solution whatsoever, is provided to solve the information gap due to the social web or the mobile web. On this part the book does not add anything to the version from 2009.

Another notion of the ever increasing invisible web as we know it, concerns grey literature. Scholarly output in the form of peer reviewed articles or books are reasonably well covered by (web) search engines and library subscribed A&I databases, but to retrieve the grey literature still remains a major problem. The whole notion of grey literature is mentioned in this book. Despite the concern about the invisible or deep web, they also fail to stress the advantages that full scale web search engines have brought. Previously we only had the indexed bibliographic information to search whereas web search engines brought us full text search. Full text search, while not being superior, has brought us new opportunities and sometimes improved retrieval as well.

The book is not entirely up to date. The majority of the reference are up to date to 2011, only a few 2012 let alone 2013 references are included. Apparently the book took a long time to write and produce. But what is really lacking is a suitable accompanying website. The many URLs provided in the book on a short list would have been helpful to probably many readers. For the time being we have to do it with their older webpage which is less comprehensive than the complete collection of sources mentioned in this edition.

Where the book completely fails is the inclusion of the darknet. Since Wikileaks and Snowden we should be aware that even more is going on in the invisible web than ever before. Devine & Egger Sider, only mention the darknet or dark web as an area not to treat. This is slightly disappointing.

If you have already the 2009 edition of this book, there is no need to upgrade to the current version.

References
Bergman, M.K. (2001). White Paper: The Deep Web: Surfacing Hidden Value. The Journal of Electronic Publishing, 7(1). http://dx.doi.org/10.3998/3336451.0007.104
Devine, J., & Egger-Sider, F. (2004). Beyond Google : The invisible Web in the academic library. The Journal of Academic Librairianship, 30(4), 265-269. http://dx.doi.org/10.1016/j.acalib.2004.04.010
Devine, J., & Egger-Sider, F. (2009). Going beyond Google : the invisible web in learning and teaching. London: Facet Publishing. 156p.
Devine, J., & Egger-Sider, F. (2014). Going beyond Google again : strategies for using and teaching the Invisible Web. Chicago: Neal-Schuman, an imprint of the American Library Association. 180p.
Lewandowski, D., & Mayr, P. (2006). Exploring the academic invisible web. Library Hi Tech, 24(4), 529-539. http://dx.doi.org/10.1108/07378830610715392 OA version: http://eprints.rclis.org/9203/
Sherman, C., & Price, G. (2001). The invisible web: Discovering information sources search engines can’t see. Medford NJ, USA: Information today. 439p.
Ford, N., & Mansourian, Y. (2006). The invisible web: An empirical study of “cognitive invisibility”. Journal of Documentation, 62(5), 584-596. http://dx.doi.org/10.1108/00220410610688732

Other reviews for this book
Malone, A. (2014). Going Beyond Google Again: Strategies for Using and Teaching the Invisible Web, Jane Devine, Francine Egger-Sider. Neal-Schuman, Chicago (2014), ISBN: 978-1-55570-898-6. The Journal of Academic Librarianship, 40(3–4), 421. http://dx.doi.org/10.1016/j.acalib.2014.03.006
Mason, D. (2014). Going Beyond Google Again: Strategies for Using and Teaching the Invisible Web. Online Information Review, 38(7), 992-993. http://dx.doi.org/10.1108/OIR-10-2014-0228
Stenis, P. (2014). Going Beyond Google Again: Strategies for Using and Teaching the Invisible Web. Reference & User Services Quarterly, 53(4), 367-367. http://dx.doi.org/10.5860/rusq.53n4.367a
Sweeper, D. (2014). A Review of “Going Beyond Google Again: Strategies for Using and Teaching the Invisible Web”. Journal of Electronic Resources Librarianship, 26(2), 154-155. http://dx.doi.org/10.1080/1941126x.2014.910415

Grey Literature at Wageningen UR, the Library, the Cloud(s) and Reporting

A while back I gave a presentation at the offices of SURF during a small scale seminar on Grey Literature in the Netherlands. The occasion was the visit of Amanda Lawrence to SURF to discuss Grey Literature in the Netherlands.

I was invited to give a presentation of Grey Literature at Wageningen UR. The slides I used are shared in this Slideshare.

Where I always assume that slides tell their story themselves. It is perhaps a good idea to provide some narrative in this blog post to explain certain parts that are perhaps less obvious. In the first slides I present the university and the research institutes at Wageningen. The student staff ratios are so favourable since Wageningen UR comprises of a university and a number of substantive research institutes that concentrate on research in the life sciences only and have no teaching obligations.

CRIS and repository

At the library we manage two systems for the whole organization that are closely intertwined. The current research information system (CRIS) called Metis. In Metis we register all output of Wageningen UR faculty and staff. The data entry normally takes places at the chair group level. Most often the secretary of the chair group or business unit is responsible and the library checks the quality of the data entry and maintains the various lists that facilitates data entry and quality control. Output registration in the metis is really comprehensive, since evaluations, award of bonuses, prizes, promotions, research assessments and external peer reviews take place on the metadata registered in the Metis.

All information that is registered in the CRIS, Metis, is published in our institutional bibliography called Staff Publications. I prefer the term institutional bibliography since the term repository is often associated with Open Access repositories or (open access) institutional repositories only. Whereas in my view the institutional bibliography is the comprehensive metadata collection for all output of the institution including, but not limited to, Open Access publications. It goes without saying that data sets are an integral part of the research output, and we are starting to register datasets in our systems as well.

The coupling of the CRIS and the institutional bibliography exists only since 2003. We have in our bibliography a collection of 90,000 heritage metadata records of lesser quality. Of the 200,000+ items in our repository 25% contain open access items. Looking at the peer reviewed journal articles registered in our staff publications (indicated as WaY in the graph) you can see that it closely follows th enumber of articles that van be retrieved from either Scopus or Web of Science. There are differences in the coverage between Web of Science and Scopus, but both databases seem to cover Wageningen UR output quite closely. Or not?

Comprehesive registration

In slides 6 I show all metadata registrations of publication output. Reaching more than 12,500 items described for the publication year 2010. In the year the number of peer reviewed articles registered was only around 2700 peer reviewed publications. We registered nearly 10,000 other items of research output. In slide 7 I present an overview of the various document types registered on top of peer reviewed publications only. Most important are the “other” articles, those are
articles published in trade or vocational journals. These have very often to do with the societal role the university and research institutes play. These articles are aimed at the larger public and therefore very often in Open Access as well. Book chapters and reports are also very substantial amounts of publications. The reports are most often aimed at the various ministries for which the research institutes work and most often published as OA reports as well. With book chapters this is often not the case. On a yearly basis they are not so conspicuous, but the PhD-theses are nearly all available in Open Access or in a few cases as delayed Open Access. The other items include presentations, brochures, lectures, patents, interviews for newspapers, radio or TV. It is all registered. It all makes a very substantial addition to the peer reviewed publications only.

Dissemination to the Cloud(s)

The institutional bibliography plays a crucial role in the dissemination of information to other parties. All metadata records are indexed in both Google and slightly less in Google Scholar, but we experience problems with Google Scholar indexing the full text of our Open Access publications, since the full text files are located on a separate filing systems. All Dutch language publications are disseminated trough Groenkennisnet.nl a portal for education and practitioners in the green sector. Wageningen UR Staff Publications is fully OAI/PMH compatible and data is disseminated to Narcis, the overarching repository of repositories in the Netherlands. Other repository aggregators include OAISTER and BASE. The information is harvested by the FAO, which plays a pivotal role in the dissemination of agricultural information in the world. All our PhD-theses are disseminated to DART-Europe the Electronic Theses and Dissertations (ETD) portal for Europe. With our retrospectively digitized collection of theses we are the 12th largest collection of PhD theses in Europe.

Open Access

The growth of Open Access publications is a steady one, although we occasionally face sets backs. Last year for instance we got claims from photographers whose images were used in trade journals for illustration purposes, and the IP rights for electronic dissemination were not rightfully addressed. Currently we just passed the 50,000 OA publications border. When you look at all depositions of OA material in Dutch repositories, Wageningen UR stands out in depositing current material (slide 10). Outperforming any of the other universities (slide 11). Looking at the documentation types of the recent material deposited, it is immediately apparent that Wageningen deposits relatively large numbers of reports and contributions to periodicals (the trade and vocational journals) and also deposits more conference papers as Open Access publications.

De deposition of green OA peer reviewed journal articles is not very successful. We don’t have an intuitive system for the researchers to deposit their publication in place. The library systematically checks the publications and see what we are allowed to do with the publishers versions of the article. In the first place we look at the DOAJ journal list, and actively load those articles in the repository. Secondly we look at the Sherpa/Romeo list of publishers allowing the delayed archiving of publishers PDF. The third list, not truly OA, is the list of publishers allowing free to read access after an embargo period, which we link. A last resort, could be, to link to deposited material in PMC. But we haven’t done that yet. The first two steps leads to 23% of our peer reviewed journal articles being available in Open Access, steps 3 and 4 still need to be executed.

Grey Literature

Why are we so successful in collecting the grey literature output? At the university registration of output is grind in the system. We started at the university in 1975 already and it took years before everybody complied. But faculty and staff are now quite used to do this. Registration also leads to comprehensive reports on publications activities of researchers and research groups. For the relatively recent introduced tenure track, the systems calculates the research credits for the candidates. For staff we provide an attractive graphic overview of their publications with various par charts and pie charts and their co-author network, but most important is a bibliometric report on the basis of articles published in journals covered in the Web of Science, benchmarked on the basis of the baselines from the Essential Science Indicators.

If all universities register the publication output more comprehensively in their current research information systems, these outputs can then be made available trough their repositories. In the example of the publication in Dutch on Culicoides, we see that it concerns a report by researchers from Utrecht University, but this report is not to be found in their OA repository (The publication is not scientific!?) nor in the catalogue of the university. If Narcis would be made the official tool for reporting publication output to the ministry of education on publication out put in the Netherlands in a transparent and verifiable way, publications like these will make a chance to be collected, described and curated.

If the OA repository infrastructure in the Netherlands improves, Narcis can be turned into a service as link resolver. Using the DOI, we could resolve that against the publishers site, but also to Narcis which point to an OA version of the same paper at a repository of one of the universities. In the case of public libraries in the Netherlands, we could configure a national link resolver that exposes OA material in addition to the efficient Google Scholar Open Access material. This is important since not all repository content is discovered in Google Scholar.

Knowledge Economy

With regards to a new knowledge economy, a important report was published quite recently. However, the report did not mention libraries, did not mention repositories, did not mention grey literature. So there is still a world to win for comprehensive institutional repositories that collects and disseminate all the grey literature that is openly available.

References
WRR. 2013. Naar een lerende economie : Investeren in het verdienvermogen van Nederland. WRR report Vol. 90. Amsterdam: Amsterdam University Press. 440 pp. http://www.wrr.nl/publicaties/publicatie/article/naar-een-lerende-economie-1/

Karen Calhoun on digital libraries

Review of : Calhoun, K. 2014. Exploring digital libraries : Foundations, practices, prospects. Chicago: Neal-Schuman. 322p.

As a library practitioner I am always a bit weary about the term digital libraries. I have had sincere doubts about the role of library practitioners in digital libraries

“some would argue that digital libraries have very little to do with libraries as institutions or the practice of librarianship”

(Lynch, 2005). But this new book of Karen Calhoun has removed al my reservations against the term digital libraries, and built the bridge from digital library research to practical librarianship.

First of all, Calhoun has written an excellent book. Go, buy, read and learn from it. For anybody working in today’s academic library settings, a must read. Calhoun elegantly makes the connection between the digital library scientists that started in the previous century and the last decade, to the current questions we are dealing with in the academic library setting.

Calhoun describes the context around the usual pathways, from link resolvers, to the metalib solutions ending with the current options of discovery tools. But those off the shelf solutions are not too exciting.

Where I liked the book the most, and learned a lot was around the chapters on the repository. Those are insightful chapters, albeit I didn’t always agree with Cahoun’s views. Calhoun and I probably agree on the fact that repositories are the most challenging areas for academic libraries to be active in. Calhoun did not address the fact that this has resulted in an enormous change in workflow. In the classical library catalogue we only dealt with monographs and journals. In repositories we are dealing with more granular items such as book chapters, contributions to proceedings, articles and posters. That is not only a change from paper to digital, but also a completely different level of metadata descriptions. That are changes that we are still struggling to grasp with. I see in the everyday practice.

A shortcoming of the book is that Calhoun equated repositories with open access repositories. That is a misnomer to my mind. It is perhaps the more European setting where most academic libraries get involved in current research information systems (CRIS). This crisses form an essential part in the university digital infrastructure and feed a comprehensive institutional repository. The repository becomes thus far more than only a collection of OA items. Dear Karen have a look at our repository. More than 200,000 items collected, of which 50,000 available in Open Access. But more important, next to the peer 55,000 peer reviewed articles we have nearly 35,000 articles in professional or trade journals that boast our societal impact. We have also 27,000+ reports, nearly 18,000 abstracts and conference contributions as well. Institutional repositories to my mind should be more than Open Access repositories of peer reviewed journal articles alone. The institutional repository plays an important role in dissemination al kinds a “grey” literature output. Calhoun could probably learn more from the changing European landscape where CRIS and repositories are growing to each other and as a result completely new library role arises, when libraries can get a role in the management of the CRIS. But that is a natural match. Or should be.

What Calhoun made me realize is that we have a unique proposition in Wageningen. Our catalogue is comprehensively indexed in Google and nearly as well in Google Scholar. The indexing for our repository goes well in Google, but for our repository we are still struggeling to get the contents in Google Scholar. We have a project under way to correct this. But no success guaranteed, since Google Scholar is completely different from Google. No ordinary SEO expert has experience with these matters. But that we are indexed both in Google as well as Google Scholar are valuable assets. With our change to WorldCat local we have something to loose. We should tread carefully in this area.

Where I learned a lot from Calhoun, is from those chapters I normally don’t care too much about. The social roles of digital libraries and digital library communities. Normally areas, and literature, I tend to neglect, but the overview presented by Calhoun, really convinced me to solicit more buy-in for our new developments. We are in the preparation of our first centennial (in 2018) and running a project to collect and digitize all our official academic output. Where we present the results? Our comprehensive institutional bibliography! Of course. Not an easy task, but we are building our own, unique, digital library.

Disclaimer: I don’t have an MLIS, but work already for nearly 15 years with a lot of pleasure at Wageningen UR library, where I work in the area of research support.

References
Calhoun, K. 2014. Exploring digital libraries : Foundations, practices, prospects. Chicago: Neal-Schuman. 322p.
Lynch, C. 2005. Where do we go from here? The next decade for digital libraries. D-Lib Magazine, 11(7/8) http://www.dlib.org/dlib/july05/lynch/07lynch.html