Karen Calhoun on digital libraries

Review of : Calhoun, K. 2014. Exploring digital libraries : Foundations, practices, prospects. Chicago: Neal-Schuman. 322p.

As a library practitioner I am always a bit weary about the term digital libraries. I have had sincere doubts about the role of library practitioners in digital libraries

“some would argue that digital libraries have very little to do with libraries as institutions or the practice of librarianship”

(Lynch, 2005). But this new book of Karen Calhoun has removed al my reservations against the term digital libraries, and built the bridge from digital library research to practical librarianship.

First of all, Calhoun has written an excellent book. Go, buy, read and learn from it. For anybody working in today’s academic library settings, a must read. Calhoun elegantly makes the connection between the digital library scientists that started in the previous century and the last decade, to the current questions we are dealing with in the academic library setting.

Calhoun describes the context around the usual pathways, from link resolvers, to the metalib solutions ending with the current options of discovery tools. But those off the shelf solutions are not too exciting.

Where I liked the book the most, and learned a lot was around the chapters on the repository. Those are insightful chapters, albeit I didn’t always agree with Cahoun’s views. Calhoun and I probably agree on the fact that repositories are the most challenging areas for academic libraries to be active in. Calhoun did not address the fact that this has resulted in an enormous change in workflow. In the classical library catalogue we only dealt with monographs and journals. In repositories we are dealing with more granular items such as book chapters, contributions to proceedings, articles and posters. That is not only a change from paper to digital, but also a completely different level of metadata descriptions. That are changes that we are still struggling to grasp with. I see in the everyday practice.

A shortcoming of the book is that Calhoun equated repositories with open access repositories. That is a misnomer to my mind. It is perhaps the more European setting where most academic libraries get involved in current research information systems (CRIS). This crisses form an essential part in the university digital infrastructure and feed a comprehensive institutional repository. The repository becomes thus far more than only a collection of OA items. Dear Karen have a look at our repository. More than 200,000 items collected, of which 50,000 available in Open Access. But more important, next to the peer 55,000 peer reviewed articles we have nearly 35,000 articles in professional or trade journals that boast our societal impact. We have also 27,000+ reports, nearly 18,000 abstracts and conference contributions as well. Institutional repositories to my mind should be more than Open Access repositories of peer reviewed journal articles alone. The institutional repository plays an important role in dissemination al kinds a “grey” literature output. Calhoun could probably learn more from the changing European landscape where CRIS and repositories are growing to each other and as a result completely new library role arises, when libraries can get a role in the management of the CRIS. But that is a natural match. Or should be.

What Calhoun made me realize is that we have a unique proposition in Wageningen. Our catalogue is comprehensively indexed in Google and nearly as well in Google Scholar. The indexing for our repository goes well in Google, but for our repository we are still struggeling to get the contents in Google Scholar. We have a project under way to correct this. But no success guaranteed, since Google Scholar is completely different from Google. No ordinary SEO expert has experience with these matters. But that we are indexed both in Google as well as Google Scholar are valuable assets. With our change to WorldCat local we have something to loose. We should tread carefully in this area.

Where I learned a lot from Calhoun, is from those chapters I normally don’t care too much about. The social roles of digital libraries and digital library communities. Normally areas, and literature, I tend to neglect, but the overview presented by Calhoun, really convinced me to solicit more buy-in for our new developments. We are in the preparation of our first centennial (in 2018) and running a project to collect and digitize all our official academic output. Where we present the results? Our comprehensive institutional bibliography! Of course. Not an easy task, but we are building our own, unique, digital library.

Disclaimer: I don’t have an MLIS, but work already for nearly 15 years with a lot of pleasure at Wageningen UR library, where I work in the area of research support.

Calhoun, K. 2014. Exploring digital libraries : Foundations, practices, prospects. Chicago: Neal-Schuman. 322p.
Lynch, C. 2005. Where do we go from here? The next decade for digital libraries. D-Lib Magazine, 11(7/8) http://www.dlib.org/dlib/july05/lynch/07lynch.html

National Library of the Netherlands discloses its Google Books Contract

After the successful disclosure of the agreement between the British Library and Google Books on the basis of the Freedom of Information Act, the National Library of the Netherlands (KB) also disclosed their agreement with Google Ireland today. Albeit the director of the KB tweeted a day ago that not all public information needed to be available on the Web, it was decided to publish the agreement on the Web since there were two WOB (a Dutch version of FOIA) procedures underway to get insight in the agreement.

Albeit I am not a lawyer, a few thins caught my eye. The agreement is very similar to the agreement between Google and the British Library. Bert Zeeman pondered the idea of standard Google contracts in this respect. This seems to go for the exception of the number of volumes in the public domain that will be digitized, 250,000 in the UK and 160,000 in the Netherlands (clause 2.1).

What struck me as interesting was the use of the libraries digital copies, clause 4.8 “the library may provide all or any portion of the library digital copy… to (a) academic institutions or research or public libraries, ….” But we are not able to “providing search or hosting services substantially similar to those provided by Google, including but not limited to those services substantially similar to Google book search”. I guess that leaves out the other academic libraries in the Netherlands to include these digital copies in their discovery tools. It is tempting, but I see problems on the horizon. We seem to be left with separate information silos whereas integration with the rest of the collection would be really interesting. It becomes more explicit in clause 4.9 where it is stated that “nothing in this agreement restricts the library from allowing Europeana to crawl the standard metadata of the digital copies provided to library by Google.” We would be more interested in the data rather than the metadata.

But then again, it is up to the lawyers to see what’s allowed and what’s not. But then again, again, after fifteen years all restrictions on the use or distribution terminate (clause 4.7), a bit long according to the open rights group. However, we have experience with building academic library collections, it takes ages. Those fifteen years are over in the wink of a young girl’s eye.

How Wiley made a mess of the Synergy and InterScience integration

Two weeks ago we were forewarned that Wiley would integrate all the content of the Blackwell Synergy on Wiley InterScience platform. It would only disrupt the service of the systems over the weekend of June 28-29. When I received this notification I thought immediately about Péter’s picks&pans (2007) where he investigated the capabilities of both platforms.

Just a few quotes from his review:

A merger of the Blackwell Synergy and the Wiley Interscience collections using the software of the latter would certainly not produce Synergy. On the contrary, the serious software deficiencies om Interscience would weaken performance and functionality of Blackwell Synergy, which uses the excellent Atypon software.

[Synergy] This is a very well-designed system enhanced by complementary information – as you should expect these days.

Wiley made no efforts to improve its software. The software keeps fooling itself and the searchers by offering dysfunctional and nonsense options.

It is a severe sign of dementia when people do not recognize their own name. So is the syndrome that Wiley keeps listing some of its very own journal some of the time under the label “Cited Articles available from other publishers” and/or keeps ignoring them in the citation tracking.

In a subsequent chat with our serials librarian, he indicated that he preferred the Blackwell Synergy platform behind the scenes much more that the Wiley InterScience platform. From my own viewpoint, I regretted this move as well, since Blackwell was already Counter compliant for quite some time and the Counter reports have been audited as well, whereas Wiley Synergy was and still is not Counter compliant. That is a very serious shortcoming for one a the largest scientific publishing houses.

So users had something too loose in ease of use possibilities and librarians as well after this announcement of abandoning the Synergy platform.

What was intended to take only a mere weekend, has continued for a whole week. All Dutch university libraries faced problems with access to both Wiley and Blackwell journals. We have to sit and wait and see if the problems have been resolved during this weekend. Meanwhile I find it disappointing that Wiley makes no mention of these problems on their transition page.

Facing these problems I can only pay a compliment to Péter who foresaw what was coming up on us in March 2007 already. “A merger of the Blackwell Synergy and the Wiley Interscience collections using the software of the latter would certainly not produce Synergy”.

Jacsó́, P. (2007). SpringerLink, Blackwell Synergy, Wiley InterScience. Online(Jul/Aug 2007): 49-51. http://www.jacso.info/PDFs/jacso-springerlink-blackwell-wiley.pdf

Library website design, search engine crawlers and SEO

Digital libraries have tons of data, and when they don’t have the data in digital format , they have really nice and structured electronically available metadata about those data. Library catalogs we call them. They are plain ordinary databases and come in all kind of flavours.

When I joined the library behind the scenes some eight years ago, indexing of the library catalog was off limits for the search engines. That would cripple the system, at the expense of the users! Actually we were talking about Altavista and AllTheWeb in those days. Albeit, Google was around already. Times have changed though. We have taken away all kinds of no-index no-follow signs on our system and the first catalog cards are being indexed by search engines. We are just starting to use RSS and OAI as sitemaps for Google. But this is not the only approach that should be taken. The site should become optimized for the Google bots and crawlers of all kind of search engines. Although Google is by far the most important search engine at this moment.

Interesting to look back in my archives, a study was done carried out two years ago by Drunk men work here. It is not a peer reviewed study so it seems, but interesting nevertheless. In their research they compared the crawling behaviour of the Google, Yahoo! and MSN bots on a really large site that was set up as a binary search tree. Quite amazing, the Yahoo! bot showed to be the most proficient bot, having indexed most underlying pages and down to the deepest level. The Google bot followed at quite some distance and MSN came in last.

How matters have changed over the last two years. Smith and Nelson (2008) built two a large digital library websites to study crawler behaviour. They compared wide and deep linking design of the websites. It appeared that conventional wisdom held true, in that the wide design sites were indexed twice as fast. In the case of google 18 days compared to 44 days. The Yahoo! crawler failed to index the complete site. The MSN bot took more than 200 days in the wide design and failed to completely index the entire website when using the deep design.

The latest article I read which touched this subject was Jody L. DeRidder (2008) who explored the use of Google sitemaps and static browse pages (for which I have been pleading already for so long, not so much for robots as well as for our human users) and they concluded that -with a relatively small sample- static browse pages enhanced the crawling and indexing by search engines.

Having digested this all, I think we are back again to our thesaurus and classification system, and use those logical trees for entry of the crawlers into our catalog. Isn’t it nice that we indexed all our records already manually for years and that we can make use of that system as efficient highways into our catalog for the crawlers. Old systems used for new purposes.

Release the spiders!

The second part of the exercise becomes of course, design efficient catalog records that rank well in the search engine result pages. Wonder who has formally studied those matters? Any suggestions?

DeRidder, J. L. (2008). Googlizing a digital library. Code4Lib journal, 2. http://journal.code4lib.org/articles/43
Smith, J. A. and M. L. Nelson (2008). Site design impact on robots: An examination of search engine crawler behaviour at deep and wide websites. D-Lib Magazine, 14(3/4). http://www.dlib.org/dlib/march08/smith/03smith.html