Defrosting the digital library

ResearchBlogging.org Duncan Hull, Steve R. Pettifer and Douglas B. Kell (2008) wrote an interesting review on the current state of personal digital libraries. It is perhaps important to stress the fact that in the end the review focused on personal digital libraries, where a lot can also be written on digital libraries at higher aggregation levels. But including those digital libraries at higher aggregation levels would take another review. Anyway, many of the observations for building personal digital libraries they describe are right and come straight from the workbench of the practicing systems biologist. But still some additional observations could have been addressed in this review as well.

Today most publications are born digital, distributed digital but consumed on paper. Hull et al.’s paper I read mostly on the train and later on the plane. On such occasions you still scribble on paper. Make some notes in the margin and highlight some references to check out at a later date.

Although I could have downloaded it to my laptop, or an e-book reader. The majority of users of digital libraries prefer to download and print a PDF document and peruse the publication at their favourite spot at their own leisure. This digital – paper divide still affects the quality of personal digital libraries. At the moment of drafting the first version of this blogpost I just found out that I didn’t download a copy of the paper to my laptop yet, or stored the metadata to EndNote. I have to remember to do that at a later date.

I don’t think that current generations of scientists are capable to overcome this digital-paper divide in their daily workflow at the moment. They haven’t grown up to do so yet, and the tools at hand still hamper a fluent digital workflow. Screen resolutions being to poor. Laptops being too bulky. Wi-Fi is not always available or at prohibiting costs. Interaction with PC through bulky keypads is clumsy or keypads are too small. All these little nuisances make a truly digital workflow an utopian vision.

Actually the most popular format for the electronic articles in the personal sphere is the PDF. A PDF is fine in print, but a nuisance for reading on computer screens, or e-book readers. Most journal articles have a 2-collum layout, which makes reading a PDF version of an article on an electronic device a arduous task.

Having said all that, the conclusion is in accordance with Hull et al. that the current state of personal digital libraries leaves something to be desired. To solve these problems a number of stakeholders are involved. The primary publishers of scholarly publications (Elsevier, Springer, Wiley etc…), the secondary publication databases (Scopus, WoS, PubMed etc…), local libraries in their role as gatekeepers to the licensed content, the scientists themselves as producer and consumer of scholarly publications and their willingness to leave the beaten track and adopt new ways of performing science. Last but not least the science managers who rank and rate the performance of their scientist based on the paper trail in the most prestigious scholarly journals.

Too date the paper trail is still very visible in all digitally born publications. Have a look at the reference list of a publication, and it is still infested with publication years, volumes, issues and page numbers. The publication year is a very amusing example indeed. Many publication appear online in advance of print, and receive an official –paper- publication year only months later. Many journal platforms resolve a link in the electronic environment to a digital copy of the reference trough Crossref or other linking services. But having a printed article at hand this link is literally broken. A brief URI is really how publications should be cited, and allow quick lookup when a computer is at hand.

In the phase of preparing a publication for submission this paper trail becomes obvious as well. Instructions to authors for each journal outshine each other in the most exotic layout requirements for the reference lists such as small capitals, bold publication years italics et cetera. These paper based instructions to authors take precious time from authors and editors alike, in the preparation of the manuscript or the editing and proof reading (Leslie & Davidson, 2007). All these eye pleasing variations in the layout of reference lists leads to missed impact because of the difficulty with interpreting reference lists by citation data harvesters like WoS, Scopus or Google Scholar.

Interesting to note in Hull et al.’s article with the description of the URI from Elsevier’s Scopus, the paper trail pops its ugly face around the corner yet again. This URI is based on open url and the simplest designation to the metadata record for an article include volume, issue and starting page. It is meaningful to a human reader, but in a digital workflow, it becomes overly complicated. It is to be foreseen that in the near future volumes and issues of journals cease to exist anyway.

But an open url is better than the example from WoS where the Hull et al. had difficulty to make a working URI on the basis of the ISI number included in all records from Web of Science (It is still called ISI number, despite the change of company name twice already since ISI was bought by Thomson). When you use EndNote and download a metadata record from Web of Science to EndNote, an URL will be created by EndNote on the fly, when you hit ctrl+G, based on the downloaded ISI number. It is a very long and tedious uri, but you can trim some parameters from the url and you end up with a functioning URL, with a valid session paprameter.

As described by Hull et al. it seems odd that whether you are at a primary publisher or at a database from a secondary publishers, a scientist normally has to make two saves. First for the metadata followed by saving the actual article. Only thereafter the metadata and the article can be reunited in their favourite reference manager. That really is a few saves and clicks too many. Scopus has facilitated downloading primary articles with the help of the Quosa software, but downloading the articles and the metadata are still two separate processes.

In case of Google Scholar Hull et al. make a mistake. Google Scholar can work with the link resolver of most institution’s libraries. And with most link resolvers it is possible to download metada to for instance EndNote. The snag here is that it only works on a reference per reference basis. Making it a tedious task to download the metadata from, say, twenty records from Google Scholar. Probably the worst download limitation in the scholarly information landscape.

Download limitations are an important point that wasn’t raised in the article. These vary highly between database vendors. From Web of Science one can download the metadata for 500 references at once. But using the marked list you repeat it for various sets so the work around is to download records 1-500, 501-1000, 1001- etc… In Scopus the download limit is set at 2000 records. More generous already, but limiting if you move away from personal digital libraries to digital libraries for some text mining work, or serious systems biology work. In our experience, this limit is most likely to be negotiable in the contracts between the library and database vendor. But the limitations on download are highly variable per database and in most cases annoyingly low.

Following on downloading, it is sometimes desirable to enhance some of your metadata records with additional metadata, or update some metadata. The availability of API’s for bibliographic databases becomes desirable for such occasions. Consider for instance that you have downloaded citation data for most of your records. It is logical after some period of time to be able to update this data. At this moment this seems to be an impossibility. Documented API’s of bibliographic databases are rare. Pubmed’s API being the best example of what could be possible in this area. Elsevier seems to be moving in that direction too.

I have indicated some additional points for the personal digital library agenda for the future in this blogpost. There are more takes on the construction of personal digital libraries in the future possible. The main challenge is to leave the paper trail and enabling a purely digital workflow. That will take some time to achieve, and a lot of imagination of all players involved.

References
Hull, D., S. R. Pettifer, et al. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web. PLoS Computational Biology 4(10): e1000204. http://dx.doi.org/10.1371/journal.pcbi.1000204
Leslie Jr., D. M. and M. J. Hamilton (2007). A Plea for a Common Citation Format in Scientific Serials. Serials Review 33(1): 1-3. http://dx.doi.org/10.1016/j.serrev.2006.11.009 (Subscription required)

Tagging, social bookmarking, folksonomies, reference management, LibraryThing and the Library

Tomorrow the 32nd ELAG symposium starts. The ELAG symposiums are special in the way that workshop around pre-selected subject form the mainstay of the conference. So they told me. The workshops not the ordinary workshop, where you passively attend to learn something. The idea behind these workshops is that the participants brainstorm over a subject, perhaps that the workshop leader knows some more of the background of the subject, but the workshop itself is a true group exercise.

I was asked to moderate the workshop of social-tagging. Until now the distribution of articpants over the workshop is a bit uneven, so I need to sell my workshop on “social tagging” tomorrow in a sales pitch to the attendents of the symposium. So what will I tell them?

The title of the workshop Social tagging is a combination of two terms, tagging and social bookmarking. At first sight they don’t seem to be the most spectacular subjects to ponder over in the ELAG workshops. But when the constituency of your library is adding tags to all kind of video’s, photographs and websites, wouldn’t you at least not give them the possibility to tag you library resources as well? Is it already possible in your library OPAC? Well, what about the bibliographic databases that your library licences, why can’t users tag those items yet? If they are tagging ‘your’ resources already the obvious questions to ask are, which items are they tagging and what tags are they using. What can we learn from our users.

Can we use those tags from to improve the recall and ranking from our library systems? How should these folksonomies be combined, enhanced, complemented with our formal taxonomies?

If your users can tag any item on your library system, where should the tags and tagged items be collected. Should it be a homegrown system like they have developed at Pennsylvania University Library (Penntags), Harvard Law Library (H2O) or recently at Michigan (MTagger), should we advise to use the tools developed by the big scientific publishers such as 2Collab from Elsevier, Connotea from Nature or Scholar from Blackboard? Or should our academics and their precious labour on tagging be shared on common bookmarking sites such as del.icio.us, furl and the like. Is CiteUlike perhaps the best solution after all?

When it comes to saving library items we supported already reference management programmes such as EndNote and Refworks. What is the relations between social bookmarking sites and the very popular reference management programmes. RefWorks is much better than EndNote at handeling websites, but they haven’t been developed as social bookmarking sites yet. On the other hand, Connotea and 2Collab are social bookmarking sites that have some, reference management capacity but they don’t stand up in the competition to EndNote en Refworks in this respect.

LibraryThing is perhaps an odd case in this workshop, but has some very intriguing features. Some libraries are already using the tags and recommendations from LibraryThing in their catalog. Interesting, I am not aware of an example where items tagged in a library catalog and those tags being used to enrich LibraryThing. Perhaps it exists already. I don’t know yet. LT is to some extend a special case of a reference management software. It is only used for books. An awfull lot of books. It is therefore quite easy to add your own books to LibraryThing. At our university we are all the time confronted with organically grown collections of books that are not part of the library collection. Consider the idea that those collections of books were entered in LibraryThing, that we could use the collected LibraryThings from our constituency to see if a book we don’t have in our collection is somewhere on campus, rather than rushing to the order book button. LibrarThing from our trusted users as a natural extension of our catalog and library collection?

Those are the five lines along which I hope to ponder the theme of this workshop with a group of smart library people over the next three days. Lorcan Dempsey wrote recently on this subject as a new bibliographic tissue.

This new bibliographic tissue is really hot ladies and gentleman, please come and exchange your ideas with me in this workshop.