One year

I did not notice until today, but the first anniversary of this blog was on November 13th. There are currently 87 posts and 109 comments which have attracted just over 10,000 unique visitors. Most visitors (2,700) came from the USA follow by 2,500 from the Netherlands. Which surprised me. I would have thought it the other way around. Quite a lot of visitors through my Dutch blog.

Inclusion of this blog in Walt Crawford latest book surprised me, but it feels like an honour. I still should read what he has to say, but ordering by libraries goes a bit slow.

Maintaining two blogs in different langauges took more effort than I thought at the beginning, but it the experiment is a succes in that both blogs have very different voices. Which really like.

Defrosting the digital library Duncan Hull, Steve R. Pettifer and Douglas B. Kell (2008) wrote an interesting review on the current state of personal digital libraries. It is perhaps important to stress the fact that in the end the review focused on personal digital libraries, where a lot can also be written on digital libraries at higher aggregation levels. But including those digital libraries at higher aggregation levels would take another review. Anyway, many of the observations for building personal digital libraries they describe are right and come straight from the workbench of the practicing systems biologist. But still some additional observations could have been addressed in this review as well.

Today most publications are born digital, distributed digital but consumed on paper. Hull et al.’s paper I read mostly on the train and later on the plane. On such occasions you still scribble on paper. Make some notes in the margin and highlight some references to check out at a later date.

Although I could have downloaded it to my laptop, or an e-book reader. The majority of users of digital libraries prefer to download and print a PDF document and peruse the publication at their favourite spot at their own leisure. This digital – paper divide still affects the quality of personal digital libraries. At the moment of drafting the first version of this blogpost I just found out that I didn’t download a copy of the paper to my laptop yet, or stored the metadata to EndNote. I have to remember to do that at a later date.

I don’t think that current generations of scientists are capable to overcome this digital-paper divide in their daily workflow at the moment. They haven’t grown up to do so yet, and the tools at hand still hamper a fluent digital workflow. Screen resolutions being to poor. Laptops being too bulky. Wi-Fi is not always available or at prohibiting costs. Interaction with PC through bulky keypads is clumsy or keypads are too small. All these little nuisances make a truly digital workflow an utopian vision.

Actually the most popular format for the electronic articles in the personal sphere is the PDF. A PDF is fine in print, but a nuisance for reading on computer screens, or e-book readers. Most journal articles have a 2-collum layout, which makes reading a PDF version of an article on an electronic device a arduous task.

Having said all that, the conclusion is in accordance with Hull et al. that the current state of personal digital libraries leaves something to be desired. To solve these problems a number of stakeholders are involved. The primary publishers of scholarly publications (Elsevier, Springer, Wiley etc…), the secondary publication databases (Scopus, WoS, PubMed etc…), local libraries in their role as gatekeepers to the licensed content, the scientists themselves as producer and consumer of scholarly publications and their willingness to leave the beaten track and adopt new ways of performing science. Last but not least the science managers who rank and rate the performance of their scientist based on the paper trail in the most prestigious scholarly journals.

Too date the paper trail is still very visible in all digitally born publications. Have a look at the reference list of a publication, and it is still infested with publication years, volumes, issues and page numbers. The publication year is a very amusing example indeed. Many publication appear online in advance of print, and receive an official –paper- publication year only months later. Many journal platforms resolve a link in the electronic environment to a digital copy of the reference trough Crossref or other linking services. But having a printed article at hand this link is literally broken. A brief URI is really how publications should be cited, and allow quick lookup when a computer is at hand.

In the phase of preparing a publication for submission this paper trail becomes obvious as well. Instructions to authors for each journal outshine each other in the most exotic layout requirements for the reference lists such as small capitals, bold publication years italics et cetera. These paper based instructions to authors take precious time from authors and editors alike, in the preparation of the manuscript or the editing and proof reading (Leslie & Davidson, 2007). All these eye pleasing variations in the layout of reference lists leads to missed impact because of the difficulty with interpreting reference lists by citation data harvesters like WoS, Scopus or Google Scholar.

Interesting to note in Hull et al.’s article with the description of the URI from Elsevier’s Scopus, the paper trail pops its ugly face around the corner yet again. This URI is based on open url and the simplest designation to the metadata record for an article include volume, issue and starting page. It is meaningful to a human reader, but in a digital workflow, it becomes overly complicated. It is to be foreseen that in the near future volumes and issues of journals cease to exist anyway.

But an open url is better than the example from WoS where the Hull et al. had difficulty to make a working URI on the basis of the ISI number included in all records from Web of Science (It is still called ISI number, despite the change of company name twice already since ISI was bought by Thomson). When you use EndNote and download a metadata record from Web of Science to EndNote, an URL will be created by EndNote on the fly, when you hit ctrl+G, based on the downloaded ISI number. It is a very long and tedious uri, but you can trim some parameters from the url and you end up with a functioning URL, with a valid session paprameter.

As described by Hull et al. it seems odd that whether you are at a primary publisher or at a database from a secondary publishers, a scientist normally has to make two saves. First for the metadata followed by saving the actual article. Only thereafter the metadata and the article can be reunited in their favourite reference manager. That really is a few saves and clicks too many. Scopus has facilitated downloading primary articles with the help of the Quosa software, but downloading the articles and the metadata are still two separate processes.

In case of Google Scholar Hull et al. make a mistake. Google Scholar can work with the link resolver of most institution’s libraries. And with most link resolvers it is possible to download metada to for instance EndNote. The snag here is that it only works on a reference per reference basis. Making it a tedious task to download the metadata from, say, twenty records from Google Scholar. Probably the worst download limitation in the scholarly information landscape.

Download limitations are an important point that wasn’t raised in the article. These vary highly between database vendors. From Web of Science one can download the metadata for 500 references at once. But using the marked list you repeat it for various sets so the work around is to download records 1-500, 501-1000, 1001- etc… In Scopus the download limit is set at 2000 records. More generous already, but limiting if you move away from personal digital libraries to digital libraries for some text mining work, or serious systems biology work. In our experience, this limit is most likely to be negotiable in the contracts between the library and database vendor. But the limitations on download are highly variable per database and in most cases annoyingly low.

Following on downloading, it is sometimes desirable to enhance some of your metadata records with additional metadata, or update some metadata. The availability of API’s for bibliographic databases becomes desirable for such occasions. Consider for instance that you have downloaded citation data for most of your records. It is logical after some period of time to be able to update this data. At this moment this seems to be an impossibility. Documented API’s of bibliographic databases are rare. Pubmed’s API being the best example of what could be possible in this area. Elsevier seems to be moving in that direction too.

I have indicated some additional points for the personal digital library agenda for the future in this blogpost. There are more takes on the construction of personal digital libraries in the future possible. The main challenge is to leave the paper trail and enabling a purely digital workflow. That will take some time to achieve, and a lot of imagination of all players involved.

Hull, D., S. R. Pettifer, et al. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web. PLoS Computational Biology 4(10): e1000204.
Leslie Jr., D. M. and M. J. Hamilton (2007). A Plea for a Common Citation Format in Scientific Serials. Serials Review 33(1): 1-3. (Subscription required)

The changing face of Elsevier Science

The last couple of days I had the pleasure to attend the Elsevier Development Partners meeting. The exact products they are working on might be of interest to some people, but that’s up to Elsevier to announce. But what was really the big surprise at this meeting -which lasted 3 days- was the tone from Elsevier. It was all about open Science. They clearly wanted to open up. There was a lot of talk about sharing information, making mash-ups possible, Application programming Interfaces (API). Elsevier Science wanted to move away from the double barred information silo to become an open solution provider in the scholarly world. If Elsevier is thinking and acting in this direction, then change will become a major issue for the entire scientific publishing industry and that is good news for libraries who want to remain a vital service in the future as well.

This change will take time. It doesn’t happen overnight. But Raphael Sidi just announced the other day on his blog the Elsevier Article API at the programmable Web. So, Elsevier is not only talking, they are acting up on it as well.

Let other publishers follow this example!