World Bank 2.0

Wikipedia als medicijn voor kennismanagementpijn

View more presentations from Edwin Mijnsbergen.

Thanks to ZBdigitaal I noted this presentation from World Bank staff, on the use of Wikipedia as a knowledge management system. Apart from the knowledge management side of the use of the Wikipedia for World Bank reports, it is a good initiative to expose the essence of kilometers of reports from the World Bank bureaucracy to a larger public. I wish I could encourage our researchers to make use of the Wikipedia more often. With two major objectives; (1) to expose their ideas and findings to a larger public than academic peers only. (2) For our scientists to be prepared to participate in the public discussions on their theme of research with public stakeholders. Interesting move of the World Bank

Defrosting the digital library Duncan Hull, Steve R. Pettifer and Douglas B. Kell (2008) wrote an interesting review on the current state of personal digital libraries. It is perhaps important to stress the fact that in the end the review focused on personal digital libraries, where a lot can also be written on digital libraries at higher aggregation levels. But including those digital libraries at higher aggregation levels would take another review. Anyway, many of the observations for building personal digital libraries they describe are right and come straight from the workbench of the practicing systems biologist. But still some additional observations could have been addressed in this review as well.

Today most publications are born digital, distributed digital but consumed on paper. Hull et al.’s paper I read mostly on the train and later on the plane. On such occasions you still scribble on paper. Make some notes in the margin and highlight some references to check out at a later date.

Although I could have downloaded it to my laptop, or an e-book reader. The majority of users of digital libraries prefer to download and print a PDF document and peruse the publication at their favourite spot at their own leisure. This digital – paper divide still affects the quality of personal digital libraries. At the moment of drafting the first version of this blogpost I just found out that I didn’t download a copy of the paper to my laptop yet, or stored the metadata to EndNote. I have to remember to do that at a later date.

I don’t think that current generations of scientists are capable to overcome this digital-paper divide in their daily workflow at the moment. They haven’t grown up to do so yet, and the tools at hand still hamper a fluent digital workflow. Screen resolutions being to poor. Laptops being too bulky. Wi-Fi is not always available or at prohibiting costs. Interaction with PC through bulky keypads is clumsy or keypads are too small. All these little nuisances make a truly digital workflow an utopian vision.

Actually the most popular format for the electronic articles in the personal sphere is the PDF. A PDF is fine in print, but a nuisance for reading on computer screens, or e-book readers. Most journal articles have a 2-collum layout, which makes reading a PDF version of an article on an electronic device a arduous task.

Having said all that, the conclusion is in accordance with Hull et al. that the current state of personal digital libraries leaves something to be desired. To solve these problems a number of stakeholders are involved. The primary publishers of scholarly publications (Elsevier, Springer, Wiley etc…), the secondary publication databases (Scopus, WoS, PubMed etc…), local libraries in their role as gatekeepers to the licensed content, the scientists themselves as producer and consumer of scholarly publications and their willingness to leave the beaten track and adopt new ways of performing science. Last but not least the science managers who rank and rate the performance of their scientist based on the paper trail in the most prestigious scholarly journals.

Too date the paper trail is still very visible in all digitally born publications. Have a look at the reference list of a publication, and it is still infested with publication years, volumes, issues and page numbers. The publication year is a very amusing example indeed. Many publication appear online in advance of print, and receive an official –paper- publication year only months later. Many journal platforms resolve a link in the electronic environment to a digital copy of the reference trough Crossref or other linking services. But having a printed article at hand this link is literally broken. A brief URI is really how publications should be cited, and allow quick lookup when a computer is at hand.

In the phase of preparing a publication for submission this paper trail becomes obvious as well. Instructions to authors for each journal outshine each other in the most exotic layout requirements for the reference lists such as small capitals, bold publication years italics et cetera. These paper based instructions to authors take precious time from authors and editors alike, in the preparation of the manuscript or the editing and proof reading (Leslie & Davidson, 2007). All these eye pleasing variations in the layout of reference lists leads to missed impact because of the difficulty with interpreting reference lists by citation data harvesters like WoS, Scopus or Google Scholar.

Interesting to note in Hull et al.’s article with the description of the URI from Elsevier’s Scopus, the paper trail pops its ugly face around the corner yet again. This URI is based on open url and the simplest designation to the metadata record for an article include volume, issue and starting page. It is meaningful to a human reader, but in a digital workflow, it becomes overly complicated. It is to be foreseen that in the near future volumes and issues of journals cease to exist anyway.

But an open url is better than the example from WoS where the Hull et al. had difficulty to make a working URI on the basis of the ISI number included in all records from Web of Science (It is still called ISI number, despite the change of company name twice already since ISI was bought by Thomson). When you use EndNote and download a metadata record from Web of Science to EndNote, an URL will be created by EndNote on the fly, when you hit ctrl+G, based on the downloaded ISI number. It is a very long and tedious uri, but you can trim some parameters from the url and you end up with a functioning URL, with a valid session paprameter.

As described by Hull et al. it seems odd that whether you are at a primary publisher or at a database from a secondary publishers, a scientist normally has to make two saves. First for the metadata followed by saving the actual article. Only thereafter the metadata and the article can be reunited in their favourite reference manager. That really is a few saves and clicks too many. Scopus has facilitated downloading primary articles with the help of the Quosa software, but downloading the articles and the metadata are still two separate processes.

In case of Google Scholar Hull et al. make a mistake. Google Scholar can work with the link resolver of most institution’s libraries. And with most link resolvers it is possible to download metada to for instance EndNote. The snag here is that it only works on a reference per reference basis. Making it a tedious task to download the metadata from, say, twenty records from Google Scholar. Probably the worst download limitation in the scholarly information landscape.

Download limitations are an important point that wasn’t raised in the article. These vary highly between database vendors. From Web of Science one can download the metadata for 500 references at once. But using the marked list you repeat it for various sets so the work around is to download records 1-500, 501-1000, 1001- etc… In Scopus the download limit is set at 2000 records. More generous already, but limiting if you move away from personal digital libraries to digital libraries for some text mining work, or serious systems biology work. In our experience, this limit is most likely to be negotiable in the contracts between the library and database vendor. But the limitations on download are highly variable per database and in most cases annoyingly low.

Following on downloading, it is sometimes desirable to enhance some of your metadata records with additional metadata, or update some metadata. The availability of API’s for bibliographic databases becomes desirable for such occasions. Consider for instance that you have downloaded citation data for most of your records. It is logical after some period of time to be able to update this data. At this moment this seems to be an impossibility. Documented API’s of bibliographic databases are rare. Pubmed’s API being the best example of what could be possible in this area. Elsevier seems to be moving in that direction too.

I have indicated some additional points for the personal digital library agenda for the future in this blogpost. There are more takes on the construction of personal digital libraries in the future possible. The main challenge is to leave the paper trail and enabling a purely digital workflow. That will take some time to achieve, and a lot of imagination of all players involved.

Hull, D., S. R. Pettifer, et al. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web. PLoS Computational Biology 4(10): e1000204.
Leslie Jr., D. M. and M. J. Hamilton (2007). A Plea for a Common Citation Format in Scientific Serials. Serials Review 33(1): 1-3. (Subscription required)

The changing face of Elsevier Science

The last couple of days I had the pleasure to attend the Elsevier Development Partners meeting. The exact products they are working on might be of interest to some people, but that’s up to Elsevier to announce. But what was really the big surprise at this meeting -which lasted 3 days- was the tone from Elsevier. It was all about open Science. They clearly wanted to open up. There was a lot of talk about sharing information, making mash-ups possible, Application programming Interfaces (API). Elsevier Science wanted to move away from the double barred information silo to become an open solution provider in the scholarly world. If Elsevier is thinking and acting in this direction, then change will become a major issue for the entire scientific publishing industry and that is good news for libraries who want to remain a vital service in the future as well.

This change will take time. It doesn’t happen overnight. But Raphael Sidi just announced the other day on his blog the Elsevier Article API at the programmable Web. So, Elsevier is not only talking, they are acting up on it as well.

Let other publishers follow this example!

Transforming Knowledge Services for the Digital Age : Redefining the Research Library

Peter R. Young, Director of the national Agriccultural Library in the USA is the keynote speaker during our opening congress of the Wageningen UR Library.

He starts out to describe the current role of the National Agricultural Library, which services 110,000 employees of the USDA and also plays an important role in servicing the American Public. Most interesting is the way he sketches the developments in Agricultural research in the USA. Actually research in general. Research becoming more interdisciplinary, more team based, data intensive and multi-source channels. As a research library they need integrated services, and cyberinfrastructure and digital archival, preservation & curatorial services.

The challenges for agricultural research that need to be addressed are global climate change research, access to clean water and sanitation, Animal and human infectious diseases and at last Human nutrition of course. Subsequently he goes into detail in the the challenge of feed, fiber, feed and fuel where he presents some scary statistics with respect to predicted meat production in 2050.

Via the modelling approaches for researchers and their data intensive practices he arrives on the subject of resource discovery. It is an interesting way in which he presents some of the differences between the print and digital possibilities in saerch and discovery tools, content resources, knowledge services and lists the transformational opportunities in a very long list of adjectives of what a library should represent, such as visible, innovative, integrated, evolutionary, diverse, authorative, cooperative, etc.

Towards the end he borrows heavily on Lorcan Dempsey‘s personal learning landscape, and goes into the developments of Web 2.0. He highlight LibraryThing and Twine which ties all together. The latter is still in closed testing.

He posses some challenges of Web 2.0 for libraries:

  • Why do libraries need to catalog and create metadata records?
    • Why not use social networking tools to provide tags?
  • Why worry about access and demand when Google Scholar and Books are so popular?
    • Why should we be concerned about preservation and stewardship of archival digital content?
  • Will research libraries be marginalized, or is a new paradigm emerging?

His main lesson from Web 2.0 is that we need to ocus on the library in the user environment rather than the ser in the library environment.