Archive

And so does science grinds to a halt

Copyright and Creative Commons for an article in Studies in Mycology

This morning I had to look up the citations to an article. It did no show up in WoS immediately so I had to look a bit around to trace it’s exact details. I found the article as an open access article on Highwire. No problem.

However, I was struck by the extensive and confusing copyright statements at the top of the abstract. On the first line is has the classic copyright sign © which indicates to me “all rights reserved” in this case to the CBS fungal biodiversity Centre. But the all rights reserved sign was followed immediately with their own worded Creative Commons license. CC 3.0 in this case.

I was little bemused by the third clause “”You may not alter, transform, or build upon this work”. Isn’t that what science is all about? Building on previous work?

Another annoying fact is that the DOI is not working.  But this is the link to the abstract, there are plenty of similar examples in this “Studies in Mycology” to be found.

Journal quality, an unexpected improvement of the JCR

It is odd to say, but for researcher the journal as an entity is disappearing. Scientist search for information in online databases and select from title and abstract information whether the article suits their needs. The days that scientists visited the library and browsed the table of contents of the most important journals to keep up with their field have long gone .

Still there is a lot of emotion around journals titles. Scientist want to publish their research in the best possible journal. Earlier this year the NOWT (2008) published a report on the performance of Dutch universities and there it was clearly shown that field normalized citation impact for each university correlated positively with the field normalized journal quality.
Journal quality versus Citation impact

Looking at this graph it is clear that there is considerable reason to selected the best journals in their field to publish your results. However, until recent the only widely available journal quality indicator has been the journal impact factor. There has been a lot of criticism on the uses and abuses of impact factors, but they have stood their time. All scientists are at least aware of impact factors. For years ISI, Thomson Reuters were in fact the sole gate keepers of journal quality rankings.

Over the last years a number of products, free and fee based, have tried to come up with new and competing journal ranking measures. SicmagoJR (based on Scopus data), journal analyzer from Scopus, Eigenfactor.org and the data from Thomson’s own Essential Science Indicators of course.

This week Thomson Reuters announced that they will update the journal citation report. From the 1st of February we get a entirely new Journal Citation Report. From the press release:

  • Five-Year Impact Factor - provides a broader range of citation activity for a more informative snapshot over time.
  • Journal “Self Citations” – An analysis of journal self citations and their contribution to the Journal Impact Factor calculation.
  • Graphic Displays of Impact Factor “Box Plots” - A graphic interpretation of how a journal ranks in different categories.
  • Rank-in-Category Tables for Journals Covering Multiple Disciplines - Allows a journal to be seen in the context of multiple categories at a glance rather than only a single one.

It is highly unusual to see two updates per year for JCR. But it is interesting to to note how they are moving under the pressure of some competition.

Literature:
NOWT (2008). Wetenschaps- en Technologie- Indicatoren 2008. Maastricht, Nederlands Observatorium van Wetenschap en Technologie (NOWT). http://www.nowt.nl/docs/NOWT-WTI_2008.pdf (in Dutch)

Self citations do work

Blogging on Peer-Reviewed ResearchIn a very extensive article van Raan has studied the effect of self citations on the total citations to a groups’ work. In the concluding paragraph van Raan writes:

[] external citations are enhanced by self-citations, so that we have the “chain reaction:” Larger size leads to more self-citations, which lead to more external citations. This mechanism is strongest for the lower impact journals—they “make size work”—as well as for higher performance groups. In other words, lower impact journals enable research groups more than do higher impact journals to “advertise” their other work by means of self-citations.

Most interesting to note about this article was that van Raan cited himself 11 times out of 28 in total. It may seem to be a bit excessive, but stresses his point excellently.

Another point that I always stress within the theme of publication strategy is to consider Open Acces publishing. Since the last few years I have noted that van Raan is publishing his articles in OA on Arxiv. His group has not (yet) demonstrated the advantage of OA publishing on citation impact scientifically yet, but the master of scientometrics is putting it into practice anyway. Something to be considered by every researcher very seriously.

Reference
van Raan, A. F. J. (2008). Self-citation as an impact-reinforcing mechanism in the science system. Journal of the American Society for Information Science and Technology, 59(10): 1631-1643. http://arxiv.org/ftp/arxiv/papers/0801/0801.0524.pdf

The mysterious ways of Web of Science

A while back, one of our researchers asked me how Steven Salzberg arrived at the number of citations for the paper on the Arabidopsis genome in Nature. When he checked Web of Science, it only delivered zero citations and that couldn’t be true for such a breakthrough paper. Peter found 2689 citations! How did he do that?

I checked out the paper in Web of Science myself first as well, and found also zero citations.

Zero citations from Web of Science for the Arabidopsis papers

I was not entirely surprised since I realized it was one of those consortium papers. I knew Thomson had some problems with a consortium paper in the past. But annoying it was.

I first checked about the issue around the human genome project and found it being mentioned even in Science Watch from Thomson. But from the article it appeared that Thomson only improved the tracking for citations from that Human Genome project paper, and not the raised issue per se. Even though the Arabidopdsis paper was even older the citations to this paper had not been corrected. It appeared that something in the searching or tracking of citations by WoS went wrong but where was the error being made?

I made a few futile attempts in the cited ref search with Arabidopsis as author, or Arabidopsis*. Searched in the cited ref search for Kaul as author (which is listed in the end of the original article as first author) but that only resulted in some 130 citations. Not sufficient to justify Steven Salzberg number of citations. I did not like to use the cited ref search to look for the cited articles from Nature in 2000 this is a very large result set that you have to wade through innumerable pages of results since you can’t refine these type of searches by volume or page numbers. (Wouldn’t that be nice?)

To reassure my inquisitive researcher I pointed him to Scopus (Sorry Thomson) where the he could see a reassuring 3000+ citations himself. Meanwhile I did not have a quick fix for this problem.

It was only later when I looked into the problem again, and somehow I was forwarded to the all databases search rather than the Web of Science search tab, which I normally use. To my utter amazement the title search delivered this time two records. Both with zero citations, but more importantly it showed next to [Anon] Ar Gen In, as the author.

Zero citations from Web of Science for the Arabidopsis papers

Now the problem was simple. I had found the author. A cited ref search yielded indeed nearly the 2689 citations from Steven Salzberg.

Zero citations from Web of Science for the Arabidopsis papers

But these figures are not entirely correct either since there are some additional 131 citations to be found with Kaul as a first author reference to Nature with the correct volume and page number.

Of course I requested at Web of Science a correction of the citation data, but forgot to include Kaul’s citations. Hopefully this will be repaired at a later date.

But what makes me really wondering is the slight -but very important- difference in record presentation between the All Databases search and the Web of Science search  on Web of Knowledge. For me personally the standard entry in Web of Knowledge is the Web of Science tab. Not in my normal working routine would I ever go to the all databases tab to look up a number of citations. Just by luck I found the right author name on this occasion. But it shouldn’t have to become the standard way to perform searches shouldn’t it?

Research management and research quality

Blogging on Peer-Reviewed ResearchResearch performed at our universities is nowadays a heavily directed practice. Top down in most cases. Research for the sake of research has become a rare phenomenon. Research evaluations, research management and research organization are weeding out little pet projects on the side. Grant money and research funders are requesting concrete results of achievements and determine the objectives to be completed in advance.

It is therefore rather odd that in such a strongly organized and managed environment the organization of research itself is less subject of the academic discourse. I still remember my old professor who once insisted that “we didn’t need knowledge management since we produced knowledge”. That whilst after each completed PhD project another successful candidate left the organization with his knowledge written down in a number of articles and very seldom made explicit within the organization. That did not matter too much to him.

The researchers, research groups and graduate schools at universities in the Netherlands are regularly evaluated by external peer reviews. Productivity, Quality, Relevance and Vitality of the research are the main criteria on which groups are judged. It is odd however that very little study has been made of the underlying explanatory factors of successful groups versus less successful groups. I was therefore pleasantly surprised by an article of van der Weijden et al. (2008) who looked into some aspects of managerial control of research groups on their research performance.

An important shortcoming of their study was that the only bibliometric parameter they looked at was the number of papers produced in the journals covered by Web of Science. It really would have been useful if they had looked at normalized citation impact as one of their variables as well. Apart from the simple bibliometric measure of published peer reviewed articles they also looked at the success of the groups at the attainment of research grants etc.

Their most important finding was that:

“One internal research management activity was found to have a positive relationship with (bio)medical research performance in general. Offering special commendations to (bio)medical (both preclinical and clinical) research staff members, including non-financial prizes, in order to motivate them is positively related to all performance measures used in this study.”

Or in other words positive attention from the senior managers for what researchers were up to paid off really well.

From the more detailed conclusions another one struck me as very interesting as well:

“Different types of internally organized research evaluation practices have (linear) positive relationships with performance measures concerning external research funding. In preclinical groups pre-evaluations of research proposals have a positive relationship with these performance measures. Interestingly, in clinical groups, positive relationships are found with research output evaluations.”

Where in practice the external peer reviews are most often met with some degree of resistance. Well, criticism at least. It seems to be worth the effort invested by all participants into these kind of exercises.

Always good to realize this when our library is involved in the preparation of the peer review of six different graduate schools which involve about 1000 permanent staff and some 3000 researchers in total.

Reference:

van der Weijden, I., D. de Gilder, P. Groenewegen & E. Klasen. (2008). Implications of managerial control on performance of Dutch academic (bio)medical and health research groups Research Policy 37(9): 1616-1629 http://dx.doi.org/10.1016/j.respol.2008.06.007 (subscription required).

Trends in the science and information world

Tomorrow I have to teach a class for better searching for scientific information on the world wide web. In the introduction I try to highlight the major trends in research and the information landscape. I came up with the two following bullet lists.

Trends in science and research

  • Increased multidisciplinarity
  • Increasing cooperation between scientists
  • Internationalization of research
  • Need for primary data
  • More competition for same grant money

Trends in the information world

  • Increased importance of free web resources
  • From information scarcity to overload
  • After A&I databases, journal currently digitization of books
  • From bibliographic control to fulltext search
  • Open Access & Source
  • Multiformity of resources
  • User in control

I wondered if anybody has some additional suggestions for either one of these lists.

One year WoWter.net

I did not notice until today, but the first anniversary of this blog was on November 13th. There are currently 87 posts and 109 comments which have attracted just over 10,000 unique visitors. Most visitors (2,700) came from the USA follow by 2,500 from the Netherlands. Which surprised me. I would have thought it the other way around. Quite a lot of visitors through my Dutch blog.

Inclusion of this blog in Walt Crawford latest book surprised me, but it feels like an honour. I still should read what he has to say, but ordering by libraries goes a bit slow.

Maintaining two blogs in different langauges took more effort than I thought at the beginning, but it the experiment is a succes in that both blogs have very different voices. Which really like.

Defrosting the digital library

Blogging on Peer-Reviewed ResearchDuncan Hull, Steve R. Pettifer and Douglas B. Kell (2008) wrote an interesting review on the current state of personal digital libraries. It is perhaps important to stress the fact that in the end the review focused on personal digital libraries, where a lot can also be written on digital libraries at higher aggregation levels. But including those digital libraries at higher aggregation levels would take another review. Anyway, many of the observations for building personal digital libraries they describe are right and come straight from the workbench of the practicing systems biologist. But still some additional observations could have been addressed in this review as well.

Today most publications are born digital, distributed digital but consumed on paper. Hull et al.’s paper I read mostly on the train and later on the plane. On such occasions you still scribble on paper. Make some notes in the margin and highlight some references to check out at a later date.

Although I could have downloaded it to my laptop, or an e-book reader. The majority of users of digital libraries prefer to download and print a PDF document and peruse the publication at their favourite spot at their own leisure. This digital – paper divide still affects the quality of personal digital libraries. At the moment of drafting the first version of this blogpost I just found out that I didn’t download a copy of the paper to my laptop yet, or stored the metadata to EndNote. I have to remember to do that at a later date.

I don’t think that current generations of scientists are capable to overcome this digital-paper divide in their daily workflow at the moment. They haven’t grown up to do so yet, and the tools at hand still hamper a fluent digital workflow. Screen resolutions being to poor. Laptops being too bulky. Wi-Fi is not always available or at prohibiting costs. Interaction with PC through bulky keypads is clumsy or keypads are too small. All these little nuisances make a truly digital workflow an utopian vision.

Actually the most popular format for the electronic articles in the personal sphere is the PDF. A PDF is fine in print, but a nuisance for reading on computer screens, or e-book readers. Most journal articles have a 2-collum layout, which makes reading a PDF version of an article on an electronic device a arduous task.

Having said all that, the conclusion is in accordance with Hull et al. that the current state of personal digital libraries leaves something to be desired. To solve these problems a number of stakeholders are involved. The primary publishers of scholarly publications (Elsevier, Springer, Wiley etc…), the secondary publication databases (Scopus, WoS, PubMed etc…), local libraries in their role as gatekeepers to the licensed content, the scientists themselves as producer and consumer of scholarly publications and their willingness to leave the beaten track and adopt new ways of performing science. Last but not least the science managers who rank and rate the performance of their scientist based on the paper trail in the most prestigious scholarly journals.

Too date the paper trail is still very visible in all digitally born publications. Have a look at the reference list of a publication, and it is still infested with publication years, volumes, issues and page numbers. The publication year is a very amusing example indeed. Many publication appear online in advance of print, and receive an official –paper- publication year only months later. Many journal platforms resolve a link in the electronic environment to a digital copy of the reference trough Crossref or other linking services. But having a printed article at hand this link is literally broken. A brief URI is really how publications should be cited, and allow quick lookup when a computer is at hand.

In the phase of preparing a publication for submission this paper trail becomes obvious as well. Instructions to authors for each journal outshine each other in the most exotic layout requirements for the reference lists such as small capitals, bold publication years italics et cetera. These paper based instructions to authors take precious time from authors and editors alike, in the preparation of the manuscript or the editing and proof reading (Leslie & Davidson, 2007). All these eye pleasing variations in the layout of reference lists leads to missed impact because of the difficulty with interpreting reference lists by citation data harvesters like WoS, Scopus or Google Scholar.

Interesting to note in Hull et al.’s article with the description of the URI from Elsevier’s Scopus, the paper trail pops its ugly face around the corner yet again. This URI is based on open url and the simplest designation to the metadata record for an article include volume, issue and starting page. It is meaningful to a human reader, but in a digital workflow, it becomes overly complicated. It is to be foreseen that in the near future volumes and issues of journals cease to exist anyway.

But an open url is better than the example from WoS where the Hull et al. had difficulty to make a working URI on the basis of the ISI number included in all records from Web of Science (It is still called ISI number, despite the change of company name twice already since ISI was bought by Thomson). When you use EndNote and download a metadata record from Web of Science to EndNote, an URL will be created by EndNote on the fly, when you hit ctrl+G, based on the downloaded ISI number. It is a very long and tedious uri, but you can trim some parameters from the url and you end up with a functioning URL, with a valid session paprameter.

As described by Hull et al. it seems odd that whether you are at a primary publisher or at a database from a secondary publishers, a scientist normally has to make two saves. First for the metadata followed by saving the actual article. Only thereafter the metadata and the article can be reunited in their favourite reference manager. That really is a few saves and clicks too many. Scopus has facilitated downloading primary articles with the help of the Quosa software, but downloading the articles and the metadata are still two separate processes.

In case of Google Scholar Hull et al. make a mistake. Google Scholar can work with the link resolver of most institution’s libraries. And with most link resolvers it is possible to download metada to for instance EndNote. The snag here is that it only works on a reference per reference basis. Making it a tedious task to download the metadata from, say, twenty records from Google Scholar. Probably the worst download limitation in the scholarly information landscape.

Download limitations are an important point that wasn’t raised in the article. These vary highly between database vendors. From Web of Science one can download the metadata for 500 references at once. But using the marked list you repeat it for various sets so the work around is to download records 1-500, 501-1000, 1001- etc… In Scopus the download limit is set at 2000 records. More generous already, but limiting if you move away from personal digital libraries to digital libraries for some text mining work, or serious systems biology work. In our experience, this limit is most likely to be negotiable in the contracts between the library and database vendor. But the limitations on download are highly variable per database and in most cases annoyingly low.

Following on downloading, it is sometimes desirable to enhance some of your metadata records with additional metadata, or update some metadata. The availability of API’s for bibliographic databases becomes desirable for such occasions. Consider for instance that you have downloaded citation data for most of your records. It is logical after some period of time to be able to update this data. At this moment this seems to be an impossibility. Documented API’s of bibliographic databases are rare. Pubmed’s API being the best example of what could be possible in this area. Elsevier seems to be moving in that direction too.

I have indicated some additional points for the personal digital library agenda for the future in this blogpost. There are more takes on the construction of personal digital libraries in the future possible. The main challenge is to leave the paper trail and enabling a purely digital workflow. That will take some time to achieve, and a lot of imagination of all players involved.

References
Hull, D., S. R. Pettifer, et al. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web. PLoS Computational Biology 4(10): e1000204. http://dx.doi.org/10.1371/journal.pcbi.1000204
Leslie Jr., D. M. and M. J. Hamilton (2007). A Plea for a Common Citation Format in Scientific Serials. Serials Review 33(1): 1-3. http://dx.doi.org/10.1016/j.serrev.2006.11.009 (Subscription required)

The changing face of Elsevier Science

The last couple of days I had the pleasure to attend the Elsevier Development Partners meeting. The exact products they are working on might be of interest to some people, but that’s up to Elsevier to announce. But what was really the big surprise at this meeting -which lasted 3 days- was the tone from Elsevier. It was all about open Science. They clearly wanted to open up. There was a lot of talk about sharing information, making mash-ups possible, Application programming Interfaces (API). Elsevier Science wanted to move away from the double barred information silo to become an open solution provider in the scholarly world. If Elsevier is thinking and acting in this direction, then change will become a major issue for the entire scientific publishing industry and that is good news for libraries who want to remain a vital service in the future as well.

This change will take time. It doesn’t happen overnight. But Raphael Sidi just announced the other day on his blog the Elsevier Article API at the programmable Web. So, Elsevier is not only talking, they are acting up on it as well.

Let other publishers follow this example!

Allow me to introduce to you

A fellow Dutch library blogger just started a new library blog in English. Jan Klerk just started a new library blog called “Biebzone beta“. In his daily life Jan is a manager at the Public library in Haarlem. He has build himself a nice reputation over the last couple of years as a thoughtfull library blogger at his other blog Jan Tweepuntnul (2.0 that is).

A quote from his current post illustrates this thoughtfullness perhaps a little:

It’s all about argument and counterargument. It’s about listening carefully and reading and writing carefully. 

I really appreciate his step to present some more of the wheelings and dealings op Dutch public libraries to a larger (international) audience. In this wat the rest of the world can have a closer look at (public) library developments in the Netherlands.