The mysterious ways of Web of Science

A while back, one of our researchers asked me how Steven Salzberg arrived at the number of citations for the paper on the Arabidopsis genome in Nature. When he checked Web of Science, it only delivered zero citations and that couldn’t be true for such a breakthrough paper. Peter found 2689 citations! How did he do that?

I checked out the paper in Web of Science myself first as well, and found also zero citations.

Zero citations from Web of Science for the Arabidopsis papers

I was not entirely surprised since I realized it was one of those consortium papers. I knew Thomson had some problems with a consortium paper in the past. But annoying it was.

I first checked about the issue around the human genome project and found it being mentioned even in Science Watch from Thomson. But from the article it appeared that Thomson only improved the tracking for citations from that Human Genome project paper, and not the raised issue per se. Even though the Arabidopdsis paper was even older the citations to this paper had not been corrected. It appeared that something in the searching or tracking of citations by WoS went wrong but where was the error being made?

I made a few futile attempts in the cited ref search with Arabidopsis as author, or Arabidopsis*. Searched in the cited ref search for Kaul as author (which is listed in the end of the original article as first author) but that only resulted in some 130 citations. Not sufficient to justify Steven Salzberg number of citations. I did not like to use the cited ref search to look for the cited articles from Nature in 2000 this is a very large result set that you have to wade through innumerable pages of results since you can’t refine these type of searches by volume or page numbers. (Wouldn’t that be nice?)

To reassure my inquisitive researcher I pointed him to Scopus (Sorry Thomson) where the he could see a reassuring 3000+ citations himself. Meanwhile I did not have a quick fix for this problem.

It was only later when I looked into the problem again, and somehow I was forwarded to the all databases search rather than the Web of Science search tab, which I normally use. To my utter amazement the title search delivered this time two records. Both with zero citations, but more importantly it showed next to [Anon] Ar Gen In, as the author.

Zero citations from Web of Science for the Arabidopsis papers

Now the problem was simple. I had found the author. A cited ref search yielded indeed nearly the 2689 citations from Steven Salzberg.

Zero citations from Web of Science for the Arabidopsis papers

But these figures are not entirely correct either since there are some additional 131 citations to be found with Kaul as a first author reference to Nature with the correct volume and page number.

Of course I requested at Web of Science a correction of the citation data, but forgot to include Kaul’s citations. Hopefully this will be repaired at a later date.

But what makes me really wondering is the slight -but very important- difference in record presentation between the All Databases search and the Web of Science search  on Web of Knowledge. For me personally the standard entry in Web of Knowledge is the Web of Science tab. Not in my normal working routine would I ever go to the all databases tab to look up a number of citations. Just by luck I found the right author name on this occasion. But it shouldn’t have to become the standard way to perform searches shouldn’t it?

2 thoughts on “The mysterious ways of Web of Science”

  1. Hi Wouter! You ask a very good question – how did I find all those citations for the Arabidopsis genome paper at ISI’s website? Well, I have to admit that I first tried the same things you did – using wildcards around the consortium name (“Arabidopsis Genome Initiative”), and I also got the zero citations hit. But I knew this paper had 1000’s of citations – it had to!

    So I cheated.

    I wrote directly to the staff at ISI, with whom I’ve corresponded before, and they did a bunch of different searches for me, with different wildcards. They took a union of all the hits (ignoring double hits) and that’s how I got the total of 2,689. I still doubt that this is the true total, but it’s the best I could do, and it seems about right given the important of this genome to the entire plant biology community.

    I also asked ISI to try to fix this problem, but I’m not optimistic that they will – it’s probably difficult for them and will require significant changes in their software.

  2. @Steven,

    That’s a revealing answer! Well I sincerely do hope our efforts show to ISI that they have to act upon this matter. I’ll keep it on my radar.

Comments are closed.