New webometrics ranking of world universities released

Of all possible rankings of universities that are available, the Webometrics Ranking of World Universities takes an odd place. It only looks at the website performance of the university. Their rankings have been updated somewhere earlier this month.

I have mixed feelings with their approach, but it is a prelude newer rankings than those solely based on scholarly output and impact. However I think that their approach needs more time and better tools than are available at the moment. The leading  researchers in this field are in the group of Mike Thelwall. Their measurements are based on their own crawlers and tools to explore, measure and investigate the academic Web. They have can understand and interpret their results completely. The Cybermetrics Lab (CINDOC) which produces the Webometrics rankings uses publicly available tools such as Yahoo!, Google and Exalead over which they don’t have control. And more importantly they don’t know whasoever how these results come about. Another problem with e.g. Google is that the number for search results are notoriously unreliable. It depends amongst others on time of day, Web Traffic, Server Load at Google and Data Center dat is being used.

So for the moment we have to take these results with a spoon full of salt rather than a pinch. It is also a question what is being measured. Take for instance the size of university Websites.  In Utrecht all staff and students appear to have personal webpages on the University Website. These are all included in the count, whether they actually contain some usefull information or not. At our University the mainstay of the indexed webpages consist of catalog records from the library. I really wonder if you really want to compare these apples and pears.

As for the measure of rich files I really wonder if they have been able to harvest all the material deposited on our repository. Looking a the statistics such as provided by OAISTER on OA harvestable documents, Wageningen University has one of the larger content rich repositories in the Netherlands. In the Webometric we are the bottom fish for this measure in the Netherlands. That we are making use of proprietary software but still adhering the OAI-PMHH protocol, of that the repository is hosted as a directory http://library.wur.nl/way should not effect the rankings as it does for the moment.

On other measures they are completely vague about the exact measure. Take for instance the Google Scholar measure. They state: “Google Scholar provides the number of papers and citations for each academic domain. These results from the Scholar database represent papers, reports and other academic items.” How do they combine publications and citations in a single measure? It is not explained. Google never gives more than the first 1000 results. How do they arrive at all citations for an institute? How did they search for the name of an institute? Did they include medical training hospitals with the University.

I do use these rankings for one point though. That is to push for the improvement of our University and Library Website wherever possible. In some aspects that is really badly needed. But I really want to take these rankings more seriously. For the moment I can’t. They have been updated again that should be the message of this post, since their blog has been defunct for quite some time already. A pity.

Google and the academic Deep Web

Blogging on Peer-Reviewed ResearchHagendorn and Santelli (2008) just published an interesting article on the comprehensiveness of indexing of academic repositories by Google. This article triggers this me to write up some observations I was intending to make for quite some time already. It addresses the question I got from a colleague of mine, who observed that the deep web apparently doesn’t exist anymore.

Google has made a start to index flash files. Google has made a start to retrieve information that is hidden behind search forms on the web, i.e. started to index information contained in databases. Google and OCLC exchange information on books scanned, and those contained in Worldcat. Google so it seems has indexed the Web comprehensively with 1 trillion indexed webpages. Could there possibly be anything more to be indexed?

The article by Hagendorn and Santelli shows convincingly that Google still has not indexed all information that is contained in OAISTER, the second largest archive of open access article information. Only Scientific Commons is more comprehensive. They tested this with the Google Research API using the University Research Program for Google Search. They only checked whether the URL was present. This approach only partially reveals some information on depth of the Academic Deep Web. But those are staggering figures already. But reality bites even more.

A short while ago I taught a Web Search class for colleagues at the University Library at Leiden. For the purpose of demonstrating what the Deep or Invisible Web actually constitutes I used and example from their own repository. It is was a thesis on Cannabis from last year and deposited as one huge PDF of 14 MB. Using Google you can find the metadata record. With Google Scholar as well. However, if you try to search for a quite specific sentence on the beginning pages of the actual PDF file Google gives not the sought after thesis. You find three other PhD dissertations. Two of those defended at the same university that same day, but not the one on Cannabis.

Interestingly, you are able to find parts of the thesis in Google Scholar, eg chapter 2, chapter 3 etc. But those are the parts of the thesis contained in different chapters that have been published elsewhere in scholarly journals. Unfortunately, none of these parts in Google Scholar refers back to the original thesis that is in Open Access or have been posted as OA journal article pre-prints in the Leiden repository. In Google Scholar most of the materials is still behind toll gates at publishers websites.

Is Google to blame for this incomplete indexing of repositories? Hagendorn and Santelli point the finger to Google indeed. However, John Wilkin, a colleague of them, doesn’t agree. Just as Lorcan Dempsey didn’t. And neither do I.

I have taken an interest in the new role of librarians. We are no longer solely responsible for bringing external –documentary- resources from outside into the realm of our academic clientele. We have also the dear task of bringing the fruits of their labour as good as possible for the floodlights of the external world. Be it academic or plain lay interest. We have to bring the information out there. Open Access plays an important role in this new task. But that task doesn’t stop at making it simply available on the Web.

Making it available is only a first, essential step. Making it rank well is a second, perhaps even more important step. So as librarians we have to become SEO experts. I have mentioned this here before, as well as at my Dutch blog.

So what to do about this chosen example from the Leiden repository. Well there is actually a slew of measures that should be taken. First of course is to divide the complete thesis in parts, at chapter level. Albeit publishers give permission only to publish articles, of which most theses in the beta sciences exists in the Netherlands, when the thesis is published as a whole. On the other hand, nearly 95% of the publishers allow publication of pre-prints and peer reviewed post prints. The so called Romeo green road. So it is up to the repository managers, preferably with the consent from the PhD candidate, to tear up the thesis in its parts –the chapters, which are the pre-print or post-prints of articles- and archive the thesis on chapter level as well. This makes the record for this thesis with a number of links to far more digestible chunks of information better palatable for the search engine spiders and crawlers. The record for the thesis thus contains links to the individual chapters deposited elsewhere in the repository.

Interesting side effect of this additional effort at the repository side is that the deposit rates will increase considerably. This applies for most Universities in the Netherlands, for our collection of theses as well. Since PhD students are responsible of the lion’s share of academic research at the University, depositing the individual chapters as article preprints in the repository will be of major benefit to the OA performance university. It will require more labour at the side of repository management, but if we take this seriously it is well worth the effort.

We still have to work at the visibility of the repositories really hard, but making the information more palatable is a good start.

Reference:
Hagedorn, K. and J. Santelli (2008). Google still not indexing hidden web URLs. D-Lib Magazine 14(7/8). http://www.dlib.org/dlib/july08/hagedorn/07hagedorn.html

Thomson Reuters adds citation maps to Web of Science

New citation map feature from Web of Science

A while ago Thomson Reuters heralded their new database Thomson Innovation. One of the strong points of their new platform are the visualization tools such as the citation maps. With these tools, users can quickly analyze patents cited as references by the focal-patent, as well as those that have since cited it. An article in R&D Magazine described the tool in more detail.

This evening I found out that these citation maps have been introduced in Web of Science as well. Still in beta. But it is a nice spill over from the new Thomson Innovation platform. It allows you to browse from article to article. It is indeed visually very attractive. I have to play around with it a little more before I will fully comprehend the real advantages.

Another database that has these citation maps a little longer already is Highwire, but those I have never used seriously. See what we can learn from the comparison in the near.

Just noticed that the feature was announced in the June 2008 update of the “What’s New?” items. What I noticed there as well that you finally can use your browser back buttons on Web of Science. WoW! That’s what is called innovation.

Innovative use of Twitter in libraries

I have been watching the Peace Palace Library using twitter for quite some time already. They use as one of the various means to inform their users. Apart from Twitter the use mail, chat and RSS  to broadcast messages. Their use of twitter is mainly for informing users on updates, systems changes and all those kind of things. Short messages, of course.

I was therefore interested by the application of the Library of the Technical University of Hamburg Harburg where they have implemented Twitter as a document stream on their  electronic repository -which they prefer to call a document server. To me this makes a lot of sense. Too many libraries treat their repository as just one of their ordinary databases. It sits there and that’s about it. Okay they use OAI-PMH to make it possible to exchange information. That is important indeed.

But it shouldn’t stop there. Libraries should try their utter best to broadcast or syndicate the content of their repositories as widely as possible. They have the task trusted upon them to make the rest of the world aware of the valuable publications the researchers of their Alma Mater have produced. Relying on OAI-PMH only is not sufficient to reach that goal.

RSS is absolutely a necessity. If it was only to trickle feed the Google’s of this world with fresh information. But RSS is an excellent tool for getting your content to appear in other place on the Web as well. So RSS on your repository is a prerequisite. Let me be clear about that beforehand.

Today I was amused by the ingenious use of Twitter to syndicate updates of this repository. It is up to the user to subscribe to this feed if they wish too. On the other hand, I observe some conversion for my blogs from the twitter streams from these blogs. It is not much in comparison to RSS, but if you can please some of your clients by this form of syndication and the implementation costs are next to nothing. Then why not? Why not give it a try an see how it works out.

I love these small experiments.

hattip: netbib

How Wiley made a mess of the Synergy and InterScience integration

Two weeks ago we were forewarned that Wiley would integrate all the content of the Blackwell Synergy on Wiley InterScience platform. It would only disrupt the service of the systems over the weekend of June 28-29. When I received this notification I thought immediately about Péter’s picks&pans (2007) where he investigated the capabilities of both platforms.

Just a few quotes from his review:

A merger of the Blackwell Synergy and the Wiley Interscience collections using the software of the latter would certainly not produce Synergy. On the contrary, the serious software deficiencies om Interscience would weaken performance and functionality of Blackwell Synergy, which uses the excellent Atypon software.

[Synergy] This is a very well-designed system enhanced by complementary information – as you should expect these days.

Wiley made no efforts to improve its software. The software keeps fooling itself and the searchers by offering dysfunctional and nonsense options.

It is a severe sign of dementia when people do not recognize their own name. So is the syndrome that Wiley keeps listing some of its very own journal some of the time under the label “Cited Articles available from other publishers” and/or keeps ignoring them in the citation tracking.

In a subsequent chat with our serials librarian, he indicated that he preferred the Blackwell Synergy platform behind the scenes much more that the Wiley InterScience platform. From my own viewpoint, I regretted this move as well, since Blackwell was already Counter compliant for quite some time and the Counter reports have been audited as well, whereas Wiley Synergy was and still is not Counter compliant. That is a very serious shortcoming for one a the largest scientific publishing houses.

So users had something too loose in ease of use possibilities and librarians as well after this announcement of abandoning the Synergy platform.

What was intended to take only a mere weekend, has continued for a whole week. All Dutch university libraries faced problems with access to both Wiley and Blackwell journals. We have to sit and wait and see if the problems have been resolved during this weekend. Meanwhile I find it disappointing that Wiley makes no mention of these problems on their transition page.

Facing these problems I can only pay a compliment to Péter who foresaw what was coming up on us in March 2007 already. “A merger of the Blackwell Synergy and the Wiley Interscience collections using the software of the latter would certainly not produce Synergy”.

Reference
Jacsó́, P. (2007). SpringerLink, Blackwell Synergy, Wiley InterScience. Online(Jul/Aug 2007): 49-51. http://www.jacso.info/PDFs/jacso-springerlink-blackwell-wiley.pdf