Searching for Science

Since a little while -say a year and a half or so- I teach at regular intervals a course on finding scholarly information with freely available resources on the Web. The course is titled “Searching for Science“. The course material is freely available in one of my Wikis’. The main reason for using a wiki for presenting a course like this, is that linking to examples on the Web works so much more smoothly than using a powerpoint  instead.

With regards to the course today, a small group attended. 4 researchers and 5 (mostly) international students. A nice mix. I really enjoyed it, and I think they did as well. Well at least they gave me a really positive evaluation.
During the course I spend about three quarters of the morning, say a littel over 2 hours, on general search tactics. Search engines and their commands, Web directories and the Deep Web.  During the evaluation I always get the feedback that just some plain Google commands and search tips receive the most Brownie points. What’s always interesting is an exercise where we compare the coverage of scholarly search engines plus Live Academic on retrieving a known article from an OA repository in the Netherlands. I always ask the students to do the search with the full title of an article and repeat the exercise with a sentence from the discussion part of the article. It is always interesting to see the outcome of this exercise. As usual Live Academic failed entirely. Google Scholar did reasonbaly well on both, but today Scirus and Scientific Commons only worked with the title words.  These outcomes can be different again tomorrow. It is always difficult to explain these outcomes.

Meanwhile I find some real gratification in the fact to point my students to some of the OA discussions as well, whilst covering collections of OA journals, Repositories or mentioning Open Course Ware sources.

On most occasions the participants are entirely new to some of de Science 2.0 developments. RSS? never heard off. So I introduce them to Bloglines, Netvibes and Google Reader. Show them something about scholarly blogs, social bookmarking for scientists or Digg.

We do actually have a course on Science 2.0 in the planning for somewhere in April. Needs still a lot of developing though. But it will be interesting.

Three studies on OA repositories

Too much too read to comprehend at once, but tthree reports on the status of the European reposistories have been released. In the (Dutch) press release I noted that they talked about to findability of research reports, and did not mention the availability. It just struck me since I see many repositories in the Netherlands functioning as metadata repositories rather than ftxt repositories. But I have to admit, I should read the reports first.

Weenink, Kasja, Leo Waaijers and Karen van Godtsenhoven (eds.), A DRIVER’s Guide to European Repositories: Five studies of important Digital Repository related issues and good Practices (AUP: Amsterdam, 2007) ISBN 9789053564110, 200p.

This Driver’s guide is a practical guide to be used by repository managers and institutions for setting up and develop a repository and extra services. In this guide five essential aspects for realizing and amplifying repositories are described: the business plan, intellectual property rights, storing research data, curation of data and the long-time conservation of data. The authors have chosen for workable solutions that are applicable on local and national level.

Maurits van der Graaf en Kwame van Eijndhoven, The European Repository Landscape : Inventory study into present type and level of OAI compliant Digital Repository activities in the EU (Amsterdam University Press, 2008) ISBN 9789053564103, 144p.

What is the current state of digital repositories for research output in the European Union? What should be the next steps to stimulate an infrastructure for digital repositories at a European level? To address these key questions, an inventory study into the current state of digital repositories for research output in the European Union was carried out as part of the DRIVER Project. The study produces a complete inventory of the state of digital repositories in the 27 countries of the European Union as per 2007 and provides a basis to contemplate the next steps in driving forward an interoperable infrastructure at a European level

Muriel Foulonneau and Francis André, Investigative study of standards for Digital repositories and related services (Amsterdam University Press, 2008) ISBN 9789053564127. 112p.

This study is meant for institutional repository managers, service providers, repository software developers and generally, all players taking an active part in the creation of the digital repository infrastructure for e-research and e-learning. It reviews the current standards, protocols and applications in the domain of digital repositories. Special attention is being paid to the interoperability of repositories to enhance the exchange of data in repositories. It aims to stimulate discussion about these topics and supports initiatives for the integration of and, where needed, development of new standards. The authors also take a look at the nearby future: which steps have to be taken now in order to comply with future demands?

What amazes me most is that I can only find a press release in Dutch. The Driver website hasn’t got the news yet….
Well, Peter it is up to you…..

Impact factors and Scimago JR compared

In December I promised to look into more detail of the newly launched Scimago Country & Journal Rank database. Scimago has attracted some attention in the blogosphere outside Spain since December and got some serious attention from Declan Butler as a news item in Nature (Subscription required).

It is too early for some thorough in-depth investigations of this new database, but the better blog reactions were at Information Research and a second time again and the Biomed Central Blog . They both had an issue of self interest to see where they where their journals were standing in this new database. We have to wait a bit longer for the reviews in the scholarly literature, I’m afraid.

Meanwhile I have looked into this database a bit more closely. In this blogpost I report some of my findings. My reason to look into this database more closely is mainly triggered by the fact that it allows us –librarians- to evaluate the rankings of a larger set of journals in a quantitative way. Impact factors have played a role in the decisions on journal subscriptions and cancellations –albeit not the sole criterion- How does the SJR compare to the impact factor is my main question.

SJR is “an indicator that expresses the number of connections that a journal receives through the citation of its documents divided between the total of documents published in the year selected by the publication, weighted according to the amount of incoming and outgoing connections of the sources.” In essence is the SJR an Pagerank type of indicator in which citations from highly ranked journals increase the ranking of the journal.

To gain more understanding SJR and I have looked at the journals in the subject category ‘Library and Information Science’. This category includes some 98 journals. It is important to note that SCImago JR has a much more refined subject categorization than included in Scopus itself. Although I speculate that this subject categorization is possibly somewhere under the hood in Scopus as well. The corresponding category in JCR is Information ‘Science & Library Science’ which contains 53 journals.

It is really easy to transfer the data from Simago JR to excel, where it always take a bit more clicks (making a marked list) and using the print export to get the data into excel. Interesting to note that in the web environment SCImago uses a European number notation with comma’s indicating the fraction and the dot indicating the thousands. On transfer to excel this is corrected automatically. A minor point from SCImago is that ISSN numbers are lacking from the exported data. In JCR the full journal titles are not exported.

The journals from JCR were matched manually against the journals from SCImago since a shared field was missing. Only a few journals from JCR were not found directly in the downloaded journals from SCImago. The journals ‘Journal of the American Medicals Information Association’, ‘Information and Management’ and ‘Journal of Scholarly Publishing’ were included in other journal categories than ‘Library and Information Science’. Furthermore it was noted that the journal ‘International Journal of the Geographical Information Science’ was included twice in the list of Library and Information Science journals at rank 5 and rank 33 again. In the processing the journal at rank 33 was dropped from the list. In the JCR the Journal of Government Information is still include albeit it was from 2005 already included in Government Information Quarterly –The calculation of IF in JCR 2006 is indeed based on only a single year of data-. Two other journals Online and Econtent included in JCR and included in Scopus were not to be found in SCImago. This is not really a great miss, since these are trade journals rather than peer reviewed scholarly journals, but this applies to some other journals included in the table as well, e.g. The Scientist and Library Journal. In the end 50 journals from SCImago and JCR in the LIS field could be matched. The full list of journals included in this little study is linked as a Google Document.

Looking at the table it is apparent that the maximum value of SJR is an order of magnitude smaller than the impact Factors. At the lower en of the scale Impact factors become zero, whereas the lowest value of SJR in this set of journals is 0.038.
In Figure 1, I have plotted the IF against the SJR. There seems to be a strong relationship between SJR and IF, albeit there are some outliers from an apparent linear relationship. Interestingly these three outliers are LIS journals on medical librarianship, they are: Journal of the American Medical Informatics Association : JAMIA, Journal of Health Communication and Journal of the Medical Library Association. MIS Quarterly is not regarded as an outlier since it clear follows lies on the trendline underlying the other datapoints.

Figure 1

I think the three outliers really illustrate the point that SJR is more a pagerank type of indicator. The three medically oriented journals receive relatively citations from highly ranked medical journals. Checking this for JAMIA in Scopus, we find citations from journals such as Pediatrics (SJR=0.528), Annals of Internal Medicine (SJR= 1.127) or BMC Bioinformatics (SJR= 0.957). The journal adhering the trendline for LIS journals receive far less of these kind of “external” citations.

Excluding the three medical journals we get a very good regression between the two parameters with an R² of 0.86. In Figure 2 the regression line is added based on the remaining 47 journals.

Figure 2

Thought this is a really cool result illustrating the difference between SJR and IF quite clearly. In a subsequent post I will look a bit more into the correlations between the various parameters a bit more.

32nd ELAG Library Systems Seminar

The 32nd European Library Automation Group (ELAG) Library Systems Seminar will be hosted by Library Wageningen UR from 14 – 16 April 2008. The Website supporting the event went online today. As Paula Goossens remarks in her invitation, hotel accommodation is not really abundant, be early with making your reservations! Therefore, please register and make your reservations soon.

The theme of the conference is ‘rethinking the library’. When you look at the workshops you get an impression of the topics covered during this conference. There is a substantial amount of 2.0 (and beyond) in the program to make it really interesting. Of course I do cordially invite you to participate in the workshop on social tagging, which will cover social bookmarking and tagging. The other workshop sound interesting as well, but I have been asked to moderate the social tagging workshop.

I really do hope to see you soon in Wageningen!

Eric Lease Morgan’s digital information landscape

During the Ticer’07 summerschool ‘Digital Libraries à la Carte’ I First met Eric Lease Morgan. He was an excellent instructor, making the techie stuff more palatable.

With much interest I noted one of his recent lectures cited in Current Cites. His lecture “Today’s digital information landscape” has some thoughtful points on future libraries, librarianship and above all catalogs. Here are some interesting quotes selected from the various parts of his lecture

On MARC and XML “MARC is a Gordian Knot that needs to be cut, and XML put into it’s place.”

On databases and indexes “They are two sides of the same information retrieval coin.”

On exploiting the network “A rising tide floats all boats. The tide of network computing is certainly upon us. Let’s make sure our boats are in the water.”

On institutional repositories and open access “Acquisitions departments are not necessarily about buying content… An acquisitions department is responsible for bringing collections into the library.”

On the next generation catalogs “More importantly, a “next generation” library catalog will provide services against the things discovered. These services can be enumerated and described with action statements including but not limited to: get it, add it to my personal collection, tag & classify it, review it, buy it, delete it, edit it, share it, link it, compare & contrast it, search it, summarize it, extract all the images from it, cite it, trace it, delete it. Each of these tasks supplement the learning, teaching, and research process.” And “Collections without services are useless. Services without collections are empty. Library catalogs lie at the intersection of collections and services.”

Morgan concludes with “The principles of collection, organization, preservation, and dissemination are extraordinarily relevant in today’s digital landscape. The advent of the globally networked computers, Internet indexes, and mass digitization projects have not changed this fact.”

Worth reading as a whole.

Morgan, E. L. (2007). Today’s digital information landscape. Infomusings.