Some observations during the bibliometrics session at the Österreichische Bibliothekartag

Albeit the program consistently talks about the Österreichische Bibliothekartag (singular) the whole library day spans actually 4 days. One would have expected at least the Österreichische Bibliothekartaggen (plural) but they insist in mentioning only one day. Of those four days, I was only present during part of the morning of the third day, so this is a very limited report on the Österreichische Bibliothekartag. Looking at their program, it is a very comprehensive and interesting program. Never thought that you could cover a complete session, 5 presentations, talking about cooking books (No pun intended). It only reflects that bibliometrics was only a small part of the program amongst many other subjects covered. I noticed a lot of presentations on e-book platforms, many digitization projects, plenty of mobile less of library 2.0 than you would expect (is the hype over?) and open access had also a very limited role. What struck me as interesting for conference organizers, is that many commercial presentation were programmed equally throughout the sessions. Just a sign of taking the sponsors seriously.

So far on the conference as a whole, of which I actually experienced too little. On to the bibliometrics sessions. The session was chaired by Juan Gorraiz, a bubbly Spaniard working already for years in Austria. Give him the opportunity and he will take the floor and would love to take all the time available and fill the slots for all presentations planned.

The first presentation was on a piece of research that should result in a masters thesis at some point, but some preliminary results were presented in this session by Christian Gumpenberger. The focus of the research was on the acceptance and familiarity of Austrian researchers with bibliometrics. The results were not really shocking, most researchers stated that they were familiar with impact factors, but for the moment there was no clue as to whether they were aware about a thing like a two year citation window. Or the difference between citable items and non-citable items leading to the inflation of impact factors for journals like Nature and Science. Christian sketched some sunny skies for bibliometrics in Austria, but in the subsequent discussion part this sunny view was criticized quite a bit. Notwithstanding I would like to have a look at this MS thesis when it becomes available.

The second presentation was from Italian origin by Nicola de Bellis. Nicola has written an interesting book on citation analysis in which he stresses the sociological, philosophical and historical aspects of bibliometric analyses. It is always interesting to hear a presentation like this, away from the fact finding number crunching approach which I normally have and dream a bit away on outlines of what in an ideal world should be done on a subject like this. Quite a lot, but some of it is beyond being practical. When you carry out bibliometric analyses in the library at some scale, like dealing with 18,000 papers that have collected 265,000 citations like we do in our library, you can only be practical. So there is an interesting conflict between his presentation (which will be on-line soon, I hope) and mine which followed Nicola his presentation.

I don’t want to cover all aspects of Nicolas his presentation. Go and read the book, which I am going to do as well. But at one point during his presentation I strongly disagreed with him. Where he stated that only the mediocre scientists have an interest in bibliometrics and the top scientists normally don’t have an interest in this topic. My experience it quite the contrary. In the first place it was one of Wageningen’s top scientist who urged the library to take a subscription on Web of Science back in 2001, and made it possible with a special contribution from his top institute. He knew he was a highly cited scientist, but somehow he needed Web of Science to confirm his reputation. Later on as well, apart from the discussion with scholars in the social sciences department, it has always been those top performing groups that invited me to give a presentation on this subject rather than the groups that were lagging behind in the bibliometric performance indicators. To me it has always appeared that those who are leading the pack are also interested in staying ahead of the rest and invite the library to explain the results obtained and enhance their performance in the future.

The second observation in Nicola his presentation where he was far beyond practical where he insisted on the point that for a publication all citations to this publication should be retrieved from the three general databases (Web of Science, Scopus and Google Scholar) in the first place supplemented with citations from at least one citation enriched subject specific database. Well that’s a lot of work for single publication in the first place, leading to deduplication errors if you’re not very careful. Secondly it should be well know that Google Scholar, albeit attractive because of tools like Harzing’s Publish-or-Perish, is not a reliable database for citation counts at his moment (Jacso 2008). Google Scholar still has serious problems with ordinary counting and depuplication and should therefore not be used for serious citation analyses. The third argument against the use of multiple databases goes a bit further into the theory of bibliometrics and relies on approaches described by Waltman et al. (2011) and Leydesdorff et al. (2011). The key point is that a number of citations in itself has no meaning. It should be related to the citations of related documents in the same field of science. You can do that by normalizing on the mean citation rate in the field (Waltman et al. 2011) or by the perhaps more sophisticated approach sketched by Leydesdorff et al. (2011) based on the citation distributions in the fied to which the paper belongs. The latter approach is very novel, and has not really been widely tested yet. Both these approaches rely on the availability of the all the citations to the publications in a certain field of science of a certain age and document type. This can be expected that you have the availability of the means or citation distribution when you work with a specific database (for WoS there is plenty experience, with Scopus it is coming with SciVal Strata but for Google Scholar it doesn’t exist yet), but is beyond reality when you derive citation data from three or four databases at the same time.

But apart from these critical points I just made, I liked the presentation by De Bellis very much. For those interested in similar views on the citation practice I really recommend to read MacRoberts & MacRoberts (1996) as well.

The session closed with my presentation, which is enclosed here

Bibliometric analysis tools on top of the university’s bibliographic database, new roles and opportunities for library outreach

View more presentations from Wouter Gerritsma

After which the session ended with some discussion but soon all 30 or so participants hurried themselves to the coffee.

References

De Bellis, N. (2009). Bibliometrics and citation analysis : From the Science Citation Index to cybermetrics. ISBN 9780810867130, The Scarecrow Press, 450p. (download here)
Jacsó, P. (2008). The pros and cons of computing the h-index using Google Scholar. Online Information Review, 32 (3): 437-451 http://dx.doi.org/10.1108/14684520810889718 http://www.jacso.info/PDFs/jacso-pros-and-cons-of-computing-the-h-index.pdf
Leydesdorff, L., L. Bornmann, R. Mutz & T. Opthof (2011). Turning the tables on citation analysis one more time: Principles for comparing sets of documents. Journal of the American Society for Information Science and Technology n/a-n/a http://dx.doi.org/10.1002/asi.21534 http://arxiv.org/abs/1101.3863
MacRoberts, M. H. & B. R. MacRoberts (1996). Problems of citation analysis. Scientometrics, 36(3): 435-444 http://dx.doi.org/10.1007/BF02129604
Waltman, L., N. J. van Eck, T. N. van Leeuwen, M. S. Visser & A. F. J. van Raan (2011). Towards a new crown indicator: Some theoretical considerations. Journal of Informetrics, 5(1): 37-47. http://dx.doi.org/10.1016/j.joi.2010.08.001 http://arxiv.org/abs/1003.2167

Social tagging workshop at ELAG

In advance of the ELAG workshop on social tagging I wrote a little bit on a wiki site in preparation for the workshop participant as a kind of introduction to the subject. Actually it was my idea to lure a few more participants to the workshop, but the low number of participants was resolved in another way. Since the the points raised fit well in the context of this blog I thought it might be worthwile to repeat those points here as well.

The title of the workshop Social tagging is a combination of two terms, tagging and social bookmarking. At first sight they don’t seem to be the most spectacular subjects to ponder over in the ELAG workshops. But when the constituency of your library is adding tags to all kind of video’s, photographs and websites, wouldn’t you at least not give them the possibility to tag you library resources as well? Is it already possible in your library OPAC? Well, what about the bibliographic databases that your library licences, why can’t users tag those items yet? If they are tagging ‘your’ resources already the obvious questions to ask are, which items are they tagging and what tags are they using. What can we learn from our users.

Can we use those tags from to improve the recall and ranking from our library systems? How should these folksonomies be combined, enhanced, complemented with our formal taxonomies?

If your users can tag any item on your library system, where should the tags and tagged items be collected. Should it be a homegrown system like they have developed at Pennsylvania University  Library (Penntags), Harvard Law Library (H2O) or recently at Michigan (MTagger), should we advise to use the tools developed by the big scientific publishers such as 2Collab from Elsevier, Connotea from Nature or Scholar from Blackboard? Or should our academics and their precious labour on tagging be shared on common bookmarking sites such as del.icio.us, furl and the like. Is CiteUlike or Zotero perhaps the best solution after all?

When it comes to saving library items we supported already reference management programmes such as EndNote and Refworks. What is the relations between social bookmarking sites and the very popular reference management programmes. RefWorks is much better than EndNote at handeling websites, but they haven’t been developed as social bookmarking sites yet. On the other hand, Connotea and 2Collab are social bookmarking sites that have some, reference management capacity but they don’t stand up in the competition to EndNote en Refworks in this respect.

LibraryThing is perhaps an odd case in this workshop, but has some very intriguing features. Some libraries are already using the tags and recommendations from LibraryThing in their catalog. Interesting, I am not aware of an example where items tagged in a library catalog and those tags being used to enrich LibraryThing. Perhaps it exists already. I don’t know yet. LT is to some extend a special case of a reference management software. It is only used for books. An awfull lot of books. It is therefore quite easy to add your own books to LibraryThing. At our university we are all the time confronted with organically grown collections of books that are not part of the library collection. Consider the idea that those collections of books were entered in LibraryThing, that we could use the collected LibraryThings from our constituency to see if a book we don’t have in our collection is somewhere on campus, rather than rushing to the order book button. LibrarThing from our trusted users as a natural extension of our catalog and library collection?

Those are the five lines along which I hope to ponder the theme of this workshop with a group of smart library people over the next three days. Lorcan Dempsey wrote recently on this subject as a new bibliographic tissue.

ELAG2008: Rethinking subject access

Jeroen started with a really well told parable about authoritative subjects.

He illustrates that in the semantic Web we have nearly in place has all about the relations between subjects, but that authority file on the subjects are still missing. To be honest I think I only grasp a little bit about his well told and presented story. But I need time digest this all. Or is it perhaps the end of the day calling?

Tagging, social bookmarking, folksonomies, reference management, LibraryThing and the Library

Tomorrow the 32nd ELAG symposium starts. The ELAG symposiums are special in the way that workshop around pre-selected subject form the mainstay of the conference. So they told me. The workshops not the ordinary workshop, where you passively attend to learn something. The idea behind these workshops is that the participants brainstorm over a subject, perhaps that the workshop leader knows some more of the background of the subject, but the workshop itself is a true group exercise.

I was asked to moderate the workshop of social-tagging. Until now the distribution of articpants over the workshop is a bit uneven, so I need to sell my workshop on “social tagging” tomorrow in a sales pitch to the attendents of the symposium. So what will I tell them?

The title of the workshop Social tagging is a combination of two terms, tagging and social bookmarking. At first sight they don’t seem to be the most spectacular subjects to ponder over in the ELAG workshops. But when the constituency of your library is adding tags to all kind of video’s, photographs and websites, wouldn’t you at least not give them the possibility to tag you library resources as well? Is it already possible in your library OPAC? Well, what about the bibliographic databases that your library licences, why can’t users tag those items yet? If they are tagging ‘your’ resources already the obvious questions to ask are, which items are they tagging and what tags are they using. What can we learn from our users.

Can we use those tags from to improve the recall and ranking from our library systems? How should these folksonomies be combined, enhanced, complemented with our formal taxonomies?

If your users can tag any item on your library system, where should the tags and tagged items be collected. Should it be a homegrown system like they have developed at Pennsylvania University Library (Penntags), Harvard Law Library (H2O) or recently at Michigan (MTagger), should we advise to use the tools developed by the big scientific publishers such as 2Collab from Elsevier, Connotea from Nature or Scholar from Blackboard? Or should our academics and their precious labour on tagging be shared on common bookmarking sites such as del.icio.us, furl and the like. Is CiteUlike perhaps the best solution after all?

When it comes to saving library items we supported already reference management programmes such as EndNote and Refworks. What is the relations between social bookmarking sites and the very popular reference management programmes. RefWorks is much better than EndNote at handeling websites, but they haven’t been developed as social bookmarking sites yet. On the other hand, Connotea and 2Collab are social bookmarking sites that have some, reference management capacity but they don’t stand up in the competition to EndNote en Refworks in this respect.

LibraryThing is perhaps an odd case in this workshop, but has some very intriguing features. Some libraries are already using the tags and recommendations from LibraryThing in their catalog. Interesting, I am not aware of an example where items tagged in a library catalog and those tags being used to enrich LibraryThing. Perhaps it exists already. I don’t know yet. LT is to some extend a special case of a reference management software. It is only used for books. An awfull lot of books. It is therefore quite easy to add your own books to LibraryThing. At our university we are all the time confronted with organically grown collections of books that are not part of the library collection. Consider the idea that those collections of books were entered in LibraryThing, that we could use the collected LibraryThings from our constituency to see if a book we don’t have in our collection is somewhere on campus, rather than rushing to the order book button. LibrarThing from our trusted users as a natural extension of our catalog and library collection?

Those are the five lines along which I hope to ponder the theme of this workshop with a group of smart library people over the next three days. Lorcan Dempsey wrote recently on this subject as a new bibliographic tissue.

This new bibliographic tissue is really hot ladies and gentleman, please come and exchange your ideas with me in this workshop.