Archive for the 'Scientometrics' Category

How Google Scholar Citations passes the competition left and right

Google Scholar logoLast Thursday Google Scholar Citations went public. It was to be expected. Since August the product has been tested by a few (blogging) scientists. We only had to wait patiently for it to be released to all scientists. Last Thursday the moment was there.

Was it worth the wait? Yes it certainly was. Google Scholar Citations really excels at finding publications you completely forgot about. But even then, there are still –obscure- publications that even Google Scholar doesn’t know about. You simply log in and deselect those few publications that don’t belong to you. You can make searches to find publications that Google has overlooked. You get a comprehensive publication list quite quickly. Well when your name is not too common, that is. How it works for very common names, Korean scientists jump to my mind as well as John Smith, I don’t know yet. But so far nothing new, Ann-Will Harzing’s excellent Publish or Perish software already did this. What is new is the fact that Google Scholar Citations keeps the citations and publications automatically up to data and allows you to publish your own publication list on the Web with the citations and some crude citations metrics.

The two major competitors in this arena are Thomson Reuters with their ResearcherID and Elsevier’s Scopus which has their Scopus ID. With both services you can identify your own publications and assign them to a unique number. IN this way you can create your unique publications list with citation metrics as well. The main disadvantage compared to Google Scholar is their rather limited resource set. Thomson Reuters WoS “only” covers some 10,000 scholarly journals a set of selected proceedings and of recent only 30,000 books. Scopus has nearly double the number of journals but stays behind in proceedings and covers hardly any books. Google Scholar certainly covers more, but we still don’t understand what is included and what not and sometimes have our doubts about currentness of Google Scholar. The larger resource base, including books and book chapters, of Google Scholar makes will make this service more attractive for social scientist and scholars in arts and humanities studies.

On top of the smaller publication base on which these services are based, these two competitors each have their own particular disadvantage as well. You have to maintain you publications list in Thomson Reuters Researcher ID yourself manually. Each time you publish a new article, you have to add it to your profile yourself. Looking around, I see that most researchers are a bit sloppy in this respect. You can however, make your publication list and the citation impact publically available. see for example my meagre list. Scopus on the other hand, maintains your publication list automatically (albeit it made some serious mistakes in this area in the past, but they seem to have improved this service). But, and this is a big but, you can’t publish you properly curated publication list with citations publically on the Web. They used to have 2Collab for this, but since they stopped 2Collab they haven’t come up with an alternative mechanism to publish your publications list with citation impact on a public website. A real pity.

So Google Scholar easily beats ResearcherID since it updates automatically and Scopus ID because you can make your list with citations publically available. To make your publication list openly available is really recommended to all scientists, it helps your personal branding.

Certainly there are disadvantages to Google Scholar aswell. The most serious at this moment all kind of ghost citations. If you look at the citations to our bibliometrics analysis on top of repositories paper, Google counts three citations. But checking the Leydesdorff citations, a reference to our article is not to be found (of course it should have been there, but it isn’t). 0xDE reported a spam account in the name of Peter Taylor, where they collected various Taylors in a single profile boasting an h-index of 94. That Google Scholar can be fooled has been reported Beel & Grip (2010).

When I was interviewed for our university paper on Google Scholar Citations (in Dutch) I told them: Google Scholar is only about five years old. Give them another five years and they will have changed the market for abstracting and indexing database totally. If only 20 percent of all scientists make their publication lists correct (also editing of the references which can be done to improve the mistakes Google has made) even without making them publically available, Google sits on a treasure trove of high quality metadata. Really interesting to see how this story will develop.

Reference:
Joeran Beel and Bela Gipp. Academic search engine spam and google scholar’s resilience against it. Journal of Electronic Publishing, 13(3), December 2010.

Some observations during the bibliometrics session at the Österreichische Bibliothekartag

Albeit the program consistently talks about the Österreichische Bibliothekartag (singular) the whole library day spans actually 4 days. One would have expected at least the Österreichische Bibliothekartaggen (plural) but they insist in mentioning only one day. Of those four days, I was only present during part of the morning of the third day, so this is a very limited report on the Österreichische Bibliothekartag. Looking at their program, it is a very comprehensive and interesting program. Never thought that you could cover a complete session, 5 presentations, talking about cooking books (No pun intended). It only reflects that bibliometrics was only a small part of the program amongst many other subjects covered. I noticed a lot of presentations on e-book platforms, many digitization projects, plenty of mobile less of library 2.0 than you would expect (is the hype over?) and open access had also a very limited role. What struck me as interesting for conference organizers, is that many commercial presentation were programmed equally throughout the sessions. Just a sign of taking the sponsors seriously.

So far on the conference as a whole, of which I actually experienced too little. On to the bibliometrics sessions. The session was chaired by Juan Gorraiz, a bubbly Spaniard working already for years in Austria. Give him the opportunity and he will take the floor and would love to take all the time available and fill the slots for all presentations planned.

The first presentation was on a piece of research that should result in a masters thesis at some point, but some preliminary results were presented in this session by Christian Gumpenberger. The focus of the research was on the acceptance and familiarity of Austrian researchers with bibliometrics. The results were not really shocking, most researchers stated that they were familiar with impact factors, but for the moment there was no clue as to whether they were aware about a thing like a two year citation window. Or the difference between citable items and non-citable items leading to the inflation of impact factors for journals like Nature and Science. Christian sketched some sunny skies for bibliometrics in Austria, but in the subsequent discussion part this sunny view was criticized quite a bit. Notwithstanding I would like to have a look at this MS thesis when it becomes available.

The second presentation was from Italian origin by Nicola de Bellis. Nicola has written an interesting book on citation analysis in which he stresses the sociological, philosophical and historical aspects of bibliometric analyses. It is always interesting to hear a presentation like this, away from the fact finding number crunching approach which I normally have and dream a bit away on outlines of what in an ideal world should be done on a subject like this. Quite a lot, but some of it is beyond being practical. When you carry out bibliometric analyses in the library at some scale, like dealing with 18,000 papers that have collected 265,000 citations like we do in our library, you can only be practical. So there is an interesting conflict between his presentation (which will be on-line soon, I hope) and mine which followed Nicola his presentation.

I don’t want to cover all aspects of Nicolas his presentation. Go and read the book, which I am going to do as well. But at one point during his presentation I strongly disagreed with him. Where he stated that only the mediocre scientists have an interest in bibliometrics and the top scientists normally don’t have an interest in this topic. My experience it quite the contrary. In the first place it was one of Wageningen’s top scientist who urged the library to take a subscription on Web of Science back in 2001, and made it possible with a special contribution from his top institute. He knew he was a highly cited scientist, but somehow he needed Web of Science to confirm his reputation. Later on as well, apart from the discussion with scholars in the social sciences department, it has always been those top performing groups that invited me to give a presentation on this subject rather than the groups that were lagging behind in the bibliometric performance indicators. To me it has always appeared that those who are leading the pack are also interested in staying ahead of the rest and invite the library to explain the results obtained and enhance their performance in the future.

The second observation in Nicola his presentation where he was far beyond practical where he insisted on the point that for a publication all citations to this publication should be retrieved from the three general databases (Web of Science, Scopus and Google Scholar) in the first place supplemented with citations from at least one citation enriched subject specific database. Well that’s a lot of work for single publication in the first place, leading to deduplication errors if you’re not very careful. Secondly it should be well know that Google Scholar, albeit attractive because of tools like Harzing’s Publish-or-Perish, is not a reliable database for citation counts at his moment (Jacso 2008). Google Scholar still has serious problems with ordinary counting and depuplication and should therefore not be used for serious citation analyses. The third argument against the use of multiple databases goes a bit further into the theory of bibliometrics and relies on approaches described by Waltman et al. (2011) and Leydesdorff et al. (2011). The key point is that a number of citations in itself has no meaning. It should be related to the citations of related documents in the same field of science. You can do that by normalizing on the mean citation rate in the field (Waltman et al. 2011) or by the perhaps more sophisticated approach sketched by Leydesdorff et al. (2011) based on the citation distributions in the fied to which the paper belongs. The latter approach is very novel, and has not really been widely tested yet. Both these approaches rely on the availability of the all the citations to the publications in a certain field of science of a certain age and document type. This can be expected that you have the availability of the means or citation distribution when you work with a specific database (for WoS there is plenty experience, with Scopus it is coming with SciVal Strata but for Google Scholar it doesn’t exist yet), but is beyond reality when you derive citation data from three or four databases at the same time.

But apart from these critical points I just made, I liked the presentation by De Bellis very much. For those interested in similar views on the citation practice I really recommend to read MacRoberts & MacRoberts (1996) as well.

The session closed with my presentation, which is enclosed here

Bibliometric analysis tools on top of the university’s bibliographic database, new roles and opportunities for library outreach

View more presentations from Wouter Gerritsma

After which the session ended with some discussion but soon all 30 or so participants hurried themselves to the coffee.

References

De Bellis, N. (2009). Bibliometrics and citation analysis : From the Science Citation Index to cybermetrics. ISBN 9780810867130, The Scarecrow Press, 450p. (download here)
Jacsó, P. (2008). The pros and cons of computing the h-index using Google Scholar. Online Information Review, 32 (3): 437-451 http://dx.doi.org/10.1108/14684520810889718 http://www.jacso.info/PDFs/jacso-pros-and-cons-of-computing-the-h-index.pdf
Leydesdorff, L., L. Bornmann, R. Mutz & T. Opthof (2011). Turning the tables on citation analysis one more time: Principles for comparing sets of documents. Journal of the American Society for Information Science and Technology n/a-n/a http://dx.doi.org/10.1002/asi.21534 http://arxiv.org/abs/1101.3863
MacRoberts, M. H. & B. R. MacRoberts (1996). Problems of citation analysis. Scientometrics, 36(3): 435-444 http://dx.doi.org/10.1007/BF02129604
Waltman, L., N. J. van Eck, T. N. van Leeuwen, M. S. Visser & A. F. J. van Raan (2011). Towards a new crown indicator: Some theoretical considerations. Journal of Informetrics, 5(1): 37-47. http://dx.doi.org/10.1016/j.joi.2010.08.001 http://arxiv.org/abs/1003.2167

Scimago rankings 2011 released

Today Félix de Moya Anegón announced on twitter  that the Scimago Institutional rankings (SIR) for 2011 were released. These rankings are not very well known or widely used. Yesterday during a ranking masterclass from the Dutch Association for Institutional Research the SIR was not even mentioned. Undeservedly so. Scimago lists just over 3000 institutions worldwide. It is therefore one of the most comprehensive institutional ranking. If not the most. It is also a very clear ranking they only measure publication output and impact. It thus ranks only research performance of the institutions and therefore very similar to the Leiden ranking.

What I like about Scimago, is their innovative indicators, they come up with each year. Last year they introduced the %Q1 parameter. Which is the ratio of publications that an institution publishes in the most influential scholarly journals of the world. Journals considered for this indicator are those ranked in  the first quartile (25%) in their categories as ordered by SCImago Journal Rank SJR indicator. This year they introduced the Excellence Rate. The Excellence Rate indicates which percentage of an institution’s scientific output is included into the set formed by the 10% of the most cited papers in their respective scientific fields. It is a measure of high quality output of research institutions. Very similar indicators, the excellence indicator is just a tougher version of the %Q1.

The other new indicator is the specialization index. The Specialization Index indicates the extent of thematic concentration / dispersion of an institution’s scientific output. Values range between 0 to 1, indicating generalistic vs. specialized institutions respectively.

Their most important indicator to express research performance is their Normalized Impact (NI). Which is similar to the MNCS of the CWTS and RI as we calculate in Wageningen. The values, expressed in percentages, show the relationship of an institution’s average scientific impact and the world average, which is 1, –i.e. a score of 0.8 means the institution is cited 20% below average and 1.3 means the institution is cited 30% above average.

Last year the the Scimago team showed already that there is exist an exponential relationship between the ability an institution has to lead its scientific papers to better journals (%Q1) and the average impact achieved by its production in terms of Normalized Impact. It is a relationship I always show in classes on publications strategy (slides 15 and 16). When looking at the Dutch universities, I noted that the correlation between the new excellence indicator and normalized impact is even better than with the %Q1. So the pressure to publish in the absolute top journal per research field will even further increase if this become general knowledge.

What do we learn for the Dutch universities from the Scimago rankings. Rotterdam still maintains its top position for normalized impact, it scores also best for the %Q1 and Exc. Direct after Rotterdam you Leiden, UvA, VU, Utrecht and Radboud with equal impact. Utrecht has published the most articles during the period 2005-2009. Wageningen excels at international cooperation. And both Tilburg and Wageningen are the most specialized universities in the Netherlands.

Making these international rankings is quite a daunting task. For the Netherlands I noticed that the output of Nijmegen was distributed over Radboud University and Radboud University and Nijmegen Medical Centre, this was not done for the other university hospitals.  And for Wageningen the output was noted under Wageningen University and Research Centre and Plant Research International (which is part of Wageningen UR). But for researchers from Spain these are difficult nuances to resolve 100% perfectly.

My only real complaint with the ranking is the fact that they state it is not a league table, and they rank the institutions on publication output. It is so much more obvious to present the list ranked on NI. Since they only produce the ranking as a PDF file, it took me a couple of hours to translate it into an excel spreadsheet and rank the data any way I wish. With all the information at hand it is also possible design your own indicators, such as a power rank in analogy of the Leiden rankings.

The message to my researchers: aim for the best journals in you field. We still have scope for improvement. We are still not in the neighbourhood of the 30 to 40% Exc. Rate we see for Rockkefeller, Harvard and the like.

Publishing for impact

It has been a while. Yes.

But here is a link to a presentation on points for a publication strategy I gave a little while back for some 300 PhD students at our university. The presentation was titled “Publishing for impact”

Publishing for impact

View more presentations from Wouter Gerritsma.

If you are interested in the actual presentation than you need to have a look at the registration of the whole symposium on writing a world class paper, I start somewhere around 2:51.  The other presentations that afternoon were interesting as well. See for those presentations the news item in our newsletter.

Related Towards a publication strategy

The role of university rankings in university marketing

ResearchBlogging.org So far I did not notice any proper research on the role of university rankings in relation to university marketing. Of course, I am aware of many instances that the importance of university rankings have been mentioned in this respect, but evidence to substantiate these claims are rare.

I was therefore pleasantly surprised by the research of Liang-Hsuan Chen (2008) which only passed my screen today. She found that for Asian graduate students attending Canadian universities the rankings played an important role in university selection. She found:

Graduate students enrolled in professional programs ranked factors such as the ranking of the program and affordability of tuition with high importance in choosing a Canadian graduate school. The fact that the ranking of program was ranked with the highest importance by this group of students was in part due to the availability of program ranking information and marketing efforts (e.g., the MBA Tour) undertaken by the programs.

My impression from this piece of research, whether you like it or not, rankings do play their role in the perception and choice of international students in their selection of university to complete their graduate education. Rankings have different purposes Chen explains:

Reputational ranking became a proxy for the quality of education. Although much criticized by academics for its lack of both validity and reliability, reputational ranking serves three purposes: first, it is a promotional tool for higher education institutions to recruit students; second, it is an assessing tool for international students to screen out competitive choices; and third, it is a marketing and signaling tool for students themselves after they graduate.

So it’s not only important to be present in the various University rankings. You better make sure you rank well!

References
Chen, Liang-Hsuan (2008) Internationalization or International Marketing? Two Frameworks for Understanding International Students’ Choice of Canadian Universities, Journal of Marketing For Higher Education, 18(1): 1-33, http://dx.doi.org/10.1080/08841240802100113 (Subscription required)

Journal quality, an unexpected improvement of the JCR

It is odd to say, but for researcher the journal as an entity is disappearing. Scientist search for information in online databases and select from title and abstract information whether the article suits their needs. The days that scientists visited the library and browsed the table of contents of the most important journals to keep up with their field have long gone .

Still there is a lot of emotion around journals titles. Scientist want to publish their research in the best possible journal. Earlier this year the NOWT (2008) published a report on the performance of Dutch universities and there it was clearly shown that field normalized citation impact for each university correlated positively with the field normalized journal quality.
Journal quality versus Citation impact

Looking at this graph it is clear that there is considerable reason to selected the best journals in their field to publish your results. However, until recent the only widely available journal quality indicator has been the journal impact factor. There has been a lot of criticism on the uses and abuses of impact factors, but they have stood their time. All scientists are at least aware of impact factors. For years ISI, Thomson Reuters were in fact the sole gate keepers of journal quality rankings.

Over the last years a number of products, free and fee based, have tried to come up with new and competing journal ranking measures. SicmagoJR (based on Scopus data), journal analyzer from Scopus, Eigenfactor.org and the data from Thomson’s own Essential Science Indicators of course.

This week Thomson Reuters announced that they will update the journal citation report. From the 1st of February we get a entirely new Journal Citation Report. From the press release:

  • Five-Year Impact Factor - provides a broader range of citation activity for a more informative snapshot over time.
  • Journal “Self Citations” – An analysis of journal self citations and their contribution to the Journal Impact Factor calculation.
  • Graphic Displays of Impact Factor “Box Plots” - A graphic interpretation of how a journal ranks in different categories.
  • Rank-in-Category Tables for Journals Covering Multiple Disciplines - Allows a journal to be seen in the context of multiple categories at a glance rather than only a single one.

It is highly unusual to see two updates per year for JCR. But it is interesting to to note how they are moving under the pressure of some competition.

Literature:
NOWT (2008). Wetenschaps- en Technologie- Indicatoren 2008. Maastricht, Nederlands Observatorium van Wetenschap en Technologie (NOWT). http://www.nowt.nl/docs/NOWT-WTI_2008.pdf (in Dutch)

Self citations do work

Blogging on Peer-Reviewed ResearchIn a very extensive article van Raan has studied the effect of self citations on the total citations to a groups’ work. In the concluding paragraph van Raan writes:

[] external citations are enhanced by self-citations, so that we have the “chain reaction:” Larger size leads to more self-citations, which lead to more external citations. This mechanism is strongest for the lower impact journals—they “make size work”—as well as for higher performance groups. In other words, lower impact journals enable research groups more than do higher impact journals to “advertise” their other work by means of self-citations.

Most interesting to note about this article was that van Raan cited himself 11 times out of 28 in total. It may seem to be a bit excessive, but stresses his point excellently.

Another point that I always stress within the theme of publication strategy is to consider Open Acces publishing. Since the last few years I have noted that van Raan is publishing his articles in OA on Arxiv. His group has not (yet) demonstrated the advantage of OA publishing on citation impact scientifically yet, but the master of scientometrics is putting it into practice anyway. Something to be considered by every researcher very seriously.

Reference
van Raan, A. F. J. (2008). Self-citation as an impact-reinforcing mechanism in the science system. Journal of the American Society for Information Science and Technology, 59(10): 1631-1643. http://arxiv.org/ftp/arxiv/papers/0801/0801.0524.pdf

The mysterious ways of Web of Science

A while back, one of our researchers asked me how Steven Salzberg arrived at the number of citations for the paper on the Arabidopsis genome in Nature. When he checked Web of Science, it only delivered zero citations and that couldn’t be true for such a breakthrough paper. Peter found 2689 citations! How did he do that?

I checked out the paper in Web of Science myself first as well, and found also zero citations.

Zero citations from Web of Science for the Arabidopsis papers

I was not entirely surprised since I realized it was one of those consortium papers. I knew Thomson had some problems with a consortium paper in the past. But annoying it was.

I first checked about the issue around the human genome project and found it being mentioned even in Science Watch from Thomson. But from the article it appeared that Thomson only improved the tracking for citations from that Human Genome project paper, and not the raised issue per se. Even though the Arabidopdsis paper was even older the citations to this paper had not been corrected. It appeared that something in the searching or tracking of citations by WoS went wrong but where was the error being made?

I made a few futile attempts in the cited ref search with Arabidopsis as author, or Arabidopsis*. Searched in the cited ref search for Kaul as author (which is listed in the end of the original article as first author) but that only resulted in some 130 citations. Not sufficient to justify Steven Salzberg number of citations. I did not like to use the cited ref search to look for the cited articles from Nature in 2000 this is a very large result set that you have to wade through innumerable pages of results since you can’t refine these type of searches by volume or page numbers. (Wouldn’t that be nice?)

To reassure my inquisitive researcher I pointed him to Scopus (Sorry Thomson) where the he could see a reassuring 3000+ citations himself. Meanwhile I did not have a quick fix for this problem.

It was only later when I looked into the problem again, and somehow I was forwarded to the all databases search rather than the Web of Science search tab, which I normally use. To my utter amazement the title search delivered this time two records. Both with zero citations, but more importantly it showed next to [Anon] Ar Gen In, as the author.

Zero citations from Web of Science for the Arabidopsis papers

Now the problem was simple. I had found the author. A cited ref search yielded indeed nearly the 2689 citations from Steven Salzberg.

Zero citations from Web of Science for the Arabidopsis papers

But these figures are not entirely correct either since there are some additional 131 citations to be found with Kaul as a first author reference to Nature with the correct volume and page number.

Of course I requested at Web of Science a correction of the citation data, but forgot to include Kaul’s citations. Hopefully this will be repaired at a later date.

But what makes me really wondering is the slight -but very important- difference in record presentation between the All Databases search and the Web of Science search  on Web of Knowledge. For me personally the standard entry in Web of Knowledge is the Web of Science tab. Not in my normal working routine would I ever go to the all databases tab to look up a number of citations. Just by luck I found the right author name on this occasion. But it shouldn’t have to become the standard way to perform searches shouldn’t it?

Research management and research quality

Blogging on Peer-Reviewed ResearchResearch performed at our universities is nowadays a heavily directed practice. Top down in most cases. Research for the sake of research has become a rare phenomenon. Research evaluations, research management and research organization are weeding out little pet projects on the side. Grant money and research funders are requesting concrete results of achievements and determine the objectives to be completed in advance.

It is therefore rather odd that in such a strongly organized and managed environment the organization of research itself is less subject of the academic discourse. I still remember my old professor who once insisted that “we didn’t need knowledge management since we produced knowledge”. That whilst after each completed PhD project another successful candidate left the organization with his knowledge written down in a number of articles and very seldom made explicit within the organization. That did not matter too much to him.

The researchers, research groups and graduate schools at universities in the Netherlands are regularly evaluated by external peer reviews. Productivity, Quality, Relevance and Vitality of the research are the main criteria on which groups are judged. It is odd however that very little study has been made of the underlying explanatory factors of successful groups versus less successful groups. I was therefore pleasantly surprised by an article of van der Weijden et al. (2008) who looked into some aspects of managerial control of research groups on their research performance.

An important shortcoming of their study was that the only bibliometric parameter they looked at was the number of papers produced in the journals covered by Web of Science. It really would have been useful if they had looked at normalized citation impact as one of their variables as well. Apart from the simple bibliometric measure of published peer reviewed articles they also looked at the success of the groups at the attainment of research grants etc.

Their most important finding was that:

“One internal research management activity was found to have a positive relationship with (bio)medical research performance in general. Offering special commendations to (bio)medical (both preclinical and clinical) research staff members, including non-financial prizes, in order to motivate them is positively related to all performance measures used in this study.”

Or in other words positive attention from the senior managers for what researchers were up to paid off really well.

From the more detailed conclusions another one struck me as very interesting as well:

“Different types of internally organized research evaluation practices have (linear) positive relationships with performance measures concerning external research funding. In preclinical groups pre-evaluations of research proposals have a positive relationship with these performance measures. Interestingly, in clinical groups, positive relationships are found with research output evaluations.”

Where in practice the external peer reviews are most often met with some degree of resistance. Well, criticism at least. It seems to be worth the effort invested by all participants into these kind of exercises.

Always good to realize this when our library is involved in the preparation of the peer review of six different graduate schools which involve about 1000 permanent staff and some 3000 researchers in total.

Reference:

van der Weijden, I., D. de Gilder, P. Groenewegen & E. Klasen. (2008). Implications of managerial control on performance of Dutch academic (bio)medical and health research groups Research Policy 37(9): 1616-1629 http://dx.doi.org/10.1016/j.respol.2008.06.007 (subscription required).

Herbert van de Sompel at Ticer: Scholarly communication in the digital age

Van de Sompel is an enthusiastic talker and really does his best to take the audience in the world of scientometrics. I am a fan. Have a look at the subjects of this blog. The Mesur project is about a totally new set of data analysis of scholarly communication moving partly away citation data to actual downloading and clicking behaviour and perhaps reading habits. Their goal is to develop new metrics.

Really interesting stuff. But still really a little bit beyond most libraries.

Bollen, J., H. van de Sompel, et al. (2008). Towards Usage-based Impact Metrics. Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries: 231-240. http://portal.acm.org/citation.cfm?id=1378889.1378928