Archive for the 'Scientometrics' Category

New webometrics ranking of world universities released

Of all possible rankings of universities that are available, the Webometrics Ranking of World Universities takes an odd place. It only looks at the website performance of the university. Their rankings have been updated somewhere earlier this month.

I have mixed feelings with their approach, but it is a prelude newer rankings than those solely based on scholarly output and impact. However I think that their approach needs more time and better tools than are available at the moment. The leading  researchers in this field are in the group of Mike Thelwall. Their measurements are based on their own crawlers and tools to explore, measure and investigate the academic Web. They have can understand and interpret their results completely. The Cybermetrics Lab (CINDOC) which produces the Webometrics rankings uses publicly available tools such as Yahoo!, Google and Exalead over which they don’t have control. And more importantly they don’t know whasoever how these results come about. Another problem with e.g. Google is that the number for search results are notoriously unreliable. It depends amongst others on time of day, Web Traffic, Server Load at Google and Data Center dat is being used.

So for the moment we have to take these results with a spoon full of salt rather than a pinch. It is also a question what is being measured. Take for instance the size of university Websites.  In Utrecht all staff and students appear to have personal webpages on the University Website. These are all included in the count, whether they actually contain some usefull information or not. At our University the mainstay of the indexed webpages consist of catalog records from the library. I really wonder if you really want to compare these apples and pears.

As for the measure of rich files I really wonder if they have been able to harvest all the material deposited on our repository. Looking a the statistics such as provided by OAISTER on OA harvestable documents, Wageningen University has one of the larger content rich repositories in the Netherlands. In the Webometric we are the bottom fish for this measure in the Netherlands. That we are making use of proprietary software but still adhering the OAI-PMHH protocol, of that the repository is hosted as a directory http://library.wur.nl/way should not effect the rankings as it does for the moment.

On other measures they are completely vague about the exact measure. Take for instance the Google Scholar measure. They state: “Google Scholar provides the number of papers and citations for each academic domain. These results from the Scholar database represent papers, reports and other academic items.” How do they combine publications and citations in a single measure? It is not explained. Google never gives more than the first 1000 results. How do they arrive at all citations for an institute? How did they search for the name of an institute? Did they include medical training hospitals with the University.

I do use these rankings for one point though. That is to push for the improvement of our University and Library Website wherever possible. In some aspects that is really badly needed. But I really want to take these rankings more seriously. For the moment I can’t. They have been updated again that should be the message of this post, since their blog has been defunct for quite some time already. A pity.

On Impact Factors and article quality

I just found this quote:

Thus, while it is incorrect to say that the impact factor gives no information about individual papers in a journal, the information is surprisingly vague and can be dramatically misleading.

(Adler et al. 2008)

The report is a very critical discussion about the use and abuse of impact factors and the h-index.

Reference:
Adler, R., J. Ewing, et al. (2008). Citation Statistics : A report from the International Mathematical Union (IMU) in cooperation with the International Council of Industrial and Applied Mathematics (ICIAM) and the Institute of Mathematical Statistics (IMS), Joint Committee on Quantitative Assessment of Research. 26p. http://www.mathunion.org/fileadmin/IMU/Report/CitationStatistics.pdf

Hattip: Sidi

Article impact and journal impact factors

In the scientometric literature we are very often warned not to use journal impact factors to judge the performance of researchers or research groups. For this statement I always refer back to Seglen (1997). Seglen showed that only 50% of the articles in three chemistry journals contributed to 90% of the citations to those journals, i.e. the other half of the articles only contributed to 10% of the citation impact. It is one of those illustrations of the long tail of scientometrics.

In my courses on citation analysis I point always to this fact, and elaborate on the use of journal impact factors in journal selection as part of a publication strategy. Choose the highest impact factor journals to submit your best work is a simple advice.

In the latest analysis by the NOWT of research performance in the Netherlands, my university is placed of the second division of Dutch universities ranked by citation impact. One of the points the report made quite clear was that the field corrected journal impact of the articles was far below the national average. Actually, it was only the second worst university in this respect, only Tilburg University feared worse. (NOWT 2008, table 4.5 on p.40).

I think there is a necessity to pay more attention to this fact at our university. In a informal citation analysis for one of our chair groups I am going to elaborate this point a bit further.

Relative impact versus Journal impact factors

If you look at the relative impact of their articles published the period 1998-2005 and the journal impact factors you get a large scatter diagram. If you want to draw a regression line, it seems a bit meaningless. The slope is just positive, but the R² is only 0.0048. The problem is of course that the relative impacts of the articles are far from normally distributed. The average of the relative impacts per article is 1.35, whereas the median is 0.92. Most articles have a relative impact below world average. If you calculate the average article impact for this group as the sum of citations divided by the sum of the baseline citations the relative impact is 1.28.

For me the picture became much clearer when I drew the lines for the median citation impact and the median journal impact. If you look at the articles below the median citation impact line, most articles are concentrated in the lower journal impact factor quadrant 36 versus 16. Of the higher impact articles most articles are concentrated in higher journal impact factor quadrant, 35 against 16. Actually those 35 articles were published in only 14 different journals.

Relative impact versus Journal impact factors with the median lines

Perhaps this research group should focus their publication output on those 14 journal titles, and stay away from the 21 journals associated with lower left quadrant. I found this approach quite revealing.

References
NOWT (2008). Wetenschaps- en Technologie- Indicatoren 2008. Maastricht, Nederlands Observatorium van Wetenschap en Technologie (NOWT). http://www.nowt.nl/docs/NOWT-WTI_2008.pdf
Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research. BMJ 314(7079): 497-502. http://bmj.bmjjournals.com/cgi/content/full/314/7079/497

Towards a publication strategy

This afternoon we had the opportunity to inform some of our participants in the Graduate School of VLAG on the procedures in the preparation of the external peer review which will take place next year. The first part of our presentation was, on my part, quite straight forward explaining the details of the bibliometric analysis which is part of the self assessment in preparation of the external peer review.

The second part of our presentation,  presented by Marianne, was much more speculative. Perhaps more interesting. It dealt with the opportunities to enhance your publication impact. There are no hard guidelines on this subject whatsoever. We had to strech our imagination to the limit, but I think we found quite a balanced set of rules to set out for our audience.

SlideShare | View | Upload your own

Elsevier’s topcited just launched

Where Thomson scientific has already for quite some years the free website ISIhighlycited, Elsevier has launched today (?) a competitive product called TopCited. Albeit not the same, it is clear that the competition is inspiring both companies to come up with new products in each other niches. The databases are effectively a lure to get reserchers interested in the products behind it. TopCited gives an overview of subject-specific top 20 cited articles in the past 3, 4 or 5 years of publication. The underlying database for the citation data is Scopus of course.
I just discovered it, some quick impressions:

  • A time frame of maximally 5 years is a bit brief. I would love to see a 10 year frame as well.
  • I suspect they have some difficulty of determining the research field of article published in multidisciplinary journals such as Nature and Science. They seem to be lacking from rankings, albeit a glimpsed a few. Too few according to my impression.

Later on I will look at this new site more carefully, and will attempt to make a comparison with the competitive Thomson databases.

What’s in a name

In courses on citation analysis for research evaluation I always give stern warning to researchers not to change their names. That is most important nowadays since it has become fashion to publish on a first name basis. First names differ occasionally from given names and can lead therefore to confusion when evaluators perform a citation analysis for whatever purpose. The situation is always a trifle more  complicated  for female researchers.  Young aspiring scientist start publishing  with their own name. Later on in their career some of them opt to publish under their husband’s name. Not to mention what happens after a divorce.

Since citation analysis is seemingly easy to perform with more and more databases offering simple citation lookup options, researchers should be aware of the consequences of their, often sloppy or at least in consequent, habits of referring to their own names in scholarly articles.

In today’s newspaper (NRC 20080305) there was a very interesting article reporting on some research carried out at the University of Tilburg. In this research they experimented with the influence of the change of the woman’s name after marriage on their social career. Three different experiments were performed and all three of them showed unequivocally that changing names after marriage had a negative effect on their social careers.

So far so good. But what amazed me most was that 83% of the female students of Tilburg University (going for their MSc)  taking part in these experiments planned to change their names after marriage.  This  is apparently  about the national average.  Of the male students 81%  expected their future wifes to be to adopt their names.

I was under the impression that years and years of women’s lib would have solved this problem quite soon. How wrong I was.

Hattip: GS

New version of Citeseer available

Citeseer was the first citation enhanced  bibliographic  database which provided free available citation data for the scientific literature. It was  therefore the first serious competitior for the kings of citation data ISI/Thomson Scientific. Citseer covered the literature of computer and information science. Started in 1997 at the NEC Research Institute, Princeton, New Jersey it has come a long way. Since it’s inception, the original CiteSeer grew to index over 750,000 documents and served over 1.5 million requests daily, pushing the limits of the system’s capabilities.

The next Generation Citeseer, CiteseerX, is now available for search.

My first impression is a really nice intuitive layout, and a fast search performance. I will keep pointing students to this free resource during my classes on citation analysis.

Impact factors and Scimago JR compared

In December I promised to look into more detail of the newly launched Scimago Country & Journal Rank database. Scimago has attracted some attention in the blogosphere outside Spain since December and got some serious attention from Declan Butler as a news item in Nature (Subscription required).

It is too early for some thorough in-depth investigations of this new database, but the better blog reactions were at Information Research and a second time again and the Biomed Central Blog . They both had an issue of self interest to see where they where their journals were standing in this new database. We have to wait a bit longer for the reviews in the scholarly literature, I’m afraid.

Meanwhile I have looked into this database a bit more closely. In this blogpost I report some of my findings. My reason to look into this database more closely is mainly triggered by the fact that it allows us –librarians- to evaluate the rankings of a larger set of journals in a quantitative way. Impact factors have played a role in the decisions on journal subscriptions and cancellations –albeit not the sole criterion- How does the SJR compare to the impact factor is my main question.

SJR is “an indicator that expresses the number of connections that a journal receives through the citation of its documents divided between the total of documents published in the year selected by the publication, weighted according to the amount of incoming and outgoing connections of the sources.” In essence is the SJR an Pagerank type of indicator in which citations from highly ranked journals increase the ranking of the journal.

To gain more understanding SJR and I have looked at the journals in the subject category ‘Library and Information Science’. This category includes some 98 journals. It is important to note that SCImago JR has a much more refined subject categorization than included in Scopus itself. Although I speculate that this subject categorization is possibly somewhere under the hood in Scopus as well. The corresponding category in JCR is Information ‘Science & Library Science’ which contains 53 journals.

It is really easy to transfer the data from Simago JR to excel, where it always take a bit more clicks (making a marked list) and using the print export to get the data into excel. Interesting to note that in the web environment SCImago uses a European number notation with comma’s indicating the fraction and the dot indicating the thousands. On transfer to excel this is corrected automatically. A minor point from SCImago is that ISSN numbers are lacking from the exported data. In JCR the full journal titles are not exported.

The journals from JCR were matched manually against the journals from SCImago since a shared field was missing. Only a few journals from JCR were not found directly in the downloaded journals from SCImago. The journals ‘Journal of the American Medicals Information Association’, ‘Information and Management’ and ‘Journal of Scholarly Publishing’ were included in other journal categories than ‘Library and Information Science’. Furthermore it was noted that the journal ‘International Journal of the Geographical Information Science’ was included twice in the list of Library and Information Science journals at rank 5 and rank 33 again. In the processing the journal at rank 33 was dropped from the list. In the JCR the Journal of Government Information is still include albeit it was from 2005 already included in Government Information Quarterly –The calculation of IF in JCR 2006 is indeed based on only a single year of data-. Two other journals Online and Econtent included in JCR and included in Scopus were not to be found in SCImago. This is not really a great miss, since these are trade journals rather than peer reviewed scholarly journals, but this applies to some other journals included in the table as well, e.g. The Scientist and Library Journal. In the end 50 journals from SCImago and JCR in the LIS field could be matched. The full list of journals included in this little study is linked as a Google Document.

Looking at the table it is apparent that the maximum value of SJR is an order of magnitude smaller than the impact Factors. At the lower en of the scale Impact factors become zero, whereas the lowest value of SJR in this set of journals is 0.038.
In Figure 1, I have plotted the IF against the SJR. There seems to be a strong relationship between SJR and IF, albeit there are some outliers from an apparent linear relationship. Interestingly these three outliers are LIS journals on medical librarianship, they are: Journal of the American Medical Informatics Association : JAMIA, Journal of Health Communication and Journal of the Medical Library Association. MIS Quarterly is not regarded as an outlier since it clear follows lies on the trendline underlying the other datapoints.

Figure 1

I think the three outliers really illustrate the point that SJR is more a pagerank type of indicator. The three medically oriented journals receive relatively citations from highly ranked medical journals. Checking this for JAMIA in Scopus, we find citations from journals such as Pediatrics (SJR=0.528), Annals of Internal Medicine (SJR= 1.127) or BMC Bioinformatics (SJR= 0.957). The journal adhering the trendline for LIS journals receive far less of these kind of “external” citations.

Excluding the three medical journals we get a very good regression between the two parameters with an R² of 0.86. In Figure 2 the regression line is added based on the remaining 47 journals.

Figure 2

Thought this is a really cool result illustrating the difference between SJR and IF quite clearly. In a subsequent post I will look a bit more into the correlations between the various parameters a bit more.

Another bibliometrics presentation

SlideShare | View | Upload your own

Tomorrow I will give a brief presentation on the outcomes of a citation analysis exercise we did for a chairgroup at our university a while back. I share this presentation since I contains some tips on publishing which some might find useful.

Citation analysis for research evaluation

Tomorrow, I am about to give a course on citation analysis for research evaluation. This powerpoint is the mainstay for the morning, but the course is open to any suggestions. It differs only in little details from the course given at the start of this year. The most exciting change came from Scimago, which I only discovered yesterday but has already been included in the exercises.