Testing Science Seeker

This post is only compiled to be included in the Science Seeker aggregator
sciseekclaimtoken-4edbd224a1812

How Google Scholar Citations passes the competition left and right

Google Scholar logoLast Thursday Google Scholar Citations went public. It was to be expected. Since August the product has been tested by a few (blogging) scientists. We only had to wait patiently for it to be released to all scientists. Last Thursday the moment was there.

Was it worth the wait? Yes it certainly was. Google Scholar Citations really excels at finding publications you completely forgot about. But even then, there are still –obscure- publications that even Google Scholar doesn’t know about. You simply log in and deselect those few publications that don’t belong to you. You can make searches to find publications that Google has overlooked. You get a comprehensive publication list quite quickly. Well when your name is not too common, that is. How it works for very common names, Korean scientists jump to my mind as well as John Smith, I don’t know yet. But so far nothing new, Ann-Will Harzing’s excellent Publish or Perish software already did this. What is new is the fact that Google Scholar Citations keeps the citations and publications automatically up to data and allows you to publish your own publication list on the Web with the citations and some crude citations metrics.

The two major competitors in this arena are Thomson Reuters with their ResearcherID and Elsevier’s Scopus which has their Scopus ID. With both services you can identify your own publications and assign them to a unique number. IN this way you can create your unique publications list with citation metrics as well. The main disadvantage compared to Google Scholar is their rather limited resource set. Thomson Reuters WoS “only” covers some 10,000 scholarly journals a set of selected proceedings and of recent only 30,000 books. Scopus has nearly double the number of journals but stays behind in proceedings and covers hardly any books. Google Scholar certainly covers more, but we still don’t understand what is included and what not and sometimes have our doubts about currentness of Google Scholar. The larger resource base, including books and book chapters, of Google Scholar makes will make this service more attractive for social scientist and scholars in arts and humanities studies.

On top of the smaller publication base on which these services are based, these two competitors each have their own particular disadvantage as well. You have to maintain you publications list in Thomson Reuters Researcher ID yourself manually. Each time you publish a new article, you have to add it to your profile yourself. Looking around, I see that most researchers are a bit sloppy in this respect. You can however, make your publication list and the citation impact publically available. see for example my meagre list. Scopus on the other hand, maintains your publication list automatically (albeit it made some serious mistakes in this area in the past, but they seem to have improved this service). But, and this is a big but, you can’t publish you properly curated publication list with citations publically on the Web. They used to have 2Collab for this, but since they stopped 2Collab they haven’t come up with an alternative mechanism to publish your publications list with citation impact on a public website. A real pity.

So Google Scholar easily beats ResearcherID since it updates automatically and Scopus ID because you can make your list with citations publically available. To make your publication list openly available is really recommended to all scientists, it helps your personal branding.

Certainly there are disadvantages to Google Scholar aswell. The most serious at this moment all kind of ghost citations. If you look at the citations to our bibliometrics analysis on top of repositories paper, Google counts three citations. But checking the Leydesdorff citations, a reference to our article is not to be found (of course it should have been there, but it isn’t). 0xDE reported a spam account in the name of Peter Taylor, where they collected various Taylors in a single profile boasting an h-index of 94. That Google Scholar can be fooled has been reported Beel & Grip (2010).

When I was interviewed for our university paper on Google Scholar Citations (in Dutch) I told them: Google Scholar is only about five years old. Give them another five years and they will have changed the market for abstracting and indexing database totally. If only 20 percent of all scientists make their publication lists correct (also editing of the references which can be done to improve the mistakes Google has made) even without making them publically available, Google sits on a treasure trove of high quality metadata. Really interesting to see how this story will develop.

Reference:
Joeran Beel and Bela Gipp. Academic search engine spam and google scholar’s resilience against it. Journal of Electronic Publishing, 13(3), December 2010.

Journals changing publisher, but can the rights change as well?

Journal cover Animal ConservationFrom a perspective as a repository manager I do like the Cambridge University Press journals a lot. Albeit no immediate OA, after a year the author is allowed to post the publisher’s version/PDF of his article in an institutional repository. We adhere to this policy on behalf of our authors. So we post all the publications in Cambridge journal articles to our repository after this 12 month embargo period. Sounds simple and it actually is that simple.

Recently I ran a check on this policy using the DOI’s, rather than the ISSN’s we normally use, on the metadata we collect of all our researcher’s publications. The DOI string of Cambridge journal articles all start with the prefix 10.1017. I came across an article published in the journal Animal Conservation from 2004 which was not OA on our repository. Further checking this article I found out that the DOI of the article resolved to the Wiley Online library, where the article came online only in 2006, instead of the Cambridge website. Rather odd. Checking the copyright and archiving policy of this journal at the Sherpa Romeo site, they referred to the rather limited Wiley copyrights and self archiving possibilities for this journal. Sherpa Romeo implies that this is applicable for all content of this journal. I was rather disappointed.

However, that Cambridge DOI bothered me, so I checked the Cambridge site for the journal and could find the article there as well. The DOI however, resolves to the Wiley online journals site. Clearly the journal changed from publisher, that happens all the time. But on changing from publisher it appears that the authors’ copyrights changed as well. Especially since the Wiley site also hosts the complete backfile going back to Volume 1, issue 1. That the authors self archiving rights changed on change of publisher for the journal can’t be the case because they’re based on the original publishing agreement, but the Wiley site and Sherpa Romeo do imply that the Wiley copyright and self archiving policies apply to all content of the journal. That can’t be true, can it? But here I have an article hosted at two publishers websites with two very different self archiving policies.

Of course we adhere to the Cambridge self archiving policy for this article. There is therefore now a third copy copy of this article available on the Web, proudly presented in Wageningen Yield.

These are strange ways of publishers and copyrights.

Some observations during the bibliometrics session at the Österreichische Bibliothekartag

Albeit the program consistently talks about the Österreichische Bibliothekartag (singular) the whole library day spans actually 4 days. One would have expected at least the Österreichische Bibliothekartaggen (plural) but they insist in mentioning only one day. Of those four days, I was only present during part of the morning of the third day, so this is a very limited report on the Österreichische Bibliothekartag. Looking at their program, it is a very comprehensive and interesting program. Never thought that you could cover a complete session, 5 presentations, talking about cooking books (No pun intended). It only reflects that bibliometrics was only a small part of the program amongst many other subjects covered. I noticed a lot of presentations on e-book platforms, many digitization projects, plenty of mobile less of library 2.0 than you would expect (is the hype over?) and open access had also a very limited role. What struck me as interesting for conference organizers, is that many commercial presentation were programmed equally throughout the sessions. Just a sign of taking the sponsors seriously.

So far on the conference as a whole, of which I actually experienced too little. On to the bibliometrics sessions. The session was chaired by Juan Gorraiz, a bubbly Spaniard working already for years in Austria. Give him the opportunity and he will take the floor and would love to take all the time available and fill the slots for all presentations planned.

The first presentation was on a piece of research that should result in a masters thesis at some point, but some preliminary results were presented in this session by Christian Gumpenberger. The focus of the research was on the acceptance and familiarity of Austrian researchers with bibliometrics. The results were not really shocking, most researchers stated that they were familiar with impact factors, but for the moment there was no clue as to whether they were aware about a thing like a two year citation window. Or the difference between citable items and non-citable items leading to the inflation of impact factors for journals like Nature and Science. Christian sketched some sunny skies for bibliometrics in Austria, but in the subsequent discussion part this sunny view was criticized quite a bit. Notwithstanding I would like to have a look at this MS thesis when it becomes available.

The second presentation was from Italian origin by Nicola de Bellis. Nicola has written an interesting book on citation analysis in which he stresses the sociological, philosophical and historical aspects of bibliometric analyses. It is always interesting to hear a presentation like this, away from the fact finding number crunching approach which I normally have and dream a bit away on outlines of what in an ideal world should be done on a subject like this. Quite a lot, but some of it is beyond being practical. When you carry out bibliometric analyses in the library at some scale, like dealing with 18,000 papers that have collected 265,000 citations like we do in our library, you can only be practical. So there is an interesting conflict between his presentation (which will be on-line soon, I hope) and mine which followed Nicola his presentation.

I don’t want to cover all aspects of Nicolas his presentation. Go and read the book, which I am going to do as well. But at one point during his presentation I strongly disagreed with him. Where he stated that only the mediocre scientists have an interest in bibliometrics and the top scientists normally don’t have an interest in this topic. My experience it quite the contrary. In the first place it was one of Wageningen’s top scientist who urged the library to take a subscription on Web of Science back in 2001, and made it possible with a special contribution from his top institute. He knew he was a highly cited scientist, but somehow he needed Web of Science to confirm his reputation. Later on as well, apart from the discussion with scholars in the social sciences department, it has always been those top performing groups that invited me to give a presentation on this subject rather than the groups that were lagging behind in the bibliometric performance indicators. To me it has always appeared that those who are leading the pack are also interested in staying ahead of the rest and invite the library to explain the results obtained and enhance their performance in the future.

The second observation in Nicola his presentation where he was far beyond practical where he insisted on the point that for a publication all citations to this publication should be retrieved from the three general databases (Web of Science, Scopus and Google Scholar) in the first place supplemented with citations from at least one citation enriched subject specific database. Well that’s a lot of work for single publication in the first place, leading to deduplication errors if you’re not very careful. Secondly it should be well know that Google Scholar, albeit attractive because of tools like Harzing’s Publish-or-Perish, is not a reliable database for citation counts at his moment (Jacso 2008). Google Scholar still has serious problems with ordinary counting and depuplication and should therefore not be used for serious citation analyses. The third argument against the use of multiple databases goes a bit further into the theory of bibliometrics and relies on approaches described by Waltman et al. (2011) and Leydesdorff et al. (2011). The key point is that a number of citations in itself has no meaning. It should be related to the citations of related documents in the same field of science. You can do that by normalizing on the mean citation rate in the field (Waltman et al. 2011) or by the perhaps more sophisticated approach sketched by Leydesdorff et al. (2011) based on the citation distributions in the fied to which the paper belongs. The latter approach is very novel, and has not really been widely tested yet. Both these approaches rely on the availability of the all the citations to the publications in a certain field of science of a certain age and document type. This can be expected that you have the availability of the means or citation distribution when you work with a specific database (for WoS there is plenty experience, with Scopus it is coming with SciVal Strata but for Google Scholar it doesn’t exist yet), but is beyond reality when you derive citation data from three or four databases at the same time.

But apart from these critical points I just made, I liked the presentation by De Bellis very much. For those interested in similar views on the citation practice I really recommend to read MacRoberts & MacRoberts (1996) as well.

The session closed with my presentation, which is enclosed here

Bibliometric analysis tools on top of the university’s bibliographic database, new roles and opportunities for library outreach

View more presentations from Wouter Gerritsma

After which the session ended with some discussion but soon all 30 or so participants hurried themselves to the coffee.

References

De Bellis, N. (2009). Bibliometrics and citation analysis : From the Science Citation Index to cybermetrics. ISBN 9780810867130, The Scarecrow Press, 450p. (download here)
Jacsó, P. (2008). The pros and cons of computing the h-index using Google Scholar. Online Information Review, 32 (3): 437-451 http://dx.doi.org/10.1108/14684520810889718 http://www.jacso.info/PDFs/jacso-pros-and-cons-of-computing-the-h-index.pdf
Leydesdorff, L., L. Bornmann, R. Mutz & T. Opthof (2011). Turning the tables on citation analysis one more time: Principles for comparing sets of documents. Journal of the American Society for Information Science and Technology n/a-n/a http://dx.doi.org/10.1002/asi.21534 http://arxiv.org/abs/1101.3863
MacRoberts, M. H. & B. R. MacRoberts (1996). Problems of citation analysis. Scientometrics, 36(3): 435-444 http://dx.doi.org/10.1007/BF02129604
Waltman, L., N. J. van Eck, T. N. van Leeuwen, M. S. Visser & A. F. J. van Raan (2011). Towards a new crown indicator: Some theoretical considerations. Journal of Informetrics, 5(1): 37-47. http://dx.doi.org/10.1016/j.joi.2010.08.001 http://arxiv.org/abs/1003.2167

The unofficial guide for authors

Recently I co-authored a book on scientific publishing. It is available from LuLu for less than € 6,-. When that’s too much for you, you can download it for free. The book is published under a CC-BY-NC licence.

From the cover:

Most scientific journals provide guidelines for authors - how to format references and prepare artwork, how many copies of the paper to submit and to which address. However, most official guidelines say little about how you should design and produce your paper and the chances that it will be accepted. This book provides a comprehensive but focused guide to producing scientific information - from research design to publication. It provides practical tips and answers to some of the most frequently asked questions: Why do we publish in the first place? What is OA publishing and why bother about it? What is the h-index? What is a Journal Impact Factor and does it matter? How can I increase my research production efficiency? Why should I use OS software tools for academic work? How can I produce graphics that will impress? How can I brainstorm good titles? How can I select a suitable journal and where can I find out more about it? How can I get into the reviewers’ heads?

Scimago rankings 2011 released

Today Félix de Moya Anegón announced on twitter  that the Scimago Institutional rankings (SIR) for 2011 were released. These rankings are not very well known or widely used. Yesterday during a ranking masterclass from the Dutch Association for Institutional Research the SIR was not even mentioned. Undeservedly so. Scimago lists just over 3000 institutions worldwide. It is therefore one of the most comprehensive institutional ranking. If not the most. It is also a very clear ranking they only measure publication output and impact. It thus ranks only research performance of the institutions and therefore very similar to the Leiden ranking.

What I like about Scimago, is their innovative indicators, they come up with each year. Last year they introduced the %Q1 parameter. Which is the ratio of publications that an institution publishes in the most influential scholarly journals of the world. Journals considered for this indicator are those ranked in  the first quartile (25%) in their categories as ordered by SCImago Journal Rank SJR indicator. This year they introduced the Excellence Rate. The Excellence Rate indicates which percentage of an institution’s scientific output is included into the set formed by the 10% of the most cited papers in their respective scientific fields. It is a measure of high quality output of research institutions. Very similar indicators, the excellence indicator is just a tougher version of the %Q1.

The other new indicator is the specialization index. The Specialization Index indicates the extent of thematic concentration / dispersion of an institution’s scientific output. Values range between 0 to 1, indicating generalistic vs. specialized institutions respectively.

Their most important indicator to express research performance is their Normalized Impact (NI). Which is similar to the MNCS of the CWTS and RI as we calculate in Wageningen. The values, expressed in percentages, show the relationship of an institution’s average scientific impact and the world average, which is 1, –i.e. a score of 0.8 means the institution is cited 20% below average and 1.3 means the institution is cited 30% above average.

Last year the the Scimago team showed already that there is exist an exponential relationship between the ability an institution has to lead its scientific papers to better journals (%Q1) and the average impact achieved by its production in terms of Normalized Impact. It is a relationship I always show in classes on publications strategy (slides 15 and 16). When looking at the Dutch universities, I noted that the correlation between the new excellence indicator and normalized impact is even better than with the %Q1. So the pressure to publish in the absolute top journal per research field will even further increase if this become general knowledge.

What do we learn for the Dutch universities from the Scimago rankings. Rotterdam still maintains its top position for normalized impact, it scores also best for the %Q1 and Exc. Direct after Rotterdam you Leiden, UvA, VU, Utrecht and Radboud with equal impact. Utrecht has published the most articles during the period 2005-2009. Wageningen excels at international cooperation. And both Tilburg and Wageningen are the most specialized universities in the Netherlands.

Making these international rankings is quite a daunting task. For the Netherlands I noticed that the output of Nijmegen was distributed over Radboud University and Radboud University and Nijmegen Medical Centre, this was not done for the other university hospitals.  And for Wageningen the output was noted under Wageningen University and Research Centre and Plant Research International (which is part of Wageningen UR). But for researchers from Spain these are difficult nuances to resolve 100% perfectly.

My only real complaint with the ranking is the fact that they state it is not a league table, and they rank the institutions on publication output. It is so much more obvious to present the list ranked on NI. Since they only produce the ranking as a PDF file, it took me a couple of hours to translate it into an excel spreadsheet and rank the data any way I wish. With all the information at hand it is also possible design your own indicators, such as a power rank in analogy of the Leiden rankings.

The message to my researchers: aim for the best journals in you field. We still have scope for improvement. We are still not in the neighbourhood of the 30 to 40% Exc. Rate we see for Rockkefeller, Harvard and the like.

How Google could help the Open Access world a little

It was back in 2008 when Google Scholar launched the feature that identified free available versions of articles of the Web. In the early days these were indicated by green triangles in front of the reference. Nowdays free available copies are listed in the right hand column. Many of these versions are Open Access versions of articles properly submitted to preprint servers and subject or institutional repositories. Other free versions of the papers identified by Google Scholar are publishers versions of articles posted to personal websites, dropboxes and you name it. Whatever the rights are, if you need a copy of these papers, and don’t have access through your universities library subscriptions, this Google Scholar feature is a very useful tool. In scholarly search classes I always stress this very useful feature of Google Scholar to my students.

In our institution’s bibliography I would love to include a functionality to refer for each article to the so called document clusters in Google Scholar. Consider the following publication the link to the full text included in the record leads you to Science Direct. Whether you can access the paper on SD, depends on the subscriptions. Sometimes you can’t. Therefore it would be nice if we could include a link to the document cluster in Google Scholar. For this paper you get some 29 versions of the paper, but above all 6 of these are free versions of this paper posted on various websites. That’s really helpful.

In AgrisWeb, I learned from Johannes Keizer yesterday, that they link to Google trough a search for the title words. This works quite well, but it could be done better.

Consider the idea that Google Scholar had an API. If we could query that API on the basis of the DOI or PMID or ISSN in combination with volume, issue and pages or any other combination of standard bibliographic metadata. Yes, something like an openURL. And GoogleScholar would only return the correct Google Scholar ID for that article -that number 12564475196117890153 in the link- we could construct various links. Linking to the Google Scholar document cluster is one. Retrieving the Google Scholar citations is another.

Google doesn’t like metadata too much is an often heard argument. But the Google Books API works swell with ISBN numbers, OCLC numbers or LOC numbers. That API is talking metadata. Libraries a massive stores of metadata. So Anurag Acharya please. The pleas for a Google Scholar API are abound. Mostly for retrieval of citations, but for the OA movement those document clusters are really more important! Perhaps you could launch this Google Scholar API as a present for the Open Access week coming up in October?

National Library of the Netherlands discloses its Google Books Contract

After the successful disclosure of the agreement between the British Library and Google Books on the basis of the Freedom of Information Act, the National Library of the Netherlands (KB) also disclosed their agreement with Google Ireland today. Albeit the director of the KB tweeted a day ago that not all public information needed to be available on the Web, it was decided to publish the agreement on the Web since there were two WOB (a Dutch version of FOIA) procedures underway to get insight in the agreement.

Albeit I am not a lawyer, a few thins caught my eye. The agreement is very similar to the agreement between Google and the British Library. Bert Zeeman pondered the idea of standard Google contracts in this respect. This seems to go for the exception of the number of volumes in the public domain that will be digitized, 250,000 in the UK and 160,000 in the Netherlands (clause 2.1).

What struck me as interesting was the use of the libraries digital copies, clause 4.8 “the library may provide all or any portion of the library digital copy… to (a) academic institutions or research or public libraries, ….” But we are not able to “providing search or hosting services substantially similar to those provided by Google, including but not limited to those services substantially similar to Google book search”. I guess that leaves out the other academic libraries in the Netherlands to include these digital copies in their discovery tools. It is tempting, but I see problems on the horizon. We seem to be left with separate information silos whereas integration with the rest of the collection would be really interesting. It becomes more explicit in clause 4.9 where it is stated that “nothing in this agreement restricts the library from allowing Europeana to crawl the standard metadata of the digital copies provided to library by Google.” We would be more interested in the data rather than the metadata.

But then again, it is up to the lawyers to see what’s allowed and what’s not. But then again, again, after fifteen years all restrictions on the use or distribution terminate (clause 4.7), a bit long according to the open rights group. However, we have experience with building academic library collections, it takes ages. Those fifteen years are over in the wink of a young girl’s eye.

Google better with Google

Or 14 super search tips for scientists and students. The following scholarly super search tips are an explanation for the enclosed slideshare presentation.

Google better with google

This slideshare presentation was posted a while back on WoW!ter’s slideshare, but has been updated to stay sync with this blogpost

The tips
1. Which Google do you want to use? We have a large international audience of users at our University, who normally are redirected to http://www.google.nl. However if you use http://www.google.com/ncr then you get the international version. But if you prefer your Indian version http://www.google.co.in/ncr works as well. With the /ncr you can control the regional version you are using easily.

2. Personalize your search experience. Nowadays found under the small cogwheel at the top right hand of the page or follow this link. The sections I always pay attention to is the filter option. Why should Google judge if something is fit for my eyes? Or not? I also advice to set the number of search results to 50 (but you can’t make use of Google instant search in that case) I used to use 100 results, but even I found that a wee bit too much. Lastly I always check the box to open the results in a new window (it actually opens a new tab, rather than a window), this keeps my search results window in tact whilst I browse some to the results I retrieved.

Some further personalisation would include to install the google toolbar in your browser, or even a step more in the personalization of the search experience is to make use of iGoogle.

3. There is more than 1 Google. Many people are only using the standard Google web search engine. But for academics, Google Scholar, Google book search, Google patents are certainly specific interfaces that should be part of the searchers trick of the trades.

4. Google universal. Nowadays, Google has realized that the many different search interfaces cause a problem for the users as well and therefore they have introduced the universal search engine results page with a lot of specific options on the left hand side of the results. However a suggestion to use Google Scholar is not included.

5. Learn from the advanced search interface. All Google search interfaces have an advanced search option. Use these options to see what the possibilities of the specific search interface are, and learn how you can make use of these advanced search operators in the normal search interface. When you make use of the advanced search options in Google Scholar you see an option to search for a specific author which translates in the Scholar search box as [nitrogen fixation author:”K E Giller”]

6. Be specific or search with more than 1 term In the Dutch language we can often get away with searching for a single word, because we are allowed to make incredibly long compound words such as “wapenstilstandsonderhandelingen”. When you’re searching for scientific information you better stick to English as language . In English can’t make compound words. This is a small language difference which necessitates searching with more terms. But apart from the language difference, when you search with more terms, searches become more specific and the results more relevant. In the current example a search for water only, results in more than 700 million results, whereas [Water management technology assessment] results in nearly 8 million results.
Interestingly, when you look at the results in the slides, you’ll notice that total results numbers in Google are unreliable to say the least. In the step from 2 to 3 search terms the result sets increases again.
The fifth example in the slide is an introduction to the next slide. You can be even more precise when searching.

7. Keep words together. Make us of “phrase searches”. A phrase search is a search which returns the words in exactly the specified order. Of course Google already ranks the results with the phrases of search terms at the very top of the search engine results page. This technique also reduces the sheer number of possible results. Compare for instance [“water management”] with [water management]. You can combine as many phrases as you like (see the previous slide), or make them really long (the latter is also used in plagiarism checks).

8. Search for title words. When you feel overwhelmed by the number of results a good solution is to limit your search to title words rather than anywhere on a page. You can search for single title words with the operator, or all of your search words with the operator. These operators are the same when you compare [intitle:”water management”] with [allintitle:water management]

9. Search for information in PDF files. Most scientific information is published on the web in the format of PDF files. Be it as a scientific report or a scholarly article e.g. [Agaricus bisporus ext:pdf]. A couple of years ago this was an extremely efficient way to look for scholarly information on the Web. However, since it has become very easy to produce your own PDF files, this technique has suffered some of its effectiveness, but it still works wonders. Especially in combination with the other tips.

10. Search for results from a specific domain. In some cases it is useful to restrict you results to a certain website or domain. This is certainly true for sites that don’t have good site search options e.g. [EndNote site:library.wur.nl]. You can also limit the results to the academic institutions of the USA [“water management” site:.edu].

11. Search for number ranges. Apart from the fact that Google is a powerful calculator, you can also search for number ranges. This comes in handy when you want to limit your search to results from certain publication years, e.g. [“publication strategy” 2009…2011]. Note that three dots is different (better) than the standard used two dots.

12. Exclude specific terms with the – operator. You can narrow your searches using this operator. You can exclude as many words as you want by using the - sign in front of all of them, for example [mercury -ford -freddy -outboards -planets].

13. Search with OR. In some occasions it the intelligence of Google doesn’t include obvious synonyms. With the OR operator you can combine search terms e.g. [“carbon dioxide” OR CO2]. Notice that OR should be typed with capitals.

14. Combine. Having seen some of the options of the Google search engine you should realize that you can combine most of these operators. In this way you can make very precise searches [“publication strategy” citations 2009…2011 ext:pdf]

The Impact Factor of Open Access journals

In the world of Open Access publishing the golden road has received a great deal of attention. At least this is what our researchers seem to remember. Of course there are other roads to open access, but I want to present the impact factors of the journals facilitating the golden road to open access. This blogpost lists all open access journals included in DOAJ and assigned an Journal Impact Factor in the JCR 2009. The reason for this, is that our researchers see publishing in open access journals as the simplest way of achieving open access to their work, but on the other hand they are required for judgement of the citation impact that they publish in journals covered by Web of Science and therefore the Journal Citation Reports (JCR).

In the past there have been studies on citation impact of the open access journals that have actually received a journal impact factor from Thomson Reuters Scientific (formerly ISI). The first was by (McVeigh 2004) followed by (Vanouplines and Beullens 2008) (in Dutch, and not openly accessible) and recently by (Giglia 2010). These consecutive studies showed an increasing number of open access journals that received an Journal Impact Factor from Thomson Reuters. McVeigh reported 239 OA journals for the JCR 2004, Vanouplines reported 295 OA journals for the JCR 2005 and Giglia reported 385 OA journals for the JCR 2008 (there are some methodological issues that make these figures not entirely comparable).

The pitfall of these studies is that although they showed interesting figures and additional analyses, none of these studies actually published the list of open access journals that received an impact factor. The sole purpose of this blogpost is to publish this actual list. The probable reason for the previous authors is that the impact factors are proprietary information from Thomson Reuters. You are not allowed to publish these figures. On the other hand most publishers, use it in all their marketing outings for their journals. So the journal impact factor is virtually information in the public domain.

To avoid any intellectual property problems with Thomson Reuters I have included the ScimagoJR and Scopus SNIP indicator for the journals rather than the Journal Impact Factor. The correlation for this set of journals between SNIP and IF was 0.94 and between SJR and IF was 0.96. In total 619 journals from DOAJ were present in the JCR 2009 report (Science and Social Science & Humanities version deduplicated). The growth in journal coverage is due to the growth in OA journals and the significant expansion of journal coverage in 2008. On the other hand looking at the journal list of Scopus indexed journals I note that they include some 1365 journals open access journal which have a ScimagoJR or SNIP.

For the current table I matched the journal list from DOAJ downloaded on December 13th 2010, with the deduplicated list of the JCR 2009 indexed journals. This journal set of 619 journals was matched against the journal list from journalmetrics.com to include the ScimagoJR 2009 and SNIP2009 as well. For each journal the subject categories indicated by DOAJ were included. The journals were sorted alphabetically on subjects and descending IF within a subject. For the following table journals with multiple subject assignments in DOAJ were included in their different categories as well. This expanded the list to 782 lines. Finally the column with impact factors was removed, showing only the ScimagoJR and SNIP for the journals. A few journals were not assigned a ScimagoJR or SNIP, but these were assigned a Journal Impact Factor. In some cases this was due to differences in journal coverage between Scopus and Web of Science, but in a few cases this appears also the problem of different ISSN assignments by the respective databases.

Download: List of open access journals that are assigned an Impact Factor in the JCR 2009 showing their respective SNIP and ScimagoJR for 2009.

Have fun with this list

References

Giglia, E. (2010). The Impact Factor of Open Access journals: data and trends. ELPUB 2010 International Conference on Electronic Publishing, Helsinki (Finland), 16-18 June 2010. http://dhanken.shh.fi/dspace/bitstream/10227/599/72/2giglia.pdf and http://hdl.handle.net/10760/14666.

McVeigh, M.E. (2004). Open Access Journals in the ISI Citation Databases: Analysis of Impact Factors and Citation Patterns A citation study from Thomson Scientific, Thomson Scientific. http://science.thomsonreuters.com/m/pdfs/openaccesscitations2.pdf

Vanouplines, P. & R. Beullens (2008). De impact van open access tijdschriften. IK Intelectueel Kapitaal 7(5): 14-17. (In Dutch, Not OA available)

Possibly related posts
Another expansion of journal coverage by Thomson