A census of Open Access repositories in the Netherlands

Open Access receives a lot of attention in the Netherlands. All universities have formulated OA policies explicitly, signed the Berlin OA declaration. Erasmus University Rotterdam Stipulcated a mandated OA policy for its researchers. All Dutch universities have repositories in place and there is an overarching repository, narcis.nl, which harvest the repositories of all universities and major research institutions. The UNESCO Global Open Access Portal (GOAP) reported last year “Netherlands has a strong OA awareness and an active promotion of open access through institutional mandates, establishment of OA repositories, OA publishing agreements. SURFfoundation, a Dutch programme for information and communication technology innovation focuses on Open Access and it is the Dutch partner in Knowledge Exchange along with DFG (Germany), DEFF (Denmark) and JISC (UK)”. In 2011 some milestones were celebrated, the 250,000 publication was harvested by Narcis, and Wageningen UR deposited its 30,000th publication in Narcis by which it became the largest depositing institution in Narcis .

Despite some early assessments (van Westrienen & Lynch, 2005) no recent analyses on the actual deposit rates by Dutch universities have been made. Let alone a systematic analysis of trends in depositing rates. In this blogpost I want to give a status update of deposits in Open Access repositories in the Netherlands, concentrating on the regular Dutch universities. I hope to follow this up next year to give insight into actual deposit rates.

Data collection
Narcis was used as overarching repository for all OA publications from the Netherlands. Narcis facilitates to estimate deposits per institution, document type and publication year in a uniform and efficient way for 27 repositories in the Netterlands. Data was collected from Narcis in the period December 27th 2011 to January 2nd 2012, during that week no additional deposits to Narcis were made. The total number of deposits in Narcis during that week was 270,519 items, and did not change during the period while retrieving the data.

Results
As mentioned under data collection an impressive number of 270,519 deposits have been harvested by Narcis from the 27 OA repositories in the Netherlands. In the following graph the distribution of total deposits over the 27 repositories in the Netherlands is shown.
Total deposits in Narcis 2011
The smallest repository is the Theological University of Kampen with only 4 deposits and the largest Wageningen University with 30,704 deposits. The 13 regular universities in the Netherlands have the largest repositories as measured in Narcis. NWO with 10,179 deposited items is the largest repository of the group of none universities (this group includes the Open University). The NWO repository is just a fraction smaller than the repository of Radboud university Nijmegen. Also indicated in the graph is the recency of the deposits. The share of deposits from recent (since 2006) publication years is indicated in red, whereas the blue part of the bars represents the deposits from the older (pre 2006) publication years. Of the regular universities Wageningen UR and the VU university have the largest share recent deposits, whereas TU Eindhoven and Tilburg University have the largest share of older publications.

The next graph looks into more detail in the deposits of the most recent publication years of the 13 Dutch universities. The deposits per publication year for the period 2006-2011 are depicted. In all cases deposits from the publication year 2011 trailed behind, which doesn’t come as a surprise. In a few cases however I observe clear negative trends in the number of deposits made during the period 2006-2011. This is clearly the case for the universities of Groningen, Leiden, Maastricht and Utrecht.
OA deposits in narcis by publication year 2006-2011
The trend in deposits per publication year is more or less stable in Nijmegen and Twente. For the universities of Rotterdam, Delft, Eindhoven, University of Amsterdam, Tilburg, VU Amsterdam and Wageningen UR an increasing trend in deposits is observed. The VU Amsterdam shows a clear outlier in number of deposits for publication year 2009. About half of the universities have more than 1000 deposits per publication year. Rotterdam, Nijmegen, Eindhoven, Leiden, Maastricht and Tilburg are lagging behind in this respect. Wageningen UR has more than double the number of deposits per publication year compared to any other university.

Yearly trends SI
By far most of the smaller institutions have less than 100 open access deposits per publication year. NWO, NIVEL, KNAW and the Open University have on average between the 100 and 300 open access deposits per publication year. It is interesting to note that the deposits for publication year 2011 are more in line with the preceding publication years than for the general universities. An indication that it appears easier to manage the publication output for smaller institutions.

In the next graph I actually looked to the document type breakdown of deposits for the period 2006-2011 for the regular universities. In the first place it should be noted that there exists a large range of document types in Narcis. Some of these document types seem superfluous. The difference between Student thesis and Master thesis is entirely unclear, and technical documentation versus reports is another example. Narcis should look into this matter and some universities should clean up their document formats as well. Having said that, most universities have three major types of open access publications: articles, reports and PhD theses.
OA desposits Pub type
The VU university excels at OA article deposits over the last six years, followed by Groningen and Utrecht. Wageningen UR excels at depositing reports, followed at quite some distance by TU Eindhoven and the UvA. For the PhD theses, Utrecht has the lead, followed by the VU and Delft. OA PhD theses are an important source of material since they consists in most cases of a chapters which are preprints of articles to be published at a later date. Erasmus University Rotterdam, Maastricht and Tilburg are the universities with the largest share of working papers. Wageningen UR has a very large share of contributions to periodicals. This is a group of publications that have hardly any deposits at other universities. Looking at the overall picture Wageningen UR clearly stands out as a results of the large share of reports and contributions to periodicals. On top of that they have the largest share of conference papers as well. It can easily be argued that Wageningen UR, of all repositories in the Netherlands excels at disseminating grey literature by means of their open access repository Wageningen Yield.

At this moment there aren’t comparative repository usage statistics in the Netherlands, but the early trial results indicate that repositories with more recent content also get more article downloads. To draw firm conclusions on the trial implementation of SURE2 is a bit too early.

The share of OA in NL
The absolute numbers of OA deposits themselves are not so meaningful as long as they are not related to the actual scientific output of the institutions. Although we have the current set of figures on OA deposits as measured through Narcis in the Netherlands, the share of OA in total institutional output is a difficult figure to establish. A few institutions deposit metadata records of all their publications to Narcis, but other institutions limit themselves to OA deposits only. Whereas a third group deposits only a subset of all their publications metadata to Narcis. To arrive at figures for the full publication output we have to consult other sources. The VSNU would be an obvious source, but the disadvantage of these figures is that they are based on reporting years rather than publication years (a rather odd approach). A point in case are the PhD theses output reported by the VSNU compared to the OA theses reported in Narcis over the period 2006-2010 in the following table.

University

VSNU

OA (narcis)

coverage

    Erasmus University Rotterdam

1524

993

65%

    RU Nijmegen

2266

1992

88%

    RU Groningen

1690

1082

64%

    TU Delft

1319

1079

82%

    TU Eindhoven

900

776

86%

    University Leiden

1791

919

51%

    University Maastricht

1367

1542

113%

    University Twente

1321

1077

82%

    University Utrecht

455

333

73%

    University van Amsterdam

1276

1297

102%

    University van Tilburg

896

790

88%

    Vrije University Amsterdam

878

772

88%

    Wageningen UR

1075

1032

96%

At Maastricht University and UvA there were actually more theses deposited in NARCIS over the period 2006-2010 than reported to the VSNU. For actual years the fluctuations can be quite extensive, but over a period of consecutive years the fluctuations become smaller. Apparently all theses defended at Maastricht and the UvA are available in OA. Wageningen follows closely with 96%, whereas Radboud University Nijmegen, TU Delft, TU Eindhoven, Twente University, Tilburg University and VU Amsterdam follow with percentages of OA PhD theses in the 80%. Erasmus University, RU Groningen University of Leiden and Utrecht University are lagging behind in depositing their PhD theses in OA.

Coverage of OA article ouput
For an actual estimate of articles produced per institution multiple sources exist. The VSNU figures based on reporting years are useless in this respect. The databases Scopus or Web of Science (WoS) could be used to estimate the actual article output per university, but to disambiguate all the name variations of the universities (and their institutes or hospitals) is a cumbersome task. In this respect Scopus actually performs better than WoS. However other sources based on either WoS or Scopus have already carried out this disambiguation. The reports by CWTS for example are useful in this matter. The most recentWTI2 report (Jager et al. 2011) (the successor of the NOWT reports) gives figures for the publication output of Dutch universities for the period 2007-2010 (table 30, p. 48) that have been disambiguated by CWTS. These figures are derived from Web of Science and underestimate the actual peer reviewed article output. For a life sciences university as Wageningen UR some 70% of the actual article output is published in journals covered by WoS and included in the TWI2 report. For broad, general universities with more social sciences and humanities this percentage is expected to be lower. For Tilburg this figures appears to be only 30%, whereas for Nijmegen this seems to be 51% and for TU Eindhoven 67%.

In table 2 the total number of articles for the period 2007-2010 reported in Narcis, the total number of articles according to CWTS (TWI2 report, Jager et al. (2011)) and the actual OA articles reported in Narcis are presented. The percentage OA coverage is calculated in two ways. In the first place we look at the %OA(CWTS) by comparing the OA articles in Narics to the articles reported by CWTS. In the second place we look at the total number of articles reported in Narcis compared to the OA articles reported in Narcis. In the third percentage column we look the minimum value of both methods. The last column is probably the best estimate of %OA coverage per institution.

Table 2, total articles per university for the period 2007-2010 reported in NARCIS and TWI2 and %OA coverage based on comparison with CWTS figures and total articles registered in Narcis

University

Articles

In Narics

Articles by

CWTS

OA

articles

%OA

(CWTS)

%OA

(Narcis)

Minimum

%OA coverage

    Erasmus University Rotterdam

1072

10663

1072

10%

100%

10%

    Radboud University Nijmegen

19803

10126

1189

12%

6%

6%

    RU Groningen

4067

10461

4067

39%

100%

39%

    TU Delft

2150

6521

2145

33%

100%

33%

    TU Eindhoven

7041

4732

520

11%

7%

7%

    University Leiden

730

10616

730

7%

100%

7%

    University Maastricht

519

7086

482

7%

93%

7%

    University Twente

3665

3740

880

24%

24%

24%

    University Utrecht

4803

15243

3039

20%

63%

20%

    University van Amsterdam

16191

13030

2727

21%

17%

17%

    University van Tilburg

5791

1782

1285

72%

22%

22%

    VU Amsterdam

5354

10912

4410

40%

82%

40%

    Wageningen UR

10572

7419

2479

33%

23%

23%

    Aggregate

81758

112331

25025

22%

31%

22%

Comparing the OA articles in NARCIS for the period 2007-2010 with the figures from CWTS report results in a very favourable figure of 72% of the articles available in OA at Tilburg university. This favourable figure is largely due to the under estimation of Tilburg University article output based on articles covered in WoS journals only. VU Amsterdam is the next highest (40%) %OA articles based on the CWTS figures, followed closely by Groningen (39%). The aggregate figure for all universities in the Netherlands is 22% of the articles are OA based on WoS estimates of article output. Since WoS under estimates the actual article output it is useful to look at the total number of articles in Narcis as well.

Compared to the self deposited articles in Narcis, Erasmus University Rotterdam, RU Groningen, TU Delft and Leiden University only deposit OA articles in Narcis whereas the other universities also deposit metadata for none OA articles. However, coverage of this share of publications varies among universities. Radboud University Nijmegen and TU Eindhoven for instance, who score already low on the %OA articles based on the CWTS figures, score even lower considering their self reported article output in Narcis. In those instances where the %OA(Narcis) is higher than the %OA(CWTS) there is an underestimation of the actual article output registration of metadata deposited in Narcis.

The minimum %OA coverage of reported in the third percentage column is the best estimate for OA coverage for universities in the Netherlands based on OA articles reported in Narcis. VU Amsterdam, RU Groningen and TU Delft are the most successful in making their article output available in OA. The reported coverage lies clearly above the 20% of OA reported for most institutions without mandated OA policies (Harnad, 2009) Twente University, Utrecht University, Tilburg University, Wageningen UR and UvA are performing around the average of 22%, this percentage is in line with the figure of %OA for universities without mandated OA policies. Whereas Erasmus Rotterdam, RU Nijmegen, TU Eindhoven, Leiden university and Maastricht university are under performing in this respect. It remains a question whether OA article numbers reported by Narcis are actually correct, or wether in the case of Radboud and TU Eindhoven, the total article output reported in Narcis are correct. It is possible that the document types actually include more than only peer reviewed scholarly articles.

Despite having signed the Berlin OA declaration by all Dutch universities, this has resulted only in a few universities with substantial higher shares of OA peer reviewed articles than is to be expected on the basis of a “normal” publication output which results in about 20% articles published in OA. For the universities where I arrive at even lower %OA articles we have to wonder whether Narcis actually harvest and reports all the universities output.

Another valuable approach is to concentrate on the grey literature are Wageningen UR does. But for this type of documents it is even more difficult to arrive at a share of OA coverage. This can only be established by the institutions themselves since it can be doubted whether all institutions have their output registration complete.

Lessons to be learned

  • Narcis could and should improve the type reporting as performed in this report. They should produce overviews like this preferable twice a year.
  • Narcis should look into some of the obsolete document types to reduce the wild array of documents (are technical documentation different from reports?, student theses and master theses are probably not the type of research output to be registered in Narcis)
  • Institution should look at the document types deposited in Narcis as well.
  • The role of Narcis and the importance of OA could be improved if VSNU and Narcis (KNAW) make Narcis the standard reporting tool for research output registration in the Netherlands (The VSNU should abandon the ridiculous reporting years and use the publication years in their reports instead)
  • Universities should use metis (or a comparable CRIS) to upload all the metadata of the institutional output to Narcis.
  • Having comprehensive output registration, makes the minimum goal of at least 20% in OA better attainable since you are not depended on actual article submission by the authors, but based on Sherpa/Romeo and DOAJ OA versions can be chased down.
  • Mandates such as those in Rotterdam, announced at the beginning of 2011, have no effect whatsoever if there is no actual stick behind the policy

References
Harnad, S. (2009) Waking OA’s Slumbering Giant: Why Locus-of-Deposit Matters for Open Access and Open Access Mandates. http://openaccess.eprints.org/index.php?/archives/522-Waking-OAs-Slumbering-Giant-Why-Locus-of-Deposit-Matters-for-Open-Access-and-Open-Access-Mandates.html
Jager, C.-J., J. Veldkamp, D. Aksnes, R. te Velde & P. den Hertog (2011). Wetenschaps-, Technologie & Innovatie Indicatoren 2011. Utrecht, Dialogic innovatie ● interactie http://www.rijksoverheid.nl/documenten-en-publicaties/rapporten/2011/11/15/wetenschaps-technologie-innovatie-indicatoren-2011.html.
Westrienen, G. van & C. A. Lynch (2005). Academic Institutional Repositories: Deployment Status in 13 Nations as of Mid 2005 D-Lib magazine, 11(9) http://www.dlib.org/dlib/september05/westrienen/09westrienen.html

Testing Science Seeker

This post is only compiled to be included in the Science Seeker aggregator
sciseekclaimtoken-4edbd224a1812

How Google Scholar Citations passes the competition left and right

Google Scholar logoLast Thursday Google Scholar Citations went public. It was to be expected. Since August the product has been tested by a few (blogging) scientists. We only had to wait patiently for it to be released to all scientists. Last Thursday the moment was there.

Was it worth the wait? Yes it certainly was. Google Scholar Citations really excels at finding publications you completely forgot about. But even then, there are still –obscure- publications that even Google Scholar doesn’t know about. You simply log in and deselect those few publications that don’t belong to you. You can make searches to find publications that Google has overlooked. You get a comprehensive publication list quite quickly. Well when your name is not too common, that is. How it works for very common names, Korean scientists jump to my mind as well as John Smith, I don’t know yet. But so far nothing new, Ann-Will Harzing’s excellent Publish or Perish software already did this. What is new is the fact that Google Scholar Citations keeps the citations and publications automatically up to data and allows you to publish your own publication list on the Web with the citations and some crude citations metrics.

The two major competitors in this arena are Thomson Reuters with their ResearcherID and Elsevier’s Scopus which has their Scopus ID. With both services you can identify your own publications and assign them to a unique number. IN this way you can create your unique publications list with citation metrics as well. The main disadvantage compared to Google Scholar is their rather limited resource set. Thomson Reuters WoS “only” covers some 10,000 scholarly journals a set of selected proceedings and of recent only 30,000 books. Scopus has nearly double the number of journals but stays behind in proceedings and covers hardly any books. Google Scholar certainly covers more, but we still don’t understand what is included and what not and sometimes have our doubts about currentness of Google Scholar. The larger resource base, including books and book chapters, of Google Scholar makes will make this service more attractive for social scientist and scholars in arts and humanities studies.

On top of the smaller publication base on which these services are based, these two competitors each have their own particular disadvantage as well. You have to maintain you publications list in Thomson Reuters Researcher ID yourself manually. Each time you publish a new article, you have to add it to your profile yourself. Looking around, I see that most researchers are a bit sloppy in this respect. You can however, make your publication list and the citation impact publically available. see for example my meagre list. Scopus on the other hand, maintains your publication list automatically (albeit it made some serious mistakes in this area in the past, but they seem to have improved this service). But, and this is a big but, you can’t publish you properly curated publication list with citations publically on the Web. They used to have 2Collab for this, but since they stopped 2Collab they haven’t come up with an alternative mechanism to publish your publications list with citation impact on a public website. A real pity.

So Google Scholar easily beats ResearcherID since it updates automatically and Scopus ID because you can make your list with citations publically available. To make your publication list openly available is really recommended to all scientists, it helps your personal branding.

Certainly there are disadvantages to Google Scholar aswell. The most serious at this moment all kind of ghost citations. If you look at the citations to our bibliometrics analysis on top of repositories paper, Google counts three citations. But checking the Leydesdorff citations, a reference to our article is not to be found (of course it should have been there, but it isn’t). 0xDE reported a spam account in the name of Peter Taylor, where they collected various Taylors in a single profile boasting an h-index of 94. That Google Scholar can be fooled has been reported Beel & Grip (2010).

When I was interviewed for our university paper on Google Scholar Citations (in Dutch) I told them: Google Scholar is only about five years old. Give them another five years and they will have changed the market for abstracting and indexing database totally. If only 20 percent of all scientists make their publication lists correct (also editing of the references which can be done to improve the mistakes Google has made) even without making them publically available, Google sits on a treasure trove of high quality metadata. Really interesting to see how this story will develop.

Reference:
Joeran Beel and Bela Gipp. Academic search engine spam and google scholar’s resilience against it. Journal of Electronic Publishing, 13(3), December 2010.

Journals changing publisher, but can the rights change as well?

Journal cover Animal ConservationFrom a perspective as a repository manager I do like the Cambridge University Press journals a lot. Albeit no immediate OA, after a year the author is allowed to post the publisher’s version/PDF of his article in an institutional repository. We adhere to this policy on behalf of our authors. So we post all the publications in Cambridge journal articles to our repository after this 12 month embargo period. Sounds simple and it actually is that simple.

Recently I ran a check on this policy using the DOI’s, rather than the ISSN’s we normally use, on the metadata we collect of all our researcher’s publications. The DOI string of Cambridge journal articles all start with the prefix 10.1017. I came across an article published in the journal Animal Conservation from 2004 which was not OA on our repository. Further checking this article I found out that the DOI of the article resolved to the Wiley Online library, where the article came online only in 2006, instead of the Cambridge website. Rather odd. Checking the copyright and archiving policy of this journal at the Sherpa Romeo site, they referred to the rather limited Wiley copyrights and self archiving possibilities for this journal. Sherpa Romeo implies that this is applicable for all content of this journal. I was rather disappointed.

However, that Cambridge DOI bothered me, so I checked the Cambridge site for the journal and could find the article there as well. The DOI however, resolves to the Wiley online journals site. Clearly the journal changed from publisher, that happens all the time. But on changing from publisher it appears that the authors’ copyrights changed as well. Especially since the Wiley site also hosts the complete backfile going back to Volume 1, issue 1. That the authors self archiving rights changed on change of publisher for the journal can’t be the case because they’re based on the original publishing agreement, but the Wiley site and Sherpa Romeo do imply that the Wiley copyright and self archiving policies apply to all content of the journal. That can’t be true, can it? But here I have an article hosted at two publishers websites with two very different self archiving policies.

Of course we adhere to the Cambridge self archiving policy for this article. There is therefore now a third copy copy of this article available on the Web, proudly presented in Wageningen Yield.

These are strange ways of publishers and copyrights.

Some observations during the bibliometrics session at the Österreichische Bibliothekartag

Albeit the program consistently talks about the Österreichische Bibliothekartag (singular) the whole library day spans actually 4 days. One would have expected at least the Österreichische Bibliothekartaggen (plural) but they insist in mentioning only one day. Of those four days, I was only present during part of the morning of the third day, so this is a very limited report on the Österreichische Bibliothekartag. Looking at their program, it is a very comprehensive and interesting program. Never thought that you could cover a complete session, 5 presentations, talking about cooking books (No pun intended). It only reflects that bibliometrics was only a small part of the program amongst many other subjects covered. I noticed a lot of presentations on e-book platforms, many digitization projects, plenty of mobile less of library 2.0 than you would expect (is the hype over?) and open access had also a very limited role. What struck me as interesting for conference organizers, is that many commercial presentation were programmed equally throughout the sessions. Just a sign of taking the sponsors seriously.

So far on the conference as a whole, of which I actually experienced too little. On to the bibliometrics sessions. The session was chaired by Juan Gorraiz, a bubbly Spaniard working already for years in Austria. Give him the opportunity and he will take the floor and would love to take all the time available and fill the slots for all presentations planned.

The first presentation was on a piece of research that should result in a masters thesis at some point, but some preliminary results were presented in this session by Christian Gumpenberger. The focus of the research was on the acceptance and familiarity of Austrian researchers with bibliometrics. The results were not really shocking, most researchers stated that they were familiar with impact factors, but for the moment there was no clue as to whether they were aware about a thing like a two year citation window. Or the difference between citable items and non-citable items leading to the inflation of impact factors for journals like Nature and Science. Christian sketched some sunny skies for bibliometrics in Austria, but in the subsequent discussion part this sunny view was criticized quite a bit. Notwithstanding I would like to have a look at this MS thesis when it becomes available.

The second presentation was from Italian origin by Nicola de Bellis. Nicola has written an interesting book on citation analysis in which he stresses the sociological, philosophical and historical aspects of bibliometric analyses. It is always interesting to hear a presentation like this, away from the fact finding number crunching approach which I normally have and dream a bit away on outlines of what in an ideal world should be done on a subject like this. Quite a lot, but some of it is beyond being practical. When you carry out bibliometric analyses in the library at some scale, like dealing with 18,000 papers that have collected 265,000 citations like we do in our library, you can only be practical. So there is an interesting conflict between his presentation (which will be on-line soon, I hope) and mine which followed Nicola his presentation.

I don’t want to cover all aspects of Nicolas his presentation. Go and read the book, which I am going to do as well. But at one point during his presentation I strongly disagreed with him. Where he stated that only the mediocre scientists have an interest in bibliometrics and the top scientists normally don’t have an interest in this topic. My experience it quite the contrary. In the first place it was one of Wageningen’s top scientist who urged the library to take a subscription on Web of Science back in 2001, and made it possible with a special contribution from his top institute. He knew he was a highly cited scientist, but somehow he needed Web of Science to confirm his reputation. Later on as well, apart from the discussion with scholars in the social sciences department, it has always been those top performing groups that invited me to give a presentation on this subject rather than the groups that were lagging behind in the bibliometric performance indicators. To me it has always appeared that those who are leading the pack are also interested in staying ahead of the rest and invite the library to explain the results obtained and enhance their performance in the future.

The second observation in Nicola his presentation where he was far beyond practical where he insisted on the point that for a publication all citations to this publication should be retrieved from the three general databases (Web of Science, Scopus and Google Scholar) in the first place supplemented with citations from at least one citation enriched subject specific database. Well that’s a lot of work for single publication in the first place, leading to deduplication errors if you’re not very careful. Secondly it should be well know that Google Scholar, albeit attractive because of tools like Harzing’s Publish-or-Perish, is not a reliable database for citation counts at his moment (Jacso 2008). Google Scholar still has serious problems with ordinary counting and depuplication and should therefore not be used for serious citation analyses. The third argument against the use of multiple databases goes a bit further into the theory of bibliometrics and relies on approaches described by Waltman et al. (2011) and Leydesdorff et al. (2011). The key point is that a number of citations in itself has no meaning. It should be related to the citations of related documents in the same field of science. You can do that by normalizing on the mean citation rate in the field (Waltman et al. 2011) or by the perhaps more sophisticated approach sketched by Leydesdorff et al. (2011) based on the citation distributions in the fied to which the paper belongs. The latter approach is very novel, and has not really been widely tested yet. Both these approaches rely on the availability of the all the citations to the publications in a certain field of science of a certain age and document type. This can be expected that you have the availability of the means or citation distribution when you work with a specific database (for WoS there is plenty experience, with Scopus it is coming with SciVal Strata but for Google Scholar it doesn’t exist yet), but is beyond reality when you derive citation data from three or four databases at the same time.

But apart from these critical points I just made, I liked the presentation by De Bellis very much. For those interested in similar views on the citation practice I really recommend to read MacRoberts & MacRoberts (1996) as well.

The session closed with my presentation, which is enclosed here

Bibliometric analysis tools on top of the university’s bibliographic database, new roles and opportunities for library outreach

View more presentations from Wouter Gerritsma

After which the session ended with some discussion but soon all 30 or so participants hurried themselves to the coffee.

References

De Bellis, N. (2009). Bibliometrics and citation analysis : From the Science Citation Index to cybermetrics. ISBN 9780810867130, The Scarecrow Press, 450p. (download here)
Jacsó, P. (2008). The pros and cons of computing the h-index using Google Scholar. Online Information Review, 32 (3): 437-451 http://dx.doi.org/10.1108/14684520810889718 http://www.jacso.info/PDFs/jacso-pros-and-cons-of-computing-the-h-index.pdf
Leydesdorff, L., L. Bornmann, R. Mutz & T. Opthof (2011). Turning the tables on citation analysis one more time: Principles for comparing sets of documents. Journal of the American Society for Information Science and Technology n/a-n/a http://dx.doi.org/10.1002/asi.21534 http://arxiv.org/abs/1101.3863
MacRoberts, M. H. & B. R. MacRoberts (1996). Problems of citation analysis. Scientometrics, 36(3): 435-444 http://dx.doi.org/10.1007/BF02129604
Waltman, L., N. J. van Eck, T. N. van Leeuwen, M. S. Visser & A. F. J. van Raan (2011). Towards a new crown indicator: Some theoretical considerations. Journal of Informetrics, 5(1): 37-47. http://dx.doi.org/10.1016/j.joi.2010.08.001 http://arxiv.org/abs/1003.2167

The unofficial guide for authors

Recently I co-authored a book on scientific publishing. It is available from LuLu for less than € 6,-. When that’s too much for you, you can download it for free. The book is published under a CC-BY-NC licence.

From the cover:

Most scientific journals provide guidelines for authors - how to format references and prepare artwork, how many copies of the paper to submit and to which address. However, most official guidelines say little about how you should design and produce your paper and the chances that it will be accepted. This book provides a comprehensive but focused guide to producing scientific information - from research design to publication. It provides practical tips and answers to some of the most frequently asked questions: Why do we publish in the first place? What is OA publishing and why bother about it? What is the h-index? What is a Journal Impact Factor and does it matter? How can I increase my research production efficiency? Why should I use OS software tools for academic work? How can I produce graphics that will impress? How can I brainstorm good titles? How can I select a suitable journal and where can I find out more about it? How can I get into the reviewers’ heads?

Scimago rankings 2011 released

Today Félix de Moya Anegón announced on twitter  that the Scimago Institutional rankings (SIR) for 2011 were released. These rankings are not very well known or widely used. Yesterday during a ranking masterclass from the Dutch Association for Institutional Research the SIR was not even mentioned. Undeservedly so. Scimago lists just over 3000 institutions worldwide. It is therefore one of the most comprehensive institutional ranking. If not the most. It is also a very clear ranking they only measure publication output and impact. It thus ranks only research performance of the institutions and therefore very similar to the Leiden ranking.

What I like about Scimago, is their innovative indicators, they come up with each year. Last year they introduced the %Q1 parameter. Which is the ratio of publications that an institution publishes in the most influential scholarly journals of the world. Journals considered for this indicator are those ranked in  the first quartile (25%) in their categories as ordered by SCImago Journal Rank SJR indicator. This year they introduced the Excellence Rate. The Excellence Rate indicates which percentage of an institution’s scientific output is included into the set formed by the 10% of the most cited papers in their respective scientific fields. It is a measure of high quality output of research institutions. Very similar indicators, the excellence indicator is just a tougher version of the %Q1.

The other new indicator is the specialization index. The Specialization Index indicates the extent of thematic concentration / dispersion of an institution’s scientific output. Values range between 0 to 1, indicating generalistic vs. specialized institutions respectively.

Their most important indicator to express research performance is their Normalized Impact (NI). Which is similar to the MNCS of the CWTS and RI as we calculate in Wageningen. The values, expressed in percentages, show the relationship of an institution’s average scientific impact and the world average, which is 1, –i.e. a score of 0.8 means the institution is cited 20% below average and 1.3 means the institution is cited 30% above average.

Last year the the Scimago team showed already that there is exist an exponential relationship between the ability an institution has to lead its scientific papers to better journals (%Q1) and the average impact achieved by its production in terms of Normalized Impact. It is a relationship I always show in classes on publications strategy (slides 15 and 16). When looking at the Dutch universities, I noted that the correlation between the new excellence indicator and normalized impact is even better than with the %Q1. So the pressure to publish in the absolute top journal per research field will even further increase if this become general knowledge.

What do we learn for the Dutch universities from the Scimago rankings. Rotterdam still maintains its top position for normalized impact, it scores also best for the %Q1 and Exc. Direct after Rotterdam you Leiden, UvA, VU, Utrecht and Radboud with equal impact. Utrecht has published the most articles during the period 2005-2009. Wageningen excels at international cooperation. And both Tilburg and Wageningen are the most specialized universities in the Netherlands.

Making these international rankings is quite a daunting task. For the Netherlands I noticed that the output of Nijmegen was distributed over Radboud University and Radboud University and Nijmegen Medical Centre, this was not done for the other university hospitals.  And for Wageningen the output was noted under Wageningen University and Research Centre and Plant Research International (which is part of Wageningen UR). But for researchers from Spain these are difficult nuances to resolve 100% perfectly.

My only real complaint with the ranking is the fact that they state it is not a league table, and they rank the institutions on publication output. It is so much more obvious to present the list ranked on NI. Since they only produce the ranking as a PDF file, it took me a couple of hours to translate it into an excel spreadsheet and rank the data any way I wish. With all the information at hand it is also possible design your own indicators, such as a power rank in analogy of the Leiden rankings.

The message to my researchers: aim for the best journals in you field. We still have scope for improvement. We are still not in the neighbourhood of the 30 to 40% Exc. Rate we see for Rockkefeller, Harvard and the like.

How Google could help the Open Access world a little

It was back in 2008 when Google Scholar launched the feature that identified free available versions of articles of the Web. In the early days these were indicated by green triangles in front of the reference. Nowdays free available copies are listed in the right hand column. Many of these versions are Open Access versions of articles properly submitted to preprint servers and subject or institutional repositories. Other free versions of the papers identified by Google Scholar are publishers versions of articles posted to personal websites, dropboxes and you name it. Whatever the rights are, if you need a copy of these papers, and don’t have access through your universities library subscriptions, this Google Scholar feature is a very useful tool. In scholarly search classes I always stress this very useful feature of Google Scholar to my students.

In our institution’s bibliography I would love to include a functionality to refer for each article to the so called document clusters in Google Scholar. Consider the following publication the link to the full text included in the record leads you to Science Direct. Whether you can access the paper on SD, depends on the subscriptions. Sometimes you can’t. Therefore it would be nice if we could include a link to the document cluster in Google Scholar. For this paper you get some 29 versions of the paper, but above all 6 of these are free versions of this paper posted on various websites. That’s really helpful.

In AgrisWeb, I learned from Johannes Keizer yesterday, that they link to Google trough a search for the title words. This works quite well, but it could be done better.

Consider the idea that Google Scholar had an API. If we could query that API on the basis of the DOI or PMID or ISSN in combination with volume, issue and pages or any other combination of standard bibliographic metadata. Yes, something like an openURL. And GoogleScholar would only return the correct Google Scholar ID for that article -that number 12564475196117890153 in the link- we could construct various links. Linking to the Google Scholar document cluster is one. Retrieving the Google Scholar citations is another.

Google doesn’t like metadata too much is an often heard argument. But the Google Books API works swell with ISBN numbers, OCLC numbers or LOC numbers. That API is talking metadata. Libraries a massive stores of metadata. So Anurag Acharya please. The pleas for a Google Scholar API are abound. Mostly for retrieval of citations, but for the OA movement those document clusters are really more important! Perhaps you could launch this Google Scholar API as a present for the Open Access week coming up in October?

National Library of the Netherlands discloses its Google Books Contract

After the successful disclosure of the agreement between the British Library and Google Books on the basis of the Freedom of Information Act, the National Library of the Netherlands (KB) also disclosed their agreement with Google Ireland today. Albeit the director of the KB tweeted a day ago that not all public information needed to be available on the Web, it was decided to publish the agreement on the Web since there were two WOB (a Dutch version of FOIA) procedures underway to get insight in the agreement.

Albeit I am not a lawyer, a few thins caught my eye. The agreement is very similar to the agreement between Google and the British Library. Bert Zeeman pondered the idea of standard Google contracts in this respect. This seems to go for the exception of the number of volumes in the public domain that will be digitized, 250,000 in the UK and 160,000 in the Netherlands (clause 2.1).

What struck me as interesting was the use of the libraries digital copies, clause 4.8 “the library may provide all or any portion of the library digital copy… to (a) academic institutions or research or public libraries, ….” But we are not able to “providing search or hosting services substantially similar to those provided by Google, including but not limited to those services substantially similar to Google book search”. I guess that leaves out the other academic libraries in the Netherlands to include these digital copies in their discovery tools. It is tempting, but I see problems on the horizon. We seem to be left with separate information silos whereas integration with the rest of the collection would be really interesting. It becomes more explicit in clause 4.9 where it is stated that “nothing in this agreement restricts the library from allowing Europeana to crawl the standard metadata of the digital copies provided to library by Google.” We would be more interested in the data rather than the metadata.

But then again, it is up to the lawyers to see what’s allowed and what’s not. But then again, again, after fifteen years all restrictions on the use or distribution terminate (clause 4.7), a bit long according to the open rights group. However, we have experience with building academic library collections, it takes ages. Those fifteen years are over in the wink of a young girl’s eye.

Google better with Google

Or 14 super search tips for scientists and students. The following scholarly super search tips are an explanation for the enclosed slideshare presentation.

Google better with google

This slideshare presentation was posted a while back on WoW!ter’s slideshare, but has been updated to stay sync with this blogpost

The tips
1. Which Google do you want to use? We have a large international audience of users at our University, who normally are redirected to http://www.google.nl. However if you use http://www.google.com/ncr then you get the international version. But if you prefer your Indian version http://www.google.co.in/ncr works as well. With the /ncr you can control the regional version you are using easily.

2. Personalize your search experience. Nowadays found under the small cogwheel at the top right hand of the page or follow this link. The sections I always pay attention to is the filter option. Why should Google judge if something is fit for my eyes? Or not? I also advice to set the number of search results to 50 (but you can’t make use of Google instant search in that case) I used to use 100 results, but even I found that a wee bit too much. Lastly I always check the box to open the results in a new window (it actually opens a new tab, rather than a window), this keeps my search results window in tact whilst I browse some to the results I retrieved.

Some further personalisation would include to install the google toolbar in your browser, or even a step more in the personalization of the search experience is to make use of iGoogle.

3. There is more than 1 Google. Many people are only using the standard Google web search engine. But for academics, Google Scholar, Google book search, Google patents are certainly specific interfaces that should be part of the searchers trick of the trades.

4. Google universal. Nowadays, Google has realized that the many different search interfaces cause a problem for the users as well and therefore they have introduced the universal search engine results page with a lot of specific options on the left hand side of the results. However a suggestion to use Google Scholar is not included.

5. Learn from the advanced search interface. All Google search interfaces have an advanced search option. Use these options to see what the possibilities of the specific search interface are, and learn how you can make use of these advanced search operators in the normal search interface. When you make use of the advanced search options in Google Scholar you see an option to search for a specific author which translates in the Scholar search box as [nitrogen fixation author:”K E Giller”]

6. Be specific or search with more than 1 term In the Dutch language we can often get away with searching for a single word, because we are allowed to make incredibly long compound words such as “wapenstilstandsonderhandelingen”. When you’re searching for scientific information you better stick to English as language . In English can’t make compound words. This is a small language difference which necessitates searching with more terms. But apart from the language difference, when you search with more terms, searches become more specific and the results more relevant. In the current example a search for water only, results in more than 700 million results, whereas [Water management technology assessment] results in nearly 8 million results.
Interestingly, when you look at the results in the slides, you’ll notice that total results numbers in Google are unreliable to say the least. In the step from 2 to 3 search terms the result sets increases again.
The fifth example in the slide is an introduction to the next slide. You can be even more precise when searching.

7. Keep words together. Make us of “phrase searches”. A phrase search is a search which returns the words in exactly the specified order. Of course Google already ranks the results with the phrases of search terms at the very top of the search engine results page. This technique also reduces the sheer number of possible results. Compare for instance [“water management”] with [water management]. You can combine as many phrases as you like (see the previous slide), or make them really long (the latter is also used in plagiarism checks).

8. Search for title words. When you feel overwhelmed by the number of results a good solution is to limit your search to title words rather than anywhere on a page. You can search for single title words with the operator, or all of your search words with the operator. These operators are the same when you compare [intitle:”water management”] with [allintitle:water management]

9. Search for information in PDF files. Most scientific information is published on the web in the format of PDF files. Be it as a scientific report or a scholarly article e.g. [Agaricus bisporus ext:pdf]. A couple of years ago this was an extremely efficient way to look for scholarly information on the Web. However, since it has become very easy to produce your own PDF files, this technique has suffered some of its effectiveness, but it still works wonders. Especially in combination with the other tips.

10. Search for results from a specific domain. In some cases it is useful to restrict you results to a certain website or domain. This is certainly true for sites that don’t have good site search options e.g. [EndNote site:library.wur.nl]. You can also limit the results to the academic institutions of the USA [“water management” site:.edu].

11. Search for number ranges. Apart from the fact that Google is a powerful calculator, you can also search for number ranges. This comes in handy when you want to limit your search to results from certain publication years, e.g. [“publication strategy” 2009…2011]. Note that three dots is different (better) than the standard used two dots.

12. Exclude specific terms with the – operator. You can narrow your searches using this operator. You can exclude as many words as you want by using the - sign in front of all of them, for example [mercury -ford -freddy -outboards -planets].

13. Search with OR. In some occasions it the intelligence of Google doesn’t include obvious synonyms. With the OR operator you can combine search terms e.g. [“carbon dioxide” OR CO2]. Notice that OR should be typed with capitals.

14. Combine. Having seen some of the options of the Google search engine you should realize that you can combine most of these operators. In this way you can make very precise searches [“publication strategy” citations 2009…2011 ext:pdf]