For a Dutch Open Access advocate there was one event that stood out this week. The speech of @SanderDekker our junior minister Science Policy at the Academic Publishing in Europe 2014 conference this week. His speech ‘Going for Gold‘ was a passionate plea for Open Access that should be achieved through the Golden Road.

Open access is a moral obligation, essential for society and inescapable.

He did not debunk the Green route entirely, but for Dekker the Green Road to Open Access was like coming fourth on a major championship. In the end “if you are going for gold, fourth place is the most frustrating place you can achieve”.
As a product manager, responsible for our repository Staff Publications, I see one clear and present danger in the view of our junior minister. If he accepts the Golden Route as the only route, it might lead to the negligence of the Green Route and subsequently the deterioration repository infrastructure in the Netherlands.

The repository infrastructure in The Netherlands and how it can be improved

The Netherlands has a unique repository infrastructure. All universities have their Open Access repository, in most instances managed by the university libraries. Next to that many research institutes maintain Open Access Repositories as well. All the contents of these repositories are harvested and presented in Narcis the overarching repository of the Netherlands. In total 37 institutes participate in Narcis. But lo and behold the 13 universities are the main contributors to Narcis. There are two different policies practiced at the universities in dissemination their publications to Narcis. A group of universities that disseminate complete metadata on all their output to Narcis and a group of universities that only disseminate their open access publications through their repository to Narcis. Some universities can be placed somewhere between these extremes. Since all universities are in the process of acquiring new Current Research Information Systems, there is the opportunity to seize this moment and make arrangements on the exchange of comprehensive metadata for all official university publication output. Make the Academic Bibliography public, and aggregate that output in Narcis.
All universities have to report their publication output to the Association of Dutch Universities (VSNU) according to a strictly defined protocol. At this moment only the final figures are reported to the VSNU by each university independently. With a small change in policy regulations Narcis could be made the overall repository used for the reporting of these figures and by making these reports publicly available the systems becomes transparent and availability, traceability and verifiability mentioned in the VSNU protocol are all safeguarded. In a much better way than the current situation. The new demands from the Junior Minister of Education to the universities to report on Open Access production should be implemented on Narcis as well. The advantage of a comprehensive publication output registration system is that success of open access achievement can be measured as part of total publication output. If we use for this reporting Narics as well we are nog longer dependent on third party providers for bibliographic data provision and we don’t end up with incomparable numbers.

Comprehensive registration will lead to more publications

If the universities manage to achieve a more comprehensive publication output registration it will subsequently become clear that apart from peer reviewed publications, universities publish a lot more than only peer reviewed publications. Many of those other publications contribute considerably and importantly to the open access production of the universities. These publications are more in the realm of grey literature and play a substantial role in knowledge dissemination to other parties than colleague scholars and universities. These publications reach an audience in other parts of public sector, the industry etc. contributing to the so important knowledge circulation within the Netherlands (WRR, 2014). These other publications have always been produced, but where simply not registered, and more importantly not efficiently disseminated. Registration in a CRIS, dissemination trough a repository and aggregation in Narcis will help to spread the word about this grey literature.

Narcis as a link resolver target

There is another way that can reinforce the role of Narcis as well. If we could make Narcis a link resolver target as well for Open Access versions of Toll Access publications the role of Narcis could gain in importance as well. Some OA advocates rely on the Google’s and Google Scholar to identify Open Access versions of articles. But it would better fit in the academic workflow if an Open Access repository could double function as a link resolver as well. If a researcher is using Scopus to find relevant material for his research, he can locate OA versions of articles he might not have access to when they are present in one of the 37 Dutch repositories. Sugita et al. 2007 already reported on a solution like this in Japan. There is some more information on their AIRway project and the existing targets, where Netherlands is lacking completely. Ross Singer blogged a proposal on this subject as well, but I didn’t see it come to implementation.

Reinforcing the green road in the Netherlands

Sander Dekker happily proclaimed the Golden Route to Open Access as his major policy. I do hope that he, in cooperation with the VSNU, would implement a few minor policy changes that enforce the importance of the Dutch repository infrastructure. If the developers of Narcis manage to make Narcis an Open Access target for link resolvers we get a meaningful and sustainable repository infrastructure for relatively little money.

What else caught my eye this week?

How Google could help the Open Access world a little

It was back in 2008 when Google Scholar launched the feature that identified free available versions of articles of the Web. In the early days these were indicated by green triangles in front of the reference. Nowdays free available copies are listed in the right hand column. Many of these versions are Open Access versions of articles properly submitted to preprint servers and subject or institutional repositories. Other free versions of the papers identified by Google Scholar are publishers versions of articles posted to personal websites, dropboxes and you name it. Whatever the rights are, if you need a copy of these papers, and don’t have access through your universities library subscriptions, this Google Scholar feature is a very useful tool. In scholarly search classes I always stress this very useful feature of Google Scholar to my students.

In our institution’s bibliography I would love to include a functionality to refer for each article to the so called document clusters in Google Scholar. Consider the following publication the link to the full text included in the record leads you to Science Direct. Whether you can access the paper on SD, depends on the subscriptions. Sometimes you can’t. Therefore it would be nice if we could include a link to the document cluster in Google Scholar. For this paper you get some 29 versions of the paper, but above all 6 of these are free versions of this paper posted on various websites. That’s really helpful.

In AgrisWeb, I learned from Johannes Keizer yesterday, that they link to Google trough a search for the title words. This works quite well, but it could be done better.

Consider the idea that Google Scholar had an API. If we could query that API on the basis of the DOI or PMID or ISSN in combination with volume, issue and pages or any other combination of standard bibliographic metadata. Yes, something like an openURL. And GoogleScholar would only return the correct Google Scholar ID for that article -that number 12564475196117890153 in the link- we could construct various links. Linking to the Google Scholar document cluster is one. Retrieving the Google Scholar citations is another.

Google doesn’t like metadata too much is an often heard argument. But the Google Books API works swell with ISBN numbers, OCLC numbers or LOC numbers. That API is talking metadata. Libraries are massive stores of metadata. So Anurag Acharya please. The pleas for a Google Scholar API are abound. Mostly for retrieval of citations, but for the OA movement those document clusters are really more important! Perhaps you could launch this Google Scholar API as a present for the Open Access week coming up in October?

 Open Access: Just Publish

I do sincerely apologize for this boring video, a few talking heads is not the right medium to pass a message. An important message that is. But I couldn’t find any palatable alternatives on YouTube. Has nobody tried to make an attractive, short film on this subject? Anyway, a couple of big shots from the Dutch University World passing the message on the importance of Open Access. They talk in Dutch, but this version has English sub-titles.

Google and the academic Deep Web

Blogging on Peer-Reviewed ResearchHagendorn and Santelli (2008) just published an interesting article on the comprehensiveness of indexing of academic repositories by Google. This article triggers this me to write up some observations I was intending to make for quite some time already. It addresses the question I got from a colleague of mine, who observed that the deep web apparently doesn’t exist anymore.

Google has made a start to index flash files. Google has made a start to retrieve information that is hidden behind search forms on the web, i.e. started to index information contained in databases. Google and OCLC exchange information on books scanned, and those contained in Worldcat. Google so it seems has indexed the Web comprehensively with 1 trillion indexed webpages. Could there possibly be anything more to be indexed?

The article by Hagendorn and Santelli shows convincingly that Google still has not indexed all information that is contained in OAISTER, the second largest archive of open access article information. Only Scientific Commons is more comprehensive. They tested this with the Google Research API using the University Research Program for Google Search. They only checked whether the URL was present. This approach only partially reveals some information on depth of the Academic Deep Web. But those are staggering figures already. But reality bites even more.

A short while ago I taught a Web Search class for colleagues at the University Library at Leiden. For the purpose of demonstrating what the Deep or Invisible Web actually constitutes I used and example from their own repository. It is was a thesis on Cannabis from last year and deposited as one huge PDF of 14 MB. Using Google you can find the metadata record. With Google Scholar as well. However, if you try to search for a quite specific sentence on the beginning pages of the actual PDF file Google gives not the sought after thesis. You find three other PhD dissertations. Two of those defended at the same university that same day, but not the one on Cannabis.

Interestingly, you are able to find parts of the thesis in Google Scholar, eg chapter 2, chapter 3 etc. But those are the parts of the thesis contained in different chapters that have been published elsewhere in scholarly journals. Unfortunately, none of these parts in Google Scholar refers back to the original thesis that is in Open Access or have been posted as OA journal article pre-prints in the Leiden repository. In Google Scholar most of the materials is still behind toll gates at publishers websites.

Is Google to blame for this incomplete indexing of repositories? Hagendorn and Santelli point the finger to Google indeed. However, John Wilkin, a colleague of them, doesn’t agree. Just as Lorcan Dempsey didn’t. And neither do I.

I have taken an interest in the new role of librarians. We are no longer solely responsible for bringing external –documentary- resources from outside into the realm of our academic clientele. We have also the dear task of bringing the fruits of their labour as good as possible for the floodlights of the external world. Be it academic or plain lay interest. We have to bring the information out there. Open Access plays an important role in this new task. But that task doesn’t stop at making it simply available on the Web.

Making it available is only a first, essential step. Making it rank well is a second, perhaps even more important step. So as librarians we have to become SEO experts. I have mentioned this here before, as well as at my Dutch blog.

So what to do about this chosen example from the Leiden repository. Well there is actually a slew of measures that should be taken. First of course is to divide the complete thesis in parts, at chapter level. Albeit publishers give permission only to publish articles, of which most theses in the beta sciences exists in the Netherlands, when the thesis is published as a whole. On the other hand, nearly 95% of the publishers allow publication of pre-prints and peer reviewed post prints. The so called Romeo green road. So it is up to the repository managers, preferably with the consent from the PhD candidate, to tear up the thesis in its parts –the chapters, which are the pre-print or post-prints of articles- and archive the thesis on chapter level as well. This makes the record for this thesis with a number of links to far more digestible chunks of information better palatable for the search engine spiders and crawlers. The record for the thesis thus contains links to the individual chapters deposited elsewhere in the repository.

Interesting side effect of this additional effort at the repository side is that the deposit rates will increase considerably. This applies for most Universities in the Netherlands, for our collection of theses as well. Since PhD students are responsible of the lion’s share of academic research at the University, depositing the individual chapters as article preprints in the repository will be of major benefit to the OA performance university. It will require more labour at the side of repository management, but if we take this seriously it is well worth the effort.

We still have to work at the visibility of the repositories really hard, but making the information more palatable is a good start.

NTvG not so open access

The Dutch medical journal Nederlands Tijdschrift voor Geneeskunde, indexed in PubMed, celebrates this year its 150th anniversary this year. Quite an old lady. All its archives have been scanned and made available for subscribers for about two years now. The NTvG is the weekly professional journal for general practitioners in the Netherlands. About 80% of Dutch GPs have their personal subscription on this journal.

Yesterday is was anounced in de Volkskrant, that NTvG will open up its archives to the public. De Volkskrant went on to state that NTvG is following the example of the leading international medical journals with opening up of its archives. Well maybe. There is a catch though. The archive is only open for articles older than five years. With a moving wall of five years before opening it’s archives, one can’t really speak of an OA journal.

On the website of NTvG no mention of this news whatsoever. So far only this announcement in the well respected newspaper de Volkskrant. This story will be continued I believe.