Grey Literature at Wageningen UR, the Library, the Cloud(s) and Reporting

A while back I gave a presentation at the offices of SURF during a small scale seminar on Grey Literature in the Netherlands. The occasion was the visit of Amanda Lawrence to SURF to discuss Grey Literature in the Netherlands.

I was invited to give a presentation of Grey Literature at Wageningen UR. The slides I used are shared in this Slideshare.

Where I always assume that slides tell their story themselves. It is perhaps a good idea to provide some narrative in this blog post to explain certain parts that are perhaps less obvious. In the first slides I present the university and the research institutes at Wageningen. The student staff ratios are so favourable since Wageningen UR comprises of a university and a number of substantive research institutes that concentrate on research in the life sciences only and have no teaching obligations.

CRIS and repository

At the library we manage two systems for the whole organization that are closely intertwined. The current research information system (CRIS) called Metis. In Metis we register all output of Wageningen UR faculty and staff. The data entry normally takes places at the chair group level. Most often the secretary of the chair group or business unit is responsible and the library checks the quality of the data entry and maintains the various lists that facilitates data entry and quality control. Output registration in the metis is really comprehensive, since evaluations, award of bonuses, prizes, promotions, research assessments and external peer reviews take place on the metadata registered in the Metis.

All information that is registered in the CRIS, Metis, is published in our institutional bibliography called Staff Publications. I prefer the term institutional bibliography since the term repository is often associated with Open Access repositories or (open access) institutional repositories only. Whereas in my view the institutional bibliography is the comprehensive metadata collection for all output of the institution including, but not limited to, Open Access publications. It goes without saying that data sets are an integral part of the research output, and we are starting to register datasets in our systems as well.

The coupling of the CRIS and the institutional bibliography exists only since 2003. We have in our bibliography a collection of 90,000 heritage metadata records of lesser quality. Of the 200,000+ items in our repository 25% contain open access items. Looking at the peer reviewed journal articles registered in our staff publications (indicated as WaY in the graph) you can see that it closely follows th enumber of articles that van be retrieved from either Scopus or Web of Science. There are differences in the coverage between Web of Science and Scopus, but both databases seem to cover Wageningen UR output quite closely. Or not?

Comprehesive registration

In slides 6 I show all metadata registrations of publication output. Reaching more than 12,500 items described for the publication year 2010. In the year the number of peer reviewed articles registered was only around 2700 peer reviewed publications. We registered nearly 10,000 other items of research output. In slide 7 I present an overview of the various document types registered on top of peer reviewed publications only. Most important are the “other” articles, those are
articles published in trade or vocational journals. These have very often to do with the societal role the university and research institutes play. These articles are aimed at the larger public and therefore very often in Open Access as well. Book chapters and reports are also very substantial amounts of publications. The reports are most often aimed at the various ministries for which the research institutes work and most often published as OA reports as well. With book chapters this is often not the case. On a yearly basis they are not so conspicuous, but the PhD-theses are nearly all available in Open Access or in a few cases as delayed Open Access. The other items include presentations, brochures, lectures, patents, interviews for newspapers, radio or TV. It is all registered. It all makes a very substantial addition to the peer reviewed publications only.

Dissemination to the Cloud(s)

The institutional bibliography plays a crucial role in the dissemination of information to other parties. All metadata records are indexed in both Google and slightly less in Google Scholar, but we experience problems with Google Scholar indexing the full text of our Open Access publications, since the full text files are located on a separate filing systems. All Dutch language publications are disseminated trough a portal for education and practitioners in the green sector. Wageningen UR Staff Publications is fully OAI/PMH compatible and data is disseminated to Narcis, the overarching repository of repositories in the Netherlands. Other repository aggregators include OAISTER and BASE. The information is harvested by the FAO, which plays a pivotal role in the dissemination of agricultural information in the world. All our PhD-theses are disseminated to DART-Europe the Electronic Theses and Dissertations (ETD) portal for Europe. With our retrospectively digitized collection of theses we are the 12th largest collection of PhD theses in Europe.

Open Access

The growth of Open Access publications is a steady one, although we occasionally face sets backs. Last year for instance we got claims from photographers whose images were used in trade journals for illustration purposes, and the IP rights for electronic dissemination were not rightfully addressed. Currently we just passed the 50,000 OA publications border. When you look at all depositions of OA material in Dutch repositories, Wageningen UR stands out in depositing current material (slide 10). Outperforming any of the other universities (slide 11). Looking at the documentation types of the recent material deposited, it is immediately apparent that Wageningen deposits relatively large numbers of reports and contributions to periodicals (the trade and vocational journals) and also deposits more conference papers as Open Access publications.

De deposition of green OA peer reviewed journal articles is not very successful. We don’t have an intuitive system for the researchers to deposit their publication in place. The library systematically checks the publications and see what we are allowed to do with the publishers versions of the article. In the first place we look at the DOAJ journal list, and actively load those articles in the repository. Secondly we look at the Sherpa/Romeo list of publishers allowing the delayed archiving of publishers PDF. The third list, not truly OA, is the list of publishers allowing free to read access after an embargo period, which we link. A last resort, could be, to link to deposited material in PMC. But we haven’t done that yet. The first two steps leads to 23% of our peer reviewed journal articles being available in Open Access, steps 3 and 4 still need to be executed.

Grey Literature

Why are we so successful in collecting the grey literature output? At the university registration of output is grind in the system. We started at the university in 1975 already and it took years before everybody complied. But faculty and staff are now quite used to do this. Registration also leads to comprehensive reports on publications activities of researchers and research groups. For the relatively recent introduced tenure track, the systems calculates the research credits for the candidates. For staff we provide an attractive graphic overview of their publications with various par charts and pie charts and their co-author network, but most important is a bibliometric report on the basis of articles published in journals covered in the Web of Science, benchmarked on the basis of the baselines from the Essential Science Indicators.

If all universities register the publication output more comprehensively in their current research information systems, these outputs can then be made available trough their repositories. In the example of the publication in Dutch on Culicoides, we see that it concerns a report by researchers from Utrecht University, but this report is not to be found in their OA repository (The publication is not scientific!?) nor in the catalogue of the university. If Narcis would be made the official tool for reporting publication output to the ministry of education on publication out put in the Netherlands in a transparent and verifiable way, publications like these will make a chance to be collected, described and curated.

If the OA repository infrastructure in the Netherlands improves, Narcis can be turned into a service as link resolver. Using the DOI, we could resolve that against the publishers site, but also to Narcis which point to an OA version of the same paper at a repository of one of the universities. In the case of public libraries in the Netherlands, we could configure a national link resolver that exposes OA material in addition to the efficient Google Scholar Open Access material. This is important since not all repository content is discovered in Google Scholar.

Knowledge Economy

With regards to a new knowledge economy, a important report was published quite recently. However, the report did not mention libraries, did not mention repositories, did not mention grey literature. So there is still a world to win for comprehensive institutional repositories that collects and disseminate all the grey literature that is openly available.

WRR. 2013. Naar een lerende economie : Investeren in het verdienvermogen van Nederland. WRR report Vol. 90. Amsterdam: Amsterdam University Press. 440 pp.

Narcis refreshed, but not improved

Narcis is the overarching repository of (Open Access) repositories in the Netherlands. The website was entirely refreshed last week. It got a fresh, modern look. This new look was badly needed.
What did not change was the underlying database and quality of the data. That is a rally missed opportunity. Changing the paint, where repairing the woodwork is really needed is actually a waste of time and money.

Of course Narcis can’t repair it’s framework without the co-operation of the underlying repositories. With at least all universites buying in to better Current Research Information Systems (CRIS) this is the moment to prepare Narcis for the future.

I have pleaded on this blog before to make Narcis the comprehensive metadata aggregator for all scholarly output in the Netherlands. Not only Open Access (OA) publications. But the comprehensive university output. The numbers for the official VSNU reports on scholarly productivity should be based on Narcis, and all metadata underlying those reports should become verifiable in Narcis. This improves the transparency of reporting and transparency of the generated reports. Then, it should go without saying that meaningful reports of the status of Open Access in the Netherlands, as requested by the minister of education, should be generated on the basis of Narcis.

Narcis should serious work on the deduplication of all information. Currently many metadata descriptions reported by separate universities are reported separately, leading to over reporting of actual figures. Based on the estimated of national co-publlication, an overreporting of at least 20% is currently expected. Narcis should merge those records and offer link outs to all repositories contributing the metadata. This deduplication can be greatly improved if they also make better use of standard identieifers such as the Digital Object Identifier (DOI). Currently the DOI is not part of the metadata exchange protocol and this is a serious miss of course.

Narcis should take up the role as metadata exchange platform. e.g. If Groningen and Wageningen have both a co-publication and there is an OA version available in Groningen. There should be service that Wageningen can use to check and harvest that OA version as well and thus safeguard the item on basis of the Lots of Copies Keeps Stuff Safe (LOCKSS) principle. Similar for the exchange of Digital Author Identifiers (DAI). If Utrecht has indicated a DAI for an author in Utrecht in a co-publication with Wageningen, we should be able to resolve the DAI from the author in Utrecht through Narcis and complete the metadata in our systems, starting with the CRIS of course, and harvest the DAI for the none Wageningen authors from Narcis.

Narcis as a link resolver. It should’s be too difficult to change Narcis into a link resolver to find OA versions of Toll Access articles. Exchange of the DOI would help of course, since you want to resolver on the article level and not on the journal level as is done in the current link resolvers. The benefits would be great to the Dutch public and the relevance of the individual repositories would increase.

Narcis got a new colour and letter type. It looks really nice now, but I look forward to bold steps in the direction of improving the database. Making the database an essential part in the Dutch repository infrastructure and boosting the importance and relevance of the institutional repositories.

The week in review – Week 5, 2014

For a Dutch Open Access advocate there was one event that stood out this week. The speech of @SanderDekker our junior minister Science Policy at the Academic Publishing in Europe 2014 conference this week. His speech ‘Going for Gold‘ was a passionate plea for Open Access that should be achieved through the Golden Road.

Open access is a moral obligation, essential for society and inescapable.

He did not debunk the Green route entirely, but for Dekker the Green Road to Open Access was like coming fourth on a major championship. In the end “if you are going for gold, fourth place is the most frustrating place you can achieve”.
As a product manager, responsible for our repository Staff Publications, I see one clear and present danger in the view of our junior minister. If he accepts the Golden Route as the only route, it might lead to the negligence of the Green Route and subsequently the deterioration repository infrastructure in the Netherlands.

The repository infrastructure in The Netherlands and how it can be improved

The Netherlands has a unique repository infrastructure. All universities have their Open Access repository, in most instances managed by the university libraries. Next to that many research institutes maintain Open Access Repositories as well. All the contents of these repositories are harvested and presented in Narcis the overarching repository of the Netherlands. In total 37 institutes participate in Narcis. But lo and behold the 13 universities are the main contributors to Narcis. There are two different policies practiced at the universities in dissemination their publications to Narcis. A group of universities that disseminate complete metadata on all their output to Narcis and a group of universities that only disseminate their open access publications through their repository to Narcis. Some universities can be placed somewhere between these extremes. Since all universities are in the process of acquiring new Current Research Information Systems, there is the opportunity to seize this moment and make arrangements on the exchange of comprehensive metadata for all official university publication output. Make the Academic Bibliography public, and aggregate that output in Narcis.
All universities have to report their publication output to the Association of Dutch Universities (VSNU) according to a strictly defined protocol. At this moment only the final figures are reported to the VSNU by each university independently. With a small change in policy regulations Narcis could be made the overall repository used for the reporting of these figures and by making these reports publicly available the systems becomes transparent and availability, traceability and verifiability mentioned in the VSNU protocol are all safeguarded. In a much better way than the current situation. The new demands from the Junior Minister of Education to the universities to report on Open Access production should be implemented on Narcis as well. The advantage of a comprehensive publication output registration system is that success of open access achievement can be measured as part of total publication output. If we use for this reporting Narics as well we are nog longer dependent on third party providers for bibliographic data provision and we don’t end up with incomparable numbers.

Comprehensive registration will lead to more publications

If the universities manage to achieve a more comprehensive publication output registration it will subsequently become clear that apart from peer reviewed publications, universities publish a lot more than only peer reviewed publications. Many of those other publications contribute considerably and importantly to the open access production of the universities. These publications are more in the realm of grey literature and play a substantial role in knowledge dissemination to other parties than colleague scholars and universities. These publications reach an audience in other parts of public sector, the industry etc. contributing to the so important knowledge circulation within the Netherlands (WRR, 2014). These other publications have always been produced, but where simply not registered, and more importantly not efficiently disseminated. Registration in a CRIS, dissemination trough a repository and aggregation in Narcis will help to spread the word about this grey literature.

Narcis as a link resolver target

There is another way that can reinforce the role of Narcis as well. If we could make Narcis a link resolver target as well for Open Access versions of Toll Access publications the role of Narcis could gain in importance as well. Some OA advocates rely on the Google’s and Google Scholar to identify Open Access versions of articles. But it would better fit in the academic workflow if an Open Access repository could double function as a link resolver as well. If a researcher is using Scopus to find relevant material for his research, he can locate OA versions of articles he might not have access to when they are present in one of the 37 Dutch repositories. Sugita et al. 2007 already reported on a solution like this in Japan. There is some more information on their AIRway project and the existing targets, where Netherlands is lacking completely. Ross Singer blogged a proposal on this subject as well, but I didn’t see it come to implementation.

Reinforcing the green road in the Netherlands

Sander Dekker happily proclaimed the Golden Route to Open Access as his major policy. I do hope that he, in cooperation with the VSNU, would implement a few minor policy changes that enforce the importance of the Dutch repository infrastructure. If the developers of Narcis manage to make Narcis an Open Access target for link resolvers we get a meaningful and sustainable repository infrastructure for relatively little money.

What else caught my eye this week?

Some selected tweets









Sugita, S., K. Horikoshi, M. Suzuki, Shin Kataoka, E.S. Hellman & K. Suzuki 2007. Linking service to open access repositories. D-Lib Magazine, 13(3-4)

WRR. 2013. Naar een lerende economie : Investeren in het verdienvermogen van Nederland. WRR report Vol. 90. Amsterdam: Amsterdam University Press. 440pp.

A census of Open Access repositories in the Netherlands

Open Access receives a lot of attention in the Netherlands. All universities have formulated OA policies explicitly, signed the Berlin OA declaration. Erasmus University Rotterdam Stipulated a mandated OA policy for its researchers. All Dutch universities have repositories in place and there is an overarching repository,, which harvest the repositories of all universities and major research institutions. The UNESCO Global Open Access Portal (GOAP) reported last year “Netherlands has a strong OA awareness and an active promotion of open access through institutional mandates, establishment of OA repositories, OA publishing agreements. SURFfoundation, a Dutch programme for information and communication technology innovation focuses on Open Access and it is the Dutch partner in Knowledge Exchange along with DFG (Germany), DEFF (Denmark) and JISC (UK)”. In 2011 some milestones were celebrated, the 250,000 Open Access publication was harvested by Narcis, and Wageningen UR deposited its 30,000th Open Access publication in Narcis by which it became the largest depositing institution in Narcis .

Despite some early assessments (van Westrienen & Lynch, 2005) no recent analyses on the actual deposit rates by Dutch universities have been made. Let alone a systematic analysis of trends in depositing rates. In this blogpost I want to give a status update of deposits in Open Access repositories in the Netherlands, concentrating on the regular Dutch universities. I hope to follow this up next year to give insight into actual deposit rates.

Data collection
Narcis was used as overarching repository for all OA publications from the Netherlands. Narcis facilitates to estimate deposits per institution, document type and publication year in a uniform and efficient way for 27 repositories in the Netterlands. Data was collected from Narcis in the period December 27th 2011 to January 2nd 2012, during that week no additional deposits to Narcis were made. The total number of deposits in Narcis during that week was 270,519 Open Access items, and did not change during the period while retrieving the data.

As mentioned under data collection an impressive number of 270,519 Open Access deposits have been harvested by Narcis from the 27 OA repositories in the Netherlands. In the following graph the distribution of total deposits over the 27 repositories in the Netherlands is shown.
Total deposits in Narcis 2011
The smallest repository is the Theological University of Kampen with only 4 deposits and the largest Wageningen University with 30,704 deposits. The 13 regular universities in the Netherlands have the largest repositories as measured in Narcis. NWO with 10,179 deposited items is the largest repository of the group of none universities (this group includes the Open University). The NWO repository is just a fraction smaller than the repository of Radboud university Nijmegen. Also indicated in the graph is the recency of the deposits. The share of deposits from recent (since 2006) publication years is indicated in red, whereas the blue part of the bars represents the deposits from the older (pre 2006) publication years. Of the regular universities Wageningen UR and the VU university have the largest share recent deposits, whereas TU Eindhoven and Tilburg University have the largest share of older publications.

The next graph looks into more detail in the Open Access deposits of the most recent publication years of the 13 Dutch universities. The deposits per publication year for the period 2006-2011 are depicted. In all cases deposits from the publication year 2011 trailed behind, which doesn’t come as a surprise. In a few cases however I observe clear negative trends in the number of deposits made during the period 2006-2011. This is clearly the case for the universities of Groningen, Leiden, Maastricht and Utrecht.
OA deposits in narcis by publication year 2006-2011
The trend in deposits per publication year is more or less stable in Nijmegen and Twente. For the universities of Rotterdam, Delft, Eindhoven, University of Amsterdam, Tilburg, VU Amsterdam and Wageningen UR an increasing trend in deposits is observed. The VU Amsterdam shows a clear outlier in number of deposits for publication year 2009. About half of the universities have more than 1000 deposits per publication year. Rotterdam, Nijmegen, Eindhoven, Leiden, Maastricht and Tilburg are lagging behind in this respect. Wageningen UR has more than double the number of deposits per publication year compared to any other university.

Yearly trends SI
By far most of the smaller institutions have less than 100 open access deposits per publication year. NWO, NIVEL, KNAW and the Open University have on average between the 100 and 300 open access deposits per publication year. It is interesting to note that the deposits for publication year 2011 are more in line with the preceding publication years than for the general universities. An indication that it appears easier to manage the publication output for smaller institutions.

In the next graph I actually looked to the document type breakdown of deposits for the period 2006-2011 for the regular universities. In the first place it should be noted that there exists a large range of document types in Narcis. Some of these document types seem superfluous. The difference between Student thesis and Master thesis is entirely unclear, and technical documentation versus reports is another example. Narcis should look into this matter and some universities should clean up their document formats as well. Having said that, most universities have three major types of open access publications: articles, reports and PhD theses.
OA desposits Pub type
The VU university excels at OA article deposits over the last six years, followed by Groningen and Utrecht. Wageningen UR excels at depositing reports, followed at quite some distance by TU Eindhoven and the UvA. For the PhD theses, Utrecht has the lead, followed by the VU and Delft. OA PhD theses are an important source of material since they consists in most cases of a chapters which are preprints of articles to be published at a later date. Erasmus University Rotterdam, Maastricht and Tilburg are the universities with the largest share of working papers. Wageningen UR has a very large share of contributions to periodicals. This is a group of publications that have hardly any deposits at other universities. Looking at the overall picture Wageningen UR clearly stands out as a results of the large share of reports and contributions to periodicals. On top of that they have the largest share of conference papers as well. It can easily be argued that Wageningen UR, of all repositories in the Netherlands excels at disseminating grey literature by means of their open access repository Wageningen Yield.

At this moment there aren’t comparative repository usage statistics in the Netherlands, but the early trial results indicate that repositories with more recent content also get more article downloads. To draw firm conclusions on the trial implementation of SURE2 is a bit too early.

The share of OA in NL
The absolute numbers of OA deposits themselves are not so meaningful as long as they are not related to the actual scientific output of the institutions. Although we have the current set of figures on OA deposits as measured through Narcis in the Netherlands, the share of OA in total institutional output is a difficult figure to establish. A few institutions deposit metadata records of all their publications to Narcis, but other institutions limit themselves to OA deposits only. Whereas a third group deposits only a subset of all their publications metadata to Narcis. To arrive at figures for the full publication output we have to consult other sources. The VSNU would be an obvious source, but the disadvantage of these figures is that they are based on reporting years rather than publication years (a rather odd approach). A point in case are the PhD theses output reported by the VSNU compared to the OA theses reported in Narcis over the period 2006-2010 in the following table.



OA (narcis)


    Erasmus University Rotterdam




    RU Nijmegen




    RU Groningen




    TU Delft




    TU Eindhoven




    University Leiden




    University Maastricht




    University Twente




    University Utrecht




    University van Amsterdam




    University van Tilburg




    Vrije University Amsterdam




    Wageningen UR




At Maastricht University and UvA there were actually more theses deposited in NARCIS over the period 2006-2010 than reported to the VSNU. For actual years the fluctuations can be quite extensive, but over a period of consecutive years the fluctuations become smaller. Apparently all theses defended at Maastricht and the UvA are available in OA. Wageningen follows closely with 96%, whereas Radboud University Nijmegen, TU Delft, TU Eindhoven, Twente University, Tilburg University and VU Amsterdam follow with percentages of OA PhD theses in the 80%. Erasmus University, RU Groningen University of Leiden and Utrecht University are lagging behind in depositing their PhD theses in OA.

Coverage of OA article ouput
For an actual estimate of articles produced per institution multiple sources exist. The VSNU figures based on reporting years are useless in this respect. The databases Scopus or Web of Science (WoS) could be used to estimate the actual article output per university, but to disambiguate all the name variations of the universities (and their institutes or hospitals) is a cumbersome task. In this respect Scopus actually performs better than WoS. However other sources based on either WoS or Scopus have already carried out this disambiguation. The reports by CWTS for example are useful in this matter. The most recent WTI2 report (Jager et al. 2011) (the successor of the NOWT reports) gives figures for the publication output of Dutch universities for the period 2007-2010 (table 30, p. 48) that have been disambiguated by CWTS. These figures are derived from Web of Science and underestimate the actual peer reviewed article output. For a life sciences university as Wageningen UR some 70% of the actual article output is published in journals covered by WoS and included in the WTI2 report. For broad, general universities with more social sciences and humanities this percentage is expected to be lower. For Tilburg this figures appears to be only 30%, whereas for Nijmegen this seems to be 51% and for TU Eindhoven 67%.

In table 2 the total number of articles for the period 2007-2010 reported in Narcis, the total number of articles according to CWTS (WTI2 report, Jager et al. (2011)) and the actual OA articles reported in Narcis are presented. The percentage OA coverage is calculated in two ways. In the first place we look at the %OA(CWTS) by comparing the OA articles in Narics to the articles reported by CWTS. In the second place we look at the total number of articles reported in Narcis compared to the OA articles reported in Narcis. In the third percentage column we look the minimum value of both methods. The last column is probably the best estimate of %OA coverage per institution.

Table 2, total articles per university for the period 2007-2010 reported in NARCIS and WTI2 and %OA coverage based on comparison with CWTS figures and total articles registered in Narcis



In Narics

Articles by









%OA coverage

    Erasmus University Rotterdam







    Radboud University Nijmegen







    RU Groningen







    TU Delft







    TU Eindhoven







    University Leiden







    University Maastricht







    University Twente







    University Utrecht







    University van Amsterdam







    University van Tilburg







    VU Amsterdam







    Wageningen UR














Comparing the OA articles in NARCIS for the period 2007-2010 with the figures from CWTS report results in a very favourable figure of 72% of the articles available in OA at Tilburg university. This favourable figure is largely due to the under estimation of Tilburg University article output based on articles covered in WoS journals only. VU Amsterdam is the next highest (40%) %OA articles based on the CWTS figures, followed closely by Groningen (39%). The aggregate figure for all universities in the Netherlands is 22% of the articles are OA based on WoS estimates of article output. Since WoS under estimates the actual article output it is useful to look at the total number of articles in Narcis as well.

Compared to the self deposited articles in Narcis, Erasmus University Rotterdam, RU Groningen, TU Delft and Leiden University only deposit OA articles in Narcis whereas the other universities also deposit metadata for none OA articles. However, coverage of this share of publications varies among universities. Radboud University Nijmegen and TU Eindhoven for instance, who score already low on the %OA articles based on the CWTS figures, score even lower considering their self reported article output in Narcis. In those instances where the %OA(Narcis) is higher than the %OA(CWTS) there is an underestimation of the actual article output registration of metadata deposited in Narcis.

The minimum %OA coverage of reported in the third percentage column is the best estimate for OA coverage for universities in the Netherlands based on OA articles reported in Narcis. VU Amsterdam, RU Groningen and TU Delft are the most successful in making their article output available in OA. The reported coverage lies clearly above the 20% of OA reported for most institutions without mandated OA policies (Harnad, 2009) Twente University, Utrecht University, Tilburg University, Wageningen UR and UvA are performing around the average of 22%, this percentage is in line with the figure of %OA for universities without mandated OA policies. Whereas Erasmus Rotterdam, RU Nijmegen, TU Eindhoven, Leiden university and Maastricht university are under performing in this respect. It remains a question whether OA article numbers reported by Narcis are actually correct, or wether in the case of Radboud and TU Eindhoven, the total article output reported in Narcis are correct. It is possible that the document types actually include more than only peer reviewed scholarly articles.

Despite having signed the Berlin OA declaration by all Dutch universities, this has resulted only in a few universities with substantial higher shares of OA peer reviewed articles than is to be expected on the basis of a “normal” publication output which results in about 20% articles published in OA. For the universities where I arrive at even lower %OA articles we have to wonder whether Narcis actually harvest and reports all the universities output.

Another valuable approach is to concentrate on the grey literature are Wageningen UR does. But for this type of documents it is even more difficult to arrive at a share of OA coverage. This can only be established by the institutions themselves since it can be doubted whether all institutions have their output registration complete.

Lessons to be learned

  • Narcis could and should improve the type reporting as performed in this report. They should produce overviews like this preferable twice a year.
  • Narcis should look into some of the obsolete document types to reduce the wild array of documents (are technical documentation different from reports?, student theses and master theses are probably not the type of research output to be registered in Narcis)
  • Institution should look at the document types deposited in Narcis as well.
  • The role of Narcis and the importance of OA could be improved if VSNU and Narcis (KNAW) make Narcis the standard reporting tool for research output registration in the Netherlands (The VSNU should abandon the ridiculous reporting years and use the publication years in their reports instead)
  • Universities should use metis (or a comparable CRIS) to upload all the metadata of the institutional output to Narcis.
  • Having comprehensive output registration, makes the minimum goal of at least 20% in OA better attainable since you are not depended on actual article submission by the authors, but based on Sherpa/Romeo and DOAJ OA versions can be chased down.
  • Mandates such as those in Rotterdam, announced at the beginning of 2011, have no effect whatsoever if there is no actual stick behind the policy

Harnad, S. (2009) Waking OA’s Slumbering Giant: Why Locus-of-Deposit Matters for Open Access and Open Access Mandates.
Jager, C.-J., J. Veldkamp, D. Aksnes, R. te Velde & P. den Hertog (2011). Wetenschaps-, Technologie & Innovatie Indicatoren 2011. Utrecht, Dialogic innovatie ● interactie
Westrienen, G. van & C. A. Lynch (2005). Academic Institutional Repositories: Deployment Status in 13 Nations as of Mid 2005 D-Lib magazine, 11(9)

Google and the academic Deep Web

Blogging on Peer-Reviewed ResearchHagendorn and Santelli (2008) just published an interesting article on the comprehensiveness of indexing of academic repositories by Google. This article triggers this me to write up some observations I was intending to make for quite some time already. It addresses the question I got from a colleague of mine, who observed that the deep web apparently doesn’t exist anymore.

Google has made a start to index flash files. Google has made a start to retrieve information that is hidden behind search forms on the web, i.e. started to index information contained in databases. Google and OCLC exchange information on books scanned, and those contained in Worldcat. Google so it seems has indexed the Web comprehensively with 1 trillion indexed webpages. Could there possibly be anything more to be indexed?

The article by Hagendorn and Santelli shows convincingly that Google still has not indexed all information that is contained in OAISTER, the second largest archive of open access article information. Only Scientific Commons is more comprehensive. They tested this with the Google Research API using the University Research Program for Google Search. They only checked whether the URL was present. This approach only partially reveals some information on depth of the Academic Deep Web. But those are staggering figures already. But reality bites even more.

A short while ago I taught a Web Search class for colleagues at the University Library at Leiden. For the purpose of demonstrating what the Deep or Invisible Web actually constitutes I used and example from their own repository. It is was a thesis on Cannabis from last year and deposited as one huge PDF of 14 MB. Using Google you can find the metadata record. With Google Scholar as well. However, if you try to search for a quite specific sentence on the beginning pages of the actual PDF file Google gives not the sought after thesis. You find three other PhD dissertations. Two of those defended at the same university that same day, but not the one on Cannabis.

Interestingly, you are able to find parts of the thesis in Google Scholar, eg chapter 2, chapter 3 etc. But those are the parts of the thesis contained in different chapters that have been published elsewhere in scholarly journals. Unfortunately, none of these parts in Google Scholar refers back to the original thesis that is in Open Access or have been posted as OA journal article pre-prints in the Leiden repository. In Google Scholar most of the materials is still behind toll gates at publishers websites.

Is Google to blame for this incomplete indexing of repositories? Hagendorn and Santelli point the finger to Google indeed. However, John Wilkin, a colleague of them, doesn’t agree. Just as Lorcan Dempsey didn’t. And neither do I.

I have taken an interest in the new role of librarians. We are no longer solely responsible for bringing external –documentary- resources from outside into the realm of our academic clientele. We have also the dear task of bringing the fruits of their labour as good as possible for the floodlights of the external world. Be it academic or plain lay interest. We have to bring the information out there. Open Access plays an important role in this new task. But that task doesn’t stop at making it simply available on the Web.

Making it available is only a first, essential step. Making it rank well is a second, perhaps even more important step. So as librarians we have to become SEO experts. I have mentioned this here before, as well as at my Dutch blog.

So what to do about this chosen example from the Leiden repository. Well there is actually a slew of measures that should be taken. First of course is to divide the complete thesis in parts, at chapter level. Albeit publishers give permission only to publish articles, of which most theses in the beta sciences exists in the Netherlands, when the thesis is published as a whole. On the other hand, nearly 95% of the publishers allow publication of pre-prints and peer reviewed post prints. The so called Romeo green road. So it is up to the repository managers, preferably with the consent from the PhD candidate, to tear up the thesis in its parts –the chapters, which are the pre-print or post-prints of articles- and archive the thesis on chapter level as well. This makes the record for this thesis with a number of links to far more digestible chunks of information better palatable for the search engine spiders and crawlers. The record for the thesis thus contains links to the individual chapters deposited elsewhere in the repository.

Interesting side effect of this additional effort at the repository side is that the deposit rates will increase considerably. This applies for most Universities in the Netherlands, for our collection of theses as well. Since PhD students are responsible of the lion’s share of academic research at the University, depositing the individual chapters as article preprints in the repository will be of major benefit to the OA performance university. It will require more labour at the side of repository management, but if we take this seriously it is well worth the effort.

We still have to work at the visibility of the repositories really hard, but making the information more palatable is a good start.

Hagedorn, K. and J. Santelli (2008). Google still not indexing hidden web URLs. D-Lib Magazine 14(7/8).