Narcis refreshed, but not improved

Narcis is the overarching repository of (Open Access) repositories in the Netherlands. The website was entirely refreshed last week. It got a fresh, modern look. This new look was badly needed.
What did not change was the underlying database and quality of the data. That is a rally missed opportunity. Changing the paint, where repairing the woodwork is really needed is actually a waste of time and money.

Of course Narcis can’t repair it’s framework without the co-operation of the underlying repositories. With at least all universites buying in to better Current Research Information Systems (CRIS) this is the moment to prepare Narcis for the future.

I have pleaded on this blog before to make Narcis the comprehensive metadata aggregator for all scholarly output in the Netherlands. Not only Open Access (OA) publications. But the comprehensive university output. The numbers for the official VSNU reports on scholarly productivity should be based on Narcis, and all metadata underlying those reports should become verifiable in Narcis. This improves the transparency of reporting and transparency of the generated reports. Then, it should go without saying that meaningful reports of the status of Open Access in the Netherlands, as requested by the minister of education, should be generated on the basis of Narcis.

Narcis should serious work on the deduplication of all information. Currently many metadata descriptions reported by separate universities are reported separately, leading to over reporting of actual figures. Based on the estimated of national co-publlication, an overreporting of at least 20% is currently expected. Narcis should merge those records and offer link outs to all repositories contributing the metadata. This deduplication can be greatly improved if they also make better use of standard identieifers such as the Digital Object Identifier (DOI). Currently the DOI is not part of the metadata exchange protocol and this is a serious miss of course.

Narcis should take up the role as metadata exchange platform. e.g. If Groningen and Wageningen have both a co-publication and there is an OA version available in Groningen. There should be service that Wageningen can use to check and harvest that OA version as well and thus safeguard the item on basis of the Lots of Copies Keeps Stuff Safe (LOCKSS) principle. Similar for the exchange of Digital Author Identifiers (DAI). If Utrecht has indicated a DAI for an author in Utrecht in a co-publication with Wageningen, we should be able to resolve the DAI from the author in Utrecht through Narcis and complete the metadata in our systems, starting with the CRIS of course, and harvest the DAI for the none Wageningen authors from Narcis.

Narcis as a link resolver. It should’s be too difficult to change Narcis into a link resolver to find OA versions of Toll Access articles. Exchange of the DOI would help of course, since you want to resolver on the article level and not on the journal level as is done in the current link resolvers. The benefits would be great to the Dutch public and the relevance of the individual repositories would increase.

Narcis got a new colour and letter type. It looks really nice now, but I look forward to bold steps in the direction of improving the database. Making the database an essential part in the Dutch repository infrastructure and boosting the importance and relevance of the institutional repositories.

The costs for going Gold in the Netherlands

For a meeting of the Open Access work group of Dutch university libraries and the licenses work group of those same universities I was asked to make an estimate of the Costs for a 100% Gold OA model for the Netherlands. In this blog post I want to explain the methodology how I arrived at the outcome of the current calculation and contribute to this subject.

In the first slide I compare the Dutch output registered in the two most suitable databases for this research question. Scopus and Web of Science. To my own amazement Scopus only covered more Dutch publications after 2004. For the calculation of the Article Processing Charges (APC) paid by the Dutch research community it is fair to concentrate on the articles and reviews only. Editorials, letters and conference proceedings were therefore left out the equation. Scopus had the lead in articles and reviews already in 2003. Also striking in this graph is that WoS is slower in updating the database than Scopus, since year 2013 is clear trailing behind. Based on the presented graph, it is likely that we will see some 40,000 articles and reviews published by Dutch (co-)authors.

Since the Web of Science interface was renewed, in the search results an Open Access facet was added. The open access facet identifies the journals covered by Web of Science and registered in the DOAJ. The list of Open Access journals covered by Web of Science, i.e. the Open Access journals with an impact factor, or those that will soon receive an impact factor is freely available from Thomson Reuters. Because of the improved OA identification -but not perfect- Web of Science was the database of choice for this exercise. In the second slide I show the increase share of Open Access articles in journal articles covered by Web of Science in the Netherlands. In 2013 3,776 of 35,267 articles and reviews were published in Gold Open Access journals. That is 10.7%

Looking into more detail at the share of open access articles from the Netherlands in graph 3. I distighuish two points of inflection. After 2004 the share of Open Access articles really took off. I guess that this has to do with the expanded coverage of Open Access journals by Web of Science. Since 2007 Web of Science really started to expand its journal coverage. The second point of inflexion seems to be 2010, when PLoS ONE really started to become popular after it had received its first Impact Factor listing.

So far I talked about the Dutch publication as if they were all produced by the universities. In actual fact 13% of the output in 2013 was not produced by universities and 87% by universities and their academic hospitals.

Comparing the number of Open Access articles found in Web of Science and the refereed articles registered in Narcis, we see a big gap in the older years that closes in the current years. The gap is largely caused by green Open Access articles, Hybrid Open Access article, and Open Access articles published in journals not covered by Web of Science. The relative importance of these three factors need to be established. The lines touching in 2014 is an indication that Gold Open Access is important in filling the repositories immediately and that registering the Green articles in repositories actually take some time. Also because of publisher’s embargoes.

Price information for Article Processing Charges (APC) can be found on the eigenfactor website. Looking into detail to the articles published in 2013. 3314 articles were published in journals APCs , and only 404 in journals without APCs. The average APC for the paid OA journal was on average € 1220,- Taking the free journal articles into account as well, the PAC dropped to € 1087,- on average. All these prices are VAT exclusive.

The total costs for gold Open Access publishing for the Netherlands as covered by journals indexed in Web of Science increased nearly linearly from € 1.5 million in 2009 to just over € 4 million in 2013.

Over this five year period we paid quite substantial APC to the following publishers. As to be expected most to Springer/BMC and PLoS. Followed by Oxford University Press. The mentioned European Geosciences Union is in fact published by Copernicus publishers in Germany. Frontiers was recently acquired by the Nature Publishing Group. The license work group really has a list to consider next to the ‘traditional’ big deals with the standard publishers. It is wisely to see if deals can be struck on APC with Open Access publishers as well. Heather Morrison showed just the other day that we have had some steep price increases by BMC/Springer.

There are some points to consider. Not all research published by Dutch researchers is produced by Dutch Researchers only. In the Science, Technology and Innovation indicators it is indicated that some 50% of publications involve international collaboration. So for 50% of the articles Dutch authors don’t always have to pay the full APC. It is paid by the corresponding author from another country. The bill is shared. Or any other variation. Some research in this area is badly needed.
The APC are another issue. The eigenfactor collection was a good starting point, but are perhaps a bit behind reality for some journals already. Some publishers provide lists of all their journals, but often the lack sufficient metadata -e.g. issn- to do actually something useful with the lists. But in most cases APC are well hidden away, somewhere deep down in the instructions to authors for a single journal only. Publishers should be more transparent in this area.
Where the number of ‘Dutch’ articles might be an over estimation, 21% VAT is not.
In WoS currently only 718 Open Access journals are indexed, out of 9744 listed in DOAJ. Those 718 journals are an increase of 99 OA journals from the 619 I found in december 2010. But it is still a long way from the nearly 10,000 Open Access journals we know of. Of course WoS wants, and should, only cover the top tier journals, but there is more values in those 10,000 DOAJ journals than the current WoS selection. In addition to that, WoS should find a way to indicated OA articles in Toll Access journals as well.

Having made these considerations. My estimate is that in 2014 some 40,000 articles and reviews will be published by Dutch researchers. Applying the average APC of € 1087,- I arrive at an estimated € 43,500,000,- for the Netherlands if all Dutch research would be published in Gold Open Access journals. That figure should be compared to the current spending on journal subscriptions in the Netherlands by Dutch Universities, which is about € 34 million per year Euro at the moment. Going for gold will cost therefore € 10.5 million. That is a lot of money.

A census of Open Access repositories in the Netherlands

Open Access receives a lot of attention in the Netherlands. All universities have formulated OA policies explicitly, signed the Berlin OA declaration. Erasmus University Rotterdam Stipulated a mandated OA policy for its researchers. All Dutch universities have repositories in place and there is an overarching repository, narcis.nl, which harvest the repositories of all universities and major research institutions. The UNESCO Global Open Access Portal (GOAP) reported last year “Netherlands has a strong OA awareness and an active promotion of open access through institutional mandates, establishment of OA repositories, OA publishing agreements. SURFfoundation, a Dutch programme for information and communication technology innovation focuses on Open Access and it is the Dutch partner in Knowledge Exchange along with DFG (Germany), DEFF (Denmark) and JISC (UK)”. In 2011 some milestones were celebrated, the 250,000 Open Access publication was harvested by Narcis, and Wageningen UR deposited its 30,000th Open Access publication in Narcis by which it became the largest depositing institution in Narcis .

Despite some early assessments (van Westrienen & Lynch, 2005) no recent analyses on the actual deposit rates by Dutch universities have been made. Let alone a systematic analysis of trends in depositing rates. In this blogpost I want to give a status update of deposits in Open Access repositories in the Netherlands, concentrating on the regular Dutch universities. I hope to follow this up next year to give insight into actual deposit rates.

Data collection
Narcis was used as overarching repository for all OA publications from the Netherlands. Narcis facilitates to estimate deposits per institution, document type and publication year in a uniform and efficient way for 27 repositories in the Netterlands. Data was collected from Narcis in the period December 27th 2011 to January 2nd 2012, during that week no additional deposits to Narcis were made. The total number of deposits in Narcis during that week was 270,519 Open Access items, and did not change during the period while retrieving the data.

Results
As mentioned under data collection an impressive number of 270,519 Open Access deposits have been harvested by Narcis from the 27 OA repositories in the Netherlands. In the following graph the distribution of total deposits over the 27 repositories in the Netherlands is shown.
Total deposits in Narcis 2011
The smallest repository is the Theological University of Kampen with only 4 deposits and the largest Wageningen University with 30,704 deposits. The 13 regular universities in the Netherlands have the largest repositories as measured in Narcis. NWO with 10,179 deposited items is the largest repository of the group of none universities (this group includes the Open University). The NWO repository is just a fraction smaller than the repository of Radboud university Nijmegen. Also indicated in the graph is the recency of the deposits. The share of deposits from recent (since 2006) publication years is indicated in red, whereas the blue part of the bars represents the deposits from the older (pre 2006) publication years. Of the regular universities Wageningen UR and the VU university have the largest share recent deposits, whereas TU Eindhoven and Tilburg University have the largest share of older publications.

The next graph looks into more detail in the Open Access deposits of the most recent publication years of the 13 Dutch universities. The deposits per publication year for the period 2006-2011 are depicted. In all cases deposits from the publication year 2011 trailed behind, which doesn’t come as a surprise. In a few cases however I observe clear negative trends in the number of deposits made during the period 2006-2011. This is clearly the case for the universities of Groningen, Leiden, Maastricht and Utrecht.
OA deposits in narcis by publication year 2006-2011
The trend in deposits per publication year is more or less stable in Nijmegen and Twente. For the universities of Rotterdam, Delft, Eindhoven, University of Amsterdam, Tilburg, VU Amsterdam and Wageningen UR an increasing trend in deposits is observed. The VU Amsterdam shows a clear outlier in number of deposits for publication year 2009. About half of the universities have more than 1000 deposits per publication year. Rotterdam, Nijmegen, Eindhoven, Leiden, Maastricht and Tilburg are lagging behind in this respect. Wageningen UR has more than double the number of deposits per publication year compared to any other university.

Yearly trends SI
By far most of the smaller institutions have less than 100 open access deposits per publication year. NWO, NIVEL, KNAW and the Open University have on average between the 100 and 300 open access deposits per publication year. It is interesting to note that the deposits for publication year 2011 are more in line with the preceding publication years than for the general universities. An indication that it appears easier to manage the publication output for smaller institutions.

In the next graph I actually looked to the document type breakdown of deposits for the period 2006-2011 for the regular universities. In the first place it should be noted that there exists a large range of document types in Narcis. Some of these document types seem superfluous. The difference between Student thesis and Master thesis is entirely unclear, and technical documentation versus reports is another example. Narcis should look into this matter and some universities should clean up their document formats as well. Having said that, most universities have three major types of open access publications: articles, reports and PhD theses.
OA desposits Pub type
The VU university excels at OA article deposits over the last six years, followed by Groningen and Utrecht. Wageningen UR excels at depositing reports, followed at quite some distance by TU Eindhoven and the UvA. For the PhD theses, Utrecht has the lead, followed by the VU and Delft. OA PhD theses are an important source of material since they consists in most cases of a chapters which are preprints of articles to be published at a later date. Erasmus University Rotterdam, Maastricht and Tilburg are the universities with the largest share of working papers. Wageningen UR has a very large share of contributions to periodicals. This is a group of publications that have hardly any deposits at other universities. Looking at the overall picture Wageningen UR clearly stands out as a results of the large share of reports and contributions to periodicals. On top of that they have the largest share of conference papers as well. It can easily be argued that Wageningen UR, of all repositories in the Netherlands excels at disseminating grey literature by means of their open access repository Wageningen Yield.

At this moment there aren’t comparative repository usage statistics in the Netherlands, but the early trial results indicate that repositories with more recent content also get more article downloads. To draw firm conclusions on the trial implementation of SURE2 is a bit too early.

The share of OA in NL
The absolute numbers of OA deposits themselves are not so meaningful as long as they are not related to the actual scientific output of the institutions. Although we have the current set of figures on OA deposits as measured through Narcis in the Netherlands, the share of OA in total institutional output is a difficult figure to establish. A few institutions deposit metadata records of all their publications to Narcis, but other institutions limit themselves to OA deposits only. Whereas a third group deposits only a subset of all their publications metadata to Narcis. To arrive at figures for the full publication output we have to consult other sources. The VSNU would be an obvious source, but the disadvantage of these figures is that they are based on reporting years rather than publication years (a rather odd approach). A point in case are the PhD theses output reported by the VSNU compared to the OA theses reported in Narcis over the period 2006-2010 in the following table.

University

VSNU

OA (narcis)

coverage

    Erasmus University Rotterdam

1524

993

65%

    RU Nijmegen

2266

1992

88%

    RU Groningen

1690

1082

64%

    TU Delft

1319

1079

82%

    TU Eindhoven

900

776

86%

    University Leiden

1791

919

51%

    University Maastricht

1367

1542

113%

    University Twente

1321

1077

82%

    University Utrecht

455

333

73%

    University van Amsterdam

1276

1297

102%

    University van Tilburg

896

790

88%

    Vrije University Amsterdam

878

772

88%

    Wageningen UR

1075

1032

96%

At Maastricht University and UvA there were actually more theses deposited in NARCIS over the period 2006-2010 than reported to the VSNU. For actual years the fluctuations can be quite extensive, but over a period of consecutive years the fluctuations become smaller. Apparently all theses defended at Maastricht and the UvA are available in OA. Wageningen follows closely with 96%, whereas Radboud University Nijmegen, TU Delft, TU Eindhoven, Twente University, Tilburg University and VU Amsterdam follow with percentages of OA PhD theses in the 80%. Erasmus University, RU Groningen University of Leiden and Utrecht University are lagging behind in depositing their PhD theses in OA.

Coverage of OA article ouput
For an actual estimate of articles produced per institution multiple sources exist. The VSNU figures based on reporting years are useless in this respect. The databases Scopus or Web of Science (WoS) could be used to estimate the actual article output per university, but to disambiguate all the name variations of the universities (and their institutes or hospitals) is a cumbersome task. In this respect Scopus actually performs better than WoS. However other sources based on either WoS or Scopus have already carried out this disambiguation. The reports by CWTS for example are useful in this matter. The most recent WTI2 report (Jager et al. 2011) (the successor of the NOWT reports) gives figures for the publication output of Dutch universities for the period 2007-2010 (table 30, p. 48) that have been disambiguated by CWTS. These figures are derived from Web of Science and underestimate the actual peer reviewed article output. For a life sciences university as Wageningen UR some 70% of the actual article output is published in journals covered by WoS and included in the WTI2 report. For broad, general universities with more social sciences and humanities this percentage is expected to be lower. For Tilburg this figures appears to be only 30%, whereas for Nijmegen this seems to be 51% and for TU Eindhoven 67%.

In table 2 the total number of articles for the period 2007-2010 reported in Narcis, the total number of articles according to CWTS (WTI2 report, Jager et al. (2011)) and the actual OA articles reported in Narcis are presented. The percentage OA coverage is calculated in two ways. In the first place we look at the %OA(CWTS) by comparing the OA articles in Narics to the articles reported by CWTS. In the second place we look at the total number of articles reported in Narcis compared to the OA articles reported in Narcis. In the third percentage column we look the minimum value of both methods. The last column is probably the best estimate of %OA coverage per institution.

Table 2, total articles per university for the period 2007-2010 reported in NARCIS and WTI2 and %OA coverage based on comparison with CWTS figures and total articles registered in Narcis

University

Articles

In Narics

Articles by

CWTS

OA

articles

%OA

(CWTS)

%OA

(Narcis)

Minimum

%OA coverage

    Erasmus University Rotterdam

1072

10663

1072

10%

100%

10%

    Radboud University Nijmegen

19803

10126

1189

12%

6%

6%

    RU Groningen

4067

10461

4067

39%

100%

39%

    TU Delft

2150

6521

2145

33%

100%

33%

    TU Eindhoven

7041

4732

520

11%

7%

7%

    University Leiden

730

10616

730

7%

100%

7%

    University Maastricht

519

7086

482

7%

93%

7%

    University Twente

3665

3740

880

24%

24%

24%

    University Utrecht

4803

15243

3039

20%

63%

20%

    University van Amsterdam

16191

13030

2727

21%

17%

17%

    University van Tilburg

5791

1782

1285

72%

22%

22%

    VU Amsterdam

5354

10912

4410

40%

82%

40%

    Wageningen UR

10572

7419

2479

33%

23%

23%

    Aggregate

81758

112331

25025

22%

31%

22%

Comparing the OA articles in NARCIS for the period 2007-2010 with the figures from CWTS report results in a very favourable figure of 72% of the articles available in OA at Tilburg university. This favourable figure is largely due to the under estimation of Tilburg University article output based on articles covered in WoS journals only. VU Amsterdam is the next highest (40%) %OA articles based on the CWTS figures, followed closely by Groningen (39%). The aggregate figure for all universities in the Netherlands is 22% of the articles are OA based on WoS estimates of article output. Since WoS under estimates the actual article output it is useful to look at the total number of articles in Narcis as well.

Compared to the self deposited articles in Narcis, Erasmus University Rotterdam, RU Groningen, TU Delft and Leiden University only deposit OA articles in Narcis whereas the other universities also deposit metadata for none OA articles. However, coverage of this share of publications varies among universities. Radboud University Nijmegen and TU Eindhoven for instance, who score already low on the %OA articles based on the CWTS figures, score even lower considering their self reported article output in Narcis. In those instances where the %OA(Narcis) is higher than the %OA(CWTS) there is an underestimation of the actual article output registration of metadata deposited in Narcis.

The minimum %OA coverage of reported in the third percentage column is the best estimate for OA coverage for universities in the Netherlands based on OA articles reported in Narcis. VU Amsterdam, RU Groningen and TU Delft are the most successful in making their article output available in OA. The reported coverage lies clearly above the 20% of OA reported for most institutions without mandated OA policies (Harnad, 2009) Twente University, Utrecht University, Tilburg University, Wageningen UR and UvA are performing around the average of 22%, this percentage is in line with the figure of %OA for universities without mandated OA policies. Whereas Erasmus Rotterdam, RU Nijmegen, TU Eindhoven, Leiden university and Maastricht university are under performing in this respect. It remains a question whether OA article numbers reported by Narcis are actually correct, or wether in the case of Radboud and TU Eindhoven, the total article output reported in Narcis are correct. It is possible that the document types actually include more than only peer reviewed scholarly articles.

Despite having signed the Berlin OA declaration by all Dutch universities, this has resulted only in a few universities with substantial higher shares of OA peer reviewed articles than is to be expected on the basis of a “normal” publication output which results in about 20% articles published in OA. For the universities where I arrive at even lower %OA articles we have to wonder whether Narcis actually harvest and reports all the universities output.

Another valuable approach is to concentrate on the grey literature are Wageningen UR does. But for this type of documents it is even more difficult to arrive at a share of OA coverage. This can only be established by the institutions themselves since it can be doubted whether all institutions have their output registration complete.

Lessons to be learned

  • Narcis could and should improve the type reporting as performed in this report. They should produce overviews like this preferable twice a year.
  • Narcis should look into some of the obsolete document types to reduce the wild array of documents (are technical documentation different from reports?, student theses and master theses are probably not the type of research output to be registered in Narcis)
  • Institution should look at the document types deposited in Narcis as well.
  • The role of Narcis and the importance of OA could be improved if VSNU and Narcis (KNAW) make Narcis the standard reporting tool for research output registration in the Netherlands (The VSNU should abandon the ridiculous reporting years and use the publication years in their reports instead)
  • Universities should use metis (or a comparable CRIS) to upload all the metadata of the institutional output to Narcis.
  • Having comprehensive output registration, makes the minimum goal of at least 20% in OA better attainable since you are not depended on actual article submission by the authors, but based on Sherpa/Romeo and DOAJ OA versions can be chased down.
  • Mandates such as those in Rotterdam, announced at the beginning of 2011, have no effect whatsoever if there is no actual stick behind the policy

References
Harnad, S. (2009) Waking OA’s Slumbering Giant: Why Locus-of-Deposit Matters for Open Access and Open Access Mandates. http://openaccess.eprints.org/index.php?/archives/522-Waking-OAs-Slumbering-Giant-Why-Locus-of-Deposit-Matters-for-Open-Access-and-Open-Access-Mandates.html
Jager, C.-J., J. Veldkamp, D. Aksnes, R. te Velde & P. den Hertog (2011). Wetenschaps-, Technologie & Innovatie Indicatoren 2011. Utrecht, Dialogic innovatie ● interactie http://www.rijksoverheid.nl/documenten-en-publicaties/rapporten/2011/11/15/wetenschaps-technologie-innovatie-indicatoren-2011.html.
Westrienen, G. van & C. A. Lynch (2005). Academic Institutional Repositories: Deployment Status in 13 Nations as of Mid 2005 D-Lib magazine, 11(9) http://www.dlib.org/dlib/september05/westrienen/09westrienen.html

Allow me to introduce to you

A fellow Dutch library blogger just started a new library blog in English. Jan Klerk just started a new library blog called “Biebzone beta“. In his daily life Jan is a manager at the Public library in Haarlem. He has build himself a nice reputation over the last couple of years as a thoughtfull library blogger at his other blog Jan Tweepuntnul (2.0 that is).

A quote from his current post illustrates this thoughtfullness perhaps a little:

It’s all about argument and counterargument. It’s about listening carefully and reading and writing carefully. 

I really appreciate his step to present some more of the wheelings and dealings op Dutch public libraries to a larger (international) audience. In this wat the rest of the world can have a closer look at (public) library developments in the Netherlands.