Tag Archive for 'Open Access'

A census of Open Access repositories in the Netherlands

Open Access receives a lot of attention in the Netherlands. All universities have formulated OA policies explicitly, signed the Berlin OA declaration. Erasmus University Rotterdam Stipulcated a mandated OA policy for its researchers. All Dutch universities have repositories in place and there is an overarching repository, narcis.nl, which harvest the repositories of all universities and major research institutions. The UNESCO Global Open Access Portal (GOAP) reported last year “Netherlands has a strong OA awareness and an active promotion of open access through institutional mandates, establishment of OA repositories, OA publishing agreements. SURFfoundation, a Dutch programme for information and communication technology innovation focuses on Open Access and it is the Dutch partner in Knowledge Exchange along with DFG (Germany), DEFF (Denmark) and JISC (UK)”. In 2011 some milestones were celebrated, the 250,000 publication was harvested by Narcis, and Wageningen UR deposited its 30,000th publication in Narcis by which it became the largest depositing institution in Narcis .

Despite some early assessments (van Westrienen & Lynch, 2005) no recent analyses on the actual deposit rates by Dutch universities have been made. Let alone a systematic analysis of trends in depositing rates. In this blogpost I want to give a status update of deposits in Open Access repositories in the Netherlands, concentrating on the regular Dutch universities. I hope to follow this up next year to give insight into actual deposit rates.

Data collection
Narcis was used as overarching repository for all OA publications from the Netherlands. Narcis facilitates to estimate deposits per institution, document type and publication year in a uniform and efficient way for 27 repositories in the Netterlands. Data was collected from Narcis in the period December 27th 2011 to January 2nd 2012, during that week no additional deposits to Narcis were made. The total number of deposits in Narcis during that week was 270,519 items, and did not change during the period while retrieving the data.

Results
As mentioned under data collection an impressive number of 270,519 deposits have been harvested by Narcis from the 27 OA repositories in the Netherlands. In the following graph the distribution of total deposits over the 27 repositories in the Netherlands is shown.
Total deposits in Narcis 2011
The smallest repository is the Theological University of Kampen with only 4 deposits and the largest Wageningen University with 30,704 deposits. The 13 regular universities in the Netherlands have the largest repositories as measured in Narcis. NWO with 10,179 deposited items is the largest repository of the group of none universities (this group includes the Open University). The NWO repository is just a fraction smaller than the repository of Radboud university Nijmegen. Also indicated in the graph is the recency of the deposits. The share of deposits from recent (since 2006) publication years is indicated in red, whereas the blue part of the bars represents the deposits from the older (pre 2006) publication years. Of the regular universities Wageningen UR and the VU university have the largest share recent deposits, whereas TU Eindhoven and Tilburg University have the largest share of older publications.

The next graph looks into more detail in the deposits of the most recent publication years of the 13 Dutch universities. The deposits per publication year for the period 2006-2011 are depicted. In all cases deposits from the publication year 2011 trailed behind, which doesn’t come as a surprise. In a few cases however I observe clear negative trends in the number of deposits made during the period 2006-2011. This is clearly the case for the universities of Groningen, Leiden, Maastricht and Utrecht.
OA deposits in narcis by publication year 2006-2011
The trend in deposits per publication year is more or less stable in Nijmegen and Twente. For the universities of Rotterdam, Delft, Eindhoven, University of Amsterdam, Tilburg, VU Amsterdam and Wageningen UR an increasing trend in deposits is observed. The VU Amsterdam shows a clear outlier in number of deposits for publication year 2009. About half of the universities have more than 1000 deposits per publication year. Rotterdam, Nijmegen, Eindhoven, Leiden, Maastricht and Tilburg are lagging behind in this respect. Wageningen UR has more than double the number of deposits per publication year compared to any other university.

Yearly trends SI
By far most of the smaller institutions have less than 100 open access deposits per publication year. NWO, NIVEL, KNAW and the Open University have on average between the 100 and 300 open access deposits per publication year. It is interesting to note that the deposits for publication year 2011 are more in line with the preceding publication years than for the general universities. An indication that it appears easier to manage the publication output for smaller institutions.

In the next graph I actually looked to the document type breakdown of deposits for the period 2006-2011 for the regular universities. In the first place it should be noted that there exists a large range of document types in Narcis. Some of these document types seem superfluous. The difference between Student thesis and Master thesis is entirely unclear, and technical documentation versus reports is another example. Narcis should look into this matter and some universities should clean up their document formats as well. Having said that, most universities have three major types of open access publications: articles, reports and PhD theses.
OA desposits Pub type
The VU university excels at OA article deposits over the last six years, followed by Groningen and Utrecht. Wageningen UR excels at depositing reports, followed at quite some distance by TU Eindhoven and the UvA. For the PhD theses, Utrecht has the lead, followed by the VU and Delft. OA PhD theses are an important source of material since they consists in most cases of a chapters which are preprints of articles to be published at a later date. Erasmus University Rotterdam, Maastricht and Tilburg are the universities with the largest share of working papers. Wageningen UR has a very large share of contributions to periodicals. This is a group of publications that have hardly any deposits at other universities. Looking at the overall picture Wageningen UR clearly stands out as a results of the large share of reports and contributions to periodicals. On top of that they have the largest share of conference papers as well. It can easily be argued that Wageningen UR, of all repositories in the Netherlands excels at disseminating grey literature by means of their open access repository Wageningen Yield.

At this moment there aren’t comparative repository usage statistics in the Netherlands, but the early trial results indicate that repositories with more recent content also get more article downloads. To draw firm conclusions on the trial implementation of SURE2 is a bit too early.

The share of OA in NL
The absolute numbers of OA deposits themselves are not so meaningful as long as they are not related to the actual scientific output of the institutions. Although we have the current set of figures on OA deposits as measured through Narcis in the Netherlands, the share of OA in total institutional output is a difficult figure to establish. A few institutions deposit metadata records of all their publications to Narcis, but other institutions limit themselves to OA deposits only. Whereas a third group deposits only a subset of all their publications metadata to Narcis. To arrive at figures for the full publication output we have to consult other sources. The VSNU would be an obvious source, but the disadvantage of these figures is that they are based on reporting years rather than publication years (a rather odd approach). A point in case are the PhD theses output reported by the VSNU compared to the OA theses reported in Narcis over the period 2006-2010 in the following table.

University

VSNU

OA (narcis)

coverage

    Erasmus University Rotterdam

1524

993

65%

    RU Nijmegen

2266

1992

88%

    RU Groningen

1690

1082

64%

    TU Delft

1319

1079

82%

    TU Eindhoven

900

776

86%

    University Leiden

1791

919

51%

    University Maastricht

1367

1542

113%

    University Twente

1321

1077

82%

    University Utrecht

455

333

73%

    University van Amsterdam

1276

1297

102%

    University van Tilburg

896

790

88%

    Vrije University Amsterdam

878

772

88%

    Wageningen UR

1075

1032

96%

At Maastricht University and UvA there were actually more theses deposited in NARCIS over the period 2006-2010 than reported to the VSNU. For actual years the fluctuations can be quite extensive, but over a period of consecutive years the fluctuations become smaller. Apparently all theses defended at Maastricht and the UvA are available in OA. Wageningen follows closely with 96%, whereas Radboud University Nijmegen, TU Delft, TU Eindhoven, Twente University, Tilburg University and VU Amsterdam follow with percentages of OA PhD theses in the 80%. Erasmus University, RU Groningen University of Leiden and Utrecht University are lagging behind in depositing their PhD theses in OA.

Coverage of OA article ouput
For an actual estimate of articles produced per institution multiple sources exist. The VSNU figures based on reporting years are useless in this respect. The databases Scopus or Web of Science (WoS) could be used to estimate the actual article output per university, but to disambiguate all the name variations of the universities (and their institutes or hospitals) is a cumbersome task. In this respect Scopus actually performs better than WoS. However other sources based on either WoS or Scopus have already carried out this disambiguation. The reports by CWTS for example are useful in this matter. The most recentWTI2 report (Jager et al. 2011) (the successor of the NOWT reports) gives figures for the publication output of Dutch universities for the period 2007-2010 (table 30, p. 48) that have been disambiguated by CWTS. These figures are derived from Web of Science and underestimate the actual peer reviewed article output. For a life sciences university as Wageningen UR some 70% of the actual article output is published in journals covered by WoS and included in the TWI2 report. For broad, general universities with more social sciences and humanities this percentage is expected to be lower. For Tilburg this figures appears to be only 30%, whereas for Nijmegen this seems to be 51% and for TU Eindhoven 67%.

In table 2 the total number of articles for the period 2007-2010 reported in Narcis, the total number of articles according to CWTS (TWI2 report, Jager et al. (2011)) and the actual OA articles reported in Narcis are presented. The percentage OA coverage is calculated in two ways. In the first place we look at the %OA(CWTS) by comparing the OA articles in Narics to the articles reported by CWTS. In the second place we look at the total number of articles reported in Narcis compared to the OA articles reported in Narcis. In the third percentage column we look the minimum value of both methods. The last column is probably the best estimate of %OA coverage per institution.

Table 2, total articles per university for the period 2007-2010 reported in NARCIS and TWI2 and %OA coverage based on comparison with CWTS figures and total articles registered in Narcis

University

Articles

In Narics

Articles by

CWTS

OA

articles

%OA

(CWTS)

%OA

(Narcis)

Minimum

%OA coverage

    Erasmus University Rotterdam

1072

10663

1072

10%

100%

10%

    Radboud University Nijmegen

19803

10126

1189

12%

6%

6%

    RU Groningen

4067

10461

4067

39%

100%

39%

    TU Delft

2150

6521

2145

33%

100%

33%

    TU Eindhoven

7041

4732

520

11%

7%

7%

    University Leiden

730

10616

730

7%

100%

7%

    University Maastricht

519

7086

482

7%

93%

7%

    University Twente

3665

3740

880

24%

24%

24%

    University Utrecht

4803

15243

3039

20%

63%

20%

    University van Amsterdam

16191

13030

2727

21%

17%

17%

    University van Tilburg

5791

1782

1285

72%

22%

22%

    VU Amsterdam

5354

10912

4410

40%

82%

40%

    Wageningen UR

10572

7419

2479

33%

23%

23%

    Aggregate

81758

112331

25025

22%

31%

22%

Comparing the OA articles in NARCIS for the period 2007-2010 with the figures from CWTS report results in a very favourable figure of 72% of the articles available in OA at Tilburg university. This favourable figure is largely due to the under estimation of Tilburg University article output based on articles covered in WoS journals only. VU Amsterdam is the next highest (40%) %OA articles based on the CWTS figures, followed closely by Groningen (39%). The aggregate figure for all universities in the Netherlands is 22% of the articles are OA based on WoS estimates of article output. Since WoS under estimates the actual article output it is useful to look at the total number of articles in Narcis as well.

Compared to the self deposited articles in Narcis, Erasmus University Rotterdam, RU Groningen, TU Delft and Leiden University only deposit OA articles in Narcis whereas the other universities also deposit metadata for none OA articles. However, coverage of this share of publications varies among universities. Radboud University Nijmegen and TU Eindhoven for instance, who score already low on the %OA articles based on the CWTS figures, score even lower considering their self reported article output in Narcis. In those instances where the %OA(Narcis) is higher than the %OA(CWTS) there is an underestimation of the actual article output registration of metadata deposited in Narcis.

The minimum %OA coverage of reported in the third percentage column is the best estimate for OA coverage for universities in the Netherlands based on OA articles reported in Narcis. VU Amsterdam, RU Groningen and TU Delft are the most successful in making their article output available in OA. The reported coverage lies clearly above the 20% of OA reported for most institutions without mandated OA policies (Harnad, 2009) Twente University, Utrecht University, Tilburg University, Wageningen UR and UvA are performing around the average of 22%, this percentage is in line with the figure of %OA for universities without mandated OA policies. Whereas Erasmus Rotterdam, RU Nijmegen, TU Eindhoven, Leiden university and Maastricht university are under performing in this respect. It remains a question whether OA article numbers reported by Narcis are actually correct, or wether in the case of Radboud and TU Eindhoven, the total article output reported in Narcis are correct. It is possible that the document types actually include more than only peer reviewed scholarly articles.

Despite having signed the Berlin OA declaration by all Dutch universities, this has resulted only in a few universities with substantial higher shares of OA peer reviewed articles than is to be expected on the basis of a “normal” publication output which results in about 20% articles published in OA. For the universities where I arrive at even lower %OA articles we have to wonder whether Narcis actually harvest and reports all the universities output.

Another valuable approach is to concentrate on the grey literature are Wageningen UR does. But for this type of documents it is even more difficult to arrive at a share of OA coverage. This can only be established by the institutions themselves since it can be doubted whether all institutions have their output registration complete.

Lessons to be learned

  • Narcis could and should improve the type reporting as performed in this report. They should produce overviews like this preferable twice a year.
  • Narcis should look into some of the obsolete document types to reduce the wild array of documents (are technical documentation different from reports?, student theses and master theses are probably not the type of research output to be registered in Narcis)
  • Institution should look at the document types deposited in Narcis as well.
  • The role of Narcis and the importance of OA could be improved if VSNU and Narcis (KNAW) make Narcis the standard reporting tool for research output registration in the Netherlands (The VSNU should abandon the ridiculous reporting years and use the publication years in their reports instead)
  • Universities should use metis (or a comparable CRIS) to upload all the metadata of the institutional output to Narcis.
  • Having comprehensive output registration, makes the minimum goal of at least 20% in OA better attainable since you are not depended on actual article submission by the authors, but based on Sherpa/Romeo and DOAJ OA versions can be chased down.
  • Mandates such as those in Rotterdam, announced at the beginning of 2011, have no effect whatsoever if there is no actual stick behind the policy

References
Harnad, S. (2009) Waking OA’s Slumbering Giant: Why Locus-of-Deposit Matters for Open Access and Open Access Mandates. http://openaccess.eprints.org/index.php?/archives/522-Waking-OAs-Slumbering-Giant-Why-Locus-of-Deposit-Matters-for-Open-Access-and-Open-Access-Mandates.html
Jager, C.-J., J. Veldkamp, D. Aksnes, R. te Velde & P. den Hertog (2011). Wetenschaps-, Technologie & Innovatie Indicatoren 2011. Utrecht, Dialogic innovatie ● interactie http://www.rijksoverheid.nl/documenten-en-publicaties/rapporten/2011/11/15/wetenschaps-technologie-innovatie-indicatoren-2011.html.
Westrienen, G. van & C. A. Lynch (2005). Academic Institutional Repositories: Deployment Status in 13 Nations as of Mid 2005 D-Lib magazine, 11(9) http://www.dlib.org/dlib/september05/westrienen/09westrienen.html

Journals changing publisher, but can the rights change as well?

Journal cover Animal ConservationFrom a perspective as a repository manager I do like the Cambridge University Press journals a lot. Albeit no immediate OA, after a year the author is allowed to post the publisher’s version/PDF of his article in an institutional repository. We adhere to this policy on behalf of our authors. So we post all the publications in Cambridge journal articles to our repository after this 12 month embargo period. Sounds simple and it actually is that simple.

Recently I ran a check on this policy using the DOI’s, rather than the ISSN’s we normally use, on the metadata we collect of all our researcher’s publications. The DOI string of Cambridge journal articles all start with the prefix 10.1017. I came across an article published in the journal Animal Conservation from 2004 which was not OA on our repository. Further checking this article I found out that the DOI of the article resolved to the Wiley Online library, where the article came online only in 2006, instead of the Cambridge website. Rather odd. Checking the copyright and archiving policy of this journal at the Sherpa Romeo site, they referred to the rather limited Wiley copyrights and self archiving possibilities for this journal. Sherpa Romeo implies that this is applicable for all content of this journal. I was rather disappointed.

However, that Cambridge DOI bothered me, so I checked the Cambridge site for the journal and could find the article there as well. The DOI however, resolves to the Wiley online journals site. Clearly the journal changed from publisher, that happens all the time. But on changing from publisher it appears that the authors’ copyrights changed as well. Especially since the Wiley site also hosts the complete backfile going back to Volume 1, issue 1. That the authors self archiving rights changed on change of publisher for the journal can’t be the case because they’re based on the original publishing agreement, but the Wiley site and Sherpa Romeo do imply that the Wiley copyright and self archiving policies apply to all content of the journal. That can’t be true, can it? But here I have an article hosted at two publishers websites with two very different self archiving policies.

Of course we adhere to the Cambridge self archiving policy for this article. There is therefore now a third copy copy of this article available on the Web, proudly presented in Wageningen Yield.

These are strange ways of publishers and copyrights.

How Google could help the Open Access world a little

It was back in 2008 when Google Scholar launched the feature that identified free available versions of articles of the Web. In the early days these were indicated by green triangles in front of the reference. Nowdays free available copies are listed in the right hand column. Many of these versions are Open Access versions of articles properly submitted to preprint servers and subject or institutional repositories. Other free versions of the papers identified by Google Scholar are publishers versions of articles posted to personal websites, dropboxes and you name it. Whatever the rights are, if you need a copy of these papers, and don’t have access through your universities library subscriptions, this Google Scholar feature is a very useful tool. In scholarly search classes I always stress this very useful feature of Google Scholar to my students.

In our institution’s bibliography I would love to include a functionality to refer for each article to the so called document clusters in Google Scholar. Consider the following publication the link to the full text included in the record leads you to Science Direct. Whether you can access the paper on SD, depends on the subscriptions. Sometimes you can’t. Therefore it would be nice if we could include a link to the document cluster in Google Scholar. For this paper you get some 29 versions of the paper, but above all 6 of these are free versions of this paper posted on various websites. That’s really helpful.

In AgrisWeb, I learned from Johannes Keizer yesterday, that they link to Google trough a search for the title words. This works quite well, but it could be done better.

Consider the idea that Google Scholar had an API. If we could query that API on the basis of the DOI or PMID or ISSN in combination with volume, issue and pages or any other combination of standard bibliographic metadata. Yes, something like an openURL. And GoogleScholar would only return the correct Google Scholar ID for that article -that number 12564475196117890153 in the link- we could construct various links. Linking to the Google Scholar document cluster is one. Retrieving the Google Scholar citations is another.

Google doesn’t like metadata too much is an often heard argument. But the Google Books API works swell with ISBN numbers, OCLC numbers or LOC numbers. That API is talking metadata. Libraries a massive stores of metadata. So Anurag Acharya please. The pleas for a Google Scholar API are abound. Mostly for retrieval of citations, but for the OA movement those document clusters are really more important! Perhaps you could launch this Google Scholar API as a present for the Open Access week coming up in October?

The Impact Factor of Open Access journals

In the world of Open Access publishing the golden road has received a great deal of attention. At least this is what our researchers seem to remember. Of course there are other roads to open access, but I want to present the impact factors of the journals facilitating the golden road to open access. This blogpost lists all open access journals included in DOAJ and assigned an Journal Impact Factor in the JCR 2009. The reason for this, is that our researchers see publishing in open access journals as the simplest way of achieving open access to their work, but on the other hand they are required for judgement of the citation impact that they publish in journals covered by Web of Science and therefore the Journal Citation Reports (JCR).

In the past there have been studies on citation impact of the open access journals that have actually received a journal impact factor from Thomson Reuters Scientific (formerly ISI). The first was by (McVeigh 2004) followed by (Vanouplines and Beullens 2008) (in Dutch, and not openly accessible) and recently by (Giglia 2010). These consecutive studies showed an increasing number of open access journals that received an Journal Impact Factor from Thomson Reuters. McVeigh reported 239 OA journals for the JCR 2004, Vanouplines reported 295 OA journals for the JCR 2005 and Giglia reported 385 OA journals for the JCR 2008 (there are some methodological issues that make these figures not entirely comparable).

The pitfall of these studies is that although they showed interesting figures and additional analyses, none of these studies actually published the list of open access journals that received an impact factor. The sole purpose of this blogpost is to publish this actual list. The probable reason for the previous authors is that the impact factors are proprietary information from Thomson Reuters. You are not allowed to publish these figures. On the other hand most publishers, use it in all their marketing outings for their journals. So the journal impact factor is virtually information in the public domain.

To avoid any intellectual property problems with Thomson Reuters I have included the ScimagoJR and Scopus SNIP indicator for the journals rather than the Journal Impact Factor. The correlation for this set of journals between SNIP and IF was 0.94 and between SJR and IF was 0.96. In total 619 journals from DOAJ were present in the JCR 2009 report (Science and Social Science & Humanities version deduplicated). The growth in journal coverage is due to the growth in OA journals and the significant expansion of journal coverage in 2008. On the other hand looking at the journal list of Scopus indexed journals I note that they include some 1365 journals open access journal which have a ScimagoJR or SNIP.

For the current table I matched the journal list from DOAJ downloaded on December 13th 2010, with the deduplicated list of the JCR 2009 indexed journals. This journal set of 619 journals was matched against the journal list from journalmetrics.com to include the ScimagoJR 2009 and SNIP2009 as well. For each journal the subject categories indicated by DOAJ were included. The journals were sorted alphabetically on subjects and descending IF within a subject. For the following table journals with multiple subject assignments in DOAJ were included in their different categories as well. This expanded the list to 782 lines. Finally the column with impact factors was removed, showing only the ScimagoJR and SNIP for the journals. A few journals were not assigned a ScimagoJR or SNIP, but these were assigned a Journal Impact Factor. In some cases this was due to differences in journal coverage between Scopus and Web of Science, but in a few cases this appears also the problem of different ISSN assignments by the respective databases.

Download: List of open access journals that are assigned an Impact Factor in the JCR 2009 showing their respective SNIP and ScimagoJR for 2009.

Have fun with this list

References

Giglia, E. (2010). The Impact Factor of Open Access journals: data and trends. ELPUB 2010 International Conference on Electronic Publishing, Helsinki (Finland), 16-18 June 2010. http://dhanken.shh.fi/dspace/bitstream/10227/599/72/2giglia.pdf and http://hdl.handle.net/10760/14666.

McVeigh, M.E. (2004). Open Access Journals in the ISI Citation Databases: Analysis of Impact Factors and Citation Patterns A citation study from Thomson Scientific, Thomson Scientific. http://science.thomsonreuters.com/m/pdfs/openaccesscitations2.pdf

Vanouplines, P. & R. Beullens (2008). De impact van open access tijdschriften. IK Intelectueel Kapitaal 7(5): 14-17. (In Dutch, Not OA available)

Possibly related posts
Another expansion of journal coverage by Thomson

Open Access: Just Publish

I do sincerely apologize for this boring video, a few talking heads is not the right medium to pass a message. An important message that is. But I couldn’t find any palatable alternatives on YouTube. Has nobody tried to make an attractive, short film on this subject?Anyway, a couple of bigshots from the Dutch University world passing the message on the importance of Open Access. They talk in Dutch, but this version has English sub-titles.

Self citations do work

Blogging on Peer-Reviewed ResearchIn a very extensive article van Raan has studied the effect of self citations on the total citations to a groups’ work. In the concluding paragraph van Raan writes:

[] external citations are enhanced by self-citations, so that we have the “chain reaction:” Larger size leads to more self-citations, which lead to more external citations. This mechanism is strongest for the lower impact journals—they “make size work”—as well as for higher performance groups. In other words, lower impact journals enable research groups more than do higher impact journals to “advertise” their other work by means of self-citations.

Most interesting to note about this article was that van Raan cited himself 11 times out of 28 in total. It may seem to be a bit excessive, but stresses his point excellently.

Another point that I always stress within the theme of publication strategy is to consider Open Acces publishing. Since the last few years I have noted that van Raan is publishing his articles in OA on Arxiv. His group has not (yet) demonstrated the advantage of OA publishing on citation impact scientifically yet, but the master of scientometrics is putting it into practice anyway. Something to be considered by every researcher very seriously.

Reference
van Raan, A. F. J. (2008). Self-citation as an impact-reinforcing mechanism in the science system. Journal of the American Society for Information Science and Technology, 59(10): 1631-1643. http://arxiv.org/ftp/arxiv/papers/0801/0801.0524.pdf

The changing face of Elsevier Science

The last couple of days I had the pleasure to attend the Elsevier Development Partners meeting. The exact products they are working on might be of interest to some people, but that’s up to Elsevier to announce. But what was really the big surprise at this meeting -which lasted 3 days- was the tone from Elsevier. It was all about open Science. They clearly wanted to open up. There was a lot of talk about sharing information, making mash-ups possible, Application programming Interfaces (API). Elsevier Science wanted to move away from the double barred information silo to become an open solution provider in the scholarly world. If Elsevier is thinking and acting in this direction, then change will become a major issue for the entire scientific publishing industry and that is good news for libraries who want to remain a vital service in the future as well.

This change will take time. It doesn’t happen overnight. But Raphael Sidi just announced the other day on his blog the Elsevier Article API at the programmable Web. So, Elsevier is not only talking, they are acting up on it as well.

Let other publishers follow this example!

Google and the academic Deep Web

Blogging on Peer-Reviewed ResearchHagendorn and Santelli (2008) just published an interesting article on the comprehensiveness of indexing of academic repositories by Google. This article triggers this me to write up some observations I was intending to make for quite some time already. It addresses the question I got from a colleague of mine, who observed that the deep web apparently doesn’t exist anymore.

Google has made a start to index flash files. Google has made a start to retrieve information that is hidden behind search forms on the web, i.e. started to index information contained in databases. Google and OCLC exchange information on books scanned, and those contained in Worldcat. Google so it seems has indexed the Web comprehensively with 1 trillion indexed webpages. Could there possibly be anything more to be indexed?

The article by Hagendorn and Santelli shows convincingly that Google still has not indexed all information that is contained in OAISTER, the second largest archive of open access article information. Only Scientific Commons is more comprehensive. They tested this with the Google Research API using the University Research Program for Google Search. They only checked whether the URL was present. This approach only partially reveals some information on depth of the Academic Deep Web. But those are staggering figures already. But reality bites even more.

A short while ago I taught a Web Search class for colleagues at the University Library at Leiden. For the purpose of demonstrating what the Deep or Invisible Web actually constitutes I used and example from their own repository. It is was a thesis on Cannabis from last year and deposited as one huge PDF of 14 MB. Using Google you can find the metadata record. With Google Scholar as well. However, if you try to search for a quite specific sentence on the beginning pages of the actual PDF file Google gives not the sought after thesis. You find three other PhD dissertations. Two of those defended at the same university that same day, but not the one on Cannabis.

Interestingly, you are able to find parts of the thesis in Google Scholar, eg chapter 2, chapter 3 etc. But those are the parts of the thesis contained in different chapters that have been published elsewhere in scholarly journals. Unfortunately, none of these parts in Google Scholar refers back to the original thesis that is in Open Access or have been posted as OA journal article pre-prints in the Leiden repository. In Google Scholar most of the materials is still behind toll gates at publishers websites.

Is Google to blame for this incomplete indexing of repositories? Hagendorn and Santelli point the finger to Google indeed. However, John Wilkin, a colleague of them, doesn’t agree. Just as Lorcan Dempsey didn’t. And neither do I.

I have taken an interest in the new role of librarians. We are no longer solely responsible for bringing external –documentary- resources from outside into the realm of our academic clientele. We have also the dear task of bringing the fruits of their labour as good as possible for the floodlights of the external world. Be it academic or plain lay interest. We have to bring the information out there. Open Access plays an important role in this new task. But that task doesn’t stop at making it simply available on the Web.

Making it available is only a first, essential step. Making it rank well is a second, perhaps even more important step. So as librarians we have to become SEO experts. I have mentioned this here before, as well as at my Dutch blog.

So what to do about this chosen example from the Leiden repository. Well there is actually a slew of measures that should be taken. First of course is to divide the complete thesis in parts, at chapter level. Albeit publishers give permission only to publish articles, of which most theses in the beta sciences exists in the Netherlands, when the thesis is published as a whole. On the other hand, nearly 95% of the publishers allow publication of pre-prints and peer reviewed post prints. The so called Romeo green road. So it is up to the repository managers, preferably with the consent from the PhD candidate, to tear up the thesis in its parts –the chapters, which are the pre-print or post-prints of articles- and archive the thesis on chapter level as well. This makes the record for this thesis with a number of links to far more digestible chunks of information better palatable for the search engine spiders and crawlers. The record for the thesis thus contains links to the individual chapters deposited elsewhere in the repository.

Interesting side effect of this additional effort at the repository side is that the deposit rates will increase considerably. This applies for most Universities in the Netherlands, for our collection of theses as well. Since PhD students are responsible of the lion’s share of academic research at the University, depositing the individual chapters as article preprints in the repository will be of major benefit to the OA performance university. It will require more labour at the side of repository management, but if we take this seriously it is well worth the effort.

We still have to work at the visibility of the repositories really hard, but making the information more palatable is a good start.

Reference:
Hagedorn, K. and J. Santelli (2008). Google still not indexing hidden web URLs. D-Lib Magazine 14(7/8). http://www.dlib.org/dlib/july08/hagedorn/07hagedorn.html

ELAG2008 : Can the library be a publisher ?

A presentation by Library Waaijers on open access at the university. His presentation has been used in the Dutch congress to celebrate the opening of the library in February. His presentation is therefore already available.

Leo takes the research article as an example, and explains the publishing and peer review process. In which authors normally pay with handing over their copyrights. In a newer model authors pay in cash for the review process. In brief these are the two publishing models.

The quality construct of academic journals is grounded in the impact factors. And Impact Factors are debated to say the least. On the latter he quotes Michael Mabe from Elsevier:

Extending the use of the journal impact factor from the journal to the authors of papers in the journal is highly suspect; ……[impact factors] are not a direct measure of quality and must be used with considerable care.”

He shows us the Sherpa/Romeo categorization of copyright contracts. Reasearhcers want their articles to be published in high impact journals, that have high circulation and easily reused and presented on websites and cv’s. Preservation also matter to the researchers.

According to Leo it is time to act. The publishers won’t act. Authors, research funders and policy makers are acting al have acted. In the powerpoint of Leo he mentions (and links) many of these statements.

Leo then draws a call for proposal for Wageningen University as follows.

“Annually, WUR produces N articles in (sub) discipline Y. A consortium comprising WUR, the Ministry of Agriculture, FAO, NWO wants to tender the reviewing process for these articles under the following conditions:

  1. The reviewing process must be independent, rigorous and swift.
  2. The reviewing may be anonymous, named or open (to be decided on).
  3. All N articles will pass the reviewing process.
  4. As a result of the reviewing the articles are marked 1 to 5.
  5. Articles with marks 3 to 5 are accepted for posting in the Wageningen institutional repository and for immediate open publishing in Wageningen Yield 2.0 (in WUR house style).
  6. Subsequently authors may publish their articles in any journal.
  7. In their appraisal procedures for staff and research projects members of the consortium will weigh articles with marks 3, 4 and 5 as if they were published in journals with impact factors 3, 8 and 15 respectively (figures are nominal and subject to disciplinary calibration).
  8. The national library of the Netherlands will take care of the long term curation of the accepted articles

Proposals for a three year contract should be sent to ……The allocation of the contract will be based on the best price-performance ratio.”

Really interesting, but wonder when the time is there we actually get this idea sold.

New 3TU data repository, but is it open?

The libraries of the three cooperating technical universities in the Netherlands have started a data repository for long term archiving of digital data sets. In their combined press release they state:

The world of technical science is to have its own data centre for digital data sets. The 3TU.Datacentre will ensure well-documented storage and long-term access to technical-science study data. This will guarantee the long-term availability of the Netherlands’ entire technical-science heritage.

The 3TU.Datacentre will provide storage of and continuing access to technical-science study data. After all, data sets often remain highly valuable even after a study has been completed. They may be reused in a new study or used to verify the original study. The long-term storage of test data also enables studies to be held over a long period.

A very good initiative, but I am missing out on one point. Is it open? One might expect soo, but the press release does not make a mention of this fact. In my opinion there is no use in having a repository when we don’t have open access to it. But it’s perhaps too obvious to mention.

Let’s hope so.