Document types in databases

Currently I am reading with a great deal of interest the Arxiv preprint of the review by Ludo Waltman (2015) from CWTS on bibliometric indicators. In this post I want to provide a brief comment on his section 5.1 where he discusses the role document types in bibliometric analyses. Ludo mainly reviews and comments on the inclusion or exclusion of certain document types in bibliometric analyses, he does not touch upon the subject of discrepancies between databases. I want to argue that he could take his review a step further in this area.

Web of Science and Scopus, differ quite a bit from each other on how they assign document types. If you don’t realize that this discrepancy exists, you can draw wrong conclusions when bibliometric analyses between these databases are studied.

This blogpost is a quick illustration that this is an issue that should be adressed in a review like this. To illustrate my argument I looked to the document types assigned to Nature publications from 2014 in Web of Science and Scopus. The following tables gives an overview of the results:

Document Types Assigned to 2014 Nature Publications in Web of Science and Scopus
WoS Document type # Publications Scopus Document type # Publications
Editorial Material 833 Editorial 244
Article 828 Article 1064
Article in Press 56
News item 371
Letter 272 Letter 253
Correction 109 Erratum 97
Book Review 102
Review 34 Review 51
Biographical Item 13
Reprint 3
Note 600
Short Survey 257
Total 2565 2622

In the first place Scopus yields for the year 2014 a few more publications for Nature than Web of Science does. The difference can be explained by the articles in press that are still present in the Scopus search results. This probably still requires maintenance from Scopus and should be corrected.

More importantly WoS assigns 833 publications as “editorial material” whereas Scopus assigns only 244 publications as “editorial”. It is a well known tactic from journals such as Nature to assign articles as editorial material, since this practice artificially boosts their impact factor. I have had many arguments with professors whose invited news and views items (most often very well cited!) were not included in bibliometric analyses since they were assigned to “editorial material” category and therefore not included in the analysis.

“Letters”, “corrections” or “errata” are in the same order of size between Scopus and Web of Science. “News Items” are a category of publications in Web of Science, but not in Scopus. They are probably listed as “note” in Scopus. Some of the “short surveys” in Scopus turn up in Web of Science as “news item”. But all these categories probably don’t affect bibliometric analyses too much.

The discrepancy in “reviews” between Web of Science and Scopus however is important. And large as well. Web of Science assigns 34 articles as a “review”, whereas Scopus counts 51 “reviews” in the same journal over the same period. Reviews are included in bibliometric analyses, and since the attract relatively more citations than oridinary articles, special baselines are construed for this document type. But comparisons between these databases are foremost affected by differences in document assignation between these databases.

The differences in editorial material, articles and reviews between Web of Science and Scopus are most likely to the affect outcomes of comparisons in bibliometric analyses between these two databases. But I am not sure about the size of this effect. I would love to see some more quatitative studies in the bibliometrics arena to investigate this issue.

 

References

Waltman, Ludo (2015). A review of the literature on citation impact indicators. http://arxiv.org/abs/1507.02099

Overview of Open Access journals resources

The ISSN register recently launched a new resource: ROAD, Directory of Open Access scholarly Resources. It is an attempt to describe various Open Access resources. Journals, of course. Besides the journals they describe serials, book series and conference proceedings, but also repositories. The latter was new to me that databases could get an ISSN as well. They have not come very far with their inventory of repositories. Currently they have only indexed 172 Open Access repositories. As can be expected the ROAD directory is far more comprehensive for Open Access journals, currently indexing 7194 Open Access journals and a mere 68 conference proceedings. Book series are not yet included but apparently they will follow in 2014.

The effort of the ISSN organisation to index Open Access repositories is in stark contrast with OpenDOAR which has registered 2582 Open Access repositories worldwide and the Registry of Open Access Repositories with 3585 repositories.

For a comparison of the various initiatives to build and maintain databases of Open Access journals the following databases deserve special mention:

Directory of Open Access Journals (DOAJ)
Probably the best know collection of Open Access journals. Currently a collection of 9804 free full text, peer reviewed Open Access Journals are described. More than 5636 journals are searchable at the article level on the standard bibliographic metadata of the articles.

Livre!
Livre! is the a journal portal from Brazilian origin, it covers more than 5916 scientific journals, magazines, bulletins and newsletters but you can easily limit the selections to peer reviewed scientific journals.

Jan Szczepanski’s lists of OA-journals
Jan Szczepanski, a librarian at Göteborg University, has collected links and information on Open Access journals for years. His lists contain over 22,000 current OA-journals (end 2013). Het estimates that about 10% of the links in this anthology are dead, but the metadata provided make it possible to find the journal with web search engines or in the Internet archive.

The Elektronische Zeitschriftenbibliothek EZB (Electronic Journals Library)
Covers some 44,000 OA journals. The collection is therefore one of the most comprehensive free journal collections. Just select only the “green” journals and you can browse or search through this impressive collection. The collection covers more than only peer reviewed scholarly journals. Unfortunately you can’t filter out peer reviewed yournals only. You can filter journals by some 41 subject areas.

Walt Crawford’s overview of early E-zines
In Cites & Insight 6(12) Walt Crawford provides an overview of early OA Journals “They weren’t generally called Open Access journals in 1995: If that term existed before 2001 or 2002, it certainly wasn’t the standard name for free online scholarship. But there were examples of free online scholarship, some dating back to 1987.”

I had some doubt whether to include Highwire Press as well. They do list journals from various publishers, but the majority are Toll Access journals, and most of those in Open Access, are delayed open access. Free content as they call it. So it doesn’t fit this collection.

Not a list of journals, but highly suspicious Open Access publishers, is Beall’s list. Most of the resources listed in this post include journals uncritically. Beall’s list is a useful resource to counter some of the Open Access positivism.

Academic search engine optimization: for publishers

A few weeks ago my eye caught a tweet on the subject of academic search engine optimization

The nicely styled PDF referred to in the tweet  was from Wiley. Wiley has been quite active in this area. In my book mark list I have somewhere the link to their webpage on optimizing your research articles for search engines (SEO), somewhere tucked away on their author services section. And a link to the article “Search engine optimization and your journal article: Do you want the bad news first?” on their Exchange blog. Wiley is not the only publishers dealing with this subject, here is an example on academic search engine optimization from Elsevier and another example from Sage. I bet there are other examples from publishers to be found.

The major advise is to use the right keywords. Use these keywords in your title, and repeat them throughout your abstract. Contextually repeated as they say. Do mention some synonyms for those keywords as well and please do make use of the key words fields in the article as well.  They emphasize to use Google Trends or Google Adwords to find the right keywords, but that is ill-advised for academic search engine optimization in my opinion. When selecting keywords for academic search engine optimization it is better to use keyword systems, ontologies or thesauri from you subject area, because experienced researchers will use this terminology to search for their information as well. So in the biomedical area it is obvious to consult the mesh browser, but when you are in the agriculture or ecology field of research the CAB thesaurus is the first choice for selecting the appropriate keywords. The Wiley SEO tips ends with  the advise to consistent with your own name (and affiliation, your lab deserves to be named properly as well), and don’t forget to cite your previous work.

The role of the editors in Academic Search Engine Optimization

In their short PDF the Wiley team mentions to use headings as well “Headings for the various sections of your article tip off search engines to the structure and content of your article.  Incorporate your keywords and phrases in these headings wherever it’s appropriate.”  A nice suggestion but in practice this is hardly ever in the hands of the individual author. Scholarly articles tend to have a rather fixed structure. The IMRAD structure, Introduction, Methods, Results and Discussion being the most common. In such a case the author has no space to add headings in the right position in their paper. But research by  Hamrick et al. showed that papers with callouts, tend to have higher number of citations. A “callout” is a phrase or sentence from the paper, perhaps paraphrased, that is displayed prominently in a larger font. The journal which they investigated abandoned the practice to use callouts, but after their article this practice was reinstated again. A decision like that, is an editorial decision. And it is recommended for all journals to help the readers with pointers in the form of callouts, and benefit from the affects it can have as academic search engine optimization as well. My favourite Wiley journal, JASIST, certainly doesn’t make systematic use of callouts.

The other topic on which the editorial board has an important say is the layout of the reference lists in their journals. I have pleaded many times before for a reduced number of specifications of reference lists. It looks like the first task an editorial board of a newly established journal embarks upon is,  is to formulate yet another exotic variation of the many different styles specifying the layout of the reference list. The point however, that these definitions hardly make use of the possibilities of academic search engine optimization, or search engine optimization whatsoever, most often they forget to include linking options in the reference list altogether. Older instructions to authors have not caught up with the present time yet. In the html version of the scholarly articles links are included as part of the journal platform software, but in the PDF versions of the articles the URLs are often forgotten altogether. Where DOIs are linkable in the webpage, in most instances DOIs in the PDF version are most often presented in the form of  doi:10.1002/asi/etc. It is even explicitly stipulated in the APA style and many others to reference a DOI as doi: which goes against the advice of the DOI governing body. These bad practices results in the fact that DOI’s included in the PDF versions of the reference list don’t link. Which is a complete and utter waste of SEO opportunity. So academic search engine optimization is badly broken in this area.

The role of publishers in Academic Search Engine Optimization

Publishers have their role in supporting the editorial boards in resolving the two previously mentioned items. But they should also have a careful look into the PDF files they produce at the moment as well. At this moment the Google Webmaster has only a few pointers to PDF optimization. To mention a few interesting ones: Links should be included in the PDF (this means again DOIs as links rather than doi: statements) since they are treated as ordinary links.  And the last point is important as well “How can I influence the title shown in search results for my PDF document” The title attribute in the PDF is used! And the anchor text. On publishers site this is most often “PDF”. If they only would use the title as anchor text on their website it would work in their advantage. Albeit not mentioned on the Google webmaster blogpost, since it is probably too obvious, if the file had only the name of the title it certainly would help the SEO for the PDF, and it would help all those scientists who download all the PDF files for their research to sort out what file is what about. Was 123456.pdf about the genetics or genomes, or was that in 234567.pdf? Clear titles would help both researchers as well as search engines to work out what it is all about.

And whilst publishers are on the subject of PDF optimization they might as well complete the other attributes for PDF files, such as authors, keywords and summary. If it is not now, another search engine might make use of those attributes another day. You might as well be prepared.  Researchers, using reference management tools, can also benefit from those metadata attributes. Ross Mounce has some interesting blogposts about the researchers need for good metadata in PDFs.  Theoretically a little effort since all that metadata is in the databases already, so make use of it to optimize your PDFs for academic search engine optimization or service to your most loyal users who have so far put up with a load of bad PDFs.

References

Hamrick, T. A., R. D. Fricker, and G. G. Brown. 2010. Assessing what distinguishes highly cited from less-cited papers published in Interfaces. Interfaces, 40(6): 454-464. http://dx.doi.org/10.1287/inte.1100.0527. OA version:http://faculty.nps.edu/tahamric/docs/citations%20paper.pdf

Related: Google and the academic Deep Web

The week in review – week 4

The week in review, a new attempt to get some life back into this weblog. It is inspired of course (for the Dutch readers) on TWIT The Week In Tweets by colleague @UBABert and the older monthly overviews which Deet’jes used to do on Dymphie.com

The new Web of Science interface
Whilst I was in Kenya the previous week to give training for PhD students and staff at Kenyatta University and the University of Nairobi, Thomson Reuters released their new version of the Web of Science. So only this week I had a first go at it. We haven’t been connected to Google Scholar yet, still waiting to see that come through, but in general the new interface is an improvement over the old one. Albeit, searching for authors is still broken for those who haven’t claimed their ResearcherID. But apart from that, what I hadn’t noticed in the demo versions of the new interface is the new Open Access facet in Web of Science. I like it. But immediately the question arises how do they do it jumps to my mind. The is no information in the help files on this new possibility. So my first guess would be the DOAJ list of journals. Through a message on the Sigmetrics list a little more confusion was added, since various PLoS journals are included in their ‘Open Access Journal Title List’, but for PLoS ONE. Actual searches in Web of Science quickly illustrate that for almost any topic in the past view years PLoS ONE is the largest OA journal responsible for content within this Open Access facet. I guess this new facet in Web of Science will spark some more research in the near future. I see the practical approach of Web of Science as a first step in the right direction. The next challenge is of course to indicate the individual Open Access articles in hybrid journals. Followed by -and this will be a real challenge- green archived copies of Toll Access articles. The latter is badly needed since we can’t rely only on Google Scholar to do this for us.

Altmetrics
Two interesting articles in the unfolding field of Altmetrics deserve mention. The groups of Judit Barr-Ilan and Mike Thelwall cooperated in “Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics” . They show that Research Blogging is a good post peer review blogging platform able to pick the better cited articles. However, the number of articles covered by the platform is really too small to be meaningful to become a widely used altmetric indicator.
The other article, at the moment still a working paper, was from CWTS (Costas et al. 2014). They combined Web of Science covered articles with the Altmetric.com indicators and investigated many different Altmetric indicators such as as mentions on Facebook walls, Blogs, Twitter, Google+ and News outlets but not Mendeley. Twitter is by far the most abundant Altmetric source in this study, but blogs are in a better position to identify top publications. However the main problem remains the limited coverage by the various altmetrics tools. For 2012 24% of the publications had an altmetric mention, but already 26% of the publications had scored already a citations. Thus confirming the other study that coverage of the peer reviewed scholarly output is only covered on a limited scale by social media tools.

Scholarly Communication
As a follow up on my previous post on the five stars of transparent pre-publication peer review, a few articles on peer review came to my attention. The first was, yet another, excellent bibliography by Charles W. Bailey Jr. on transforming peer review. He did not cover blogposts, only peer reviewed journals. The contributions to this field are published in many different journals, so an overview like this still has its merits.
Through a tweet from @Mfenner

I was notified on a really interesting book ‘Opening Science‘. It is still lacking a chapter on changes in the peer review system, but it is really strong at indicating new trends in Scholarly Communication and Publishing. Worth further perusing. Rankings Although the ranking season has not started yet. The rankers are always keen of putting old wine in new bags. The Times Higher Education presented this week the 25 most international universities in the world. It is based the THE WUR, released last year, this time only focusing on the ‘international outlook indicator’only which accounts for 7.5% of their standard ranking. Of the Dutch universities Maastricht does well. Despite the fact that Wageningen university host students from more than 150 countries, we only ranked 45th on this indicator. More interesting was an article of Alter and Reback (2014) where they show that rankings actually influence the number of freshman applying for a college in the United States as well as the fact that quality of college life plays an important factor as well. So it makes sense for universities to invest in campus facilities and recreation possibilities such as sports grounds etc. Random notes A study on copy rights, database rights and IPR in Europe for Europeana by Guibault. Too much to read at once, and far too difficult to comprehend at once. But essential reading for repository managers.

 

Resources
Alter, M., and R. Reback. 2014. True for Your School? How Changing Reputations Alter Demand for Selective U.S. Colleges. Educational Evaluation and Policy Analysis. http://dx.doi.org/10.3102/0162373713517934 (Free access)
Bailey Jr., C. W. 2014. Transforming Peer Review Bibliography. Available from http://digital-scholarship.org/tpr/tpr.htm
Binfield, P. 2014. Novel Scholarly Journal Concepts. In: Opening Science, edited by Sönke Bartling and Sascha Friesike, 155-163. Springer International Publishing. http://dx.doi.org/10.1007/978-3-319-00026-8_10. OA version: http://book.openingscience.org/tools/novel_scholarly_journal_concepts.html
Costas, R., Z. Zahedi, and P. Wouters. 2014. Do ‘altmetrics’ correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective. CWTS Working Paper Series Vol. CWTS-WP-2014-001. Leiden: CWTS. 30 pp. http://www.cwts.nl/pdf/CWTS-WP-2014-001.pdf
Guibault, L., and A. Wiebe. 2013. Safe to be open : Study on the protection of research data and recommendation for access and usage. Göttingen: Universitätsverlag Göttingen 167 pp. http://webdoc.sub.gwdg.de/univerlag/2013/legalstudy.pdf
Shema, H., J. Bar-Ilan, and M. Thelwall. 2014. Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics. Journal of the Association for Information Science and Technology: n/a-n/a. http://dx.doi.org/10.1002/asi.23037. OA version: http://www.scit.wlv.ac.uk/~cm1993/papers/blogCitations.pdf

The unofficial guide for authors

Recently I co-authored a book on scientific publishing. It is available from LuLu for less than € 6,-. When that’s too much for you, you can download it for free. The book is published under a CC-BY-NC licence.

From the cover:

Most scientific journals provide guidelines for authors – how to format references and prepare artwork, how many copies of the paper to submit and to which address. However, most official guidelines say little about how you should design and produce your paper and the chances that it will be accepted. This book provides a comprehensive but focused guide to producing scientific information – from research design to publication. It provides practical tips and answers to some of the most frequently asked questions: Why do we publish in the first place? What is OA publishing and why bother about it? What is the h-index? What is a Journal Impact Factor and does it matter? How can I increase my research production efficiency? Why should I use OS software tools for academic work? How can I produce graphics that will impress? How can I brainstorm good titles? How can I select a suitable journal and where can I find out more about it? How can I get into the reviewers’ heads?