Academic search engine optimization: for publishers

A few weeks ago my eye caught a tweet on the subject of academic search engine optimization

The nicely styled PDF referred to in the tweet  was from Wiley. Wiley has been quite active in this area. In my book mark list I have somewhere the link to their webpage on optimizing your research articles for search engines (SEO), somewhere tucked away on their author services section. And a link to the article “Search engine optimization and your journal article: Do you want the bad news first?” on their Exchange blog. Wiley is not the only publishers dealing with this subject, here is an example on academic search engine optimization from Elsevier and another example from Sage. I bet there are other examples from publishers to be found.

The major advise is to use the right keywords. Use these keywords in your title, and repeat them throughout your abstract. Contextually repeated as they say. Do mention some synonyms for those keywords as well and please do make use of the key words fields in the article as well.  They emphasize to use Google Trends or Google Adwords to find the right keywords, but that is ill-advised for academic search engine optimization in my opinion. When selecting keywords for academic search engine optimization it is better to use keyword systems, ontologies or thesauri from you subject area, because experienced researchers will use this terminology to search for their information as well. So in the biomedical area it is obvious to consult the mesh browser, but when you are in the agriculture or ecology field of research the CAB thesaurus is the first choice for selecting the appropriate keywords. The Wiley SEO tips ends with  the advise to consistent with your own name (and affiliation, your lab deserves to be named properly as well), and don’t forget to cite your previous work.

The role of the editors in Academic Search Engine Optimization

In their short PDF the Wiley team mentions to use headings as well “Headings for the various sections of your article tip off search engines to the structure and content of your article.  Incorporate your keywords and phrases in these headings wherever it’s appropriate.”  A nice suggestion but in practice this is hardly ever in the hands of the individual author. Scholarly articles tend to have a rather fixed structure. The IMRAD structure, Introduction, Methods, Results and Discussion being the most common. In such a case the author has no space to add headings in the right position in their paper. But research by  Hamrick et al. showed that papers with callouts, tend to have higher number of citations. A “callout” is a phrase or sentence from the paper, perhaps paraphrased, that is displayed prominently in a larger font. The journal which they investigated abandoned the practice to use callouts, but after their article this practice was reinstated again. A decision like that, is an editorial decision. And it is recommended for all journals to help the readers with pointers in the form of callouts, and benefit from the affects it can have as academic search engine optimization as well. My favourite Wiley journal, JASIST, certainly doesn’t make systematic use of callouts.

The other topic on which the editorial board has an important say is the layout of the reference lists in their journals. I have pleaded many times before for a reduced number of specifications of reference lists. It looks like the first task an editorial board of a newly established journal embarks upon is,  is to formulate yet another exotic variation of the many different styles specifying the layout of the reference list. The point however, that these definitions hardly make use of the possibilities of academic search engine optimization, or search engine optimization whatsoever, most often they forget to include linking options in the reference list altogether. Older instructions to authors have not caught up with the present time yet. In the html version of the scholarly articles links are included as part of the journal platform software, but in the PDF versions of the articles the URLs are often forgotten altogether. Where DOIs are linkable in the webpage, in most instances DOIs in the PDF version are most often presented in the form of  doi:10.1002/asi/etc. It is even explicitly stipulated in the APA style and many others to reference a DOI as doi: which goes against the advice of the DOI governing body. These bad practices results in the fact that DOI’s included in the PDF versions of the reference list don’t link. Which is a complete and utter waste of SEO opportunity. So academic search engine optimization is badly broken in this area.

The role of publishers in Academic Search Engine Optimization

Publishers have their role in supporting the editorial boards in resolving the two previously mentioned items. But they should also have a careful look into the PDF files they produce at the moment as well. At this moment the Google Webmaster has only a few pointers to PDF optimization. To mention a few interesting ones: Links should be included in the PDF (this means again DOIs as links rather than doi: statements) since they are treated as ordinary links.  And the last point is important as well “How can I influence the title shown in search results for my PDF document” The title attribute in the PDF is used! And the anchor text. On publishers site this is most often “PDF”. If they only would use the title as anchor text on their website it would work in their advantage. Albeit not mentioned on the Google webmaster blogpost, since it is probably too obvious, if the file had only the name of the title it certainly would help the SEO for the PDF, and it would help all those scientists who download all the PDF files for their research to sort out what file is what about. Was 123456.pdf about the genetics or genomes, or was that in 234567.pdf? Clear titles would help both researchers as well as search engines to work out what it is all about.

And whilst publishers are on the subject of PDF optimization they might as well complete the other attributes for PDF files, such as authors, keywords and summary. If it is not now, another search engine might make use of those attributes another day. You might as well be prepared.  Researchers, using reference management tools, can also benefit from those metadata attributes. Ross Mounce has some interesting blogposts about the researchers need for good metadata in PDFs.  Theoretically a little effort since all that metadata is in the databases already, so make use of it to optimize your PDFs for academic search engine optimization or service to your most loyal users who have so far put up with a load of bad PDFs.

References

Hamrick, T. A., R. D. Fricker, and G. G. Brown. 2010. Assessing what distinguishes highly cited from less-cited papers published in Interfaces. Interfaces, 40(6): 454-464. http://dx.doi.org/10.1287/inte.1100.0527. OA version:http://faculty.nps.edu/tahamric/docs/citations%20paper.pdf

Related: Google and the academic Deep Web

How Wiley made a mess of the Synergy and InterScience integration

Two weeks ago we were forewarned that Wiley would integrate all the content of the Blackwell Synergy on Wiley InterScience platform. It would only disrupt the service of the systems over the weekend of June 28-29. When I received this notification I thought immediately about Péter’s picks&pans (2007) where he investigated the capabilities of both platforms.

Just a few quotes from his review:

A merger of the Blackwell Synergy and the Wiley Interscience collections using the software of the latter would certainly not produce Synergy. On the contrary, the serious software deficiencies om Interscience would weaken performance and functionality of Blackwell Synergy, which uses the excellent Atypon software.

[Synergy] This is a very well-designed system enhanced by complementary information – as you should expect these days.

Wiley made no efforts to improve its software. The software keeps fooling itself and the searchers by offering dysfunctional and nonsense options.

It is a severe sign of dementia when people do not recognize their own name. So is the syndrome that Wiley keeps listing some of its very own journal some of the time under the label “Cited Articles available from other publishers” and/or keeps ignoring them in the citation tracking.

In a subsequent chat with our serials librarian, he indicated that he preferred the Blackwell Synergy platform behind the scenes much more that the Wiley InterScience platform. From my own viewpoint, I regretted this move as well, since Blackwell was already Counter compliant for quite some time and the Counter reports have been audited as well, whereas Wiley Synergy was and still is not Counter compliant. That is a very serious shortcoming for one a the largest scientific publishing houses.

So users had something too loose in ease of use possibilities and librarians as well after this announcement of abandoning the Synergy platform.

What was intended to take only a mere weekend, has continued for a whole week. All Dutch university libraries faced problems with access to both Wiley and Blackwell journals. We have to sit and wait and see if the problems have been resolved during this weekend. Meanwhile I find it disappointing that Wiley makes no mention of these problems on their transition page.

Facing these problems I can only pay a compliment to Péter who foresaw what was coming up on us in March 2007 already. “A merger of the Blackwell Synergy and the Wiley Interscience collections using the software of the latter would certainly not produce Synergy”.

Reference
Jacsó́, P. (2007). SpringerLink, Blackwell Synergy, Wiley InterScience. Online(Jul/Aug 2007): 49-51. http://www.jacso.info/PDFs/jacso-springerlink-blackwell-wiley.pdf

RSS what a mess, publishers have made of it

Web 2.0 is in vogue. Library 2.0 seems even hipper.

One of the consistent examples for a good 2.0 library is the implementation of RSS feeds. RSS-ify your news items, your latest acquisitions and more. A logical extension of a RSS-ified library is a feed for each and every journal in the catalogue. Perhaps not a good idea to make them for each and every journal yourself, but as an aggregator of services the e-journals catalogue is a good place to offer them. So far so good. Where do you get them? At the publishers sites of course. That is where the pain starts. I only whish there was some logic, some coherence, some consistency in the way publishers would offer RSS feeds for new journal content.

Some examples?

American Chemical Society publishes Journal of agricultural and food chemistry the feed looks like http://pubs.acs.org/wls/alerts/rss/jafcau some illogical journal abbreviation specifies the journal. With ACS you could have expected a RSS feed based on the CODEN at least. Let alone for ISSN.

Biomed Central publishes BMC Complementary and Alternative Medicine the feeds of most BMC journals are based on the journal title but in this particular instance the feed is http://www.biomedcentral.com/bmccomplementalternmed/rss/

Blackwell publishes Ecological Entomology the feed looks like http://www.blackwell-synergy.com/action/showFeed?ui=0&mi=0&ai=wn&jc=een&type=etoc&feed=rss, the jc=een refers to the journal under investigation.

Cambridge Journals publishes Experimental Agriculture the feeds is the following … Oops. Your can’t. You get the following message: To continue this action you will need to login to CJO with your username or password. If you are a new visitor please register here.

Elsevier has a similar problem as Cambridge has. You need to be logged in to the ScienceDirect platform to subscribe to some feeds. Many feed options, yes that’s true. But simple RSS feeds on new journal content is a bit more difficult than straightforward.

Oxford has a great journal in Annals of Botany. Oxford offers a range of feeds for the journal, but the current issue feed looks as follows http://aob.oxfordjournals.org/rss/current.xml, i.e. based on some sort of journal abbreviation.

Sage publishes amongst others the Journal of information science the feed is to be found at http://jis.sagepub.com/rss

Springer is the publisher of Scientometrics which RSS feed is to be found at http://www.springerlink.com/content/101080/?sortorder=asc&export=rss where the number in the feed has no relation whatsoever to the ISSN.

Taylor & Francis has amongst others the journal Acta Agriculturae Scandinavica, Section A – Animal Sciences which feed is to be found at http://www.informaworld.com/ampp/rss~content=t713690045. Don’t be mislead, the last number is not an ISSN. The ISSN of this journal is 0906-4702 (to be found is the XML page behind the feed)

Wiley Interscience publishes the Journal of the American Society for Information Science and Technology. It’s RSS is to be found at http://www3.interscience.wiley.com/rss/journal/76501873. It looks deceptive, but the number at the end is not the ISSN of course. Those are 1532-2882 for the paper edition and 1532-2890 for the electronic form.

So many publishers, so many different RSS feeds. Hello wake up! We as libraries are their customers. We have to make clear that this is not an acceptable policy. Of course we can wait for yet another player in the information provision chain to sort it out for us. But what is needed is some simple and logic reasoning. We don’t need to invent yet another DOI system or an open URL system. A basic URL for a journal’s feed should look like this:

http://<base url>/<ISSN>/feed

Where the base url is something like the url of the publishers or aggragtors platform. Something like www.springerlink.com or www.sciencedirect.com. The ISSN is preferably the paper issn -since that is available in most catalogues. If not that an e-issn is required. And the feed should end like <feed>, wether RSS 0.92, 2.0 or Atom. Deceptively simple, yet not a publisher has thought this up.

Come on publishers agree with each other and standardize on a standard for journal content notifications.