Google better with Google

Or 14 super search tips for scientists and students. The following scholarly super search tips are an explanation for the enclosed slideshare presentation.

Google better with google

This slideshare presentation was posted a while back on WoW!ter’s slideshare, but has been updated to stay sync with this blogpost

The tips
1. Which Google do you want to use? We have a large international audience of users at our University, who normally are redirected to http://www.google.nl. However if you use http://www.google.com/ncr then you get the international version. But if you prefer your Indian version http://www.google.co.in/ncr works as well. With the /ncr you can control the regional version you are using easily.

2. Personalize your search experience. Nowadays found under the small cogwheel at the top right hand of the page or follow this link. The sections I always pay attention to is the filter option. Why should Google judge if something is fit for my eyes? Or not? I also advice to set the number of search results to 50 (but you can’t make use of Google instant search in that case) I used to use 100 results, but even I found that a wee bit too much. Lastly I always check the box to open the results in a new window (it actually opens a new tab, rather than a window), this keeps my search results window in tact whilst I browse some to the results I retrieved.

Some further personalisation would include to install the google toolbar in your browser, or even a step more in the personalization of the search experience is to make use of iGoogle.

3. There is more than 1 Google. Many people are only using the standard Google web search engine. But for academics, Google Scholar, Google book search, Google patents are certainly specific interfaces that should be part of the searchers trick of the trades.

4. Google universal. Nowadays, Google has realized that the many different search interfaces cause a problem for the users as well and therefore they have introduced the universal search engine results page with a lot of specific options on the left hand side of the results. However a suggestion to use Google Scholar is not included.

5. Learn from the advanced search interface. All Google search interfaces have an advanced search option. Use these options to see what the possibilities of the specific search interface are, and learn how you can make use of these advanced search operators in the normal search interface. When you make use of the advanced search options in Google Scholar you see an option to search for a specific author which translates in the Scholar search box as [nitrogen fixation author:”K E Giller”]

6. Be specific or search with more than 1 term In the Dutch language we can often get away with searching for a single word, because we are allowed to make incredibly long compound words such as “wapenstilstandsonderhandelingen”. When you’re searching for scientific information you better stick to English as language . In English can’t make compound words. This is a small language difference which necessitates searching with more terms. But apart from the language difference, when you search with more terms, searches become more specific and the results more relevant. In the current example a search for water only, results in more than 700 million results, whereas [Water management technology assessment] results in nearly 8 million results.
Interestingly, when you look at the results in the slides, you’ll notice that total results numbers in Google are unreliable to say the least. In the step from 2 to 3 search terms the result sets increases again.
The fifth example in the slide is an introduction to the next slide. You can be even more precise when searching.

7. Keep words together. Make us of “phrase searches”. A phrase search is a search which returns the words in exactly the specified order. Of course Google already ranks the results with the phrases of search terms at the very top of the search engine results page. This technique also reduces the sheer number of possible results. Compare for instance [“water management”] with [water management]. You can combine as many phrases as you like (see the previous slide), or make them really long (the latter is also used in plagiarism checks).

8. Search for title words. When you feel overwhelmed by the number of results a good solution is to limit your search to title words rather than anywhere on a page. You can search for single title words with the operator, or all of your search words with the operator. These operators are the same when you compare [intitle:”water management”] with [allintitle:water management]

9. Search for information in PDF files. Most scientific information is published on the web in the format of PDF files. Be it as a scientific report or a scholarly article e.g. [Agaricus bisporus ext:pdf]. A couple of years ago this was an extremely efficient way to look for scholarly information on the Web. However, since it has become very easy to produce your own PDF files, this technique has suffered some of its effectiveness, but it still works wonders. Especially in combination with the other tips.

10. Search for results from a specific domain. In some cases it is useful to restrict you results to a certain website or domain. This is certainly true for sites that don’t have good site search options e.g. [EndNote site:library.wur.nl]. You can also limit the results to the academic institutions of the USA [“water management” site:.edu].

11. Search for number ranges. Apart from the fact that Google is a powerful calculator, you can also search for number ranges. This comes in handy when you want to limit your search to results from certain publication years, e.g. [“publication strategy” 2009…2011]. Note that three dots is different (better) than the standard used two dots.

12. Exclude specific terms with the – operator. You can narrow your searches using this operator. You can exclude as many words as you want by using the – sign in front of all of them, for example [mercury -ford -freddy -outboards -planets].

13. Search with OR. In some occasions it the intelligence of Google doesn’t include obvious synonyms. With the OR operator you can combine search terms e.g. [“carbon dioxide” OR CO2]. Notice that OR should be typed with capitals.

14. Combine. Having seen some of the options of the Google search engine you should realize that you can combine most of these operators. In this way you can make very precise searches [“publication strategy” citations 2009…2011 ext:pdf]

Google and the academic Deep Web

Blogging on Peer-Reviewed ResearchHagendorn and Santelli (2008) just published an interesting article on the comprehensiveness of indexing of academic repositories by Google. This article triggers this me to write up some observations I was intending to make for quite some time already. It addresses the question I got from a colleague of mine, who observed that the deep web apparently doesn’t exist anymore.

Google has made a start to index flash files. Google has made a start to retrieve information that is hidden behind search forms on the web, i.e. started to index information contained in databases. Google and OCLC exchange information on books scanned, and those contained in Worldcat. Google so it seems has indexed the Web comprehensively with 1 trillion indexed webpages. Could there possibly be anything more to be indexed?

The article by Hagendorn and Santelli shows convincingly that Google still has not indexed all information that is contained in OAISTER, the second largest archive of open access article information. Only Scientific Commons is more comprehensive. They tested this with the Google Research API using the University Research Program for Google Search. They only checked whether the URL was present. This approach only partially reveals some information on depth of the Academic Deep Web. But those are staggering figures already. But reality bites even more.

A short while ago I taught a Web Search class for colleagues at the University Library at Leiden. For the purpose of demonstrating what the Deep or Invisible Web actually constitutes I used and example from their own repository. It is was a thesis on Cannabis from last year and deposited as one huge PDF of 14 MB. Using Google you can find the metadata record. With Google Scholar as well. However, if you try to search for a quite specific sentence on the beginning pages of the actual PDF file Google gives not the sought after thesis. You find three other PhD dissertations. Two of those defended at the same university that same day, but not the one on Cannabis.

Interestingly, you are able to find parts of the thesis in Google Scholar, eg chapter 2, chapter 3 etc. But those are the parts of the thesis contained in different chapters that have been published elsewhere in scholarly journals. Unfortunately, none of these parts in Google Scholar refers back to the original thesis that is in Open Access or have been posted as OA journal article pre-prints in the Leiden repository. In Google Scholar most of the materials is still behind toll gates at publishers websites.

Is Google to blame for this incomplete indexing of repositories? Hagendorn and Santelli point the finger to Google indeed. However, John Wilkin, a colleague of them, doesn’t agree. Just as Lorcan Dempsey didn’t. And neither do I.

I have taken an interest in the new role of librarians. We are no longer solely responsible for bringing external –documentary- resources from outside into the realm of our academic clientele. We have also the dear task of bringing the fruits of their labour as good as possible for the floodlights of the external world. Be it academic or plain lay interest. We have to bring the information out there. Open Access plays an important role in this new task. But that task doesn’t stop at making it simply available on the Web.

Making it available is only a first, essential step. Making it rank well is a second, perhaps even more important step. So as librarians we have to become SEO experts. I have mentioned this here before, as well as at my Dutch blog.

So what to do about this chosen example from the Leiden repository. Well there is actually a slew of measures that should be taken. First of course is to divide the complete thesis in parts, at chapter level. Albeit publishers give permission only to publish articles, of which most theses in the beta sciences exists in the Netherlands, when the thesis is published as a whole. On the other hand, nearly 95% of the publishers allow publication of pre-prints and peer reviewed post prints. The so called Romeo green road. So it is up to the repository managers, preferably with the consent from the PhD candidate, to tear up the thesis in its parts –the chapters, which are the pre-print or post-prints of articles- and archive the thesis on chapter level as well. This makes the record for this thesis with a number of links to far more digestible chunks of information better palatable for the search engine spiders and crawlers. The record for the thesis thus contains links to the individual chapters deposited elsewhere in the repository.

Interesting side effect of this additional effort at the repository side is that the deposit rates will increase considerably. This applies for most Universities in the Netherlands, for our collection of theses as well. Since PhD students are responsible of the lion’s share of academic research at the University, depositing the individual chapters as article preprints in the repository will be of major benefit to the OA performance university. It will require more labour at the side of repository management, but if we take this seriously it is well worth the effort.

We still have to work at the visibility of the repositories really hard, but making the information more palatable is a good start.

Reference:
Hagedorn, K. and J. Santelli (2008). Google still not indexing hidden web URLs. D-Lib Magazine 14(7/8). http://www.dlib.org/dlib/july08/hagedorn/07hagedorn.html