Tuesday, August 14, 2012

Internet search: What makes it simple, difficult or impossible?

On the face of it you’d think that searching on a modern search engine such as Google is a pretty simple and straightforward skill.  And mostly, you’d be right.  It’s the exceptions that are interesting.

Every so often we’ve all had the experience of trying to find something that we just can’t quite seem to nail down.  You expect there’s a web page out there somewhere in the billions of possible pages, but you just can’t figure out what to do to make it pop to the top of the search results.  Even more strangely, you’ve probably also had a frustrating moment of being unable to find something, then told a friend about it, only to have the perfect result show up when they did exactly the right query. 

That's frustrating.  

Sometimes you (or your colleague) do just get lucky and manage to phrase the query just so.  The good news is that search engines do a remarkably good job on the vast majority of searches.   That’s where the skill of search comes into play. 

More than almost any other technology, search engines have transformed the way we do research:  papers and results that were previously undiscoverable (or only painfully and laboriously discoverable) now have become simple and quick to locate.  Where graduate students once slaved endlessly over a hot photocopier in library stacks, they can now run a search, locate the relevant papers, and all the papers those papers reference, and so on.  The scholarship of research really IS different these days.  It’s not just simpler, but also broader and deeper.  Given the same number of research hours, one can potentially reach much, much more.

What makes a search hard?  Back to that sticky search problem:  What causes difficult search tasks, and what can we do to work around the less obvious search problems?

Tough problems are often called “long tail” problems because they’re tasks that take more than the average number of searches to accomplish.  As with most internet-related human behaviors, search tasks follow a power distribution curve in the number of searches needed to satisfy a search goal.  

If you’re just looking for the main web site of a university department, that’s pretty simple to do and takes only a search or two.  

But if you’re trying to understand the latest research findings on the early detection and treatment recommendations of autism, that’s a task that will take many searches over an extended period of time. Long tasks are well out on the long part of the tail.

Difficult search tasks are difficult for a number of reasons.  First and foremost is an effect well known in cognitive psychology—the framing effect.  When a searcher initially conceives of a search task, the problem is sometimes framed in terms that are relevant to the searcher, but not necessarily in the language of the literature.  This most commonly happens when the searcher seeks out information in a domain in which they are not expert.  

A friend recently spent a great deal of time searching for the data set on which gestational diabetes blood sugar levels were established, but had no success until she determined that the appropriate word to use in searching was “pre-prandial” (as in “pre-prandial blood glucose levels”).  Once she discovered that word, the world of scholarly literature about pre-prandial glucose testing was easily found; without the key term, it was merely a long slog through general results.  In this case, the language she used to frame the question pre-disposed the search engine to the non-technical literature.  Once a key-term for a search task is found, an entirely new universe of results suddenly becomes open for inspection.

Luckily, this “key term” effect happens primarily in technical domains where the long-tail effect tails over.  In more common search tasks, many other people have made followed search paths that have led to success.  In these tasks, search engine automatic synonymization works very effectively to get the searcher to their results rapidly.  While Google will synonymize “blood sugar” with “glucose,” too few people search for pre-prandial tests norms for the term “pre-prandial glucose” to be suggested as a synonym for “blood sugar test.”

Another method for getting out of a framing mindset is to check out the “related searches” that other searchers have used.   (Related searches are shown in the left hand navigation panel or at the top of the organic results.)  These searches made by others working in the same domain can often lead to useful re-framings of the search task.  The query [ power law ] might have other related search such as [ pareto ], [ power distribution ], [ Zipf distribution ], or [ 80 20 rule ]—all useful suggestions that might easily unstuck a conceptual fixedness.

Practicing search works!  Like many expert behaviors, search is one that rewards skill-development, practice and attention.  In our studies, we have shown that spending a modest amount of time learning the attendant skills of search pays off in much reduced search times and search accuracy.  

For example, a crucial skill in reading a search-results page (indeed, any online document) is knowing how to Find a word on the page.  In all browsers, this the Control-F / Command-F / Edit>Find function to locate a given word in the document.  Surprisingly, our surveys show that roughly 90% of the US English-speaking population does NOT know this key skill.  Once told about this, many search tasks are significantly simplified, and on long documents (e.g., that 150-page PDF highly technical monograph you’re reading) the task of locating relevant information goes from extremely-difficult to trivial.

Similarly, the skill of scoping a search by limiting searches to a particular resource can be very useful.  Some difficult searches suffer from having search terms that are too common, making it difficult for the searcher to separate the wheat from the chaff.  For instance, suppose you have a vague recollection that there was an interesting article about crossword puzzles in some issue of the APS Observer.  Doing a general web search for [ crossword ] is unlikely to bring any issue of the Observer into the top 10 results.  Knowing how to limit your search to just the APA Observer website will bring all of the articles about crossword puzzles into immediate focus.  The way to do this scoped search is by using the site: operator.  

Example:  [site:psychologicalscience.org crossword ] 

This is a handy skill to have when you want to search different repositories.  (And of course, if your search comes up empty, be sure to try your search without the site: operator.)

Thing you might not have thought about:  But the surgical scalpel of key word choice cuts both ways, sometimes pushing your search investigation into a particular context that you might not have considered.  For example, when searching for a door lever to be put onto a child’s door handle to simplify door-opening, including the search term “child” automatically puts your searches into the realm of children.  If you’re searching for a typically childhood disease or disorder, that’s usually good, but in the context of doorknobs and handle extensions,  queries like [child door knob] are dominated by the much more typical problem of preventing children from opening particular doors.  For this particular search, rethinking the problem in terms of other use cases leads to a better search experience.  For instance, older people also have problems opening doors because of reduced grip strength and imprecise motor movements.  The search [elder door knob] gives much better results for this particular search task of looking for aids for door opening.  The important thing to remember is that web search operates over the entire web, which might well include topics and areas you might not be considering when you form an initial concept for the search.

Everything changes... constantly: Of course, a key thing to remember about web search is that both the contents of the web and what search engines can do to process that content are constantly under revision.  What this means for you is that search is a skill like any other.  It is useful for professionals to pay attention to new content resources as they come online (that is, accessible through search engines), watching for new search capabilities (such as the ability to search realtime streams for breaking news on current events), and new ways of viewing the results of searches (timeline views of search results).  There are classes, information streams and resources available for staying on top of what’s going on.  To be the best possible searcher, you need to make time to track these new capabilities as well as understand the entailments of what’s possible.

Keep learning:  With the inexorable and rapid transformation of the world-wide web into a resource of incredible depth and breadth, you owe it to yourself to stay in touch as new materials and new tools transform research problems from very difficult or impossible to quick and simple tasks.  The web changes, so do the tools; keep learning about what’s possible.

Originally written for the APS “Observer” journal.   
Link to original post. Edited and for publication in this blog. 

Photo credit: Sybren A. Stüvel.  Thanks!


  1. From Henry IV -
    "As that ungentle hull, the cuckoo's bird,
    Useth the sparrow; did oppress our nest;
    Grew by our feeding to so great a bulk
    That even our love durst not come near your sight
    For fear of swallowing; but with nimble wing
    We were enforced, for safety sake, to fly
    Out of sight and raise this present head;
    Whereby we stand opposed by such means
    As you yourself have forged against yourself
    By unkind usage, dangerous countenance,
    And violation of all faith and troth
    Sworn to us in your younger enterprise."

    From "The earliest House Sparrow introductions to North America" .pdf

    "In 1850, Nicholas Pike was the director of the Brooklyn Institute (Barrows 1889), so we have every reason to believe that he had direct knowledge of this first introduction. Thus, Pike’s report to Barrows constitutes an actual historical record, of sorts."

    So, it was Nicolas Pike, Director of the Brooklyn Institute, who established the English sparrow in North America, in the 1850's, to eat insects. They grew to such great numbers that they almost wiped out the bluebird. Both birds use the same type of nesting habitats.

    I started by looking up bluebirds and found out they had been almost driven to extinction by English Sparrows. I then looked up sparrow in Henry IV. Then found the PDF of "The earliest House Sparrow introductions to North America" and found out the history.

  2. Locating and using a glossary is useful to help a searcher locate search terms that are the "language of the literature." Another useful tool is to build a Mind Map using WikiMindMap http://www.wikimindmap.org.

  3. Interesting analysis!

    One reason why it's hard to find comprehensive information on the web has to do with the way that information is scattered across pages. Our work in healthcare suggests that facts also follow a power-like distribution, with few pages containing many facts and many pages containing few facts. Furthermore, no single page contains all relevant facts, which means that users need to visit multiple pages to get comprehensive information.

    But even that doesn't completely capture the complexity, because the extent to which facts are covered on pages also varies. We found three types of pages: general pages, which contain many facts in low-to-medium detail, specific pages, which contain few facts in great detail, and sparse pages, which contain few facts in low detail. The existence of these "page types" suggests a particular order in which pages should be visited, usually from general to specific. This search procedure is well-known by librarians and search experts, but is not known by novices, nor is it captured in the search for "the right keywords". Our work suggests that one way to help users find comprehensive information is to make such procedures salient to novices.

    Here's a few papers, if you're interested...
    On making search procedures salient: http://www.skbhavnani.com/DIVA/papers/Bhavnani_et_al_JASIST_2006.pdf
    On information scatter: http://www.skbhavnani.com/DIVA/papers/Bhavnani-Peck-JASIST-2010.pdf