Lab 05 - Search me

More and more, searching is how we interact with our web browsers. As the world of information available to us through this interface expands, the more difficult it can be to find what you're looking for. Search engines use things like web crawlers to build information about webpages and websites using a technique called indexing. But why should we, as end-users, care about the technical details of how search systems work? We talked about the "filter bubble" in Lab 01 and how it might affect which results we see when we search. Understanding the limits of our searches and their results gives us tools to use when the results aren't what we expected (or even what we really wanted).

One thing you might notice about some of the information in the book is that it may have changed by the time you work through the book's examples. Providing search functionality is a valuable and profitable business, and pdates that maintain that level of profitability happen more frequently than ever in the competitive search landscape. Part of being fluent in information technology is to understand the rapid pace of change and to use what you have learned before to take advantage of what is new.

For optimal search experiences, the goal is to deliver what librarians call precision and recall while still having the most relevant results show up at the top. Search engines employ extraordinarily complex algorithms to deliver those optimized results, and as the scope of information changes, so must the algorithms. Similarly, look-and-feel changes help highlight or hide functionality based on changes made by the search provider.

Why does this matter? Because you may develop a routine around searching and not realize your access to certain parts of the search tool has changed, or the tool works in a different way than you have been used to. Keeping a critical eye on any tool you use is a good idea in this environment of change.

To consider:

Putting search tools to work: compare and contrast

In this fifth lab, you'll enter the same search terms across a variety of web search tools, including bing, Yahoo!, Google, and Wolfram Alpha. We'll also take a closer look at other search tools, like finding materials in your library catalog. As you explore, review your answers from the To consider: section above. Have you changed your mind about any of them?

Learning more

  1. Let's begin by comparing the web-based search tools to each other. Try the following searches in bing, Yahoo!, Google, Google France, and Wolfram Alpha. Enter the search terms exactly as they are shown below:
          chocolate rain
          apollo 13
          123852/238.6
          convert 100 euros to us dollars 
          how do I change the battery in my laptop?
          poisson
          translate poisson
          a modest proposal
          the prince
          java
          art
        

    • What did you notice about the "type ahead" or "suggested search" features (if present) that try to complete your search for you? Were they always helpful? Do you know where they come from? Could you turn that feature off if you wanted to?
    • How do the search engines differ in their results? Why do you think they do? If you were a programmer working for one of these search companies, why would you choose certain search results as more relevant than others? How could you get those results to come up earlier in the "hit list" or results set?
    • Many times the first page of results will look similar from search tool to search tool. Take a look at the second pages of the search results for the search "java". Do the search tools' results sets begin to diverge (get less alike), or converge (get more alike)? How many of the results on the first and second pages are related to commercial products? Is the ratio of commercial sites different on the second page compared to the first? Is the ratio different from one search tool to another?
    • Were there any results you were suprised by? Which ones, if any?

  2. Now take the same search terms and enter them into a few library search tools. You can use our own institution's library, or you can search the New York Public Library, Arizona State University's library catalog, or ASU Library's discovery layer service, Library One Search.

    • What were the similarities and the differences between the web-based search tools and the library search tools?
    • Were the differences in the results what you expected? Why or why not?
    • Which tool was the best at each search? Were some tools just as good as the others, depending on the search?
    • Find the advanced search functions for each search tool. Could any of these options helped you get better results? Which ones?
    • Which search tools had ads at the top of the results list? Was it easy to tell they were ads?
    • What did the search tools assume about your search for "java"? Which "java" did you think you were searching for? Was it one of the listings on Wikipedia's disambiguation page for "java"? Do you think any of the results were affected by your own "filter bubble"? Why or why not?
    • Were any commercial results in the results sets you got from the library? How do you know? Were there any resources you were not allowed to get to because you were not a library patron? Could you get to those resources legally and for free anywhere else? Where?

  3. We've taken a look at the differences in results we can get just by changing which search tool we use. Now we need to think about what exactly it is we're getting. How do you know when the site you click on has accurate information? How do you know when it's trying to sell you something, or when it's trying to convince you of a particular point of view?

    Try IT: critically assessing results
    One way to determine whether or not a website is giving you reliable and accurate information is to examine it closely. Who is responsible for the site? Is it easy to contact them? How recently was the site updated? Do other sites that you know are credible link to that site?

    • Imagine you are doing some research for a paper you are going to write on Dr. Martin Luther King, Jr. and his work. Go to a search tool like Google, and type in "Martin Luther King" (without the quotes). What are some of the "type ahead" suggestions?
    • Take a look at the results set. Which of these sites look legitimate? Do any of these sites look like they might not be legitimate? Why or why not?
    • Find the entry for the site martinlutherking.org. The link text reads "Martin Luther King Jr. - A True Historical Examination". Click on the link to the site.
    • Without clicking on any of the links, look at everything on the homepage of the site, even the small print at the bottom. What is your first impression?
    • Now click a few links. Has your impression changed? Why or why not?
    • If you haven't yet, click on the "Hosted by Stormfront" link at the bottom of the page. Is the page you get to what you expected?
    • Can you tell some things about the Stormfront organization from the logo on the page, and from the forum topics? Do you think this organization is able to give accurate information about Martin Luther King Jr.? Explain.
    • How can you discover whether or not the information on the martinlutherking.org site is accurate? Where could you corroborate or disprove the information you find there?
    • Go back to your search tool and type in "martinlutherkingorg" (without the quotes). What are some of the "type ahead" suggestions? does that tell you anything about the site itself? How would you find out how those search suggestions are made?

    The barriers to publishing a website are not very high. As you may recall from Lab 04, you can do it yourself for free. What does that mean in terms of how accurate the information is that you find on the web? What are some things to keep in mind as you search the web, not just for papers you need to write, but other things you might need to do. How do you know the person tweeting as Lady Gaga is really her? If you don't know for sure, how would you find out?

Moving on

Web-based search tools are capable of doing more than just search, and are handling natural language search requests with more and more accuracy. Still, searching is a bit more complex behind the scenes than we may have realized, and we may not also realize what factors affect any given set of search results. We can remain hopeful that search tools will usually give us what we want. But we need to know what it looks like when they don't - which you may now know can look more subtle than expected. What techniques will you use to evaluate your search results, and what will you do when they don't meet your expectations?