Start now
May 5, 2010 searchuserinterfaces.com
Feb 9, 2010 searchuserinterfaces.com
Feb 9, 2010 searchuserinterfaces.com
In user-centered design, decisions are made based on responses obtained from target users of the system. (This is in contrast with standard software practice in which the designers assume they know what users need, and so write the code first and assess it with users later.) In user-centered design, first a needs assessment is performed in which the designers investigate who the users are, what their goals are, and what tasks they have to complete in order to achieve those goals. The next stage is a task analysis in which the designers characterize which steps the users need to take to complete their tasks, decide which user goals they will attempt to support, and then create scenarios which exemplify these tasks being executed by the target user population (Kuniavsky, 2003, Mayhew, 1999).
Once the target user goals and tasks have been determined, design is done in a design-evaluate-redesign cycle consisting of creating prototypes, obtaining reactions from potential users, and revising the designs based on those reactions. This sequence of activities often needs to be repeated several times before a satisfactory design emerges. Evaluation at this phase can often achieve useful results by testing with only a few participants, so the evaluation method used at this point in the design space is often referred to as “discount” usability testing (Nielsen, 1989b) . After a design is testing well in discount or informal studies, formal experiments comparing different designs and measuring for statistically significant differences can be conducted.
This iterative procedure is necessary because interface design is still more of a practice than a science. There are usually several good solutions within the interface design space, and the task of the designers is to navigate through the design space until reaching some “local optimum.” The iterative process allows study participants to help the designers make decisions about which paths to explore in that space. Experienced designers often can begin the design near a good part of the solution space; less experienced designers need to do more exploration. Designing for an entirely novel interaction paradigm often requires more iteration and experimentation. Evaluation is part of every cycle of the user-centered design process.
Feb 9, 2010 searchuserinterfaces.com
Shneiderman and Plaisant, 2004 identify five components of usability, restated by Nielsen, 2003b as:
Feb 9, 2010 searchuserinterfaces.com
Jan 21, 2010 searchuserinterfaces.com
The second important class of query reformulation aids are automatically suggested term refinements and expansions. Spelling correction suggestions are also query reformulation aids, but the phrase term expansion is usually applied to tools that suggest alternative words and phrases. In this usage, the suggested terms are used to either replace or augment the current query. Term suggestions that require no user input can be generated from characteristics of the collection itself (Schütze and Pedersen, 1994) , from terms derived from the top-ranked results (Anick, 2003, Bruza and Dennis, 1997), a combination of both (Xu and Croft, 1996) , from a hand-built thesaurus (Voorhees, 1994, Sihvonen and Vakkari, 2004), or from query logs (Cui et al., 2003, Cucerzan and Brill, 2005, Jones et al., 2006) or by combining query logs with navigation or other online behavior (Parikh and Sundaresan, 2008) .
Usability studies are generally positive as to the efficacy of term suggestions when users are not required to make relevance judgements and do not have to choose among too many terms. Some studies have produced negative results, but they seem to stem from problems with the presentation interface. Generally it seems users do not wish to reformulate their queries by selecting multiple terms, but many researchers have presented study participants with multiple-term selection interfaces.
For example, in one study by Bruza et al., 2000, 54 participants were exposed to a standard Web search engine, a directory browser, and an experimental interface with query suggestions. This interface showed upwards of 40 suggested terms and hid results listing until after the participant selected terms. (The selected terms were conjoined to those in the original query.) The study found that automatically generated term suggestions resulted in higher average precision than using the Web search engine, but with a slower response time and the penalty of a higher cognitive load (as measured by performance on a distractor task). No subjective responses were recorded. Another study using a similar interface and technology found that users preferred not to use the refinements in favor of going straight to the search results (Dennis et al., 1998) , underscoring the search interface design principle that search results should be shown immediately after the initial query, alongside additional search aids.
Interfaces that allow users to reformulate their query by selecting a single term (usually via a hyperlink) seem to fare better. Anick, 2003 describes the results of a large-scale investigation of the effects of incorporating related term suggestions into a major Web search engine. The term suggestion tool, called Prisma, was placed within the AltaVista search engine's results page (see Figure 6.1). The number of feedback terms was limited to 12 to conserve space in the display and minimize cognitive load. Clicking on a hyperlink for a feedback term conjoined the term to the current query and immediately ran a new query. (The chevron ( >>) to the right of the term replaced the query with the term, but its graphic design did not make it clearly clickable, and few searchers used it.) Term suggestions were derived dynamically from an analysis of the top-ranked search results.
The study created two test groups by serving different Web pages to different IP addresses (using bucket testing, see Chapter 2). One randomly selected set of users was shown the Prisma terms, and a second randomly selected set of users was shown the standard interface, to act as a control group. Analysis was performed on anonymized search logs, and user sessions were estimated to be bursts of activity separated by 60 minutes of no recorded activity. The Prisma group was shown query term refinements over a period of five days, yielding 15,133 sessions representing 8,006 users. The control group included 7,857 users and 14,595 sessions. Effectiveness of the query suggestions was measured in terms of whether or not a search result was clicked after the use of the mechanism, as well as whether or not the session ended with a result click.
In the Prisma group, 56% of sessions involved some form of refinement (which includes manual changes to the query without using the Prisma suggestions), compared to 53% of the control group's sessions, which was a significant difference. In the Prisma condition, of those sessions containing refinements:
Despite the large degree of uptake, effectiveness when measured in the occurrence of search results clicks did not differ between the baseline group and the Prisma group. However, the percentage of clicks on Prisma suggestions that were followed immediately by results clicks was slightly higher than the percentage of manual query refinements followed immediately by results clicks.
This study also examined the frequency of different refinement types. Most common refinements were:
In a more recent study, White et al., 2007 compared a system that makes term suggestions against a standard search engine baseline and two other experimental systems (one of which is discussed in the subsection below on suggesting popular destinations). Query term suggestions were computed using a query log. For each query, queries from the log that contained the query terms were retrieved. These were divided into two sets: the 100 most frequent queries containing some of the original terms, and the 100 most frequent of queries that followed the target query in query logs -- that is, user-generated refinements. These candidates were weighted by their frequency in each of the two sets, and the top-scoring six candidates were shown to the user after they issued the target query. Suggestions were shown in a box on the top right hand side of the search results page.
White et al., 2007 conducted a usability study with 36 participants, each doing two known-item tasks and two exploratory tasks, and each using the baseline system, the query suggestions, and two other experimental interfaces. For the known-item tasks, the query suggestions scored better than the baseline on all measures (easy, restful, interesting, etc). Participants were also faster using the query suggestions over the baseline on known item tasks (although tied with one experimental system), and made use of the query suggestions 35.7% of the time. For those who preferred this query suggestion interface, they said it was useful for saving typing effort and for coming up with new suggestions. (The experimental system for suggesting destinations was more effective and preferred for exploratory tasks.)
In the BioText project, Divoli et al., 2008 experimented with alternative interfaces for terms suggestions in the specialized technical domain of searching over genomics literature. They focused specifically on queries that include gene names, which are commonly used in bioscience searches, and which have many different synonyms and forms of expression. Divoli et al., 2008 first issued a questionnaire in which they asked 38 biologists what kind of information they would like to see in query term suggestions, finding strong support for gene synonyms and homologues. Participants were also interested in seeing information about genes associated with the target gene, and localization information for genes (where they occurs in organisms). It should be noted that a minority of participants were strongly opposed to showing additional information, unless it was shown as an optional link, in order to retain an uncluttered look to the interface.
A followup survey was conducted in which 19 participants from biology professions were shown four different interface mock-ups (see Figure 6.2). The first had no term suggestions, while the other three showed term suggestions for gene names, organized into columns labeled by similarity type (synonyms, homologues, parents, and siblings of the gene). Because participants had expressed a desire for reduced clutter, at most three suggestions per columns were shown, with a link to view all choices.
(a)
(b)
Design 2 required selection of the choices by individual hyperlink, with an option to add all terms. Design 3 allowed the user to select individual choices via checkboxes, and Design 4 allowed selecting of all terms within a column with a single hyperlink. Design 3 was most preferred, with one participant suggesting that the checkbox design also include a select all link within each column. Designs 4 and 2 were closely rated with one another, and all were strongly preferred over no synonym suggestions. These results suggest that for specialized and technical situations and users, term suggestions can be even more favored than in general Web search.
The results of the Anick, 2003 and the White et al., 2007 studies are generally positive, and currently many Web search engines offer term refinement. For example, the Dogpile.com metasearch engine shows suggested additional terms in a box on the right hand side under the heading “Are you looking for?” (see Figure 6.3). A search on apple yields term suggestions of Apple the Fruit (to distinguish it from the computer company and the recording company), Banana, Facts about Apples, Apple Computers, Red Apple and others. Selecting Apple the Fruit retrieves Web pages that are about that topic, and the refinements change to Apple Varieties, Apple Nutrition, History Fruit Apple, Research on Fruit, Facts about the Fruit Apple, and others. Clicking on Facts about the Fruit Apple retrieves web pages containing lists of facts.
The Microsoft search site also shows extensive term suggestions for some queries. For instance, a query on the ambiguous term jets yields related query suggestions including Jet Magazine, Jet Airways, JetBlue, Fighter Jets, Jet Li and Jet Stream (see Figure 5.8 in Chapter 5).
Jansen et al., 2007b studied 2.5M interactions (1.5M of which were queries) from a log taken in 2005 from the Dogpile.com search engine. Using their computed session boundaries (mean length of 2.31 queries per session), they found that more than 46% of users modified their queries, 37% of all queries were parts of reformulations, and 29.4% of sessions contained three or more queries. Within the sessions that contained reformulated queries, they found the following percentage of actions for query modifications (omitting statistics for starting a new topic):
(Here, collections refer to Web pages versus searching images, videos, or audio data.) Thus, they found that 8.4% of all queries were generated by the reformulation assistant provided by Dogpile (see Figure 6.3), although they do not report on what proportion of queries were offered refinements. This is additional evidence that query term refinement suggestions are a useful reformulation feature. A recent study on Yahoo's search assist feature (Anick and Kantamneni, 2008) found similar results; the feature was used about 6% of the time.
White et al., 2007 suggested another kind of reformulation information: showing popular destination Web sites. They recorded search activity logs for hundreds of thousands of users over a period of five months in 2005--2006. These logs allowed them to reconstruct the series of actions that users made from going to a search engine page, entering a query, seeing results, following links, and reading web pages. They determined when such a session trail ended by looking for a stoppage, such as staying on a page for more than 30 minutes, or a change in activity, such as switching to email, or going to a bookmarked page. They distinguished session trails from query trails; the latter had the same stopping conditions as the former, but could also be ended by a return to a search engine page. Thus they were able to “follow” users along as they performed their information seeking tasks.
White et al., 2007 found that users generally browsed far from the search results page (around 5 steps), and that on average, users visited 2 unique domains during the course of a query trail, and just over 4 domains during a session trail. They decided to use the information about which page the users ended up at as a suggestion for a shortcut for a given query. Given a new query, its statistical similarity to previously seen query-destination pairs was computed, and popular final destinations for that query were then shown as a suggested choice (see Figure 6.4). They experimented with suggestions from both query trails and sessions trails.
In the same study of 36 participants, they compared these two experimental approaches against a standard search engine baseline and a query suggestions interface, testing on both known-item tasks and exploratory tasks. For exploratory tasks, the destination suggestions from the query trails scored better than the other four systems on perceptions of the search process (easy, restful, interesting, etc.) and usefulness (perceived as producing more useful and relevant results) for the exploratory tasks. The task completion time on exploratory tasks was approximately the same for all four interfaces; the destination suggestions were tied in terms of speed with query term suggestions in known-item tasks. In exploratory tasks, query trail destination suggestions were used more often (35.2% of the time) than query term suggestions and session trail destination suggestions.
Participants who preferred the destination suggestions commented that they provided potentially helpful new areas to look at, and allowed them to bypass the need to navigate to pages. They suggested that destinations were selected because they “grabbed their attention,” “represented new ideas,” or users “couldn't find what they were looking for.” Those who did not like the suggestions stated as a reason the vagueness of showing only a Web site; presumably augmenting the destination views with query-biased summaries would make them more useful. The destination suggestions produced from session trails were sometimes very good, but were inconsistent in their relevance, a characteristic which is usually perceived negatively by users. The participants did not find the graphical bars indicating site popularity to be useful, mirroring other results of this kind.
Jan 21, 2010 searchuserinterfaces.com
Chapter 6 discusses interfaces for suggesting terms to augment the user's query after they have received results. More recently, interfaces have appeared that suggest query terms dynamically, as the user enters them. In some cases, these dynamic term suggestions appear before the searcher has seen any retrieval results, and in others, the system dynamically shows documents that match the characters typed so far, adjusting the results list as more characters are typed. Dynamic query term suggestions (sometimes referred to as auto-suggest, autosuggest, or search-as-you-type) are a promising intermediate solution between requiring the user to think of terms of interest (and how to spell them) and navigating a long list of term suggestions.
Some dynamic term suggestion systems show only query suggestions whose prefix matches what has been typed so far. Figure 4.6 shows an example from Microsoft's dynamic query suggestions interface, which shows frequent queries whose first words contain the prefix that has been typed so far, canc, including cancer, cancun weather, and cancel. Dynamic query suggestions are not restricted to matching the prefix of the query alone. For instance, at eBay, typing in the letter d in the query form shows suggestions such as d oorbusters, d igital cameras}, and d s lite}. Continuing to do shows suggestions like do orbusters, do oney bourke,} and do ll}. Web search engines today provide similar functionality in their toolbars.
The dynamic query suggestion approach falls within guidelines for dynamic queries by Shneiderman, 1994. Although no usability studies have been done for this kind of interface, a large log study by Anick and Kantamneni, 2008 found that, when measuring on four distinct days over a period of 17 weeks and 100,000 users, users clicked on the dynamic suggestions in the Yahoo Search Assist tool in 30--37% of the sessions (see Figure 1.4 in Chapter 1). The rapid spread of this facility suggests that dynamic real-time term suggestions are becoming the norm.
White and Marchionini, 2007 performed a study on a similar interaction method, on what they call real time query expansion (see Figure 4.7). After the user types a word and presses the keyboard space bar, the system queries a Web search engine and extracts terms from the surrogates for the 10 top-ranked documents. The top 10 term suggestions are shown after the first term is typed. The user can select one or more of the suggested terms by double-clicking it, or ignore the suggestions. This process continues with the system suggesting additional terms after each word is entered, until the query is completed (by pressing the Return key). Thus, the idea is similar to dynamic term suggestions, but less interactive, and responsive at the word level only, as opposed to at the character prefix level.
White and Marchionini, 2007 compared this approach to a baseline system with no feedback (using Google Web search with identifying information removed) and another version of their system in which term suggestions are shown alongside the search results, after the query is entered (standard term suggestions). The study consisted of 36 students who compared the interfaces in a within-participants design. Using pre-defined queries, the study distinguished between known-item searches and open-ended exploratory searches, hypothesizing that term expansion would be more effective for the latter. When comparing time taken and quality of results, there were no significant differences among the systems, although the numbers trended towards the real time query expansion being more effective. The quality of search results were assessed by two judges, and the precision was found to be higher in the exploratory task for the dynamic term suggestions than for the post-retrieval suggestions, and both were higher than for the baseline. No quality differences were found for the known-item tasks.
Satisfaction scores revealed that participants found the baseline to be more effective and more usable, but found the dynamic suggestions to be more engaging and more enjoyable. Post-study questionnaires suggested that if the response time for the query suggestions had been faster, participants would have found them more useful. Many commented negatively on the delay (1.8 seconds average) between hitting the space bar and seeing the suggestions. Since modern term suggestion interfaces are much more reactive, this suggests that they are most likely found useful. Participants also made positive comments about the post-query suggestions, indicating that they were often helpful when the first query was unsuccessful.
White and Marchionini, 2007 point out the potential danger in showing query term suggestions before retrieval results are seen, as the suggestions can lead the searcher down an erroneous path. They cite as an example the high prevalence of the suggested term ride for the query Who was the first female astronaut in space?. The correct answer is Soviet cosmonaut Valentina Tereshkova, but mention of Sally Ride, the first American woman in space, is frequent in the retrieved document summaries. This, compounded with the fact that the verb ride is a meaningfully related term to space travel, caused some participants to erroneously augment their query with this term. White and Marchionini, 2007 note that if users see search results first, they are less likely to make this kind of mistake.