Rob Capra and Jaime Arguello awarded NSF grant to develop systems that utilize search trails

September 28, 2017

Have you ever been struggling to find information on a particularly complex topic and thought “I can’t possibly be the first person to look for this”? You probably were not, and the searchers who preceded you may have left valuable “search trails” –  including queries issued, results clicked, pages viewed, pages bookmarked, and annotations entered – that could help you locate what you need. 

Drs.Jaime Arguello and Rob Capra on the steps of Manning Hall

Rob Capra and Jaime Arguello, professors at the UNC School of Information and Library Science (SILS), recently received a National Science Foundation (NSF) grant worth nearly $500,000 to develop and evaluate systems that will automatically display relevant search trails as a form of search assistance to users. The project has the potential to improve a broad range of systems, including web search engines used by millions, digital libraries, and enterprise and website-specific search engines.

“Prior research has suggested the usefulness of search trails, but has not answered key research challenges required to design and implement them,” Capra said. “The system needs to predict when to display search trails to a user, which trails to display, and how to display them in a way that supports the user's goal.”

Many search engines already use activity traces to improve their search algorithms and results. This may produce indirect benefits for users, but does not utilize the full potential of search trails as a direct form of search assistance.

Capra and Arguello will execute their project in three phases. Phase 1 will determine which factors of the user, task, and system influence whether a searcher wants help, for what purpose, and whether they are able to gain useful information. Phase 2 will develop models for predicting when to show trails to a user based on user and task features, as well as behavioral measures that indicate whether a searcher is having difficulty. Finally, Phase 3 will develop models for predicting which trails to show for the current search session.

“We will use learning-to-rank algorithms to combine features that measure the similarity between the current search session and a candidate trail, as well as the information content in the trail,” Arguello said. “Being able to match search sessions based on the user's higher-level goal has direct implications to other information retrieval tasks such as document ranking, query suggestion, and aggregated search.”