David Gotz, Associate Professor at the UNC School of Information and Library Science (SILS) and Assistant Director of the Carolina Health Informatics Program (CHIP), has been awarded a National Science Foundation (NSF) grant worth over $1 million to develop a set of contextual visualization methods that will improve analysis of complex data sets. Gotz and his team will evaluate the new methods in a health outcomes setting, offering significant potential to improve health care through data analytics. Ultimate goals for the four-year project include the development of open-source software that can help advance data visualization accuracy and efficacy for enterprises around the world.
In almost every field, organizations are amassing large data repositories to support evidence-based decision-making. Online companies track users to learn about their buying habits, computer security logs capture detailed traces of network activity to help identify threats, and health care systems maintain extensive records for practitioners, patients, and administrators to consult. However, today’s visualization tools are often overwhelmed when applied to datasets with large numbers of variables.
“Real-world datasets can have many thousands of variables, a stark contrast to the relatively small number of dimensions supported by current visualization tools,” Gotz said. “The gap between what the data contains and what the visualization shows can put the validity of any analysis at great risk of bias, potentially leading to serious, hidden errors. This research project will develop a new approach to high-dimensional exploratory visualization that will help detect and reduce selection bias and other problems.”
Gotz and his team will build on the premise that the very summarization that makes many visual methods effective also inherently obscures important aspects of a high-dimensional datasets. In other words, people cannot fully understand complex data, or make good decisions based on that data, if they are relying on a visualization that omits or misrepresents the context of the findings.
This project will develop methods to explicitly model and analyze the data context and convey the relationship between the visualization’s focus and the context in order to better inform users about hidden problems. The new tools will allow for inline replication for visual validation, baselined selection methods for high-dimensional visualization, and interactive rebalancing for representative visualization.
For more information on the project, visit http://vaclab.web.unc.edu/contextual-visualization/