National Consortium for Data Science Names 2015 Data Fellows

Release date: 

December 8, 2014

National Consortium for Data Science Names 2015 Data Fellows

CHAPEL HILL, NC, December 8, 2014 - The National Consortium for Data Science (NCDS), a public-private partnership to advance data science and address the challenges and opportunities of big data, today named three faculty members at three different universities as NCDS Data Fellows for the 2015 calendar year.

Each Data Fellow will each receive $50,000 to support work that addresses data science research issues in novel and innovative ways. Their work will be expected to advance the mission and vision of the NCDS, which formed in early 2013. Data Fellow positions are open to faculty members at NCDS member institutions, which includes universities in the University of North Carolina system, Duke University, Texas A & M University, and Drexel University. A wide range of researchers from six different member universities applied for the Fellowships. Their research proposals addressed many of the hot topics in data science, from cybersecurity to applying the techniques used by online music databases to develop more precise search algorithms and interest students in data science.

“This is the second year we’ve provided Data Fellows awards and we believe the program is a great way to bring together talented faculty researchers and our industry members who are interested in the practical applications of their work,” said Stan Ahalt, chair of the NCDS steering committee and director of UNC Chapel Hill’s Renaissance Computing Institute (RENCI), one of the founding members of the consortium. “We had applications from across our membership and the quality was outstanding. I know our members look forward to learning more about our new Fellows and to understanding how their research will advance data science and help organizations in business, government, and academia address their data challenges.”

The 2015 NCDS Data Fellows and their projects are:

  • David Gotz, PhD, associate professor, School of Information and Library Science, UNC Chapel Hill, and assistant director of the Carolina Health Informatics Program. Visual Analytics for Large-scale Temporal Event Data

This project will produce novel visual analytics methods for large-scale temporal event data, which is time-stamped event data found in data sets as diverse as social networking logs and electronic health records (EMRs). Analysis of temporal event data such as medical diagnoses, procedures performed, lab tests, and medications prescribed, can provide evidence to support more personalized medical decision making and better health outcomes for patients. It can also be used in comparative effectiveness studies, population analyses, and patient-centered outcomes research. However, methods for exploring the temporal event data and selecting subgroups for analysis are complicated and time consuming. Gotz plans to develop software for comprehensive visual analytics of these data in a way that is simpler, more intuitive, and much less time consuming. A software prototype will be developed, tested and evaluated using data from more than 6 million patients in the Integrated Cancer Information and Surveillance System maintained by UNC’s Lineberger Comprehensive Cancer Center.

  • Erik Saule, PhD, assistant professor, department of computer science, UNC Charlotte. Toward Machine Oblivious Graph Analysis.

Graphs are a popular tool used to model a wide range of phenomena and to show the relationships among various entities. For example, graphs can be used to model the physical path of city streets or aisles in a store in order to analyze traffic patterns and determine the best locations for businesses or for products within a retail store. In medicine, researchers use graphs to model regulatory pathways and gene expression, predict conditions, and identify the best drugs to use in treatments. Unfortunately, the explosion of digital data has led to a similar explosion in the computational costs of running graph analyses. New algorithms to deal with this challenge are usually inflexible, requiring the researcher to use a specific type of graph or a particular type of computer system for analysis. This project aims to develop a framework for performing efficient graph analysis independent of the type of analysis being performed or the computer system used.


  • Erjia Yan, PhD, assistant professor, College of Computing and Informatics, Drexel University. Assessing the Impact of Data and Software on Science Using Hybrid Metrics.

In the age of data, the critical components of scientific and industrial research increasingly are data and software. These products can have significant impacts on future scientific discoveries and business innovation. Yet, they can be difficult to discover and assess because new knowledge is still catalogued in the form of published research papers. This project will address the problem of discovering and assessing the impact of data sets and software by identifying referencing patterns and designing hybrid metrics to assess the full impact of data and software. Unlike current data repository indexing, the project aims to provide context-driven, full text data analytics for data and software in order to account for the unsystematic ways in which these products are cited in scientific literature, including hyperlinks to web pages, footnotes, endnotes, and digital object identifiers. Ultimately, the project seeks to develop a system that will comprehensively capture the impact of data and software on knowledge production and discovery.

This is the second year of the NCDS Data Fellows Program. NCDS membership dues and supplemental funding from UNC General Administration support the program.

About the NCDS
The NCDS launched in April 2013 as a way to address the challenges and opportunities posed by massive data sets being created by digital medicine, environmental sensors, scientific instruments, social networks, commerce, finance, and more. Its goals include: identifying key data science challenges; encouraging data science research that spans academia, industry and government; facilitating improved data science education; supporting technical, ethical and policy standards for data; and applying data science expertise to societal problems and scientific disciplines. In addition to the university members, NCDS members are Cisco, Deloitte LLP, GE, IBM, MCNC, RTI International, and UNC General Administration.

For more information, see

Media Contact:
Karen Green, 919.619.8213,