Scientific data repository gets $2.18 million boost

Release date: 

May 8, 2009

Photo of DRYAD logo

A digital data repository that researchers agree has the potential to transform how scientific research is pursued will be expanded with a $2.18 million grant from the National Science Foundation.

The repository, called Dryad, is designed to archive data that underlie published findings in evolutionary biology, ecology and related fields and allow scientists to access and build on each other’s findings.

The grant recipients are:

The National Evolutionary Synthesis Center and the Metadata Research Center have been developing Dryad in coordination with a large group of Journals and Societies in evolutionary biology and ecology. With the new grant, the additional team members are contributing to the development of the repository.

The work on the repository also coincides with naturalist Charles Darwin’s 200th birthday this year and the 150th anniversary of the publication of his “The Origin of Species.” Some of the data on Darwin’s finches are included in Dryad. For example, a scientist working on characteristics of American goldfinch populations can search the repository to find the raw data on the beak measurements of Galapagos finches.

photo of gold finchCurrently, a tremendous amount of information underlying published research findings is lost, researchers say. The lack of data sharing and preservation makes it impossible for the data to be examined or re-used by future investigators.

Dryad addresses these shortcomings and allows scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, integrate data across studies and look for trends through statistical meta-analysis.

“The Dryad project seeks to enable scientists to generate new knowledge using existing data,” said Kathleen Smith, Ph.D., principal investigator for the grant, a biology professor at Duke and director of the National Evolutionary Synthesis Center. “The key to Dryad in our view is making data deposition a routine and easy part of the publication process.”

Dryad is being designed with a consortium of stakeholders who include representatives of more than a dozen journals in evolutionary biology and ecology. The consortium sets policy and is responsible for long-term financial sustainability. Dryad is intended to serve as a model for the many other scientific disciplines facing similar challenges in data preservation and sharing.

“The technical goals of Dryad include automatically generating metadata representing data sets, and the exchange of metadata with specialized archives such as GenBank and TreeBASE, and with metadata registries such as MetaCat,” said Jane Greenberg, Ph.D., co-principal investigator, director of the Metadata Research Center at UNC and a professor in the School of Information and Library Science. “We are also researching how to use information from the scientific papers to enhance retrieval of the associated datasets.”

Metadata, as defined by “describes other data. It provides information about a certain item’s content. For example, an image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created and other data.

“A text document’s metadata may contain information about how long the document is, who the author is, when the document was written and a short summary of the document. Metadata is frequently used on Web pages to help describe the content, allowing search engines to locate information when requested.”

Besides Smith and Greenberg, collaborators on the grant include co-principal investigator Todd Vision of UNC; Kristin Antelman of NCSU; William Piel of Yale University; and William Michener of the University of New Mexico.

Related links:
Dryad digital data repository
National Evolutionary Synthesis Center (NESCent)
UNC School of Information and Library Science (SILS)
Metadata Research Center
North Carolina State University Digital Library Initiatives
Long Term Ecological Research (LTER) Network Office at the University of New Mexico
Yale University, TreeBASE