January 18, 2017
by Benjamin Recchie
The enormous data sets collected by today’s researchers from instruments and observations are creating enormous challenges in data management and computation. To help meet these challenges, the National Science Foundation (NSF) has awarded a grant to the Research Computing Center (RCC) for the Data Lifecycle Instrument (DaLI), a data storage and software platform available to the research community at the University of Chicago and the Marine Biological Laboratory (MBL).
Many of today’s observations and experiments generate tremendous amounts of data, often more than a terabyte each day. These enormous quantities of data enable scientific discovery and innovation and allow researchers to address complex new scientific questions, but they often must be transferred from remote locations or field stations to the user’s system for storage and analysis, which can prove a technical challenge. DaLI will address this by simplifying data management for researchers, allowing them to acquire, transfer, process, and share data from their instruments and observations in a single workflow. It will also allow them to easily share their data with a larger community of users.
The DaLI platform will consist of a high-performance compute resource for pre- and post-processing of data; a high-performance storage pool; a low-cost storage pool; and a tape backup pool. In addition, DaLI will integrate easily with campus and national cyberinfrastructures. The DaLI platform will be tied closely to the RCC Midway high-performance computing cluster, making it easy for users to process their data efficiently.
The initial DaLI user community at UChicago and MBL will span 20 projects, 9 disciplines, 91 faculty, 59 postdocs, and 108 graduate students. The projects range from acquisition of data from telescopes and X-ray videos to the creation of public data repositories and analysis tools. Dr. H. Birali Runesha, assistant vice president for research computing and director of the Research Computing Center, is the principal investigator on the project; Gordon Kindlmann, assistant professor of computer science, and Callum Ross, professor of organismal biology and anatomy, are co-PIs.
“By providing a foundation for managing the increasing size and diversity of scientific datasets, DaLI will simplify how research projects turn into tools of lasting utility, within and beyond the University,” says Kindlmann, adding that it will address the data management problems faced by many research groups trying to preserve their data for future use.
DaLI’s benefits don’t end at with MBL and UChicago. In partnership with those two institutions and collaborating institutions, DaLI will also be used as a training instrument to prepare students to meet the data challenges of the 21st century. DaLI will also serve as a test bed for developing data management practices, and its capabilities and software will serve as a replicable model for other institutions.
“Managing large data sets is one of the greatest challenges researchers face today, regardless of their field of inquiry,” says Runesha. “With DaLI, we hope to enhance scholars’ ability to store, transfer, and share data at UChicago and MBL, as well as provide a model for the greater scientific community.”