May 27, 2016
by Rob Mitchum, Computation Institute
Cyberinfrastructure is the connective tissue for computational science, tying together the research projects, resources, software, data, networks, and people needed to make important discoveries. In an era where soon all research will be computational science, to varying degrees, the importance of building strong cyberinfrastructures to support that research grows -- as do the challenges. But what will the cyberinfrastructures of the future look like?
That was the theme of CI Senior Fellow Rob Gardner’s May talk at the Enrico Fermi Institute at the University of Chicago. Fermi led the Manhattan Project to produce the world’s first nuclear chain reaction at UChicago, the type of huge, important scientific collaboration that today relies upon cyberinfrastructure to operate. But instead of wartime nuclear research, today’s Big Science pursues a broader variety of goals, such as searching the universe for dark matter, understanding the genomics of life on Earth, and digitally preserving the world’s libraries and knowledge.
“Cyberinfrastructure is the substrate of all scientific computation,” Gardner said. “Whether you realize it or not, this crosses every domain on campus: science, the humanities, the libraries. It is enabling forefront discoveries and transformative practices.”
The University of Chicago is at the center of many of these research efforts, through projects such as the South Pole Telescope, XENON1T, and the Array of Things. Scientists at UChicago and Argonne National Laboratory have also been leaders in the cyberinfrastructure solutions that make these ambitious studies possible, developing new technologies to store, share, and analyze the large volumes of data these projects produce.
One of the first major developments in this area was grid computing, co-created in the late 1990s by CI Director Ian Foster and Carl Kesselman. With the grid, researchers could share scientific data and computing cycles around the globe, linking together the world’s computational resources and enabling international collaborations that accelerated analysis and discovery.
The concept was quickly adopted by CERN, which created a grid to distribute its high-energy physics data to over 150 different sites, including the ATLAS Midwest Tier-2 Center led by Gardner. When the Higgs boson was discovered in 2012 through experiments at the CERN Large Hadron Collider (LHC), the Worldwide LHC Grid (WLCG) was credited with making this important physics breakthrough possible.
But the infrastructure built up for analyzing CERN data -- and other computation- and data-heavy projects -- was also put to use for research in other fields, Gardner said. The ATLAS Midwest Center and its 14,000 compute cores was one of the founding partners of the Open Science Grid (OSG), a consortium that made unused compute cycles (when computational resources weren’t used to their maximum) available to the broader research community. Currently, the OSG provides roughly 1 billion CPU hours each year, and moved 250 petabytes of data in 2015, Gardner said.
To increase usage of OSG resources, Gardner has spearheaded OSG Connect, software that makes it as simple as possible for researchers to use OSG for their computation needs. The idea is to make “OSG seem like a campus cluster,” he said, and it uses CI’s Globus platform for identity management and data transfer services, creating a reliable “bridge” between campus and the OSG. The successes of the LHC Grid and OSG have also informed the creation of novel cyberinfrastructures for new projects such as the XENON1T dark matter search experiment in the Gran Sasso Laboratory in Italy and the next generation of the South Pole Telescope, both of which will leverage both on-campus and off-campus resources (including the Research Computing Center's Midway computing cluster) to process the flood of data that their instruments will produce.
“We want to make it easy to use these systems, so we’re bringing the lessons from the LHC and OSG computing worlds to new disciplines and collaborations that compute and share data across institutions,” Gardner said.
Together, these efforts symbolize the rapidly changing face of infrastructure, Gardner summarized. Whereas 15 years ago, the term meant campuses building data center pods in isolation for the exclusive use of local researchers, the vision today is to build platforms that enable researchers to collaborate across institutions and to transparently compute and access data using local HPC resources, shared or allocated resources from the national cyberinfrastructure ecosystem, and resources procured from commercial cloud providers when needed.
To fulfill this vision, innovation is needed in the connective cyberinfrastructure fabric, such as enhanced use of “Science DMZs” which today isolate large research data flows from commodity Internet traffic, but in the future will offer the possibility for ubiquitous cyberinfrastructure patterns. The idea is to create the new “substrates” needed for the innovation of science platforms that lower barriers to entry for researchers while leveraging open source automation tools for creating new forms of distributed cyberinfrastructure.
“We need a competitive strategic vision for sustaining this activity going forward, one that is science-driven and built on trust relationships between faculty, cyberinfrastructure leaders, technical staff and other stakeholders across all units of the University,” Gardner said.
-See the original version of this article at https://ci.uchicago.edu/blog/connected-future-cyberinfrastructure.