October 4, 2023
By Emily Ayshford
An image of a ubiquitin protein molecule surrounded by water created by VMD. Humphrey, W., Dalke, A. and Schulten, K., `VMD -Visual Molecular Dynamics', J. Molecular Graphics, 1996, vol. 14, pp. 33-38.
Developing new drugs. Understanding the mechanisms of disease. Studying proteins critical to neuron signaling.
All of these crucial areas of study have increasingly relied on a powerful tool: molecular dynamics simulations of biological systems. Based on physics models, these computational simulations predict how atoms in a protein, and generally, in any molecular systems will move over time.
But the most advanced methods of performing these simulations have only worked on computers powered by central processing units (CPUs). Much of today’s science relies on the computational power of supercomputers, which run on both CPUs and higher power graphics processing units (GPUs).
Working with collaborators at the Sandia National Laboratory, University of Chicago Research Computing Center senior computational scientist Trung Nguyen helped integrate two high-powered molecular dynamics simulation models into the software package LAMMPS, adding in the ability for the simulation calculations to work on GPUs.
The result is much faster simulations that can run on supercomputers — potentially leading to a substantial speed-up in computational areas such as drug development and biotechnology research.
Targeting opportunities to accelerate the code
Trung, an expert in molecular modeling, has contributed for years to the LAMMPS software package, which performs atom-scale simulations of materials. In 2021, he was contacted by Josh Rackers, then-postdoctoral fellow at Sandia. Rackers was looking for help taking the physics-based rules that he developed for two molecular dynamics simulation models — called AMOEBA and HIPPO — and integrating them into LAMMPS.
AMOEBA and HIPPO both model large biomolecular systems, including proteins, DNA, RNA, and even sugars and lipids within cells. Rackers and Sandia colleague Steve Plimpton — a well-known computational scientist who wrote the original code for LAMMPS — wanted to run these models on hybrid CPU-GPU systems through LAMMPS instead of on the traditional CPU-only systems. They turned to Nguyen for his expertise.
“I love challenges, and it was a beautiful challenge,” Trung said.
AMOEBA and HIPPO are derived from quantum mechanical calculations, making them much more accurate than previous molecular dynamics simulation models. But that accuracy also means their complexity is thousands of times greater than previous classical models, requiring much more computational power.
Trung spent six months porting these calculations to GPUs, a painstaking process since getting the functions, or kernels, to run on the GPUs requires hundreds of arithmetic operations. He also evaluated strategies to make the models run even faster. The team exchanged more than 100 emails, discussing opportunities to accelerate the code. “It was a fruitful collaboration,” Trung said.
Ultimately, Trung targeted speed bottlenecks in AMOEBA and HIPPO, specifically porting those into the GPU package in LAMMPS. “I found that if I put the heaviest part of the code onto the GPUs, I could really speed up the calculations,” he said.
His work allowed the models with more than 600,000 atoms to run about 2.5 times faster than without the GPUs. That means, for example, if someone were to simulate how particular drug candidates would bind to a protein within the body, and the long list of drugs would take two weeks to simulate, that same process would now take one week or less.
“The speedup was better than what I had expected when we started, mainly because AMOEBA/HIPPO are very sophisticated and not very GPU friendly,” Trung said. “I am happy with how it turned out.”
“It makes a big difference in time for knowing which molecules to synthesize or to make another downstream decision,” Rackers said.
Allowing models to be run on supercomputers
Trung’s implementation also allows AMOEBA and HIPPO to run efficiently across many nodes — an important feature of supercomputers and one that has been difficult for software packages to achieve.
The team released the code as part of the LAMMPS in November 2022. It includes more than 10,000 lines of code and is now one of the largest packages on LAMMPS (a feat in itself, since LAMMPS has nearly 100 packages within it).
Though Rackers has moved on to a position in industry, and Plimpton has retired, Trung still contributes to LAMMPS and hopes to make even more improvements to AMOEBA and HIPPO in the future. “We have discussed several ideas and promising approaches for further improving speed-up,” Trung said.