Seagate and the Los Alamos National Laboratory have signed a cooperative research and development agreement to jointly explore developing efficient computing near storage, especially in settings where erasure encoding is needed for data protection. The overarching goal is to improve the sustainability of compute and storage architectures.
Seagate and Los Alamos are collaborating on a proof-of-concept high-performance compute architecture that provides optimised computing capabilities across and within storage devices.
The architecture allows for computing functions to be pushed down into an erasure-encoded hard drive tier. The architecture will allow for faster, more efficient, less energy-intensive and less thermally demanding data retrieval.
“Near data computing has always relied on knowing enough about the data to act accordingly; however, this architecture is the first known example of per-device computing that does not require re-constituting data into instances of the entire data set prior to exercising computing functions,” said Ed Gage, vice president of Seagate Research Group. “This design allows for computing to occur on erasure-encoded data, which is often present in hard disk drive storage architectures.”
“The goal of this joint research is to optimise the efficiencies of a high-performance computing architecture by utilising compute and memory resources of supporting storage systems down to the hard drive level,” said Mike Moritzkat, CEO and managing director of Seagate Government Solutions. “This in turn lowers overall total cost of ownership with full resource optimisation.”
Hard drives are often deployed in resilient storage architectures with erasure coding techniques that break up logical data files or objects into multiple unrelated and distributed “chunks” of data across multiple drives.
This makes it extremely difficult to perform contextual computational operations. The collaboration between Seagate and Los Alamos aims to solve this challenge.
The scientific research at Los Alamos requires highly parallel disk drive technology that provides hundreds of petabytes of “warm” storage for large-scale simulations.
That technology needs to be protected by two tiers of erasure coding, securing sensitive data against random and correlated failures.
The collaboration will demonstrate that the potential of utilising processing capability very near the disk devices can vastly reduce the amount of data that needs to be retrieved for the analysis part of a science campaign.
For example, researchers may need to track just the very front of a shock wave traveling through a material or the state of only a few high energy particles across a long simulation.
“The promise of this work is to demonstrate measurably faster data query and retrieval using less energy and generating less heat,” said Gary Grider, High Performance Computing division leader at Los Alamos. “The other obvious goal is to provide this faster and lower power solution in a very economical way, making analytics of warm/cool disk-based data a more attractive solution for any large-scale erasure protected data, be it on premise or in the cloud.”