Optimising data architecture for mixed HPC and AI workloads

Scientist working with scanning electron microscope.

Researchers could access data through a single, high-performance file system without changing applications or retooling workflows.

Credit: Ladanifer/Shutterstock

A data modernisation partnership between Hammerspace and Vanderbilt University's HPC research facility has overhauled scientific discovery capabilities with a 10-petabyte environment.

The challenges of rising costs and siloed data remain prominent among research teams scaling traditional HPC alongside new AI workloads. Despite unprecedented investment in GPUs and compute infrastructure, researchers often find themselves encountering data bottlenecks that get in the way of valuable discoveries.

Vanderbilt University’s Advanced Computing Centre for Research and Education (ACCRE), complete with nine schools and colleges housing 350 campus departments, sought a more flexible data platform that could support next-generation scientific computing. As the central HPC hub supporting research computing and storage services across Vanderbilt, ACCRE needed a more flexible architecture that could scale traditional HPC alongside emerging AI workloads without increasing operational complexity.

Performance, capacity, and archive tiers had evolved into separate proprietary systems, increasing operational overhead and fragmenting access to research data. As the cost and complexity of maintaining legacy infrastructure continued to grow, ACCRE sought an open architecture, free from appliance lock-in, that could unify performance, capacity, and archive storage under a single global namespace.

After a comprehensive evaluation, ACCRE selected Hammerspace to unify its storage infrastructure, reduce storage costs, and gain flexibility to deliver compute and storage services to researchers in a more agile way.

By unifying node-local NVMe, tier 1 flash storage, and tier 2 HDD storage in a 10-petabyte global namespace, ACCRE was able to reduce storage costs by 48% and deliver high-performance data access to their 750-node compute cluster. This cluster includes 80 GPU nodes with 320 NVIDIA GPUs.

Instead of deploying another large all-flash appliance with proprietary client dependencies, ACCRE focused on improving how data could be accessed, orchestrated, and moved across existing infrastructure.

Project beginnings

Having spent two years looking for a hardware-agnostic solution, Hunter Hagewood, Executive Director of Research Computing Operations at ACCRE, said that two key elements were at play when choosing a partner.

“One was a very smooth spend curve in terms of adding capacity,” said Hagewood. “That was going to need to be either on a spinning disc or on NVMEs or solid state.

“The other was a need to prepare a landing spot for the more intense AI workloads that we saw getting lined up, going forward.”

According to Hammerspace Senior Systems Engineer Kyle Knack, the high collective competence present across ACCRE at the start of the project enabled the body at large to be self-sufficient in deployment and design.

He said: “It was very important to me, too, that the team understood how to run this on their own. We didn't want to hand them a science project, because that would be counterintuitive to what their goals are. The technical team at ACCRE were one of the best I've worked with in my entire career; they absolutely made my role as an advisor a lot easier.”

Eradicating storage silos

Universities face additional complexity beyond typical HPC environments, due to commonly managing dozens or even hundreds of simultaneous research projects. These are often supported by grant funding that can lead to external, unmanaged storage silos purchased by Principal Investigators (PIs).
Previously, ACCRE was running on an SSD-based array that received between 600-700 megabytes per second.

After deployment, analysis throughput increased from roughly 600–700 MB/s to approximately 2.5 GB/s. representing a 4x improvement. And this was achieved without researchers needing to retool their code or workload.

Hammerspace unified Tier 0 node-local storage, shared Tier 1 storage, and ACCRE’s archive storage system into a single global namespace with automated data orchestration.

The composable model offered by Hammerspace addressed a persistent problem in life sciences HPC: data spread across isolated performance tiers, departmental storage silos, and disconnected research workflows.

Researchers could access data through a single, high-performance file system without changing applications or retooling workflows.

“For structural biology projects, researchers are carrying out protein forming and protein folding, to try and predict the build-outs of new proteins, and how those are going to interact with organisms,” said Hagewood.

“There's a lot of simulation, and much trial and error in those simulations. But their files are very large, so they're looking at long hours of computation. A consumption of statically assigned resources can last up to two weeks.

“While we could have gone out and bought an appliance for this case, it would have been massively expensive and overkill for what the group actually needs.”

“This project serves as a clear example of moving away from traditional large purchases to provisioning resources on a per-project basis,” said Knack. “With this, we can do small, self-contained mini-clusters with dedicated storage, while also retaining shared resource pools and archive pools for flexible project cycles.

Future steps

Going forward, Vanderbilt ACCRE, alongside Hammerspace, aims to expand its footprint and completely migrate away from legacy proprietary systems, while fully unifying its storage.

More particularly, Hammerspace’s solution could facilitate the orchestration of data across different storage systems and locations.

For example, data could be moved from an instrument like a cryo-microscope to a compute environment for analysis, before migrating assets to the researcher's lab - thereby automating data movement across their workflow.

The next phase of HPC and AI research will depend not only on faster processors, but on architectures capable of coordinating data efficiently across GPU clusters, instruments, archives, and distributed storage tiers. Institutions that recognise and act on this shift will be better positioned to support the increasingly data-driven nature of modern research.

When it comes to ensuring success in an HPC environment, Knack advises: “Don’t be afraid to rethink how you deploy and allocate storage.”

To learn more about the ongoing data architecture modernisation partnership between Vanderbilt University and Hammerspace, you can read the case study.

Optimising data architecture for mixed HPC and AI workloads

Project beginnings

Eradicating storage silos

Future steps

Topics

Read more about:

Editor's picks

The 2026 storage survey: strategies for AI and data-intensive research

NEW On-Demand | Ontologies - the missing foundation for AI in drug discovery

On-Demand | One workflow, every tool: how AI-native ELN is changing drug discovery

On Demand: Free Online Panel Discussion | LIMS innovation boosts precision and security

The path to AI federated learning for drug discovery

Workstations vs Clusters for Ansys Applications

Avoid Duplication, Reduce Fragmentation | Integrated Informatics for Scientific Research