What drives data storage changes in scientific research?

“The length of the refresh cycle is very much dependent on the market sector,” says Andy Palmer, Enterprise Data and Cloud Solutions Group - UK Lead, Seagate Technology, on changes he sees in data storage (Image: Shutterstock)

Our recent roundtable witnessed those working in scientific research discuss what typically drives changes in data storage. With scientific research generating vast amounts of data, choosing the right storage infrastructure – whether on-premises, hybrid, or cloud-based – is becoming more complex, of course. It’s no wonder changes can be challenging, and are often perceived as unwelcome.

Legal and regulatory requirements, AI-driven data processing, environmental sustainability, and resource scarcity are all key factors driving change, so once storage systems are up and running, what is that governs the timing of upgrades and replacements?

Jonas Lindemann, HPC Director, Lund University, told us: “We usually replace on a cycle, so that when we buy a new system, we migrate over in one go.

“We previously had a system where we had different network-attached storage systems for multiple research groups, but it was a nightmare to manage the different storage platforms and support lifecycles. So, we now have a single storage system to which we can add storage blocks.”

Nigel Berryman, Head of IT and Scientific Computing, CRUK Cambridge Institute, says it’s about getting more for the same money. “We used to operate on the Princeton model,” he explains, “which says that if you spend the same amount of money every x number of years, you get more storage for that budget as the costs tend to come down. That held true for us and worked out at about a five-year refresh; during Covid, we moved to a seven-year cycle. We ended up having to increase the amount of kit we had, as we ran out of capacity, but generally the Princeton rule still holds true.

“That might change with AI, but we’re waiting to see what happens there. We’re not yet generating a lot of new data, but there will come a time when we start looking at the data we’ve got and decide it needs a structure and a place to live.”

Deepak Aggarwal, Principal HPC Systems Manager, University of Cambridge, says changes are demand-driven. “Project leaders can buy licences for a period of time – one, two, three or five years,” he says. “They can add as many users as they want to those projects during that time for storage use. As users pay for that licence at first use, we can pretty much predict our storage needs and plan for it accordingly.

“We have looked at cloud burst solutions, such as AWS, but we found it was going to cost around four times that of on-premise solutions. Also, there is a reluctance from project leaders to move sensitive data into the cloud. For us, then, cloud is a complementary solution, but not a permanent alternative. It might suit our premium category of users.”

Mark Nossokoff, Research Director, Hyperion Research, sees replacement cycles extending. “There has been a wider industry trend of an extension to the typical life cycle of storage systems – extending from three or four years towards five or six years,” he says.

“One complicating factor in general is the accelerated roadmap cadence from some of the technology vendors. We’ve seen the likes of Nvidia and AMD announcing a significant roadmap evolution every 12 months or so; users just aren’t able to absorb those upgrades due to budget realities or technology compatibility issues.

Budgets can't always keep up with demand

“This is causing some angst on the frequency and size of those refresh cycles, as the technology is advancing at a much faster pace. This is why we are seeing cloud as an option to fill those technology gaps. Demand to keep up with this pace of technological change is there from a ‘desire’ perspective, but not always there from a budget perspective.”

Andy Palmer, Enterprise Data and Cloud Solutions Group - UK Lead, Seagate Technology, adds a vendor perspective. “The length of the refresh cycle is very much dependent on the market sector,” he says.

“Five years is still about right, but I think it is lengthening. There’s a lot more caution around committing budget to this kind of investment.

“We typically engage customers mid-cycle to help them plan their next refresh cycle, ensuring continuity and performance. We sell systems to cloud providers too, but that’s also been through an interesting time. We’re seeing a shift in cloud investment priorities, with increased focus on compute infrastructure.”

Berryman adds that, while storage used to be a problem, it isn’t so much now. “In the past, when our bread and butter was doing gene sequence alignment, the bottleneck in our cluster was storage, rather than the CPUs or GPUs,” he explains.

“We seem to have solved that now with the use of 200Gigabit InfiniBand and NVME. Capacity is a bit of an issue, but it’s nowhere near as bad as it once was. We’re still playing catch-up on GPUs now, which everyone is moving towards.”

To understand more of the challenges about data storage – and the solutions that are out there to resolve them – read our free White Paper. You can access it here.

What drives data storage changes in scientific research?

Budgets can't always keep up with demand

To understand more of the challenges about data storage – and the solutions that are out there to resolve them – read our free White Paper. You can access it here.

Topics

Read more about:

Editor's picks

Argentum AI signs $2.5bn European AI data centre deal

NEW Webcast | Ontologies - the missing foundation for AI in drug discovery

On-Demand | One workflow, every tool: how AI-native ELN is changing drug discovery

On Demand: Free Online Panel Discussion | LIMS innovation boosts precision and security

Workstations vs Clusters for Ansys Applications

Avoid Duplication, Reduce Fragmentation | Integrated Informatics for Scientific Research

Accelerating Early-Stage Drug Discovery: How Novalix Integrates CDD Vault and StarDrop for Smarter R&D