Skip to main content

The data deluge: Creating a sustainable storage infrastructure for scientific research

Andy Palmer

Andy Palmer is the UK Channel Lead for Seagate’s Enterprise Data & Cloud Solutions Group

Credit: Seagate

In today’s life sciences landscape, data is not just a by-product of research—it’s the lifeblood of discovery. With the explosion of high-resolution imaging, genomic sequencing, and AI-driven analysis, research organisations are generating unprecedented volumes of data at accelerating speeds. This surge brings opportunity and urgency: the potential to unlock transformative insights, but only if that data can be stored, accessed, and analysed efficiently. 

As the global storage footprint struggles to keep pace, innovative storage technologies are bridging the gap. In this interview, Seagate’s Andy Palmer explores how advanced storage solutions can help life sciences organisations boost performance, reduce waste, and build more sustainable, scalable infrastructures for the data-driven breakthroughs of tomorrow.

Data storage needs are growing fast. Can you give us a sense of the scale?

Palmer: What we’re trying to solve is the challenge of making research data accessible and safe. There’s a trade-off between those two things—between accessibility and safety.

Data storage is growing exponentially. In 2010, the world produced around two zettabytes of data. By 2028, research indicates we’ll hit 400 zettabytes. But how much is that? To help visualise this, let me take an example of the British Library: holding 170 million published books and artefacts, with over 400 terabytes stored digitally. 400ZB is x 30 times more data than the British Library holds every second. 

That’s a massive increase—and it will only keep growing.

The challenge is: how do I store more of the data I've produced? I've gone to all the expense of generating it—I don’t want to delete it, because there’s value in that data and the research I’ve conducted. So what’s the trade-off? You could just keep adding more and more storage into the existing infrastructure, or you could get smarter about it.

We look at this on a global scale. We’ve all seen the headlines—cloud and data centres are consuming more space and power. People often present this as a negative, but there’s real potential in the data that is being created. Cross-analysing varied datasets can lead scientists to unexpected findings and help them gain a deeper understanding that could lead to further research breakthroughs.

With the advent of AI, we have natural language models that work for us, presenting a logical, human-sounding summary of results. As we move into agentic AI, where AI has the agency to act on your behalf, we will see a massive increase in data production.

What is Seagate’s position in this data growth challenge?

Palmer:  If we focus on the “why” rather than the “what” for a moment, we see a widening gap between the amount of data produced and the industry’s capacity to store it. Our global storage capacity simply isn’t keeping up with data growth.

So what do we do? At Seagate, we're heavily investing in research on increasing the storage potential of a single hard drive. When I joined Seagate 10 years ago, we launched a 16TB drive. Everyone was very proud of it, but some customers said, “No one needs 16 terabytes.” Well, now we have a 32TB drive. We’ve announced a 36TB model, a 60TB model is on the way, and we’ve got a roadmap to over 100TB in the same format. The key point is that we’re at the forefront of this research and are often first to market with many of these capacity milestones.

Over many years, a vast amount of data has been generated in life sciences, and within that data lies potential. Researchers can unlock new insights by combining different data sets in novel ways. So first and foremost, it’s about using and reusing data to discover new things. *Well, now we have a 32TB drive. We’ve announced a 36TB model, and with our recent technology innovation - the Mozaic platform, we’re already delivering 40TB drives using HAMR (Heat-Assisted Magnetic Recording)—with a roadmap to 100TB and beyond. This innovation is not just about capacity. Mozaic 3+ drives are 40% more energy-efficient per terabyte, and they’re built to support AI and edge workloads, which are increasingly common in scientific research.

Over many years, a vast amount of data has been generated in life sciences, and within that data lies potential. Researchers can discover new perspectives by merging multiple data streams. So first and foremost, it’s about using and reusing data to discover new things.

The second point is that the type of data being stored is changing. It’s no longer just rows of numbers or letters in files. For example, we now have high-definition images from body scans—CT or MRI—where thin slices are taken across the body. These require increasingly high-resolution cameras and sensors to produce, and analysing them involves comparing vast amounts of data to identify anomalies. That’s a very data-rich process. 

Then you add AI to the equation. AI allows us to run those comparisons automatically and at scale. It dramatically speeds things up, but you have to store data snapshots along the way to do that effectively. That’s another major contributor to the exponential growth in storage needs.

Is this growth in data unique to AI, or is it also tied to broader trends in research?

Palmer: It’s both. In research, especially in scientific fields, there’s a strong emphasis on retaining data. Deleting data simply isn’t an option, whether for scientific or legal reasons. AI models also require access to the original data, which must be preserved as a source of truth. When AI generates outputs, each checkpoint is saved, and the process continues. That creates multiple layers of data retention.

We’re also seeing more obligations from governments, particularly in Europe, mandating that source files be kept intact as part of compliance and transparency efforts around AI. So, AI is not just producing more data; it’s also making retention non-negotiable.

As drive capacity grows, are there challenges with data protection or reliability?

Palmer: The challenge with high-capacity drives is data protection. If something goes wrong, the rebuild times—the time it takes to restore a system—get longer and longer. On average, it can take around three hours per terabyte to rebuild. That’s a long time for a drive to be out of service.

When a drive fails and data needs to be recovered, it is often thrown away, creating e-waste. These drives contain rare earth metals, strong magnets, and other valuable components, so they are a real environmental issue.

We’ve developed ADR (Autonomous Drive Regeneration) technology that allows the drive to stay in service, even if part of it fails. Inside a modern hard drive, there are ten platters. If one platter has a problem, we can continue operating with the remaining nine. You don’t lose any capacity, because there’s always spare capacity across various drives.

That means our customers’ systems are much more robust and safer. We have 99.9999% availability—world-class performance. The entire rebuild process happens automatically inside the drive enclosure. The user never even knows it’s happening. Their systems are essentially self-healing.

Even better, these self-healing systems require half the RAM and half the CPU power of traditional systems. Customers use less hardware and semiconductor material, which ultimately saves money.

Some collaboration examples: Just recently, to address the growing demands of genomic data storage, Seagate partnered with leading industry players to create a modular, scalable, and cost-effective object storage solution. This system combines high-density hardware with software-defined storage and policy-driven data management, enabling seamless data distribution, high availability, and reduced total cost of ownership. The architecture supports self-healing, multi-site deployments under a single namespace, minimising downtime and environmental impact while maximising performance and efficiency.

That’s the thinking driving our approach: how can we get more value from the equipment we sell while reducing waste?

Another good example is the case in Taiwan, where Academia Sinica’s climate research team faced a massive data surge, projecting over 10PB of growth in just four years, as they refined their Earth system model. To meet these demands, they adopted Seagate’s storage system, which delivered high-density, high-performance storage with 99.999% availability. The solution reduced rack space by 75%, cut total cost of ownership by 80%, and slashed rebuild times by 93% using Seagate’s ADAPT technology. ADAPT is Seagate’s proprietary data protection technology designed to improve storage system resilience and efficiency.  This enabled the team to focus on climate modelling while minimising downtime, hardware waste, and operational costs.

Beyond hardware, does Seagate offer any services that address these challenges?

Palmer: Yes, we also offer a cloud service in addition to hardware. One of the benefits of cloud services is that they centralise computing resources, making them far more efficient than highly distributed architectures.

We provide this cost-effectively, leveraging the knowledge and experience we’ve built through years of working with the world’s largest cloud providers. I’d say that 8 out of 10 times, when you search for something using a popular search engine, the data retrieved is stored on a Seagate hard drive.

We’ve applied that expertise to our  S3-compatible cloud object storage, which is highly functional, highly available, secure, supports the most demanding AI/ML applications and cloud-native workloads and is offered at a very competitive price.

How can large-scale research using HPC be more sustainable?

Palmer: It would be wrong to point to a single factor and say, "That's the answer." Our offering is a layering of capabilities that contribute to helping our customers meet their environmental goals.

Data centres often get a bad rap because they draw a lot of power, which sets off alarm bells—people see the demand rising. But in reality, it's like the difference between having a population spread across an entire country and concentrated in cities. Living in cities is more efficient than a dispersed population. It’s the same with computing.

When you concentrate computing, networking, and power resources in a single data centre, you get more out of those assets than if they were distributed worldwide. That’s what I meant by increased efficiency. But again, that’s just one layer of what we do.

We’ve already discussed areal density, which means increasing the storage capacity of each hard drive. We’ve also discussed how we deploy those drives in self-healing platforms that dramatically reduce e-waste and the need for excessive computing resources.

We also have a broad environmental management strategy. For example, we track and reduce the carbon footprint of our products, known as embodied carbon. As you scale up drive capacities, the embodied carbon per terabyte decreases, which is another efficiency gain.

At Seagate, we’ve taken some very concrete steps. Customers increasingly ask us to complete environmental and DEI (Diversity, Equity, and Inclusion) questionnaires as part of their procurement processes. Second, university graduates choose to work for companies that can credibly and realistically discuss sustainability. So, having a robust environmental strategy helps us in the war for talent.

Third, investment bankers prefer to hold shares in companies with strong ESG (Environmental, Social, and Governance) credentials. This strategy also improves our access to capital.

But beyond these, we’re embedding circularity into our design and operations. Our self-healing drives extend product lifespans and reduce the need for replacements. Technologies like ADAPT minimise downtime and material waste by enabling faster, more efficient rebuilds. We’re also exploring drive regeneration techniques that allow partially failed drives to remain in service, further reducing e-waste.

Finally, we’re committed to transparency and continuous improvement. We publish lifecycle assessments, engage in closed-loop recycling initiatives, and collaborate with partners to ensure our solutions support a more circular, low-carbon digital infrastructure. Sustainability isn’t a feature—it’s a foundation.

Andy Palmer is the UK Channel Lead for Seagate’s Enterprise Data & Cloud Solutions Group

Media Partners