The Wellcome Trust Sanger Institute, a charitably funded genomic research centre based in the United Kingdom, has deployed DataDirect Networks (DDN) high-performance storage as part of a 22 petabyte genomic storage environment.
In order to manage the massive surge in the volume of data required to evaluate genetic sequences, Sanger Institute chose DDN’s SFA high-performance storage engine and Exascaler Lustre file system appliance to deliver throughput and scalability necessary to support tens of thousands of data sequences requiring up to 10,000 CPU hours of computational analysis. With more than 2,000 scientists around the world, DDN SFA storage will also help facilitate data access and sharing, including for those who access data through the Sanger Institute’s website, which results in 20 million hits and 12 million impressions each week.
Phil Butcher, director of information communications technology, Wellcome Trust Sanger Institute, commented: ‘The sequencing machines that run today produce a million times more data than the machine used in the human genome project. We produce more sequences in one hour than we did in our first 10 years. For instance, a single cancer genome project sequences data that requires up to 10,000 CPU hours for analysis and we’re doing tens of thousands of these at once. The sheer scale is enormous and the computational effort required is huge.
‘Our storage strategy gives us incredible scaling. If we need to add a new sequencer, we can expand quickly and without disruption,’ he added.
To accommodate demands for increased bandwidth, Sanger Institute is upgrading its 10GbE network to 40GbE and plans to scale its current DDN storage to support expanded network capacity. In addition, the Institute is exploring DDN WOS distributed object storage platform, which could be ideal for increased collaboration and data sharing as part of a private cloud.