BGI speeds up analysis of DNA sequencing data

Share this on social media:

BGI, the world’s largest genomics institute, has heavily reduced the time taken to analyse batches of DNA sequencing data from nearly four days to just six hours using a Nvidia Tesla GPU-based server farm. This analysis speed up is an important step towards reaching the $1,000 genome milestone that would enable genomics to be used in clinical diagnostic tests as a practical component of patient care.

‘We are drowning in the genome data that our high-throughput sequencing machines create every day,’ said Dr Bingqiang Wang, head of high-performance computing at BGI. ‘GPU acceleration of our genome analysis applications enables our scientists to crunch through data and gain insights into bacteria, plants and humans faster than was ever possible. It offers the potential for researchers and healthcare professionals to identify highly-effective and affordable individualised medicines and treatments.’

BGI researchers and collaborators have developed three genome data analysis applications that are accelerated by Nvidia Tesla GPUs. Firstly, the SOAP3 aligner aligns short reads from the sequencing machine against existing reference genome sequences. Through GPU acceleration, it can find all three-mismatch alignments in tens of seconds per one million reads, compared to tens of minutes without GPU acceleration. This means that the sequencing and assembling of individual genomes for comparison to those previously sequenced and studied can be performed quickly to understand potential future disease states and treatments.

GSNP (SNP detection) is a GPU-accelerated version of the SOAPsnp software that detects variation of a single nucleotide polymorphism (SNP) in the DNA of a genome. These DNA SNP variations can be used to study how individuals develop diseases differently and respond to bacteria, viruses and medicines. Finally, the GAMA (high resolution genotyping tool) finds the distribution of the occurrence or frequency of particular gene variants, such as eye colour or the propensity for prostate cancer in a set of genes.

Tesla GPUs are massively-parallel accelerators based on the Nvidia Cuda parallel computing architecture. Developers can accelerate their applications by using Cuda C, Cuda C++, Cuda Fortran, or by using the simple, easy-to-use directive-based compilers.