Intel has released version 3.0 of its distribution of the open source Apache Hadoop software for working with Big Data. From the second quarter of this year, users of the Hadoop framework will be able to optimise its performance through Intel’s distribution.
The Apache Hadoop framework is an open source project for organising and storing large amounts of data. Intel’s distribution will offer up to 20 times faster decryption of the software with AES-NI, optimisation with SSDs and cache acceleration, faster querying, hardware enhanced compression, and automated tuning with Intel’s Active Tuner.
Users won’t be locked into the Intel-distributed version though, said Boyd Davis, vice president of Intel’s Datacenter Software group. ‘The enhancements will go back into the open source community,’ he commented.
Davis stated that the sort time of 1TB of data could be improved from four hours to seven minutes using Intel hardware and its distributed Hadoop software.
Partners supporting the launch include Cisco, TACC, McGraw Hill, SAP, Savvis, Red Hat, and NextBio, the latter providing big data technology for genomics and medicine. Dr Satnam Alag, CTO of NextBio, speaking at the launch, commented that the amount of genomic data organisations researching cancer, for example, have to deal with is phenomenal and that investment in Hadoop would allow NextBio to scale its software offerings as the amount of data increased. Making sense of the data and getting value out of it, not just in medicine, but in many areas from traffic to public safety, is what frameworks such as Hadoop are designed for.
The Intel-distributed Hadoop software will be sold with Intel support. The company is also running Project Rhino to improve the data protection capabilities of Hadoop.