Tom Leyden discusses why object storage is one of the key facilitators to effecting best practice collaboration among researchers
Most research organisations would jump at the opportunity to improve the storing, sharing and access to project-based research material to facilitate collaboration and a better understanding of their work. While this undoubtedly sounds like a very simple nirvana, actually achieving it presents several unique technical and cultural challenges.
On the technical side, it is not unusual for researchers to own and manage a plethora of data types, growing in volume and velocity. However, in quite a few cases, culture, or rather human nature is also a challenge. Some data is so valuable to a research team that multiple discrete copies might be retained on removable hard drives which in turn are stored in various locations. And, we’ve all heard the disaster stories about data that goes missing and the challenges that lost data presents.
Additionally, research organisations often have to address long-term data retention needs that extend well beyond the realm of individual research projects. Many UK research-intensive institutes are faced with increasingly stringent requirements for the management of project data outputs by funding bodies such as UK Research Councils. As grant funding in the UK supports best practice, it is critical to have a proven data management plan that documents how organisations will preserve data for decades while ensuring maximum appropriate access and reuse by third parties.
To expedite, or increase the probability of finding answers to research projects, organisations must first overcome these technical challenges and minimise the siloed data storage approach. Then they must develop a way to address an individual’s natural instinct to protect what is perceived as their IP, and put measures in place to ensure any regulatory compliance requirements are met. When these challenges can be successfully overcome, organisations will find the key to unlocking the real value of research assets.
Traditional storage approaches make it challenging to meet this combination of requirements including scale, access, distribution, security, performance and cost, so more and more research facilities are moving to alternative methods of storage and looking to cloud computing as a solution. Hyperscale Cloud companies have to cost-effectively process, protect and distribute massive amounts of data globally, and increasingly the most successful players like Amazon’s S3 and Facebook’s Haystack are meeting these challenges using object storage. Object storage is a turnkey appliance built upon a cloud storage foundation and addresses both the massively scalable requirements of big data and the ability to provide secure collaboration, file sharing and content distribution to users.
For 2013, IDC anticipates object storage to grow faster than any other segment in the file-and-object storage market. A driver in the growth of private cloud adoption is the control over data security and resiliency users get compared to public clouds. As more companies look at object storage for collaborative file sharing, archive and backup, IDC is seeing acceleration in cloud adoption.
Object storage is not new, however. The paradigm has been around for more than a decade and several platforms have been launched, tested and abandoned. But the success of Amazon’s S3 has demonstrated the benefits of architectures where applications access data directly through a REST interface. Amazon’s S3 now stores well beyond a trillion objects, but its technology mainly consists of in-house development. So how can other organisations build object storage infrastructures as cost-efficient and durable as Amazon’s S3?
The big challenge is of course to keep the storage overhead as low – from a cost and management point of view – as possible without compromising durability. Most of the unstructured data that is stored online or in active archives is immutable data. A lot of the functionality that is built in traditional file systems, such as NAS or SAN, is there to enable users to access files and modify them. Object Storage does not have this functionality. Object storage separates the storage of data from the relationship the individual data items have to each other. Additionally, if a user modifies a file, it is simply stored as a new object, a new version which, in part, enables multiple individuals to work on the same data sets at the same time.
The backend architecture of an object storage platform – at least a true object storage platform (I say ‘true’ because some object storage systems still have a POSIX file system layer on the disk level) is designed in such a way that all the storage nodes are presented as one single pool and there is no file system hierarchy. The architecture of the platform and new data protection schemes (vs. RAID, the de-facto data protection scheme for SAN and NAS) allow this pool to scale virtually unlimited, while keeping the system simple to manage.
Object storage systems are usually accessed through a Representation State Transfer (REST) interface – for simplicity purposes REST is a type of API using HTTP. Files or objects are ‘dumped’ into a large uniform storage pool and an identifier is kept to locate the object when it is needed. Applications that are designed to run on top of object storage will use these identifiers through the REST protocol to locate the object when it needs to be retrieved. Objects are stored with metadata (information about the object), which enables very rich search capabilities and all sorts of analytics for unstructured data.
From a management, cost and ability to massively scale perspective, traditional NAS and SAN systems, in my opinion, struggle to deal with the demands of research intensive organisations. A lot of effort has been put in optimising object storage platforms and reducing the cost of object storage to a few pennies per GB. Most research organisations will grapple with data scale, data access, distribution of data, data security and data performance, and will have to balance this against their storage costs.
Object Storage is more than a smarter paradigm that allows you to store large volumes of data. Globally distributed teams have become standard practice. Think just one moment about researchers from different institutions working on the same project, or software being developed in California and then tested in India. Geographically distributed storage pools enable multiple teams to work on the same datasets in real time. Object storage is one of the key facilitators to effecting best practice collaboration among researchers and it is only a matter of time before its use accelerates.
Tom Leyden is product marketing director, WOS Object Storage, DataDirect Networks