In the new world of cloud computing, storage is one of the most difficult problems to solve. Cloud storage needs to easily scale out, while keeping the cost of scaling as low as possible, without sacrificing reliability or speed and avoiding the inevitable failure of hardware as storage scales up. Three of the most innovative storage platform technologies are entirely software based and run on commodity hardware as distributed file systems.
You might be asking yourself, what exactly is a cloud distributed file system? In cloud computing, a distributed file system is any file system that allows access of files from multiple hosts sharing a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources. Distributed file systems differ in the way they handle performance, concurrent writes, permanent or temporary loss of nodes or storage and their policy of storing content.
Below we have outlined three of the most innovative software distributed file systems and why you would choose one over the other for your public or private cloud environment.
Gluster is an open source distributed file system that can scale to massive size for both public and private clouds. The GlusterFS architecture aggregates compute, storage, and I/O resources into a global namespace. Each server and its attached commodity storage are considered a node. Capacity is scaled by adding additional nodes or adding additional storage to each node. Performance is increased by deploying storage among more nodes and high availability is achieved by replicating data between nodes.
The GlusterFS hashing algorithm is distributed to all of the servers in the cluster managing file placement on each of its building blocks. There is no single server that manages metadata or the cluster. Its design is well suited to store a massive amount of files in a single global name space. Gluster features a modular design, using what it calls translators, to give additional options beyond simple file distribution. Translators extend the base functionally by offering the ability to easily change redundancy or stripe the data across the cluster. With the recent addition of native support for Gluster’s libgfapi into KVM+QEMU, gluster backed block devices and alpha stage native integration into Apache CloudStack Gluster is making a compelling offering for virtual machine storage.
Gluster is ideal for installations that require massive numbers of files to be distributed and available on hundreds of hosts. Recent additions make it useful as virtual machine backing for clusters that contain tens of storage hosts.
Like Gluster, Ceph is an open source storage platform designed to massively scale, but it has taken a fundamentally different approach to the problem. At its base, Ceph is a distributed object store , called RADOS, that interfaces with an object store gateway, a block device or a file system. Ceph has a very sophisticated approach to storage that allows it to be a single storage backend with lots of options built in, all managed through a single interface. Ceph also features native integration with KVM+QEMU. It also has tested support for Apache CloudStack cloud orchestration for both primary storage running virtual machines and as an image store using the S3 or Swift interface.
Aside from a variety of support storage interfaces, Ceph offers compelling features that can be enabled depending your workloads. Pools of storage can have a read-only or write-back caching tier. The physical location of data can be managed using CRUSH maps whereas snapshots can be handled entirely by the storage backend.
Ceph is versatile and can be tuned to any environment for any storage need. It also has the ability to gracefully scale to 1000s of hosts. Ceph is an excellent candidate for use on any task where a distributed file system would be used.
Like Gluster, Ceph is designed to run on commodity hardware to effectively deal with the inevitable failure of hardware. Recently, Red Hat acquired both Gluster and Inktank, the designers of Ceph. Red Hat intends to integrate both storage technologies into their current product line.
Nutanix distributed file system converges storage and compute into a single appliance based on commodity hardware. Like Gluster and Ceph, Nutanix features a scale out design that allows it to achieve redundancy and reliability while managing the inevitable hardware failures of scale.
One of the main features of Nutanix is that it uses solid-state drives in each appliance node to store hot data. This allows Nutanix to automatically shuffle hot data between the faster and slower disks as it becomes hot and cold. Nutanix storage architecture also features deduplication and compression.
It currently isn’t supported by CloudStack; however Nutanix supports NFS and iSCSI which allows it to be used with most hypervisors that are found in an enterprise. The self management capability of storage makes Nutanix one of the most turn key solutions on the market.
As you can see there are many types of distributed file systems in the market today and storage is typically one of the harder components when architecting a cloud solution. It is important to understand the difference between the top distributed file systems so you can find the storage solution that is right for your business.
photo credit: Scott Beale