Continuing from my last post, and moving on to storage innovation in open cloud (hold on, my Azure storage blog is coming soon!). I usually start with fundamentals and if you are not familiar with OpenStack, it is built up from many different modules that cover virtual machines/compute (Nova), object storage (Swift), block storage (Cinder), networking (Neutron), dashboard (Horizon), identity services (Keystone), image services (Glance), orchestration (Heat) etc. Software-defined storage environment provides policy management for feature options such as deduplication, replication, thin provisioning, snapshots and backup.
Cinder provides block storage interfaces that deliver more traditional storage resources and is most suitable for storing persistent copies of virtual machines in an OpenStack environment. In simplest form, Cinder is the block-storage component, delivered using standard protocols such as iSCSI. To OpenStack hosts, the storage appears as a block device that uses iSCSI, Fiber Channel, NFS or a range of other proprietary protocols for back-end connectivity.
Ceph calls themselves as future of storage. Ceph (you can configure Cinder– OpenStack requires a driver to interact with Ceph block devices) is a unified, distributed storage system designed for performance, reliability and scalability. Ceph provides a traditional file system interface with POSIX semantics. From the basic architecture perspective, you can mount Ceph as a thinly provisioned block device. Ceph stores a client’s data as objects within storage pools. When you write data to Ceph using a block device, Ceph automatically stripes and replicates the data across the cluster. Ceph’s RADOS Block Device (RBD) also integrates with Kernel Virtual Machines, bringing Ceph’s virtually unlimited storage to KVMs running on your Ceph clients. Using the CRUSH algorithm, Ceph calculates which placement group should contain the object, and further calculates which Ceph OSD Daemon (interacts with a logical disk) should store the placement group and it enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically.
The topology of a Ceph cluster is designed around replication and information distribution, which are intrinsic and provide data integrity. Think of RADOS as a building block for ceph, which has its libraries/ interface in the form of LIBRADOS ad then apps interacts with the REST gateway called RADOSGW. Ceph FS is a distributed file system for client management.
There are few more core concepts and functions to understand when you talk or present about Ceph. A Ceph OSD Daemon (Ceph OSD) stores data, handles data replication, recovery, backfilling, rebalancing, and provides some monitoring information to Ceph Monitors. A Ceph Monitor maintains maps of the cluster state, including the monitor map, the OSD map. So now you have taken care of data handling and monitoing. You need metadata. Ceph Metadata Server (MDS) stores metadata on behalf of the Ceph Filesystem. Ceph Metadata Servers make it feasible for POSIX file system users to execute basic commands like ls, find, etc. without placing an enormous burden on the Ceph Storage Cluster.
This is an ongoing project from open source community and lot more advance features are coming on a regular basis. Now, lets compare this with Swift, and talk about Azure Blobs….stay tuned for my next post on storage, and share your feedback!