In my Storage series of blogs, I have articulated the challenges with the hyper growth of unstructured data. Traditional storage system don’t efficiently scale to support new workloads, and are time-consuming, resource intensive to adjust to workload changes and migrations due to lack of automation/ software defined storage capabilities, and of course cost varies and increases when you put management, storage capacity, and other parameters into the account.
I had mentioned about software defined storage, and other advancements in the storage era. In my previous post I had mentioned about OpenStack Cinder, so now is the turn of Swift. OpenStack Swift is an open source object storage system. If you are following my blogs then you know that Object Storage store files in a flat organization of containers and use unique IDs and metadata to retrieve them through http interface. Objects and files are written to multiple drives, and the Swift software ensures the data is replicated across a server cluster. The system is accessed through a REST http API, and can scale horizontally to store petabytes of data through the addition of nodes in a scale- out fashion.
Lets understand the highest level of Swift architecture. Ring represents a mapping between the names of entities stored on disk and their physical location. There are separate rings for accounts, containers, and storage policy. Other OpenStack components communicate with the appropriate ring to determine its location in the cluster if they need to perform any operation on an object, container, or an account. The Ring maintains this mapping using zones, devices, partitions, and replicas. Each partition in the ring is replicated, by default, 3 times across the cluster, and the locations for a partition are stored in the mapping maintained by the ring. The ring is also responsible for determining which devices are used for handoff in failure scenarios. As a self-healing/self-retention attribute of Swift it replicates its content from active nodes to new locations in the cluster if a server or hard drive fails .
Other core architecture concepts include the servers: Object Server, Proxy Server, Container servers etc. The Container Server’s, as the name suggests contains objects listings in container, and its primary job is to handle listings of objects. The listings are stored as sqlite database files, and replicated across the cluster similar to how objects are. The Object Server is a very simple blob storage server that can store, retrieve and delete objects stored on local devices. Objects are stored as binary files on the file system with metadata stored in the file’s extended attributes. The Proxy Server is responsible for tying together the rest of the Swift architecture. For each request, it will look up the location of the account, container, or object in the ring and route the request accordingly.
Cleversafe is one of the leading Swift player in the market and addressing the need of modern storage requirements. It turns web-scale storage challenges into business advantage. Instead of creating multiple copies of data and storing them in different systems, it disperse (algorithms that parse data into multiple segments and then distribute these segments across multiple nodes in the storage cluster) in a single instance, stores and then retrieve the desired storage. Dispersed storage is one of my favorite characteristic for high availability and reliability. Dispersal algorithm Expands, Transforms, Slices (rather than striping) and Disperse across the nodes in the cluster/ cloud. It’s like container having certain priories like width (slices) and threshold (min number to restore the data).
Do you think Cleversafe and Swift in particular will make the difference? Is Dispersal a killer mathematic magic?