How object storage work: theory and practice

The text discusses the features of the architecture, advantages and disadvantages compared to other cloud NAS. It also describes how to build distributed and fault-tolerant storage and what problems arise in this process.

Introduction

Object storage is everywhere right now. Before I came to Selectel, all I knew was that they were living in the cloud, creating complex tariffs, and Amazon was ahead of the curve again.... But if you think about it, you can say that about almost any cloud service. And that still doesn't tell you anything about its features.

Maybe the specification of such storage is hidden in the tasks it solves? Today, object storages can perform a wide variety of tasks - from serving static content to storing backups and backing up analytical databases.

Attempting to understand the nature of the unusual limitations only raises new questions: why can only an empty container be deleted? Why can't you quickly move a large amount of data from one container to another? And anyway, what kind of name is Objective and what magic is lurking under the bonnet?

Rome is in contact with Selectel's object storage team, and I've been looking into our experience developing and supporting such a product for 10 years. Below is the first part of the story, where I will share my findings on the theoretical part of the subject.

What are the objects and why do they need storage? Theory

Currently, object storage is an HTTP API that allows you to load, retrieve and delete data by name. It is actually a KV storage for large amounts of data (BLOB). Usually such comparisons are not made to avoid unnecessary confusion with databases, but we are professionals and we know how to tell the flies from the cutlets?

Special attention should be paid to:

Horizontal namespace with no nesting. Items can be divided into containers, but not more than one.
Guarantee of access and data integrity.
User memory is unconditionally unlimited. You can store several gigabytes or petabytes if you wish.

Here's his take on the architecture and next steps. For example, there is no need to store complex structures. This means that access algorithms are good at handling the differences between data. A billion different kinds of products in front of me, a billion fakes in different applications in a mobile phone system - that's a big difference.

Guaranteed integrity and strict protection of your privacy through scalability and distributed nature. This is the foundation of the architecture. This is where I can help you understand the delivery protocol. HTTP API integration does not provide any consistency. Apart from a large number of libraries and tools, you can always implement this yourself by accessing the repository "manually" via Curl or, for hobbyists, via telnet.

HTTP is implemented even in dummies, its debugging is simple and familiar, and there is almost no room for a complex error to occur on the client.

Besides the basic methods of working with objects, there is room for additional functionality, such as mass deletion operations or access restrictions. But this functionality uses only the underlying architecture, which is its main difficulty.

There are other NAMes in the cloud. Let's try to consider their main features:

Type	Block	File	Object
Advantages	Direct low-level data access. It's like having a 'bare' hard drive, but accessible over the network. This offers total operational freedom.	The typical file system structure is hierarchical, featuring granular access permissions. Depending on the technology used, it may support concurrent access.	It offers virtually limitless storage with a flat data structure and consistent response times. It's not bound to any specific machine and is accessible via API. It also has the capability to grant public access.
Disadvantages	It will be connected to a specific machine. It has a limited size and might need support and maintenance.	It has a limited size and might need maintenance and administration.	It doesn't ensure atomic operations, and race conditions can occur.

We intentionally avoided discussing local storages and the presence of FUSE drivers, such as s3fs, that map object storage as a file system. This topic is extensive enough to warrant its own separate article.

Object storage describes the actual ways in which information and payload characterization information is stored. This cannot be directly compared to files or more importantly lock files, so they are organized into principled and consistent systems.

A Brief History as Conclusion

Object storages gained widespread popularity thanks to Amazon's cloud. In 2006, the company introduced the Simple Storage Service (S3). Over time, the technology became so popular that its name almost became synonymous with object storage. Mention S3, and you're talking about object storage. Talk about object storage... well, you get the idea.

In 2010, OpenStack appeared — a joint development by NASA and the cloud provider Rackspace. From the former, the Nova component for virtual machines was inherited, and from the latter came the object storage, Swift. Open-source and relative simplicity (at that time) spurred a wave of new cloud providers.

Introduction

What are the objects and why do they need storage? Theory

A Brief History as Conclusion

What else we recommend to read: