How Are Objects Stored in OpenStack Swift?

I recently received a question about how objects are stored in OpenStack Swift, and wanted to quote here from Joe Arnold, who has an excellent book on the subject:


Swift stores objects as binary files on a drive using a path containing its partition, and the operation’s timestamp.

Swift actually stores multiple copies of each object on drives across the cluster.  This is done by placing the object in a logical grouping called a partition.
A partition maps to multiple drives, each of which will have the object written to it.
How do processes on the node locate one piece of data in the cluster?
It goes to the local copy of the rings (a set up lookup tables distributed to every node in the cluster) and looks up all the locations for that data on the account ring, the container ring, or one of the object rings. Since multiple copies are stored, multiple locations are returned. The process then queries the locations for the data.
Swift uses hashing when calculating rings.
Hashing essentially takes a long string of data and generates a shorter fixed-length reference for it.  The hash should always return the same results.
Consistent hashing helps minimize the number of objects that move when capacity is added to or removed from a cluster. This is done by mapping all the possible hash values around a circle (ring). Each drive is then assigned to a point on the circle (ring) based on a hash value for the drive.
When an objects needs to be stored, the object’s hash is determined and then located on the circle (ring). The system will then do a search clockwise around the circle to locate the nearest drive marker. This will be the drive where the object is placed.

With an unmodified consistent hashing ring, there are numerous small ranges that become smaller or larger when drives are added or removed. This churn can result in objects not being available as they are moved during capacity changes.

To prevent this, Swift approaches the hashing ring differently.  Although the ring is still divided up into numerous small hash ranges, these ranges are all the same size and will not change in number. These fixed-width partitions are than assigned to drives using a placement algorithm.

How are partitions calculated?
The total power of partitions that can exist in your cluster is calculated when creating a Swift cluster using the partition power, an integer randomly picked during cluster creation.

The formula to calculate the total number of partitions in a cluster is:

total partitions in cluster = 2partition power

For example, if a partition power of 15 is chosen, the number of partitions your cluster will have is 215 = 32,768. Those 32,768 partitions are then mapped to the available drives. Although the number of drives my change and a cluster, the number of partitions will not.
To choose the partition power, estimate the maximum number of drives you expect your clustered have had its largest size. In general it is considered good to have about 100 partitions per drive in large clusters.
If you can make that prediction, you can calculate the partition power as follows:

   log2 (100 times the maximum number of disks)

Let’s look at an example for a large deployment of 1,800 drives that will grow to a total of 18,000 drives. If we set the maximum number of disks to 18,000, we can calculate a partition power that would allow for 100 partitions per disks:

partition power = log2 (100×18,000) = 20.77

So we judge that a good partition power for this example is 21.

Even if you guess wrong, there’s still recourse – it’s always possible to stand up another object storage cluster and have the authentication system route new accounts to that new storage cluster.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s