org.elasticsearch.repositories.blobstore (server 7.17.13 API)

package org.elasticsearch.repositories.blobstore

This package exposes the blobstore repository used by Elasticsearch Snapshots.

Preliminaries

The BlobStoreRepository forms the basis of implementations of Repository on top of a blob store. A blobstore can be used as the basis for an implementation as long as it provides for GET, PUT, DELETE, and LIST operations. For a read-only repository, it suffices if the blobstore provides only GET operations. These operations are formally defined as specified by the BlobContainer interface that any BlobStoreRepository implementation must provide via its implementation of BlobStoreRepository.getBlobContainer().

The blob store is written to and read from by master-eligible nodes and data nodes. All metadata related to a snapshot's scope and health is written by the master node.

The data-nodes on the other hand, write the data for each individual shard but do not write any blobs outside of shard directories for shards that they hold the primary of. For each shard, the data-node holding the shard's primary writes the actual data in form of the shard's segment files to the repository as well as metadata about all the segment files that the repository stores for the shard.

For the specifics on how the operations on the repository documented below are invoked during the snapshot process please refer to the documentation of the org.elasticsearch.snapshots package.

BlobStoreRepository maintains the following structure of blobs containing data and metadata in the blob store. The exact operations executed on these blobs are explained below.

 
   STORE_ROOT
   |- index-N           - JSON serialized {@link org.elasticsearch.repositories.RepositoryData} containing a list of all snapshot ids
   |                      and the indices belonging to each snapshot, N is the generation of the file
   |- index.latest      - contains the numeric value of the latest generation of the index file (i.e. N from above)
   |- incompatible-snapshots - list of all snapshot ids that are no longer compatible with the current version of the cluster
   |- snap-20131010.dat - SMILE serialized {@link org.elasticsearch.snapshots.SnapshotInfo} for snapshot "20131010"
   |- meta-20131010.dat - SMILE serialized {@link org.elasticsearch.cluster.metadata.Metadata} for snapshot "20131010"
   |                      (includes only global metadata)
   |- snap-20131011.dat - SMILE serialized {@link org.elasticsearch.snapshots.SnapshotInfo} for snapshot "20131011"
   |- meta-20131011.dat - SMILE serialized {@link org.elasticsearch.cluster.metadata.Metadata} for snapshot "20131011"
   .....
   |- indices/ - data for all indices
      |- Ac1342-B_x/ - data for index "foo" which was assigned the unique id Ac1342-B_x (not to be confused with the actual index uuid)
      |  |             in the repository
      |  |- meta-20131010.dat - JSON Serialized {@link org.elasticsearch.cluster.metadata.IndexMetadata} for index "foo"
      |  |- 0/ - data for shard "0" of index "foo"
      |  |  |- __1                      \  (files with numeric names were created by older ES versions)
      |  |  |- __2                      |
      |  |  |- __VPO5oDMVT5y4Akv8T_AO_A |- files from different segments see snap-* for their mappings to real segment files
      |  |  |- __1gbJy18wS_2kv1qI7FgKuQ |
      |  |  |- __R8JvZAHlSMyMXyZc2SS8Zg /
      |  |  .....
      |  |  |- snap-20131010.dat - SMILE serialized {@link org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot} for
      |  |  |                      snapshot "20131010"
      |  |  |- snap-20131011.dat - SMILE serialized {@link org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot} for
      |  |  |                      snapshot "20131011"
      |  |  |- index-123         - SMILE serialized {@link org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshots} for
      |  |  |                      the shard (files with numeric suffixes were created by older versions, newer ES versions use a uuid
      |  |  |                      suffix instead)
      |  |
      |  |- 1/ - data for shard "1" of index "foo"
      |  |  |- __1
      |  |  |- index-Zc2SS8ZgR8JvZAHlSMyMXy - SMILE serialized {@code BlobStoreIndexShardSnapshots} for the shard
      |  |  .....
      |  |
      |  |-2/
      |  ......
      |
      |- 1xB0D8_B3y/ - data for index "bar" which was assigned the unique id of 1xB0D8_B3y in the repository
      ......

Getting the Repository's RepositoryData

Loading the RepositoryData that holds the list of all snapshots as well as the mapping of indices' names to their repository IndexId is done by invoking BlobStoreRepository.getRepositoryData(org.elasticsearch.action.ActionListener<org.elasticsearch.repositories.RepositoryData>) and implemented as follows:

1. The blobstore repository stores the RepositoryData in blobs named with incrementing suffix N at /index-N directly under the repository's root.
2. For each BlobStoreRepository an entry of type RepositoryMetadata exists in the cluster state. It tracks the current valid generation N as well as the latest generation that a write was attempted for.
3. The blobstore also stores the most recent N as a 64bit long in the blob /index.latest directly under the repository's root.
1. First, find the most recent RepositoryData by getting a list of all index-N blobs through listing all blobs with prefix "index-" under the repository root and then selecting the one with the highest value for N.
2. If this operation fails because the repository's BlobContainer does not support list operations (in the case of read-only repositories), read the highest value of N from the index.latest blob.
1. Use the just determined value of N and get the /index-N blob and deserialize the RepositoryData from it.
2. If no value of N could be found since neither an index.latest nor any index-N blobs exist in the repository, it is assumed to be empty and RepositoryData.EMPTY is returned.

Writing Updated RepositoryData to the Repository

Writing an updated RepositoryData to a blob store repository is an operation that uses the cluster state to ensure that a specific index-N blob is never accidentally overwritten in a master failover scenario. The specific steps to writing a new index-N blob and thus making changes from a snapshot-create or delete operation visible to read operations on the repository are as follows and all run on the master node:

Write an updated value of RepositoryMetadata for the repository that has the same RepositoryMetadata.generation() as the existing entry and has a value of RepositoryMetadata.pendingGeneration() one greater than the pendingGeneration of the existing entry.
On the same master node, after the cluster state has been updated in the first step, write the new index-N blob and also update the contents of the index.latest blob. Note that updating the index.latest blob is done on a best effort basis and that there is a chance for a stuck master-node to overwrite the contents of the index.latest blob after a newer index-N has been written by another master node. This is acceptable since the contents of index.latest are not used during normal operation of the repository and must only be correct for purposes of mounting the contents of a BlobStoreRepository as a read-only url repository.
After the write has finished, set the value of RepositoriesState.State#generation to the value used for RepositoriesState.State#pendingGeneration so that the new entry for the state of the repository has generation and pendingGeneration set to the same value to signalize a clean repository state with no potentially failed writes newer than the last valid index-N blob in the repository.

If either of the last two steps in the above fails or master fails over to a new node at any point, then a subsequent operation trying to write a new index-N blob will never use the same value of N used by a previous attempt. It will always start over at the first of the above three steps, incrementing the pendingGeneration generation before attempting a write, thus ensuring no overwriting of a index-N blob ever to occur. The use of the cluster state to track the latest repository generation N and ensuring no overwriting of index-N blobs to ever occur allows the blob store repository to properly function even on blob stores with neither a consistent list operation nor an atomic "write but not overwrite" operation.

Creating a Snapshot

Creating a snapshot in the repository happens in the three steps described in detail below.

Initializing a Snapshot in the Repository (Mixed Version Clusters only)

In mixed version clusters that contain a node older than SnapshotsService.NO_REPO_INITIALIZE_VERSION, creating a snapshot in the repository starts with a call to Repository.initializeSnapshot(org.elasticsearch.snapshots.SnapshotId, java.util.List<org.elasticsearch.repositories.IndexId>, org.elasticsearch.cluster.metadata.Metadata) which the blob store repository implements via the following actions:

Verify that no snapshot by the requested name exists.
Write a blob containing the cluster metadata to the root of the blob store repository at /meta-${snapshot-uuid}.dat
Write the metadata for each index to a blob in that index's directory at /indices/${index-snapshot-uuid}/meta-${snapshot-uuid}.dat

TODO: Remove this section once BwC logic it references is removed

Writing Shard Data (Segments)

Once all the metadata has been written by the snapshot initialization, the snapshot process moves on to writing the actual shard data to the repository by invoking Repository.snapshotShard(org.elasticsearch.repositories.SnapshotShardContext) on the data-nodes that hold the primaries for the shards in the current snapshot. It is implemented as follows:

Note:

For each shard i in a given index, its path in the blob store is located at /indices/${index-snapshot-uuid}/${i}
All the following steps are executed exclusively on the shard's primary's data node.

Create the IndexCommit for the shard to snapshot.
Get the BlobStoreIndexShardSnapshots blob with name index-${uuid} with the uuid generation returned by ShardGenerations.getShardGen(org.elasticsearch.repositories.IndexId, int) to get the information of what segment files are already available in the blobstore.
By comparing the files in the IndexCommit and the available file list from the previous step, determine the segment files that need to be written to the blob store. For each segment that needs to be added to the blob store, generate a unique name by combining the segment data blob prefix __ and a UUID and write the segment to the blobstore.
After completing all segment writes, a blob containing a BlobStoreIndexShardSnapshot with name snap-${snapshot-uuid}.dat is written to the shard's path and contains a list of all the files referenced by the snapshot as well as some metadata about the snapshot. See the documentation of BlobStoreIndexShardSnapshot for details on its contents.
Once all the segments and the BlobStoreIndexShardSnapshot blob have been written, an updated BlobStoreIndexShardSnapshots blob is written to the shard's path with name index-${newUUID}.

Finalizing the Snapshot

After all primaries have finished writing the necessary segment files to the blob store in the previous step, the master node moves on to finalizing the snapshot by invoking Repository.finalizeSnapshot(org.elasticsearch.repositories.FinalizeSnapshotContext). This method executes the following actions in order:

Write a blob containing the cluster metadata to the root of the blob store repository at /meta-${snapshot-uuid}.dat
Write the metadata for each index to a blob in that index's directory at /indices/${index-snapshot-uuid}/meta-${snapshot-uuid}.dat
Write the SnapshotInfo blob for the given snapshot to the key /snap-${snapshot-uuid}.dat directly under the repository root.
Write an updated RepositoryData blob containing the new snapshot.

Deleting a Snapshot

Deleting a snapshot is an operation that is exclusively executed on the master node that runs through the following sequence of action when BlobStoreRepository.deleteSnapshots(java.util.Collection<org.elasticsearch.snapshots.SnapshotId>, long, org.elasticsearch.Version, org.elasticsearch.action.ActionListener<org.elasticsearch.repositories.RepositoryData>) is invoked:

Get the current RepositoryData from the latest index-N blob at the repository root.
For each index referenced by the snapshot:
1. Delete the snapshot's IndexMetadata at /indices/${index-snapshot-uuid}/meta-${snapshot-uuid}.
2. Go through all shard directories /indices/${index-snapshot-uuid}/${i} and:
  1. Remove the BlobStoreIndexShardSnapshot blob at /indices/${index-snapshot-uuid}/${i}/snap-${snapshot-uuid}.dat.
  2. List all blobs in the shard path /indices/${index-snapshot-uuid} and build a new BlobStoreIndexShardSnapshots from the remaining BlobStoreIndexShardSnapshot blobs in the shard. Afterwards, write it to the next shard generation blob at /indices/${index-snapshot-uuid}/${i}/index-${uuid} (The shard's generation is determined from the map of shard generations in the RepositoryData in the root index-${N} blob of the repository.
  3. Collect all segment blobs (identified by having the data blob prefix __) in the shard directory which are not referenced by the new BlobStoreIndexShardSnapshots that has been written in the previous step as well as the previous index-${uuid} blob so that it can be deleted at the end of the snapshot delete process.
3. Write an updated RepositoryData blob with the deleted snapshot removed and containing the updated repository generations that changed for the shards affected by the delete.
4. Delete the global Metadata blob meta-${snapshot-uuid}.dat stored directly under the repository root for the snapshot as well as the SnapshotInfo blob at /snap-${snapshot-uuid}.dat.
5. Delete all unreferenced blobs previously collected when updating the shard directories. Also, remove any index folders or blobs under the repository root that are not referenced by the new RepositoryData written in the previous step.

Related Packages

Package

Description

org.elasticsearch.repositories

Repositories of snapshot/restore information.

org.elasticsearch.repositories.fs
Classes

Class

Description

BlobStoreRepository

BlobStore - based implementation of Snapshot Repository

ChecksumBlobStoreFormat<T extends ToXContent>

Snapshot metadata file format used in v2.0 and above

ChunkedBlobOutputStream<T>

Base class for doing chunked writes to a blob store.

FileRestoreContext

This context will execute a file restore of the lucene files.

MeteredBlobStoreRepository

Package org.elasticsearch.repositories.blobstore