Package org.elasticsearch.snapshots

This package exposes the Elasticsearch Snapshot functionality.

Preliminaries

There are two communication channels between all nodes and master in the snapshot functionality:

  • The master updates the cluster state by adding, removing or altering the contents of its custom entry SnapshotsInProgress. All nodes consume the state of the SnapshotsInProgress and will start or abort relevant shard snapshot tasks accordingly.
  • Nodes that are executing shard snapshot tasks report either success or failure of their snapshot task by submitting a SnapshotShardsService.UpdateIndexShardSnapshotStatusRequest to the master node that will update the snapshot's entry in the cluster state accordingly.

Snapshot Creation

Snapshots are created by the following sequence of events:

  1. An invocation of SnapshotsService.createSnapshot(org.elasticsearch.action.admin.cluster.snapshots.create.CreateSnapshotRequest, org.elasticsearch.action.ActionListener<org.elasticsearch.snapshots.Snapshot>) enqueues a cluster state update to create a SnapshotsInProgress.Entry in the cluster state's SnapshotsInProgress. This initial snapshot entry has its state set to INIT and an empty map set for the state of the individual shard's snapshots.
  2. After the snapshot's entry with state INIT is in the cluster state, SnapshotsService determines the primary shards' assignments for all indices that are being snapshotted and updates the existing SnapshotsInProgress.Entry with state STARTED and adds the map of ShardId to SnapshotsInProgress.ShardSnapshotStatus that tracks the assignment of which node is to snapshot which shard. All shard snapshots are executed on the shard's primary node. Thus all shards for which the primary node was found to have a healthy copy of the shard are marked as being in state INIT in this map. If the primary for a shard is unassigned, it is marked as MISSING in this map. In case the primary is initializing at this point, it is marked as in state WAITING. In case a shard's primary is relocated at any point after its SnapshotsInProgress.Entry has moved to state STARTED and thus been assigned to a specific cluster node, that shard's snapshot will fail and move to state FAILED.
  3. The new SnapshotsInProgress.Entry is then observed by SnapshotShardsService.clusterChanged(org.elasticsearch.cluster.ClusterChangedEvent) on all nodes and since the entry is in state STARTED the SnapshotShardsService will check if any local primary shards are to be snapshotted (signaled by the shard's snapshot state being INIT). For those local primary shards found in state INIT) the snapshot process of writing the shard's data files to the snapshot's Repository is executed. Once the snapshot execution finishes for a shard an UpdateIndexShardSnapshotStatusRequest is sent to the master node signaling either status SUCCESS or FAILED. The master node will then update a shard's state in the snapshots SnapshotsInProgress.Entry whenever it receives such a UpdateIndexShardSnapshotStatusRequest.
  4. If as a result of the received status update requests, all shards in the cluster state are in a completed state, i.e are marked as either SUCCESS, FAILED or MISSING, the SnapshotShardsService will update the state of the Entry itself and mark it as SUCCESS. At the same time SnapshotsService.endSnapshot(org.elasticsearch.cluster.SnapshotsInProgress.Entry) is executed, writing the metadata necessary to finalize the snapshot in the repository to the repository.
  5. After writing the final metadata to the repository, a cluster state update to remove the snapshot from the cluster state is submitted and the removal of the snapshot's SnapshotsInProgress.Entry from the cluster state completes the snapshot process.

Deleting a Snapshot

Deleting a snapshot can take the form of either simply deleting it from the repository or (if it has not completed yet) aborting it and subsequently deleting it from the repository.

Aborting a Snapshot

  1. Aborting a snapshot starts by updating the state of the snapshot's SnapshotsInProgress.Entry to ABORTED.
  2. The snapshot's state change to ABORTED in cluster state is then picked up by the SnapshotShardsService on all nodes. Those nodes that have shard snapshot actions for the snapshot assigned to them, will abort them and notify master about the shards snapshot status accordingly. If the shard snapshot action completed or was in state FINALIZE when the abort was registered by the SnapshotShardsService, then the shard's state will be reported to master as SUCCESS. Otherwise, it will be reported as FAILED.
  3. Once all the shards are reported to master as either SUCCESS or FAILED the SnapshotsService on the master will finish the snapshot process as all shard's states are now completed and hence the snapshot can be completed as explained in point 4 of the snapshot creation section above.

Deleting a Snapshot from a Repository

  1. Assuming there are no entries in the cluster state's SnapshotsInProgress, deleting a snapshot starts by the SnapshotsService creating an entry for deleting the snapshot in the cluster state's SnapshotDeletionsInProgress.
  2. Once the cluster state contains the deletion entry in SnapshotDeletionsInProgress the SnapshotsService will invoke Repository.deleteSnapshot(org.elasticsearch.snapshots.SnapshotId, long, org.elasticsearch.action.ActionListener<java.lang.Void>) for the given snapshot, which will remove files associated with the snapshot from the repository as well as update its meta-data to reflect the deletion of the snapshot.
  3. After the deletion of the snapshot's data from the repository finishes, the SnapshotsService will submit a cluster state update to remove the deletion's entry in SnapshotDeletionsInProgress which concludes the process of deleting a snapshot.