Class ClusterState

java.lang.Object
org.elasticsearch.cluster.ClusterState
All Implemented Interfaces:
Diffable<ClusterState>, Writeable, org.elasticsearch.xcontent.ToXContent, org.elasticsearch.xcontent.ToXContentFragment

public class ClusterState extends Object implements org.elasticsearch.xcontent.ToXContentFragment, Diffable<ClusterState>
Represents the state of the cluster, held in memory on all nodes in the cluster with updates coordinated by the elected master.

Conceptually immutable, but in practice it has a few components like RoutingNodes which are pure functions of the immutable state but are expensive to compute so they are built on-demand if needed.

The Metadata portion is written to disk on each update so it persists across full-cluster restarts. The rest of this data is maintained only in-memory and resets back to its initial state on a full-cluster restart, but it is held on all nodes so it persists across master elections (and therefore is preserved in a rolling restart).

Updates are triggered by submitting tasks to the MasterService on the elected master, typically using a TransportMasterNodeAction to route a request to the master on which the task is submitted with ClusterService.submitStateUpdateTask(java.lang.String, T, org.elasticsearch.cluster.ClusterStateTaskExecutor<T>). Submitted tasks have an associated ClusterStateTaskConfig which defines a priority and a timeout. Tasks are processed in priority order, so a flood of higher-priority tasks can starve lower-priority ones from running. Therefore, avoid priorities other than Priority.NORMAL where possible. Tasks associated with client actions should typically have a timeout, or otherwise be sensitive to client cancellations, to avoid surprises caused by the execution of stale tasks long after they are submitted (since clients themselves tend to time out). In contrast, internal tasks can reasonably have an infinite timeout, especially if a timeout would simply trigger a retry.

Tasks that share the same ClusterStateTaskExecutor instance are processed as a batch. Each batch of tasks yields a new ClusterState which is published to the cluster by ClusterStatePublisher.publish(org.elasticsearch.cluster.ClusterStatePublicationEvent, org.elasticsearch.action.ActionListener<java.lang.Void>, org.elasticsearch.cluster.coordination.ClusterStatePublisher.AckListener). Publication usually works by sending a diff, computed via the Diffable interface, rather than the full state, although it will fall back to sending the full state if the receiving node is new or it has missed out on an intermediate state for some reason. States and diffs are published using the transport protocol, i.e. the Writeable interface and friends.

When committed, the new state is applied which exposes it to the node via ClusterStateApplier and ClusterStateListener callbacks registered with the ClusterApplierService. The new state is also made available via ClusterService.state(). The appliers are notified (in no particular order) before ClusterService.state() is updated, and the listeners are notified (in no particular order) afterwards. Cluster state updates run in sequence, one-by-one, so they can be a performance bottleneck. See the JavaDocs on the linked classes and methods for more details.

Cluster state updates can be used to trigger various actions via a ClusterStateListener rather than using a timer.

Implements ToXContentFragment to be exposed in REST APIs (e.g. GET _cluster/state and POST _cluster/reroute) and to be indexed by monitoring, mostly just for diagnostics purposes. The XContent representation does not need to be 100% faithful since we never reconstruct a cluster state from its XContent representation, but the more faithful it is the more useful it is for diagnostics. Note that the XContent representation of the Metadata portion does have to be faithful (in Metadata.XContentContext.GATEWAY context) since this is how it persists across full cluster restarts.

Security-sensitive data such as passwords or private keys should not be stored in the cluster state, since the contents of the cluster state are exposed in various APIs.