Builds analytic information over all hits in a search request. Aggregations are essentially a tool for sumarizing data, and that summary is often used to generate a visualization.
Types of aggregationsThere are three main types of aggregations, each in their own sub package:
- Bucket aggregations - which group documents (e.g. a histogram)
- Metric aggregations - which compute a summary value from several documents (e.g. a sum)
- Pipeline aggregations - which run as a seperate step and compute values across buckets
How Aggregations Work
TODO: Info about search phases goes here
Aggregations operate in general as Map Reduce jobs. The coordinating node for
the query dispatches the aggregation to each data node. The data nodes all
of the appropriate type, which in turn builds the
Aggregator for that node. This
collects the data from that shard, via
more or less. These values are shipped back to the coordinating node, which
performs the reduction on them (partial reductions in place on the data nodes
are also possible).
Three modes of operation
When it comes to actually collecting values, there are three ways aggregations operate, in general. Which one we choose depends on limitations in the query and how the data was ingested (e.g. if it is searchable).
The easiest to understand is the Compatible (i.e. usable in all situations) mode, which can be thought of as iterating each query hit and collecting a value from it. This is the least performant way to evaluate aggregations, requiring looking at every hit.
The fastest way to run an aggregation is by looking at the index structures
directly. For example, Lucene just stores the minimum and maximum values
of fields per segment, so a min aggregation matching all documents in a segment
can just look up its result. Generally speaking, this mode can be engaged when
there are no queries or sub-aggregations, and is gated by
Finally, we can rewrite an aggregation into faster aggregations,
or ideally into just a query. Generally, the goal here is to get to
filter by filters (which is an optimization on the filters aggregation
which runs it as a set of filter queries). Often this process will look like rewriting
a DateHistogram into a DateRange, and then rewriting the DateRange into Filters.
If you see
a good clue that the rewrite mode is being used. In general, when we rewrite aggregations,
we are able to detect if the rewritten agg can run in a "fast" mode, and decline the
rewrite if it can't.
In general, aggs will try to use one of the fast modes, and if that's not possible, fall back to running in compatible mode.
InterfaceDescriptionAn aggregation.Compare two buckets by their ordinal.Parses the aggregation request and creates the appropriate aggregator factory for it.Defines behavior for comparing
bucket keysto imposes a total ordering of buckets of the same type.
ClassDescriptionBase implementation of a
AggregationBuilder.Common xcontent fields that are shared among addAggregationA factory that knows how to create an
Aggregatorof a specific type.Common xcontent fields shared among aggregator buildersUtility class to create aggregations.Aggregation phase of a search request, used to collect aggregationsRepresents a set of
AggregationsAn Aggregator.Base implementation for concrete aggregators.An immutable collection of
AggregatorFactories.A Collector that can collect data in separate buckets.
MultiBucketsAggregation.Bucketordering strategy.Upper bound of how many
Aggregatorwill have to collect into.A wrapper around reducing buckets with the same key that can delay that reduction as long as possible.An internal implementation of
Aggregation.An internal implementation of
Aggregations.InternalMultiBucketAggregation<A extends InternalMultiBucketAggregation,B extends InternalMultiBucketAggregation.InternalBucket>Implementations for
MultiBucketsAggregation.Bucketordering strategy to sort by a sub-aggregation.
MultiBucketsAggregation.Bucketordering strategy to sort by multiple criteria.Contains logic for parsing a
XContentParser.Contains logic for reading/writing
BucketOrderfrom/to streams.Collects results for a particular segment.An aggregation service that creates instances of
IntConsumerthat throws a
MultiBucketConsumerService.TooManyBucketsExceptionwhen the sum of the provided values is above the limit (`search.max_buckets`).An aggregator that is not collected, this can typically be used when running an aggregation over a field that doesn't have a mapping.An implementation of
Aggregationthat is parsed from a REST response.A factory that knows how to create an
PipelineAggregatorof a specific type.The aggregation context that is part of the search context.Merges many buckets into the "top" buckets as sorted by
ExceptionDescriptionThrown when failing to execute an aggregationThrown when failing to execute an aggregation