Class SamplerAggregator

All Implemented Interfaces:
java.io.Closeable, java.lang.AutoCloseable, org.apache.lucene.search.Collector, org.elasticsearch.common.lease.Releasable, SingleBucketAggregator
Direct Known Subclasses:
DiversifiedBytesHashSamplerAggregator, DiversifiedMapSamplerAggregator, DiversifiedNumericSamplerAggregator, DiversifiedOrdinalsSamplerAggregator

public class SamplerAggregator
extends DeferableBucketAggregator
implements SingleBucketAggregator
Aggregate on only the top-scoring docs on a shard. TODO currently the diversity feature of this agg offers only 'script' and 'field' as a means of generating a de-dup value. In future it would be nice if users could use any of the "bucket" aggs syntax (geo, date histogram...) as the basis for generating de-dup values. Their syntax for creating bucket values would be preferable to users having to recreate this logic in a 'script' e.g. to turn a datetime in milliseconds into a month key value.
  • Field Details

    • SHARD_SIZE_FIELD

      public static final org.elasticsearch.common.ParseField SHARD_SIZE_FIELD
    • MAX_DOCS_PER_VALUE_FIELD

      public static final org.elasticsearch.common.ParseField MAX_DOCS_PER_VALUE_FIELD
    • EXECUTION_HINT_FIELD

      public static final org.elasticsearch.common.ParseField EXECUTION_HINT_FIELD
    • shardSize

      protected final int shardSize
    • bdd

  • Method Details

    • scoreMode

      public org.apache.lucene.search.ScoreMode scoreMode()
      Description copied from class: AggregatorBase
      Most aggregators don't need scores, make sure to extend this method if your aggregator needs them.
      Specified by:
      scoreMode in interface org.apache.lucene.search.Collector
      Overrides:
      scoreMode in class AggregatorBase
    • buildDeferringCollector

      public DeferringBucketCollector buildDeferringCollector()
      Description copied from class: DeferableBucketAggregator
      Build the DeferringBucketCollector. The default implementation replays all hits against the buckets selected by {#link DeferringBucketCollector.prepareSelectedBuckets(long...).
      Overrides:
      buildDeferringCollector in class DeferableBucketAggregator
    • shouldDefer

      protected boolean shouldDefer​(Aggregator aggregator)
      Description copied from class: DeferableBucketAggregator
      This method should be overridden by subclasses that want to defer calculation of a child aggregation until a first pass is complete and a set of buckets has been pruned.
      Overrides:
      shouldDefer in class DeferableBucketAggregator
      Parameters:
      aggregator - the child aggregator
      Returns:
      true if the aggregator should be deferred until a first pass at collection has completed
    • buildAggregations

      public InternalAggregation[] buildAggregations​(long[] owningBucketOrds) throws java.io.IOException
      Description copied from class: Aggregator
      Build the results of this aggregation.
      Specified by:
      buildAggregations in class Aggregator
      Parameters:
      owningBucketOrds - the ordinals of the buckets that we want to collect from this aggregation
      Returns:
      the results for each ordinal, in the same order as the array of ordinals
      Throws:
      java.io.IOException
    • buildEmptyAggregation

      public InternalAggregation buildEmptyAggregation()
      Description copied from class: Aggregator
      Build an empty aggregation.
      Specified by:
      buildEmptyAggregation in class Aggregator
    • getLeafCollector

      protected LeafBucketCollector getLeafCollector​(org.apache.lucene.index.LeafReaderContext ctx, LeafBucketCollector sub) throws java.io.IOException
      Description copied from class: AggregatorBase
      Collect results for this leaf.

      Most Aggregators will return a custom LeafBucketCollector that collects document information for every hit. Callers of this method will make sure to call collect for every hit. So any Aggregator that returns a customer LeafBucketCollector from this method runs at best O(hits) time. See the sum Aggregator for a fairly strait forward example of this.

      Some Aggregators are able to correctly collect results on their own, without being iterated by the top level query or the rest of the aggregations framework. These aggregations collect what they need by calling methods on LeafReaderContext and then they return LeafBucketCollector.NO_OP_COLLECTOR to signal that they've done their own collection. These aggregations can do better than O(hits). See the min Aggregator for an example of an aggregation that does this. It happens to run in constant time in some cases.

      In other cases MinAggregator can't get correct results by taking the constant time path so instead it returns a custom LeafBucketCollector. This is fairly common for aggregations that have these fast paths because most of these fast paths are only possible when the aggregation is at the root of the tree.

      Its also useful to look at the filters Aggregator chooses whether or not it can use the fast path before building the Aggregator rather than on each leaf. Either is fine.

      Specified by:
      getLeafCollector in class AggregatorBase
      Throws:
      java.io.IOException
    • doClose

      protected void doClose()
      Description copied from class: AggregatorBase
      Release instance-specific data.
      Overrides:
      doClose in class AggregatorBase