Class BucketedSort

All Implemented Interfaces:
Closeable, AutoCloseable, Releasable
Direct Known Subclasses:
BucketedSort.ForDoubles, BucketedSort.ForFloats, BucketedSort.ForLongs

public abstract class BucketedSort extends Object implements Releasable
Type specialized sort implementations designed for use in aggregations. Aggregations have a couple of super interesting characteristics:
  • They can have many, many buckets so this implementation backs to BigArrays so it doesn't need to allocate any objects per bucket and the circuit breaker in BigArrays will automatically track memory usage and abort execution if it grows too large.
  • Its fairly common for a bucket to be collected but not returned so these implementations delay as much work as possible until collection

Every bucket is in one of two states: "gathering" or min/max "heap". While "gathering" the next empty slot is stored in the "root" offset of the bucket and collecting a value is just adding it in the next slot bumping the tracking value at the root. So collecting values is O(1). Extracting the results in sorted order is O(n * log n) because, well, sorting is O(n * log n). When a bucket has collected bucketSize entries it is converted into a min "heap" in O(n) time. Or into max heap, if order is ascending.

Once a "heap", collecting a document is the heap-standard O(log n) worst case. Critically, it is a very fast O(1) to check if a value is competitive at all which, so long as buckets aren't hit in reverse order, they mostly won't be. Extracting results in sorted order is still O(n * log n).

When we first collect a bucket we make sure that we've allocated enough slots to hold all sort values for the entire bucket. In other words: the storage is "dense" and we don't try to save space when storing partially filled buckets.

We actually *oversize* the allocations (like BigArrays.overSize(long)) to get amortized linear number of allocations and to play well with our paged arrays.

  • Field Details

  • Constructor Details

  • Method Details

    • getOrder

      public final SortOrder getOrder()
      The order of the sort.
    • getFormat

      public final DocValueFormat getFormat()
      The format to use when presenting the values.
    • getBucketSize

      public int getBucketSize()
      The number of values to store per bucket.
    • getValues

      public final <T extends Comparable<T>> List<T> getValues(long bucket, BucketedSort.ResultBuilder<T> builder) throws IOException
      Get the values for a bucket if it has been collected. If it hasn't then returns an empty list.
      builder - builds results. See BucketedSort.ExtraData for how to store data along side the sort for this to extract.
    • getValues

      public final List<SortValue> getValues(long bucket) throws IOException
      Get the values for a bucket if it has been collected. If it hasn't then returns an empty array.
    • inHeapMode

      public boolean inHeapMode(long bucket)
      Is this bucket a min heap true or in gathering mode false?
    • forLeaf

      public abstract BucketedSort.Leaf forLeaf(org.apache.lucene.index.LeafReaderContext ctx) throws IOException
      Get the BucketedSort.Leaf implementation that'll do that actual collecting.
      IOException - most implementations need to perform IO to prepare for each leaf
    • needsScores

      public abstract boolean needsScores()
      Does this sort need scores? Most don't, but sorting on _score does.
    • values

      protected abstract BigArray values()
      The BigArray backing this sort.
    • growValues

      protected abstract void growValues(long minSize)
      Grow the BigArray backing this sort to account for new buckets. This will only be called if the array is too small.
    • getNextGatherOffset

      protected abstract int getNextGatherOffset(long rootIndex)
      Get the next index that should be "gathered" for a bucket rooted at rootIndex.
    • setNextGatherOffset

      protected abstract void setNextGatherOffset(long rootIndex, int offset)
      Set the next index that should be "gathered" for a bucket rooted at rootIndex.
    • getValue

      protected abstract SortValue getValue(long index)
      Get the value at an index.
    • betterThan

      protected abstract boolean betterThan(long lhs, long rhs)
      true if the entry at index lhs is "better" than the entry at rhs. "Better" in this means "lower" for SortOrder.ASC and "higher" for SortOrder.DESC.
    • swap

      protected abstract void swap(long lhs, long rhs)
      Swap the data at two indices.
    • debugFormat

      protected final String debugFormat()
      Return a fairly human readable representation of the array backing the sort.

      This is intentionally not a Object.toString() implementation because it'll be quite slow.

    • initGatherOffsets

      protected final void initGatherOffsets()
      Initialize the gather offsets after setting up values. Subclasses should call this once, after setting up their values().
    • close

      public final void close()
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in interface Releasable