java.lang.Object

org.elasticsearch.search.sort.BucketedSort

All Implemented Interfaces:: java.io.Closeable, java.lang.AutoCloseable, Releasable

Direct Known Subclasses:: BucketedSort.ForDoubles, BucketedSort.ForFloats, BucketedSort.ForLongs

public abstract class BucketedSort
extends java.lang.Object
implements Releasable

Type specialized sort implementations designed for use in aggregations. Aggregations have a couple of super interesting characteristics:

They can have many, many buckets so this implementation backs to BigArrays so it doesn't need to allocate any objects per bucket and the circuit breaker in BigArrays will automatically track memory usage and abort execution if it grows too large.
Its fairly common for a bucket to be collected but not returned so these implementations delay as much work as possible until collection

Every bucket is in one of two states: "gathering" or min/max "heap". While "gathering" the next empty slot is stored in the "root" offset of the bucket and collecting a value is just adding it in the next slot bumping the tracking value at the root. So collecting values is O(1). Extracting the results in sorted order is O(n * log n) because, well, sorting is O(n * log n). When a bucket has collected bucketSize entries it is converted into a min "heap" in O(n) time. Or into max heap, if order is ascending.

Once a "heap", collecting a document is the heap-standard O(log n) worst case. Critically, it is a very fast O(1) to check if a value is competitive at all which, so long as buckets aren't hit in reverse order, they mostly won't be. Extracting results in sorted order is still O(n * log n).

When we first collect a bucket we make sure that we've allocated enough slots to hold all sort values for the entire bucket. In other words: the storage is "dense" and we don't try to save space when storing partially filled buckets.

We actually *oversize* the allocations (like BigArrays.overSize(long)) to get amortized linear number of allocations and to play well with our paged arrays.

Nested Class Summary

Nested Classes
Modifier and Type	Class	Description
`static interface`	`BucketedSort.ExtraData`	Callbacks for storing extra data along with competitive sorts.
`static class`	`BucketedSort.ForDoubles`	Superclass for implementations of BucketedSort for `double` keys.
`static class`	`BucketedSort.ForFloats`	Superclass for implementations of BucketedSort for `float` keys.
`static class`	`BucketedSort.ForLongs`	Superclass for implementations of BucketedSort for `long` keys.
`class`	`BucketedSort.Leaf`	Performs the actual collection against a LeafReaderContext.
`static interface`	`BucketedSort.ResultBuilder<T>`	Used with `getValues(long, ResultBuilder)` to build results from the sorting operation.

Field Summary

Fields
Modifier and Type	Field	Description
`protected BigArrays`	`bigArrays`
`protected BucketedSort.ExtraData`	`extra`
`static BucketedSort.ExtraData`	`NOOP_EXTRA_DATA`	An implementation of BucketedSort.ExtraData that does nothing.

Constructor Summary

Constructors

Modifier Constructor Description

protected BucketedSort(BigArrays bigArrays, SortOrder order, DocValueFormat format, int bucketSize, BucketedSort.ExtraData extra)

Method Summary

Modifier and Type	Method	Description
`protected abstract boolean`	`betterThan(long lhs, long rhs)`	`true` if the entry at index `lhs` is "better" than the entry at `rhs`.
`void`	`close()`
`protected java.lang.String`	`debugFormat()`	Return a fairly human readable representation of the array backing the sort.
`abstract BucketedSort.Leaf`	`forLeaf(org.apache.lucene.index.LeafReaderContext ctx)`	Get the BucketedSort.Leaf implementation that'll do that actual collecting.
`int`	`getBucketSize()`	The number of values to store per bucket.
`DocValueFormat`	`getFormat()`	The format to use when presenting the values.
`protected abstract int`	`getNextGatherOffset(long rootIndex)`	Get the next index that should be "gathered" for a bucket rooted at `rootIndex`.
`SortOrder`	`getOrder()`	The order of the sort.
`protected abstract SortValue`	`getValue(long index)`	Get the value at an index.
`java.util.List<SortValue>`	`getValues(long bucket)`	Get the values for a bucket if it has been collected.
`<T extends java.lang.Comparable<T>> java.util.List<T>`	`getValues(long bucket, BucketedSort.ResultBuilder<T> builder)`	Get the values for a bucket if it has been collected.
`protected abstract void`	`growValues(long minSize)`	Grow the BigArray backing this sort to account for new buckets.
`boolean`	`inHeapMode(long bucket)`	Is this bucket a min heap `true` or in gathering mode `false`?
`protected void`	`initGatherOffsets()`	Initialize the gather offsets after setting up values.
`abstract boolean`	`needsScores()`	Does this sort need scores? Most don't, but sorting on `_score` does.
`protected abstract void`	`setNextGatherOffset(long rootIndex, int offset)`	Set the next index that should be "gathered" for a bucket rooted at `rootIndex`.
`protected abstract void`	`swap(long lhs, long rhs)`	Swap the data at two indices.
`protected abstract BigArray`	`values()`	The BigArray backing this sort.

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- NOOP_EXTRA_DATA
  
  public static final BucketedSort.ExtraData NOOP_EXTRA_DATA
  
  An implementation of BucketedSort.ExtraData that does nothing.
- bigArrays
  
  protected final BigArrays bigArrays
- extra
  
  protected final BucketedSort.ExtraData extra
Constructor Details
- BucketedSort
  
  protected BucketedSort(BigArrays bigArrays, SortOrder order, DocValueFormat format, int bucketSize, BucketedSort.ExtraData extra)
Method Details
- getOrder
  
  public final SortOrder getOrder()
  
  The order of the sort.
- getFormat
  
  public final DocValueFormat getFormat()
  
  The format to use when presenting the values.
- getBucketSize
  
  public int getBucketSize()
  
  The number of values to store per bucket.
- getValues
  
  public final <T extends java.lang.Comparable<T>> java.util.List<T> getValues(long bucket, BucketedSort.ResultBuilder<T> builder) throws java.io.IOException
  
  Get the values for a bucket if it has been collected. If it hasn't then returns an empty list.
  
  Parameters:
  
  builder - builds results. See BucketedSort.ExtraData for how to store data along side the sort for this to extract.
  
  Throws:
  
  java.io.IOException
- getValues
  
  public final java.util.List<SortValue> getValues(long bucket) throws java.io.IOException
  
  Get the values for a bucket if it has been collected. If it hasn't then returns an empty array.
  
  Throws:
  
  java.io.IOException
- inHeapMode
  
  public boolean inHeapMode(long bucket)
  
  Is this bucket a min heap true or in gathering mode false?
- forLeaf
  
  public abstract BucketedSort.Leaf forLeaf(org.apache.lucene.index.LeafReaderContext ctx) throws java.io.IOException
  
  Get the BucketedSort.Leaf implementation that'll do that actual collecting.
  
  Throws:
  
  java.io.IOException - most implementations need to perform IO to prepare for each leaf
- needsScores
  
  public abstract boolean needsScores()
  
  Does this sort need scores? Most don't, but sorting on _score does.
- values
  
  protected abstract BigArray values()
  
  The BigArray backing this sort.
- growValues
  
  protected abstract void growValues(long minSize)
  
  Grow the BigArray backing this sort to account for new buckets. This will only be called if the array is too small.
- getNextGatherOffset
  
  protected abstract int getNextGatherOffset(long rootIndex)
  
  Get the next index that should be "gathered" for a bucket rooted at rootIndex.
- setNextGatherOffset
  
  protected abstract void setNextGatherOffset(long rootIndex, int offset)
  
  Set the next index that should be "gathered" for a bucket rooted at rootIndex.
- getValue
  
  protected abstract SortValue getValue(long index)
  
  Get the value at an index.
- betterThan
  
  protected abstract boolean betterThan(long lhs, long rhs)
  
  true if the entry at index lhs is "better" than the entry at rhs. "Better" in this means "lower" for SortOrder.ASC and "higher" for SortOrder.DESC.
- swap
  
  protected abstract void swap(long lhs, long rhs)
  
  Swap the data at two indices.
- debugFormat
  
  protected final java.lang.String debugFormat()
  
  Return a fairly human readable representation of the array backing the sort.
  This is intentionally not a Object.toString() implementation because it'll be quite slow.
- initGatherOffsets
  
  protected final void initGatherOffsets()
  
  Initialize the gather offsets after setting up values. Subclasses should call this once, after setting up their values().
- close
  
  public final void close()
  
  Specified by:
  
  close in interface java.lang.AutoCloseable
  
  Specified by:
  
  close in interface java.io.Closeable
  
  Specified by:
  
  close in interface Releasable

Class BucketedSort

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

NOOP_EXTRA_DATA

bigArrays

extra

Constructor Details

BucketedSort

Method Details

getOrder

getFormat

getBucketSize

getValues

getValues

inHeapMode

forLeaf

needsScores

values

growValues

getNextGatherOffset

setNextGatherOffset

getValue

betterThan

swap

debugFormat

initGatherOffsets

close