Class ValuesSource.Bytes.WithOrdinals

Direct Known Subclasses:
ValuesSource.Bytes.WithOrdinals.FieldData
Enclosing class:
ValuesSource.Bytes

public abstract static class ValuesSource.Bytes.WithOrdinals extends ValuesSource.Bytes
Specialization of ValuesSource.Bytes who's underlying storage de-duplicates its bytes by storing them in a per-leaf sorted lookup table. Aggregations that are aware of these lookup tables can operate directly on the value's position in the table, know as the "ordinal". They can then later translate the ordinal into the BytesRef value.
  • Field Details

  • Constructor Details

    • WithOrdinals

      public WithOrdinals()
  • Method Details

    • docsWithValue

      public DocValueBits docsWithValue(org.apache.lucene.index.LeafReaderContext context) throws IOException
      Description copied from class: ValuesSource
      Get a "has any values" view into the values. It'll try to pick the "most native" way to check if there are any values, but it builds its own view into the values so if you need any of the actual values its best to use something like ValuesSource.bytesValues(org.apache.lucene.index.LeafReaderContext) or ValuesSource.Numeric.doubleValues(org.apache.lucene.index.LeafReaderContext) but if you just need to know if there are any values then use this.
      Overrides:
      docsWithValue in class ValuesSource.Bytes
      Throws:
      IOException
    • ordinalsValues

      public abstract org.apache.lucene.index.SortedSetDocValues ordinalsValues(org.apache.lucene.index.LeafReaderContext context) throws IOException
      Get a view into the leaf's ordinals and their BytesRef values.

      Use DocValuesIterator.advanceExact(int), SortedSetDocValues.getValueCount(), and SortedSetDocValues.nextOrd() to fetch the ordinals. Use SortedSetDocValues.lookupOrd(long) to convert form the ordinal number into the BytesRef value. Make sure to copy the result if you need to keep it.

      Each leaf may have a different ordinal for the same byte array. Imagine, for example, an index where one leaf has the values "a", "b", "d" and another leaf has the values "b", "c", "d". "a" has the ordinal 0 in the first leaf and doesn't exist in the second leaf. "b" has the ordinal 1 in the first leaf and 0 in the second leaf. "c" doesn't exist in the first leaf and has the ordinal 1 in the second leaf. And "d" gets the ordinal 2 in both leaves.

      If you have to compare the ordinals of values from different segments then you'd need to somehow merge them. globalOrdinalsValues(org.apache.lucene.index.LeafReaderContext) provides such a merging at the cost of longer startup times when the index has been modified.

      Throws:
      IOException
    • globalOrdinalsValues

      public abstract org.apache.lucene.index.SortedSetDocValues globalOrdinalsValues(org.apache.lucene.index.LeafReaderContext context) throws IOException
      Get a "global" view into the leaf's ordinals. This can require construction of fairly large set of lookups in memory so prefer ordinalsValues(org.apache.lucene.index.LeafReaderContext) unless you need the global view.

      This functions just like ordinalsValues(org.apache.lucene.index.LeafReaderContext) except that the ordinals that SortedSetDocValues.nextOrd() and SortedSetDocValues.lookupOrd(long) operate on are "global" to all segments in the shard. They are ordinals into a lookup table containing all values on the shard.

      Compare this to the example in the docs for ordinalsValues(org.apache.lucene.index.LeafReaderContext). Imagine, again, an index where one leaf has the values "a", "b", "d" and another leaf has the values "b", "c", "d". The global ordinal for "a" is 0. The global ordinal for "b" is 1. The global ordinal for "c" is 2. And the global ordinal for "d" is, you guessed it, 3.

      This makes comparing the values from different segments much simpler. But it comes with a fairly high memory cost and a substantial performance hit when this method is first called after modifying the index. If the global ordinals lookup hasn't been built then this method's runtime is roughly proportional to the number of distinct values on the field. If there are very few distinct values then the runtime'll be dominated by factors related to the number of segments. But in that case it'll be fast enough that you won't usually care.

      Throws:
      IOException
    • supportsGlobalOrdinalsMapping

      public abstract boolean supportsGlobalOrdinalsMapping()
      Whether this values source is able to provide a mapping between global and segment ordinals, by returning the underlying OrdinalMap. If this method returns false, then calling globalOrdinalsMapping(org.apache.lucene.index.LeafReaderContext) will result in an UnsupportedOperationException.
    • hasOrdinals

      public boolean hasOrdinals()
      Description copied from class: ValuesSource
      Check if this values source supports using global and segment ordinals.

      If this returns true then it is safe to cast it to ValuesSource.Bytes.WithOrdinals.

      Overrides:
      hasOrdinals in class ValuesSource
    • globalOrdinalsMapping

      public abstract LongUnaryOperator globalOrdinalsMapping(org.apache.lucene.index.LeafReaderContext context) throws IOException
      Returns a mapping from segment ordinals to global ordinals. This allows you to post process segment ordinals into global ordinals which could save you a few lookups. Also, operating on segment ordinals is likely to produce a more "dense" list of, say, counts.

      Anyone looking to use this strategy rather than looking up on the fly should benchmark well and update this documentation with what they learn.

      Throws:
      IOException
    • globalMaxOrd

      public long globalMaxOrd(org.apache.lucene.search.IndexSearcher indexSearcher) throws IOException
      Get the maximum global ordinal. Requires globalOrdinalsValues(org.apache.lucene.index.LeafReaderContext) so see the note about its performance.
      Throws:
      IOException