Class TermVectorsFields

All Implemented Interfaces:

public final class TermVectorsFields extends org.apache.lucene.index.Fields
This class represents the result of a TermVectorsRequest. It works exactly like the Fields class except for one thing: It can return offsets and payloads even if positions are not present. You must call nextPosition() anyway to move the counter although this method only returns -1,, if no positions were returned by the TermVectorsRequest.

The data is stored in two byte arrays (headerRef and termVectors, both BytesRef) that have the following format:

headerRef: Stores offsets per field in the termVectors array and some header information as BytesRef. Format is

  • String : "TV"
  • vint: version (=-1)
  • boolean: hasTermStatistics (are the term statistics stored?)
  • boolean: hasFieldStatitsics (are the field statistics stored?)
  • vint: number of fields
    • String: field name 1
    • vint: offset in termVectors for field 1
    • ...
    • String: field name last field
    • vint: offset in termVectors for last field

termVectors: Stores the actual term vectors as a BytesRef.

Term vectors for each fields are stored in blocks, one for each field. The offsets in headerRef are used to find where the block for a field starts. Each block begins with a

  • vint: number of terms
  • boolean: positions (has it positions stored?)
  • boolean: offsets (has it offsets stored?)
  • boolean: payloads (has it payloads stored?)
If the field statistics were requested (hasFieldStatistics is true, see headerRef), the following numbers are stored:
  • vlong: sum of total term frequencies of the field (sumTotalTermFreq)
  • vlong: sum of document frequencies for each term (sumDocFreq)
  • vint: number of documents in the shard that has an entry for this field (docCount)

After that, for each term it stores

  • vint: term lengths
  • BytesRef: term name

If term statistics are requested (hasTermStatistics is true, see headerRef):

  • vint: document frequency, how often does this term appear in documents?
  • vlong: total term frequency. Sum of terms in this field.
After that
  • vint: frequency (always returned)
    • vint: position_1 (if positions)
    • vint: startOffset_1 (if offset)
    • vint: endOffset_1 (if offset)
    • BytesRef: payload_1 (if payloads)
    • ...
    • vint: endOffset_freqency (if offset)
    • BytesRef: payload_freqency (if payloads)
  • Field Details

    • hasScores

      public final boolean hasScores
  • Constructor Details

    • TermVectorsFields

      public TermVectorsFields(BytesReference headerRef, BytesReference termVectors) throws IOException
      headerRef - Stores offsets per field in the termVectors and some header information as BytesRef.
      termVectors - Stores the actual term vectors as a BytesRef.
  • Method Details

    • iterator

      public Iterator<String> iterator()
      Specified by:
      iterator in interface Iterable<String>
      Specified by:
      iterator in class org.apache.lucene.index.Fields
    • terms

      public org.apache.lucene.index.Terms terms(String field) throws IOException
      Specified by:
      terms in class org.apache.lucene.index.Fields
    • size

      public int size()
      Specified by:
      size in class org.apache.lucene.index.Fields