Interface TokenFilterFactory

All Known Subinterfaces:
NormalizingTokenFilterFactory
All Known Implementing Classes:
AbstractTokenFilterFactory, HunspellTokenFilterFactory, ShingleTokenFilterFactory, ShingleTokenFilterFactory.Factory, StopTokenFilterFactory

public interface TokenFilterFactory
  • Field Details

    • IDENTITY_FILTER

      static final TokenFilterFactory IDENTITY_FILTER
      A TokenFilterFactory that does no filtering to its TokenStream
  • Method Details

    • name

      String name()
    • create

      org.apache.lucene.analysis.TokenStream create(org.apache.lucene.analysis.TokenStream tokenStream)
    • normalize

      default org.apache.lucene.analysis.TokenStream normalize(org.apache.lucene.analysis.TokenStream tokenStream)
      Normalize a tokenStream for use in multi-term queries The default implementation is a no-op
    • breaksFastVectorHighlighter

      default boolean breaksFastVectorHighlighter()
      Does this analyzer mess up the OffsetAttributes in such as way as to break the FastVectorHighlighter? If this is true then the FastVectorHighlighter will attempt to work around the broken offsets.
    • getChainAwareTokenFilterFactory

      default TokenFilterFactory getChainAwareTokenFilterFactory(IndexService.IndexCreationContext context, TokenizerFactory tokenizer, List<CharFilterFactory> charFilters, List<TokenFilterFactory> previousTokenFilters, Function<String,TokenFilterFactory> allFilters)
      Rewrite the TokenFilterFactory to take into account the preceding analysis chain, or refer to other TokenFilterFactories If the token filter is part of the definition of a ReloadableCustomAnalyzer, this function is called twice, once at index creation with IndexService.IndexCreationContext.CREATE_INDEX and then later with IndexService.IndexCreationContext.RELOAD_ANALYZERS on shard recovery. The IndexService.IndexCreationContext.RELOAD_ANALYZERS context should be used to load expensive resources on a generic thread pool. See SynonymGraphFilterFactory for an example of how this context is used.
      Parameters:
      context - the IndexCreationContext for the underlying index
      tokenizer - the TokenizerFactory for the preceding chain
      charFilters - any CharFilterFactories for the preceding chain
      previousTokenFilters - a list of TokenFilterFactories in the preceding chain
      allFilters - access to previously defined TokenFilterFactories
    • getSynonymFilter

      default TokenFilterFactory getSynonymFilter()
      Return a version of this TokenFilterFactory appropriate for synonym parsing Filters that should not be applied to synonyms (for example, those that produce multiple tokens) should throw an exception
    • getAnalysisMode

      default AnalysisMode getAnalysisMode()
      Get the AnalysisMode this filter is allowed to be used in. The default is AnalysisMode.ALL. Instances need to override this method to define their own restrictions.
    • getResourceName

      default String getResourceName()
      Get the name of the resource that this filter is based on. Used to reload analyzers on this resource changes. For an example, see @SynonymGraphTokenFilterFactory#getResourceName()
      Returns:
      the name of the resource that this filter was loaded from if any