Class XAnalyzingSuggester

  • All Implemented Interfaces:
    org.apache.lucene.util.Accountable
    Direct Known Subclasses:
    XFuzzySuggester

    public class XAnalyzingSuggester
    extends org.apache.lucene.search.suggest.Lookup
    Suggester that first analyzes the surface form, adds the analyzed form to a weighted FST, and then does the same thing at lookup time. This means lookup is based on the analyzed form while suggestions are still the surface form(s).

    This can result in powerful suggester functionality. For example, if you use an analyzer removing stop words, then the partial text "ghost chr..." could see the suggestion "The Ghost of Christmas Past". Note that position increments MUST NOT be preserved for this example to work, so you should call the constructor with preservePositionIncrements parameter set to false

    If SynonymFilter is used to map wifi and wireless network to hotspot then the partial text "wirele..." could suggest "wifi router". Token normalization like stemmers, accent removal, etc., would allow suggestions to ignore such variations.

    When two matching suggestions have the same weight, they are tie-broken by the analyzed form. If their analyzed form is the same then the order is undefined.

    There are some limitations:

    • A lookup from a query like "net" in English won't be any different than "net " (ie, user added a trailing space) because analyzers don't reflect when they've seen a token separator and when they haven't.
    • If you're using StopFilter, and the user will type "fast apple", but so far all they've typed is "fast a", again because the analyzer doesn't convey whether it's seen a token separator after the "a", StopFilter will remove that "a" causing far more matches than you'd expect.
    • Lookups with the empty string return no results instead of all results.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  XAnalyzingSuggester.XBuilder  
      • Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup

        org.apache.lucene.search.suggest.Lookup.LookupPriorityQueue, org.apache.lucene.search.suggest.Lookup.LookupResult
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int END_BYTE
      Marks end of the analyzed input and start of dedup byte.
      static int EXACT_FIRST
      Include this flag in the options parameter to #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) to always return the exact match first, regardless of score.
      static int HOLE_CHARACTER  
      static int PAYLOAD_SEP  
      static int PRESERVE_SEP
      Include this flag in the options parameter to #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) to preserve token separators when matching.
      static int SEP_LABEL
      Represents the separation between tokens, if PRESERVE_SEP was specified
      • Fields inherited from class org.apache.lucene.search.suggest.Lookup

        CHARSEQUENCE_COMPARATOR
    • Constructor Summary

      Constructors 
      Constructor Description
      XAnalyzingSuggester​(org.apache.lucene.analysis.Analyzer analyzer)
      Calls #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)
      XAnalyzingSuggester​(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.analysis.Analyzer queryAnalyzer)
      Calls #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)
      XAnalyzingSuggester​(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.util.automaton.Automaton queryPrefix, org.apache.lucene.analysis.Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, boolean preservePositionIncrements, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,​org.apache.lucene.util.BytesRef>> fst, boolean hasPayloads, int maxAnalyzedPathsForOneInput, int sepLabel, int payloadSep, int endByte, int holeCharacter)
      Creates a new suggester.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void build​(org.apache.lucene.search.suggest.InputIterator iterator)  
      protected org.apache.lucene.util.automaton.Automaton convertAutomaton​(org.apache.lucene.util.automaton.Automaton a)  
      static int decodeWeight​(long encoded)
      cost -> weight
      static int encodeWeight​(long value)
      weight -> cost
      java.lang.Object get​(java.lang.CharSequence key)
      Returns the weight associated with an input string, or null if it does not exist.
      long getCount()  
      protected java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,​org.apache.lucene.util.BytesRef>>> getFullPrefixPaths​(java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,​org.apache.lucene.util.BytesRef>>> prefixPaths, org.apache.lucene.util.automaton.Automaton lookupAutomaton, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,​org.apache.lucene.util.BytesRef>> fst)
      Returns all completion paths to initialize the search.
      int getMaxAnalyzedPathsForOneInput()  
      protected static org.apache.lucene.store.FSDirectory getTempDir()  
      org.apache.lucene.analysis.TokenStreamToAutomaton getTokenStreamToAutomaton()  
      boolean load​(java.io.InputStream input)  
      boolean load​(org.apache.lucene.store.DataInput input)  
      java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> lookup​(java.lang.CharSequence key, java.util.Set<org.apache.lucene.util.BytesRef> contexts, boolean onlyMorePopular, int num)  
      long ramBytesUsed()
      Returns byte size of the underlying FST.
      boolean store​(java.io.OutputStream output)  
      boolean store​(org.apache.lucene.store.DataOutput output)  
      java.util.Set<org.apache.lucene.util.IntsRef> toFiniteStrings​(org.apache.lucene.analysis.TokenStream stream)  
      • Methods inherited from class org.apache.lucene.search.suggest.Lookup

        build, lookup, lookup
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface org.apache.lucene.util.Accountable

        getChildResources
    • Field Detail

      • EXACT_FIRST

        public static final int EXACT_FIRST
        Include this flag in the options parameter to #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) to always return the exact match first, regardless of score. This has no performance impact but could result in low-quality suggestions.
        See Also:
        Constant Field Values
      • PRESERVE_SEP

        public static final int PRESERVE_SEP
        Include this flag in the options parameter to #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) to preserve token separators when matching.
        See Also:
        Constant Field Values
      • SEP_LABEL

        public static final int SEP_LABEL
        Represents the separation between tokens, if PRESERVE_SEP was specified
        See Also:
        Constant Field Values
      • END_BYTE

        public static final int END_BYTE
        Marks end of the analyzed input and start of dedup byte.
        See Also:
        Constant Field Values
    • Constructor Detail

      • XAnalyzingSuggester

        public XAnalyzingSuggester​(org.apache.lucene.analysis.Analyzer analyzer)
        Calls #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)
        Parameters:
        analyzer - Analyzer that will be used for analyzing suggestions while building the index.
      • XAnalyzingSuggester

        public XAnalyzingSuggester​(org.apache.lucene.analysis.Analyzer indexAnalyzer,
                                   org.apache.lucene.analysis.Analyzer queryAnalyzer)
        Calls #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)
        Parameters:
        indexAnalyzer - Analyzer that will be used for analyzing suggestions while building the index.
        queryAnalyzer - Analyzer that will be used for analyzing query text during lookup
      • XAnalyzingSuggester

        public XAnalyzingSuggester​(org.apache.lucene.analysis.Analyzer indexAnalyzer,
                                   org.apache.lucene.util.automaton.Automaton queryPrefix,
                                   org.apache.lucene.analysis.Analyzer queryAnalyzer,
                                   int options,
                                   int maxSurfaceFormsPerAnalyzedForm,
                                   int maxGraphExpansions,
                                   boolean preservePositionIncrements,
                                   org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,​org.apache.lucene.util.BytesRef>> fst,
                                   boolean hasPayloads,
                                   int maxAnalyzedPathsForOneInput,
                                   int sepLabel,
                                   int payloadSep,
                                   int endByte,
                                   int holeCharacter)
        Creates a new suggester.
        Parameters:
        indexAnalyzer - Analyzer that will be used for analyzing suggestions while building the index.
        queryAnalyzer - Analyzer that will be used for analyzing query text during lookup
        options - see EXACT_FIRST, PRESERVE_SEP
        maxSurfaceFormsPerAnalyzedForm - Maximum number of surface forms to keep for a single analyzed form. When there are too many surface forms we discard the lowest weighted ones.
        maxGraphExpansions - Maximum number of graph paths to expand from the analyzed form. Set this to -1 for no limit.
    • Method Detail

      • ramBytesUsed

        public long ramBytesUsed()
        Returns byte size of the underlying FST.
      • getMaxAnalyzedPathsForOneInput

        public int getMaxAnalyzedPathsForOneInput()
      • convertAutomaton

        protected org.apache.lucene.util.automaton.Automaton convertAutomaton​(org.apache.lucene.util.automaton.Automaton a)
      • getTokenStreamToAutomaton

        public org.apache.lucene.analysis.TokenStreamToAutomaton getTokenStreamToAutomaton()
      • getTempDir

        protected static org.apache.lucene.store.FSDirectory getTempDir()
      • build

        public void build​(org.apache.lucene.search.suggest.InputIterator iterator)
                   throws java.io.IOException
        Specified by:
        build in class org.apache.lucene.search.suggest.Lookup
        Throws:
        java.io.IOException
      • store

        public boolean store​(java.io.OutputStream output)
                      throws java.io.IOException
        Overrides:
        store in class org.apache.lucene.search.suggest.Lookup
        Throws:
        java.io.IOException
      • getCount

        public long getCount()
        Specified by:
        getCount in class org.apache.lucene.search.suggest.Lookup
      • load

        public boolean load​(java.io.InputStream input)
                     throws java.io.IOException
        Overrides:
        load in class org.apache.lucene.search.suggest.Lookup
        Throws:
        java.io.IOException
      • lookup

        public java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> lookup​(java.lang.CharSequence key,
                                                                                           java.util.Set<org.apache.lucene.util.BytesRef> contexts,
                                                                                           boolean onlyMorePopular,
                                                                                           int num)
        Specified by:
        lookup in class org.apache.lucene.search.suggest.Lookup
      • store

        public boolean store​(org.apache.lucene.store.DataOutput output)
                      throws java.io.IOException
        Specified by:
        store in class org.apache.lucene.search.suggest.Lookup
        Throws:
        java.io.IOException
      • load

        public boolean load​(org.apache.lucene.store.DataInput input)
                     throws java.io.IOException
        Specified by:
        load in class org.apache.lucene.search.suggest.Lookup
        Throws:
        java.io.IOException
      • getFullPrefixPaths

        protected java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,​org.apache.lucene.util.BytesRef>>> getFullPrefixPaths​(java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,​org.apache.lucene.util.BytesRef>>> prefixPaths,
                                                                                                                                                                                                                org.apache.lucene.util.automaton.Automaton lookupAutomaton,
                                                                                                                                                                                                                org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,​org.apache.lucene.util.BytesRef>> fst)
                                                                                                                                                                                                         throws java.io.IOException
        Returns all completion paths to initialize the search.
        Throws:
        java.io.IOException
      • toFiniteStrings

        public java.util.Set<org.apache.lucene.util.IntsRef> toFiniteStrings​(org.apache.lucene.analysis.TokenStream stream)
                                                                      throws java.io.IOException
        Throws:
        java.io.IOException
      • get

        public java.lang.Object get​(java.lang.CharSequence key)
        Returns the weight associated with an input string, or null if it does not exist. Unsupported in this implementation (and will throw an UnsupportedOperationException).
        Parameters:
        key - input string
        Returns:
        the weight associated with the input string, or null if it does not exist.
      • decodeWeight

        public static int decodeWeight​(long encoded)
        cost -> weight
        Parameters:
        encoded - Cost
        Returns:
        Weight
      • encodeWeight

        public static int encodeWeight​(long value)
        weight -> cost
        Parameters:
        value - Weight
        Returns:
        Cost