Class BoundedBreakIteratorScanner

  • All Implemented Interfaces:

    public class BoundedBreakIteratorScanner
    extends java.text.BreakIterator
    A custom break iterator that is used to find break-delimited passages bounded by a provided maximum length in the UnifiedHighlighter context. This class uses a BreakIterator to find the last break after the provided offset that would create a passage smaller than maxLen. If the BreakIterator cannot find a passage smaller than the maximum length, a secondary break iterator is used to re-split the passage at the first boundary after maximum length. This is useful to split passages created by BreakIterators like `sentence` that can create big outliers on semi-structured text. WARNING: This break iterator is designed to work with the UnifiedHighlighter. TODO: We should be able to create passages incrementally, starting from the offset of the first match and expanding or not depending on the offsets of subsequent matches. This is currently impossible because FieldHighlighter uses only the first matching offset to derive the start and end of each passage.
    • Field Summary

      • Fields inherited from class java.text.BreakIterator

    • Method Summary

      Modifier and Type Method Description
      int current()  
      int first()  
      int following​(int offset)
      Can be invoked only after a call to preceding(offset+1).
      static java.text.BreakIterator getSentence​(java.util.Locale locale, int maxLen)
      Returns a BreakIterator.getSentenceInstance(Locale) bounded to maxLen.
      java.text.CharacterIterator getText()  
      int last()  
      int next()  
      int next​(int n)  
      int preceding​(int offset)
      Must be called with increasing offset.
      int previous()  
      void setText​(java.lang.String newText)  
      void setText​(java.text.CharacterIterator newText)  
      • Methods inherited from class java.text.BreakIterator

        clone, getAvailableLocales, getCharacterInstance, getCharacterInstance, getLineInstance, getLineInstance, getSentenceInstance, getSentenceInstance, getWordInstance, getWordInstance, isBoundary
      • Methods inherited from class java.lang.Object

        equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • getText

        public java.text.CharacterIterator getText()
        Specified by:
        getText in class java.text.BreakIterator
      • setText

        public void setText​(java.text.CharacterIterator newText)
        Specified by:
        setText in class java.text.BreakIterator
      • setText

        public void setText​(java.lang.String newText)
        setText in class java.text.BreakIterator
      • preceding

        public int preceding​(int offset)
        Must be called with increasing offset. See FieldHighlighter for usage.
        preceding in class java.text.BreakIterator
      • following

        public int following​(int offset)
        Can be invoked only after a call to preceding(offset+1). See FieldHighlighter for usage.
        Specified by:
        following in class java.text.BreakIterator
      • getSentence

        public static java.text.BreakIterator getSentence​(java.util.Locale locale,
                                                          int maxLen)
        Returns a BreakIterator.getSentenceInstance(Locale) bounded to maxLen. Secondary boundaries are found using a BreakIterator.getWordInstance(Locale).
      • current

        public int current()
        Specified by:
        current in class java.text.BreakIterator
      • first

        public int first()
        Specified by:
        first in class java.text.BreakIterator
      • next

        public int next()
        Specified by:
        next in class java.text.BreakIterator
      • last

        public int last()
        Specified by:
        last in class java.text.BreakIterator
      • next

        public int next​(int n)
        Specified by:
        next in class java.text.BreakIterator
      • previous

        public int previous()
        Specified by:
        previous in class java.text.BreakIterator