Class BoundedBreakIteratorScanner

All Implemented Interfaces:

public class BoundedBreakIteratorScanner
extends java.text.BreakIterator
A custom break iterator that is used to find break-delimited passages bounded by a provided maximum length in the UnifiedHighlighter context. This class uses a BreakIterator to find the last break after the provided offset that would create a passage smaller than maxLen. If the BreakIterator cannot find a passage smaller than the maximum length, a secondary break iterator is used to re-split the passage at the first boundary after maximum length. This is useful to split passages created by BreakIterators like `sentence` that can create big outliers on semi-structured text. WARNING: This break iterator is designed to work with the UnifiedHighlighter. TODO: We should be able to create passages incrementally, starting from the offset of the first match and expanding or not depending on the offsets of subsequent matches. This is currently impossible because FieldHighlighter uses only the first matching offset to derive the start and end of each passage.
  • Field Summary

    Fields inherited from class java.text.BreakIterator

  • Method Summary

    Modifier and Type Method Description
    int current()  
    int first()  
    int following​(int offset)
    Can be invoked only after a call to preceding(offset+1).
    static java.text.BreakIterator getSentence​(java.util.Locale locale, int maxLen)
    Returns a BreakIterator.getSentenceInstance(Locale) bounded to maxLen.
    java.text.CharacterIterator getText()  
    int last()  
    int next()  
    int next​(int n)  
    int preceding​(int offset)
    Must be called with increasing offset.
    int previous()  
    void setText​(java.lang.String newText)  
    void setText​(java.text.CharacterIterator newText)  

    Methods inherited from class java.text.BreakIterator

    clone, getAvailableLocales, getCharacterInstance, getCharacterInstance, getLineInstance, getLineInstance, getSentenceInstance, getSentenceInstance, getWordInstance, getWordInstance, isBoundary

    Methods inherited from class java.lang.Object

    equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • getText

      public java.text.CharacterIterator getText()
      Specified by:
      getText in class java.text.BreakIterator
    • setText

      public void setText​(java.text.CharacterIterator newText)
      Specified by:
      setText in class java.text.BreakIterator
    • setText

      public void setText​(java.lang.String newText)
      setText in class java.text.BreakIterator
    • preceding

      public int preceding​(int offset)
      Must be called with increasing offset. See FieldHighlighter for usage.
      preceding in class java.text.BreakIterator
    • following

      public int following​(int offset)
      Can be invoked only after a call to preceding(offset+1). See FieldHighlighter for usage.
      Specified by:
      following in class java.text.BreakIterator
    • getSentence

      public static java.text.BreakIterator getSentence​(java.util.Locale locale, int maxLen)
      Returns a BreakIterator.getSentenceInstance(Locale) bounded to maxLen. Secondary boundaries are found using a BreakIterator.getWordInstance(Locale).
    • current

      public int current()
      Specified by:
      current in class java.text.BreakIterator
    • first

      public int first()
      Specified by:
      first in class java.text.BreakIterator
    • next

      public int next()
      Specified by:
      next in class java.text.BreakIterator
    • last

      public int last()
      Specified by:
      last in class java.text.BreakIterator
    • next

      public int next​(int n)
      Specified by:
      next in class java.text.BreakIterator
    • previous

      public int previous()
      Specified by:
      previous in class java.text.BreakIterator