Class BoundedBreakIteratorScanner

All Implemented Interfaces:

public class BoundedBreakIteratorScanner extends BreakIterator
A custom break iterator that is used to find break-delimited passages bounded by a provided maximum length in the UnifiedHighlighter context. This class uses a BreakIterator to find the last break after the provided offset that would create a passage smaller than maxLen. If the BreakIterator cannot find a passage smaller than the maximum length, a secondary break iterator is used to re-split the passage at the first boundary after maximum length. This is useful to split passages created by BreakIterators like `sentence` that can create big outliers on semi-structured text. WARNING: This break iterator is designed to work with the UnifiedHighlighter. TODO: We should be able to create passages incrementally, starting from the offset of the first match and expanding or not depending on the offsets of subsequent matches. This is currently impossible because FieldHighlighter uses only the first matching offset to derive the start and end of each passage.