public class DuplicateByteSequenceSpotter
extends java.lang.Object
(a)-(b)-(c)-(def as an int)
DuplicateByteSequenceSpotter.RootTreeNode
objects are used for the first two levels of the tree
(representing bytes a and b in the example sequence). The combinations of
objects at these 2 levels are few so internally these objects allocate an
array of 256 child node objects to quickly address children by indexing
directly into the densely packed array using a byte value. The third level in
the tree holds DuplicateByteSequenceSpotter.LightweightTreeNode
nodes that have few children
(typically much less than 256) and so use a dynamically-grown array to hold
child nodes as simple int primitives. These ints represent the final 3 bytes
of a sequence and also hold a count of the number of times the entire sequence
path has been visited (count is a single byte).
The Trie grows indefinitely as more content is added and while theoretically it could be massive (a 6-depth tree could produce 256^6 nodes) non-random content e.g English text contains fewer variations.
In future we may look at using one of these strategies when memory is tight:
Modifier and Type | Field | Description |
---|---|---|
static int |
MAX_HIT_COUNT |
|
static int |
TREE_DEPTH |
Constructor | Description |
---|---|
DuplicateByteSequenceSpotter() |
Modifier and Type | Method | Description |
---|---|---|
short |
addByte(byte b) |
Add a byte to the sequence.
|
long |
getEstimatedSizeInBytes() |
|
int[] |
getNodesAllocatedByDepth() |
|
int |
getNodesResizedByDepth() |
|
void |
startNewSequence() |
Reset the sequence detection logic to avoid any continuation of the
immediately previous bytes.
|
public static final int TREE_DEPTH
public static final int MAX_HIT_COUNT
public void startNewSequence()
public short addByte(byte b)
b
- the next byte in a sequencepublic final long getEstimatedSizeInBytes()
public int[] getNodesAllocatedByDepth()
public int getNodesResizedByDepth()