Class SetBackedScalingCuckooFilter

java.lang.Object
org.elasticsearch.common.util.SetBackedScalingCuckooFilter
All Implemented Interfaces:
Writeable

public class SetBackedScalingCuckooFilter extends Object implements Writeable
An approximate set membership datastructure that scales as more unique values are inserted. Can definitively say if a member does not exist (no false negatives), but may say an item exists when it does not (has false positives). Similar in usage to a Bloom Filter.

Internally, the datastructure maintains a Set of hashes up to a specified threshold. This provides 100% accurate membership queries.

When the threshold is breached, a list of CuckooFilters are created and used to track membership. These filters are approximate similar to Bloom Filters.

This datastructure scales as more values are inserted by growing the list of CuckooFilters. Final size is dependent on the cardinality of data inserted, and the precision specified.

  • Constructor Details

    • SetBackedScalingCuckooFilter

      public SetBackedScalingCuckooFilter(int threshold, Random rng, double fpp)
      Parameters:
      threshold - The number of distinct values that should be tracked before converting to an approximate representation
      rng - A random number generator needed for the cuckoo hashing process
      fpp - the false-positive rate that should be used for the cuckoo filters.
    • SetBackedScalingCuckooFilter

      public SetBackedScalingCuckooFilter(StreamInput in, Random rng) throws IOException
      Throws:
      IOException
  • Method Details

    • writeTo

      public void writeTo(StreamOutput out) throws IOException
      Description copied from interface: Writeable
      Write this into the StreamOutput.
      Specified by:
      writeTo in interface Writeable
      Throws:
      IOException
    • getThreshold

      public int getThreshold()
      Returns the number of distinct values that are tracked before converting to an approximate representation.
    • getRng

      public Random getRng()
      Returns the random number generator used for the cuckoo hashing process.
    • getFpp

      public double getFpp()
      Returns the false-positive rate used for the cuckoo filters.
    • registerBreaker

      public void registerBreaker(Consumer<Long> breaker)
      Registers a circuit breaker with the datastructure. CuckooFilter's can "saturate" and refuse to accept any new values. When this happens, the datastructure scales by adding a new filter. This new filter's bytes will be tracked in the registered breaker when configured.
    • mightContain

      public boolean mightContain(org.apache.lucene.util.BytesRef value)
      Returns true if the set might contain the provided value, false otherwise. False values are 100% accurate, while true values may be a false-positive.
    • mightContain

      public boolean mightContain(long value)
      Returns true if the set might contain the provided value, false otherwise. False values are 100% accurate, while true values may be a false-positive.
    • add

      public void add(org.apache.lucene.util.BytesRef value)
      Add's the provided value to the set for tracking
    • add

      public void add(long value)
      Add's the provided value to the set for tracking
    • getSizeInBytes

      public long getSizeInBytes()
      Get the approximate size of this datastructure. Approximate because only the Set occupants are tracked, not the overhead of the Set itself.
    • merge

      public void merge(SetBackedScalingCuckooFilter other)
      Merge `other` cuckoo filter into this cuckoo. After merging, this filter's state will be the union of the two. During the merging process, the internal Set may be upgraded to a cuckoo if it goes over threshold
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object