Class SetBackedScalingCuckooFilter

java.lang.Object
org.elasticsearch.common.util.SetBackedScalingCuckooFilter
All Implemented Interfaces:
Writeable

public class SetBackedScalingCuckooFilter
extends java.lang.Object
implements Writeable
An approximate set membership datastructure that scales as more unique values are inserted. Can definitively say if a member does not exist (no false negatives), but may say an item exists when it does not (has false positives). Similar in usage to a Bloom Filter.

Internally, the datastructure maintains a Set of hashes up to a specified threshold. This provides 100% accurate membership queries.

When the threshold is breached, a list of CuckooFilters are created and used to track membership. These filters are approximate similar to Bloom Filters.

This datastructure scales as more values are inserted by growing the list of CuckooFilters. Final size is dependent on the cardinality of data inserted, and the precision specified.

  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.elasticsearch.common.io.stream.Writeable

    Writeable.Reader<V>, Writeable.Writer<V>
  • Constructor Summary

    Constructors
    Constructor Description
    SetBackedScalingCuckooFilter​(int threshold, java.util.Random rng, double fpp)  
    SetBackedScalingCuckooFilter​(StreamInput in, java.util.Random rng)  
    SetBackedScalingCuckooFilter​(SetBackedScalingCuckooFilter other)  
  • Method Summary

    Modifier and Type Method Description
    void add​(long value)
    Add's the provided value to the set for tracking
    void add​(org.apache.lucene.util.BytesRef value)
    Add's the provided value to the set for tracking
    boolean equals​(java.lang.Object other)  
    long getSizeInBytes()
    Get the approximate size of this datastructure.
    int hashCode()  
    void merge​(SetBackedScalingCuckooFilter other)
    Merge `other` cuckoo filter into this cuckoo.
    boolean mightContain​(long value)
    Returns true if the set might contain the provided value, false otherwise.
    boolean mightContain​(org.apache.lucene.util.BytesRef value)
    Returns true if the set might contain the provided value, false otherwise.
    void registerBreaker​(java.util.function.Consumer<java.lang.Long> breaker)
    Registers a circuit breaker with the datastructure.
    void writeTo​(StreamOutput out)
    Write this into the StreamOutput.

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • SetBackedScalingCuckooFilter

      public SetBackedScalingCuckooFilter​(int threshold, java.util.Random rng, double fpp)
      Parameters:
      threshold - The number of distinct values that should be tracked before converting to an approximate representation
      rng - A random number generator needed for the cuckoo hashing process
      fpp - the false-positive rate that should be used for the cuckoo filters.
    • SetBackedScalingCuckooFilter

      public SetBackedScalingCuckooFilter​(SetBackedScalingCuckooFilter other)
    • SetBackedScalingCuckooFilter

      public SetBackedScalingCuckooFilter​(StreamInput in, java.util.Random rng) throws java.io.IOException
      Throws:
      java.io.IOException
  • Method Details

    • writeTo

      public void writeTo​(StreamOutput out) throws java.io.IOException
      Description copied from interface: Writeable
      Write this into the StreamOutput.
      Specified by:
      writeTo in interface Writeable
      Throws:
      java.io.IOException
    • registerBreaker

      public void registerBreaker​(java.util.function.Consumer<java.lang.Long> breaker)
      Registers a circuit breaker with the datastructure. CuckooFilter's can "saturate" and refuse to accept any new values. When this happens, the datastructure scales by adding a new filter. This new filter's bytes will be tracked in the registered breaker when configured.
    • mightContain

      public boolean mightContain​(org.apache.lucene.util.BytesRef value)
      Returns true if the set might contain the provided value, false otherwise. False values are 100% accurate, while true values may be a false-positive.
    • mightContain

      public boolean mightContain​(long value)
      Returns true if the set might contain the provided value, false otherwise. False values are 100% accurate, while true values may be a false-positive.
    • add

      public void add​(org.apache.lucene.util.BytesRef value)
      Add's the provided value to the set for tracking
    • add

      public void add​(long value)
      Add's the provided value to the set for tracking
    • getSizeInBytes

      public long getSizeInBytes()
      Get the approximate size of this datastructure. Approximate because only the Set occupants are tracked, not the overhead of the Set itself.
    • merge

      public void merge​(SetBackedScalingCuckooFilter other)
      Merge `other` cuckoo filter into this cuckoo. After merging, this filter's state will be the union of the two. During the merging process, the internal Set may be upgraded to a cuckoo if it goes over threshold
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class java.lang.Object
    • equals

      public boolean equals​(java.lang.Object other)
      Overrides:
      equals in class java.lang.Object