Class SetBackedScalingCuckooFilter

  • All Implemented Interfaces:
    Writeable

    public class SetBackedScalingCuckooFilter
    extends java.lang.Object
    implements Writeable
    An approximate set membership datastructure that scales as more unique values are inserted. Can definitively say if a member does not exist (no false negatives), but may say an item exists when it does not (has false positives). Similar in usage to a Bloom Filter. Internally, the datastructure maintains a Set of hashes up to a specified threshold. This provides 100% accurate membership queries. When the threshold is breached, a list of CuckooFilters are created and used to track membership. These filters are approximate similar to Bloom Filters. This datastructure scales as more values are inserted by growing the list of CuckooFilters. Final size is dependent on the cardinality of data inserted, and the precision specified.
    • Constructor Detail

      • SetBackedScalingCuckooFilter

        public SetBackedScalingCuckooFilter​(int threshold,
                                            java.util.Random rng,
                                            double fpp)
        Parameters:
        threshold - The number of distinct values that should be tracked before converting to an approximate representation
        rng - A random number generator needed for the cuckoo hashing process
        fpp - the false-positive rate that should be used for the cuckoo filters.
      • SetBackedScalingCuckooFilter

        public SetBackedScalingCuckooFilter​(StreamInput in,
                                            java.util.Random rng)
                                     throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • registerBreaker

        public void registerBreaker​(java.util.function.Consumer<java.lang.Long> breaker)
        Registers a circuit breaker with the datastructure. CuckooFilter's can "saturate" and refuse to accept any new values. When this happens, the datastructure scales by adding a new filter. This new filter's bytes will be tracked in the registered breaker when configured.
      • mightContain

        public boolean mightContain​(org.apache.lucene.util.BytesRef value)
        Returns true if the set might contain the provided value, false otherwise. False values are 100% accurate, while true values may be a false-positive.
      • mightContain

        public boolean mightContain​(long value)
        Returns true if the set might contain the provided value, false otherwise. False values are 100% accurate, while true values may be a false-positive.
      • add

        public void add​(org.apache.lucene.util.BytesRef value)
        Add's the provided value to the set for tracking
      • add

        public void add​(long value)
        Add's the provided value to the set for tracking
      • getSizeInBytes

        public long getSizeInBytes()
        Get the approximate size of this datastructure. Approximate because only the Set occupants are tracked, not the overhead of the Set itself.
      • merge

        public void merge​(SetBackedScalingCuckooFilter other)
        Merge `other` cuckoo filter into this cuckoo. After merging, this filter's state will be the union of the two. During the merging process, the internal Set may be upgraded to a cuckoo if it goes over threshold
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • equals

        public boolean equals​(java.lang.Object other)
        Overrides:
        equals in class java.lang.Object