Class AVLTreeDigest


public class AVLTreeDigest extends AbstractTDigest
  • Constructor Details

    • AVLTreeDigest

      public AVLTreeDigest(double compression)
      A histogram structure that will record a sketch of a distribution.
      Parameters:
      compression - How should accuracy be traded for size? A value of N here will give quantile errors almost always less than 3/N with considerably smaller errors expected for extreme quantiles. Conversely, you should expect to track about 5 N centroids for this accuracy.
  • Method Details

    • setRandomSeed

      public void setRandomSeed(long seed)
      Sets the seed for the RNG. In cases where a predicatable tree should be created, this function may be used to make the randomness in this AVLTree become more deterministic.
      Parameters:
      seed - The random seed to use for RNG purposes
    • centroidCount

      public int centroidCount()
      Specified by:
      centroidCount in class TDigest
    • add

      public void add(double x, long w)
      Description copied from class: TDigest
      Adds a sample to a histogram.
      Specified by:
      add in class TDigest
      Parameters:
      x - The value to add.
      w - The weight of this point.
    • compress

      public void compress()
      Description copied from class: TDigest
      Re-examines a t-digest to determine whether some centroids are redundant. If your data are perversely ordered, this may be a good idea. Even if not, this may save 20% or so in space. The cost is roughly the same as adding as many data points as there are centroids. This is typically < 10 * compression, but could be as high as 100 * compression. This is a destructive operation that is not thread-safe.
      Specified by:
      compress in class TDigest
    • size

      public long size()
      Returns the number of samples represented in this histogram. If you want to know how many centroids are being used, try centroids().size().
      Specified by:
      size in class TDigest
      Returns:
      the number of samples that have been added.
    • cdf

      public double cdf(double x)
      Description copied from class: TDigest
      Returns the fraction of all points added which are ≤ x. Points that are exactly equal get half credit (i.e. we use the mid-point rule)
      Specified by:
      cdf in class TDigest
      Parameters:
      x - the value at which the CDF should be evaluated
      Returns:
      the approximate fraction of all samples that were less than or equal to x.
    • quantile

      public double quantile(double q)
      Description copied from class: TDigest
      Returns an estimate of a cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
      Specified by:
      quantile in class TDigest
      Parameters:
      q - The quantile desired. Can be in the range [0,1].
      Returns:
      The minimum value x such that we think that the proportion of samples is ≤ x is q.
    • centroids

      public Collection<Centroid> centroids()
      Description copied from class: TDigest
      A Collection that lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest.
      Specified by:
      centroids in class TDigest
      Returns:
      The centroids in the form of a Collection.
    • compression

      public double compression()
      Description copied from class: TDigest
      Returns the current compression factor.
      Specified by:
      compression in class TDigest
      Returns:
      The compression factor originally used to set up the TDigest.
    • byteSize

      public int byteSize()
      Returns an upper bound on the number bytes that will be required to represent this histogram.
      Specified by:
      byteSize in class TDigest
      Returns:
      The number of bytes required.