Enum Class ScaleFunction

java.lang.Object
java.lang.Enum<ScaleFunction>
org.elasticsearch.tdigest.ScaleFunction
All Implemented Interfaces:
Serializable, Comparable<ScaleFunction>, Constable

public enum ScaleFunction extends Enum<ScaleFunction>
Encodes the various scale functions for t-digests. These limits trade accuracy near the tails against accuracy near the median in different ways. For instance, K_0 has uniform cluster sizes and results in constant accuracy (in terms of q) while K_3 has cluster sizes proportional to min(q,1-q) which results in very much smaller error near the tails and modestly increased error near the median.

The base forms (K_0, K_1, K_2 and K_3) all result in t-digests limited to a number of clusters equal to the compression factor. The K_2_NO_NORM and K_3_NO_NORM versions result in the cluster count increasing roughly with log(n).

  • Nested Class Summary

    Nested classes/interfaces inherited from class java.lang.Enum

    Enum.EnumDesc<E extends Enum<E>>
  • Enum Constant Summary

    Enum Constants
    Enum Constant
    Description
    Generates uniform cluster sizes.
    Generates cluster sizes proportional to sqrt(q*(1-q)).
    Generates cluster sizes proportional to sqrt(q*(1-q)) but avoids computation of asin in the critical path by using an approximate version.
    Generates cluster sizes proportional to q*(1-q).
    Generates cluster sizes proportional to q*(1-q).
    Generates cluster sizes proportional to min(q, 1-q).
    Generates cluster sizes proportional to min(q, 1-q).
  • Method Summary

    Modifier and Type
    Method
    Description
    abstract double
    k(double q, double normalizer)
    Converts a quantile to the k-scale.
    abstract double
    k(double q, double compression, double n)
    Converts a quantile to the k-scale.
    abstract double
    max(double q, double normalizer)
    Computes the maximum relative size a cluster can have at quantile q.
    abstract double
    max(double q, double compression, double n)
    Computes the maximum relative size a cluster can have at quantile q.
    abstract double
    normalizer(double compression, double n)
    Computes the normalizer given compression and number of points.
    abstract double
    q(double k, double normalizer)
    Computes q as a function of k.
    abstract double
    q(double k, double compression, double n)
    Computes q as a function of k.
    Returns the enum constant of this class with the specified name.
    static ScaleFunction[]
    Returns an array containing the constants of this enum class, in the order they are declared.

    Methods inherited from class java.lang.Object

    getClass, notify, notifyAll, wait, wait, wait
  • Enum Constant Details

    • K_0

      public static final ScaleFunction K_0
      Generates uniform cluster sizes. Used for comparison only.
    • K_1

      public static final ScaleFunction K_1
      Generates cluster sizes proportional to sqrt(q*(1-q)). This gives constant relative accuracy if accuracy is proportional to squared cluster size. It is expected that K_2 and K_3 will give better practical results.
    • K_1_FAST

      public static final ScaleFunction K_1_FAST
      Generates cluster sizes proportional to sqrt(q*(1-q)) but avoids computation of asin in the critical path by using an approximate version.
    • K_2

      public static final ScaleFunction K_2
      Generates cluster sizes proportional to q*(1-q). This makes tail error bounds tighter than for K_1. The use of a normalizing function results in a strictly bounded number of clusters no matter how many samples.
    • K_3

      public static final ScaleFunction K_3
      Generates cluster sizes proportional to min(q, 1-q). This makes tail error bounds tighter than for K_1 or K_2. The use of a normalizing function results in a strictly bounded number of clusters no matter how many samples.
    • K_2_NO_NORM

      public static final ScaleFunction K_2_NO_NORM
      Generates cluster sizes proportional to q*(1-q). This makes the tail error bounds tighter. This version does not use a normalizer function and thus the number of clusters increases roughly proportional to log(n). That is good for accuracy, but bad for size and bad for the statically allocated MergingDigest, but can be useful for tree-based implementations.
    • K_3_NO_NORM

      public static final ScaleFunction K_3_NO_NORM
      Generates cluster sizes proportional to min(q, 1-q). This makes the tail error bounds tighter. This version does not use a normalizer function and thus the number of clusters increases roughly proportional to log(n). That is good for accuracy, but bad for size and bad for the statically allocated MergingDigest, but can be useful for tree-based implementations.
  • Method Details

    • values

      public static ScaleFunction[] values()
      Returns an array containing the constants of this enum class, in the order they are declared.
      Returns:
      an array containing the constants of this enum class, in the order they are declared
    • valueOf

      public static ScaleFunction valueOf(String name)
      Returns the enum constant of this class with the specified name. The string must match exactly an identifier used to declare an enum constant in this class. (Extraneous whitespace characters are not permitted.)
      Parameters:
      name - the name of the enum constant to be returned.
      Returns:
      the enum constant with the specified name
      Throws:
      IllegalArgumentException - if this enum class has no constant with the specified name
      NullPointerException - if the argument is null
    • k

      public abstract double k(double q, double compression, double n)
      Converts a quantile to the k-scale. The total number of points is also provided so that a normalizing function can be computed if necessary.
      Parameters:
      q - The quantile
      compression - Also known as delta in literature on the t-digest
      n - The total number of samples
      Returns:
      The corresponding value of k
    • k

      public abstract double k(double q, double normalizer)
      Converts a quantile to the k-scale. The normalizer value depends on compression and (possibly) number of points in the digest. #normalizer(double, double)
      Parameters:
      q - The quantile
      normalizer - The normalizer value which depends on compression and (possibly) number of points in the digest.
      Returns:
      The corresponding value of k
    • q

      public abstract double q(double k, double compression, double n)
      Computes q as a function of k. This is often faster than finding k as a function of q for some scales.
      Parameters:
      k - The index value to convert into q scale.
      compression - The compression factor (often written as δ)
      n - The number of samples already in the digest.
      Returns:
      The value of q that corresponds to k
    • q

      public abstract double q(double k, double normalizer)
      Computes q as a function of k. This is often faster than finding k as a function of q for some scales.
      Parameters:
      k - The index value to convert into q scale.
      normalizer - The normalizer value which depends on compression and (possibly) number of points in the digest.
      Returns:
      The value of q that corresponds to k
    • max

      public abstract double max(double q, double compression, double n)
      Computes the maximum relative size a cluster can have at quantile q. Note that exactly where within the range spanned by a cluster that q should be isn't clear. That means that this function usually has to be taken at multiple points and the smallest value used.

      Note that this is the relative size of a cluster. To get the max number of samples in the cluster, multiply this value times the total number of samples in the digest.

      Parameters:
      q - The quantile
      compression - The compression factor, typically delta in the literature
      n - The number of samples seen so far in the digest
      Returns:
      The maximum number of samples that can be in the cluster
    • max

      public abstract double max(double q, double normalizer)
      Computes the maximum relative size a cluster can have at quantile q. Note that exactly where within the range spanned by a cluster that q should be isn't clear. That means that this function usually has to be taken at multiple points and the smallest value used.

      Note that this is the relative size of a cluster. To get the max number of samples in the cluster, multiply this value times the total number of samples in the digest.

      Parameters:
      q - The quantile
      normalizer - The normalizer value which depends on compression and (possibly) number of points in the digest.
      Returns:
      The maximum number of samples that can be in the cluster
    • normalizer

      public abstract double normalizer(double compression, double n)
      Computes the normalizer given compression and number of points.
      Parameters:
      compression - The compression parameter for the digest
      n - The number of samples seen so far
      Returns:
      The normalizing factor for the scale function