org.elasticsearch.tdigest.ScaleFunction

All Implemented Interfaces:: Serializable, Comparable<ScaleFunction>, Constable

public enum ScaleFunction extends Enum<ScaleFunction>

Encodes the various scale functions for t-digests. These limits trade accuracy near the tails against accuracy near the median in different ways. For instance, K_0 has uniform cluster sizes and results in constant accuracy (in terms of q) while K_3 has cluster sizes proportional to min(q,1-q) which results in very much smaller error near the tails and modestly increased error near the median.

The base forms (K_0, K_1, K_2 and K_3) all result in t-digests limited to a number of clusters equal to the compression factor. The K_2_NO_NORM and K_3_NO_NORM versions result in the cluster count increasing roughly with log(n).

Nested Class Summary

Nested classes/interfaces inherited from class java.lang.Enum
Enum.EnumDesc<E extends Enum<E>>
Enum Constant Summary

Enum Constants

Enum Constant

Description

K_0

Generates uniform cluster sizes.

K_1

Generates cluster sizes proportional to sqrt(q*(1-q)).

K_1_FAST

Generates cluster sizes proportional to sqrt(q*(1-q)) but avoids computation of asin in the critical path by using an approximate version.

K_2

Generates cluster sizes proportional to q*(1-q).

K_2_NO_NORM

Generates cluster sizes proportional to q*(1-q).

K_3

Generates cluster sizes proportional to min(q, 1-q).

K_3_NO_NORM

Generates cluster sizes proportional to min(q, 1-q).
Method Summary

Modifier and Type

Method

Description

abstract double

k(double q, double normalizer)

Converts a quantile to the k-scale.

abstract double

k(double q, double compression, double n)

Converts a quantile to the k-scale.

abstract double

max(double q, double normalizer)

Computes the maximum relative size a cluster can have at quantile q.

abstract double

max(double q, double compression, double n)

Computes the maximum relative size a cluster can have at quantile q.

abstract double

normalizer(double compression, double n)

Computes the normalizer given compression and number of points.

abstract double

q(double k, double normalizer)

Computes q as a function of k.

abstract double

q(double k, double compression, double n)

Computes q as a function of k.

static ScaleFunction

valueOf(String name)

Returns the enum constant of this class with the specified name.

static ScaleFunction[]

values()

Returns an array containing the constants of this enum class, in the order they are declared.

Methods inherited from class java.lang.Enum
clone, compareTo, describeConstable, equals, finalize, getDeclaringClass, hashCode, name, ordinal, toString, valueOf

Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait

Enum Constant Details
- K_0
  
  public static final ScaleFunction K_0
  
  Generates uniform cluster sizes. Used for comparison only.
- K_1
  
  public static final ScaleFunction K_1
  
  Generates cluster sizes proportional to sqrt(q*(1-q)). This gives constant relative accuracy if accuracy is proportional to squared cluster size. It is expected that K_2 and K_3 will give better practical results.
- K_1_FAST
  
  public static final ScaleFunction K_1_FAST
  
  Generates cluster sizes proportional to sqrt(q*(1-q)) but avoids computation of asin in the critical path by using an approximate version.
- K_2
  
  public static final ScaleFunction K_2
  
  Generates cluster sizes proportional to q*(1-q). This makes tail error bounds tighter than for K_1. The use of a normalizing function results in a strictly bounded number of clusters no matter how many samples.
- K_3
  
  public static final ScaleFunction K_3
  
  Generates cluster sizes proportional to min(q, 1-q). This makes tail error bounds tighter than for K_1 or K_2. The use of a normalizing function results in a strictly bounded number of clusters no matter how many samples.
- K_2_NO_NORM
  
  public static final ScaleFunction K_2_NO_NORM
  
  Generates cluster sizes proportional to q*(1-q). This makes the tail error bounds tighter. This version does not use a normalizer function and thus the number of clusters increases roughly proportional to log(n). That is good for accuracy, but bad for size and bad for the statically allocated MergingDigest, but can be useful for tree-based implementations.
- K_3_NO_NORM
  
  public static final ScaleFunction K_3_NO_NORM
  
  Generates cluster sizes proportional to min(q, 1-q). This makes the tail error bounds tighter. This version does not use a normalizer function and thus the number of clusters increases roughly proportional to log(n). That is good for accuracy, but bad for size and bad for the statically allocated MergingDigest, but can be useful for tree-based implementations.
Method Details
- values
  
  public static ScaleFunction[] values()
  
  Returns an array containing the constants of this enum class, in the order they are declared.
  
  Returns:
  
  an array containing the constants of this enum class, in the order they are declared
- valueOf
  
  public static ScaleFunction valueOf(String name)
  
  Returns the enum constant of this class with the specified name. The string must match exactly an identifier used to declare an enum constant in this class. (Extraneous whitespace characters are not permitted.)
  
  Parameters:
  
  name - the name of the enum constant to be returned.
  
  Returns:
  
  the enum constant with the specified name
  
  Throws:
  
  IllegalArgumentException - if this enum class has no constant with the specified name
  
  NullPointerException - if the argument is null
- k
  
  public abstract double k(double q, double compression, double n)
  
  Converts a quantile to the k-scale. The total number of points is also provided so that a normalizing function can be computed if necessary.
  
  Parameters:
  
  q - The quantile
  
  compression - Also known as delta in literature on the t-digest
  
  n - The total number of samples
  
  Returns:
  
  The corresponding value of k
- k
  
  public abstract double k(double q, double normalizer)
  
  Converts a quantile to the k-scale. The normalizer value depends on compression and (possibly) number of points in the digest. #normalizer(double, double)
  
  Parameters:
  
  q - The quantile
  
  normalizer - The normalizer value which depends on compression and (possibly) number of points in the digest.
  
  Returns:
  
  The corresponding value of k
- q
  
  public abstract double q(double k, double compression, double n)
  
  Computes q as a function of k. This is often faster than finding k as a function of q for some scales.
  
  Parameters:
  
  k - The index value to convert into q scale.
  
  compression - The compression factor (often written as δ)
  
  n - The number of samples already in the digest.
  
  Returns:
  
  The value of q that corresponds to k
- q
  
  public abstract double q(double k, double normalizer)
  
  Computes q as a function of k. This is often faster than finding k as a function of q for some scales.
  
  Parameters:
  
  k - The index value to convert into q scale.
  
  normalizer - The normalizer value which depends on compression and (possibly) number of points in the digest.
  
  Returns:
  
  The value of q that corresponds to k
- max
  
  public abstract double max(double q, double compression, double n)
  
  Computes the maximum relative size a cluster can have at quantile q. Note that exactly where within the range spanned by a cluster that q should be isn't clear. That means that this function usually has to be taken at multiple points and the smallest value used.
  Note that this is the relative size of a cluster. To get the max number of samples in the cluster, multiply this value times the total number of samples in the digest.
  
  Parameters:
  
  q - The quantile
  
  compression - The compression factor, typically delta in the literature
  
  n - The number of samples seen so far in the digest
  
  Returns:
  
  The maximum number of samples that can be in the cluster
- max
  
  public abstract double max(double q, double normalizer)
  
  Computes the maximum relative size a cluster can have at quantile q. Note that exactly where within the range spanned by a cluster that q should be isn't clear. That means that this function usually has to be taken at multiple points and the smallest value used.
  Note that this is the relative size of a cluster. To get the max number of samples in the cluster, multiply this value times the total number of samples in the digest.
  
  Parameters:
  
  q - The quantile
  
  normalizer - The normalizer value which depends on compression and (possibly) number of points in the digest.
  
  Returns:
  
  The maximum number of samples that can be in the cluster
- normalizer
  
  public abstract double normalizer(double compression, double n)
  
  Computes the normalizer given compression and number of points.
  
  Parameters:
  
  compression - The compression parameter for the digest
  
  n - The number of samples seen so far
  
  Returns:
  
  The normalizing factor for the scale function

Enum Class ScaleFunction

Nested Class Summary

Nested classes/interfaces inherited from class java.lang.Enum

Enum Constant Summary

Method Summary

Methods inherited from class java.lang.Enum

Methods inherited from class java.lang.Object

Enum Constant Details

K_0

K_1

K_1_FAST

K_2

K_3

K_2_NO_NORM

K_3_NO_NORM

Method Details

values

valueOf

k

k

q

q

max

max

normalizer