- All Implemented Interfaces:
Serializable
,Comparable<ScaleFunction>
,Constable
Encodes the various scale functions for t-digests. These limits trade accuracy near the tails against accuracy near
the median in different ways. For instance, K_0 has uniform cluster sizes and results in constant accuracy (in terms
of q) while K_3 has cluster sizes proportional to min(q,1-q) which results in very much smaller error near the tails
and modestly increased error near the median.
The base forms (K_0, K_1, K_2 and K_3) all result in t-digests limited to a number of clusters equal to the compression factor. The K_2_NO_NORM and K_3_NO_NORM versions result in the cluster count increasing roughly with log(n).
-
Nested Class Summary
Nested classes/interfaces inherited from class java.lang.Enum
Enum.EnumDesc<E extends Enum<E>>
-
Enum Constant Summary
Enum ConstantsEnum ConstantDescriptionGenerates uniform cluster sizes.Generates cluster sizes proportional to sqrt(q*(1-q)).Generates cluster sizes proportional to sqrt(q*(1-q)) but avoids computation of asin in the critical path by using an approximate version.Generates cluster sizes proportional to q*(1-q).Generates cluster sizes proportional to q*(1-q).Generates cluster sizes proportional to min(q, 1-q).Generates cluster sizes proportional to min(q, 1-q). -
Method Summary
Modifier and TypeMethodDescriptionabstract double
k
(double q, double normalizer) Converts a quantile to the k-scale.abstract double
k
(double q, double compression, double n) Converts a quantile to the k-scale.abstract double
max
(double q, double normalizer) Computes the maximum relative size a cluster can have at quantile q.abstract double
max
(double q, double compression, double n) Computes the maximum relative size a cluster can have at quantile q.abstract double
normalizer
(double compression, double n) Computes the normalizer given compression and number of points.abstract double
q
(double k, double normalizer) Computes q as a function of k.abstract double
q
(double k, double compression, double n) Computes q as a function of k.static ScaleFunction
Returns the enum constant of this class with the specified name.static ScaleFunction[]
values()
Returns an array containing the constants of this enum class, in the order they are declared.
-
Enum Constant Details
-
K_0
Generates uniform cluster sizes. Used for comparison only. -
K_1
Generates cluster sizes proportional to sqrt(q*(1-q)). This gives constant relative accuracy if accuracy is proportional to squared cluster size. It is expected that K_2 and K_3 will give better practical results. -
K_1_FAST
Generates cluster sizes proportional to sqrt(q*(1-q)) but avoids computation of asin in the critical path by using an approximate version. -
K_2
Generates cluster sizes proportional to q*(1-q). This makes tail error bounds tighter than for K_1. The use of a normalizing function results in a strictly bounded number of clusters no matter how many samples. -
K_3
Generates cluster sizes proportional to min(q, 1-q). This makes tail error bounds tighter than for K_1 or K_2. The use of a normalizing function results in a strictly bounded number of clusters no matter how many samples. -
K_2_NO_NORM
Generates cluster sizes proportional to q*(1-q). This makes the tail error bounds tighter. This version does not use a normalizer function and thus the number of clusters increases roughly proportional to log(n). That is good for accuracy, but bad for size and bad for the statically allocated MergingDigest, but can be useful for tree-based implementations. -
K_3_NO_NORM
Generates cluster sizes proportional to min(q, 1-q). This makes the tail error bounds tighter. This version does not use a normalizer function and thus the number of clusters increases roughly proportional to log(n). That is good for accuracy, but bad for size and bad for the statically allocated MergingDigest, but can be useful for tree-based implementations.
-
-
Method Details
-
values
Returns an array containing the constants of this enum class, in the order they are declared.- Returns:
- an array containing the constants of this enum class, in the order they are declared
-
valueOf
Returns the enum constant of this class with the specified name. The string must match exactly an identifier used to declare an enum constant in this class. (Extraneous whitespace characters are not permitted.)- Parameters:
name
- the name of the enum constant to be returned.- Returns:
- the enum constant with the specified name
- Throws:
IllegalArgumentException
- if this enum class has no constant with the specified nameNullPointerException
- if the argument is null
-
k
public abstract double k(double q, double compression, double n) Converts a quantile to the k-scale. The total number of points is also provided so that a normalizing function can be computed if necessary.- Parameters:
q
- The quantilecompression
- Also known as delta in literature on the t-digestn
- The total number of samples- Returns:
- The corresponding value of k
-
k
public abstract double k(double q, double normalizer) Converts a quantile to the k-scale. The normalizer value depends on compression and (possibly) number of points in the digest. #normalizer(double, double)- Parameters:
q
- The quantilenormalizer
- The normalizer value which depends on compression and (possibly) number of points in the digest.- Returns:
- The corresponding value of k
-
q
public abstract double q(double k, double compression, double n) Computes q as a function of k. This is often faster than finding k as a function of q for some scales.- Parameters:
k
- The index value to convert into q scale.compression
- The compression factor (often written as δ)n
- The number of samples already in the digest.- Returns:
- The value of q that corresponds to k
-
q
public abstract double q(double k, double normalizer) Computes q as a function of k. This is often faster than finding k as a function of q for some scales.- Parameters:
k
- The index value to convert into q scale.normalizer
- The normalizer value which depends on compression and (possibly) number of points in the digest.- Returns:
- The value of q that corresponds to k
-
max
public abstract double max(double q, double compression, double n) Computes the maximum relative size a cluster can have at quantile q. Note that exactly where within the range spanned by a cluster that q should be isn't clear. That means that this function usually has to be taken at multiple points and the smallest value used.Note that this is the relative size of a cluster. To get the max number of samples in the cluster, multiply this value times the total number of samples in the digest.
- Parameters:
q
- The quantilecompression
- The compression factor, typically delta in the literaturen
- The number of samples seen so far in the digest- Returns:
- The maximum number of samples that can be in the cluster
-
max
public abstract double max(double q, double normalizer) Computes the maximum relative size a cluster can have at quantile q. Note that exactly where within the range spanned by a cluster that q should be isn't clear. That means that this function usually has to be taken at multiple points and the smallest value used.Note that this is the relative size of a cluster. To get the max number of samples in the cluster, multiply this value times the total number of samples in the digest.
- Parameters:
q
- The quantilenormalizer
- The normalizer value which depends on compression and (possibly) number of points in the digest.- Returns:
- The maximum number of samples that can be in the cluster
-
normalizer
public abstract double normalizer(double compression, double n) Computes the normalizer given compression and number of points.- Parameters:
compression
- The compression parameter for the digestn
- The number of samples seen so far- Returns:
- The normalizing factor for the scale function
-