org.elasticsearch.tdigest.TDigest

Direct Known Subclasses:: AbstractTDigest

public abstract class TDigest extends Object

Adaptive histogram based on something like streaming k-means crossed with Q-digest. The special characteristics of this algorithm are: - smaller summaries than Q-digest - works on doubles as well as integers. - provides part per million accuracy for extreme quantiles and typically <1000 ppm accuracy for middle quantiles - fast - simple - test coverage roughly at 90% - easy to adapt for use with map-reduce

Field Summary

Fields

Modifier and Type

Field

Description

protected ScaleFunction

scale
Constructor Summary

Constructors

Constructor

Description

TDigest()
Method Summary

Modifier and Type

Method

Description

final void

add(double x)

Add a single sample to this TDigest.

abstract void

add(double x, long w)

Adds a sample to a histogram.

abstract void

add(TDigest other)

Add all of the centroids of another TDigest to this one.

abstract int

byteSize()

Returns the number of bytes required to encode this TDigest using #asBytes().

abstract double

cdf(double x)

Returns the fraction of all points added which are ≤ x.

abstract int

centroidCount()

abstract Collection<Centroid>

centroids()

A Collection that lets you go through the centroids in ascending order by mean.

abstract void

compress()

Re-examines a t-digest to determine whether some centroids are redundant.

abstract double

compression()

Returns the current compression factor.

static TDigest

createAvlTreeDigest(double compression)

Creates an AVLTreeDigest.

static TDigest

createHybridDigest(double compression)

Creates a HybridDigest.

static TDigest

createMergingDigest(double compression)

Creates an MergingDigest.

static TDigest

createSortingDigest()

Creates a SortingDigest.

double

getMax()

double

getMin()

abstract double

quantile(double q)

Returns an estimate of a cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.

void

reserve(long size)

Prepare internal structure for loading the requested number of samples.

void

setScaleFunction(ScaleFunction scaleFunction)

abstract long

size()

Returns the number of points that have been added to this TDigest.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- scale
  
  protected ScaleFunction scale
Constructor Details
- TDigest
  
  public TDigest()
Method Details
- createMergingDigest
  
  public static TDigest createMergingDigest(double compression)
  
  Creates an MergingDigest. This is the fastest implementation for large sample populations, with constant memory allocation while delivering relating accuracy close to 1%.
  
  Parameters:
  
  compression - The compression parameter. 100 is a common value for normal uses. 1000 is extremely large. The number of centroids retained will be a smallish (usually less than 10) multiple of this number.
  
  Returns:
  
  the MergingDigest
- createAvlTreeDigest
  
  public static TDigest createAvlTreeDigest(double compression)
  
  Creates an AVLTreeDigest. This is the most accurate implementation, delivering relative accuracy close to 0.01% for large sample populations. Still, its construction takes 2x-10x longer than MergingDigest, while its memory footprint increases (slowly) with the sample population size.
  
  Parameters:
  
  compression - The compression parameter. 100 is a common value for normal uses. 1000 is extremely large. The number of centroids retained will be a smallish (usually less than 10) multiple of this number.
  
  Returns:
  
  the AvlTreeDigest
- createSortingDigest
  
  public static TDigest createSortingDigest()
  
  Creates a SortingDigest. SortingDigest is the most accurate and an extremely fast implementation but stores all samples internally so it uses much more memory than the rest, for sample populations of 1000 or higher.
  
  Returns:
  
  the SortingDigest
- createHybridDigest
  
  public static TDigest createHybridDigest(double compression)
  
  Creates a HybridDigest. HybridDigest uses a SortingDigest for small sample populations, then switches to a MergingDigest, thus combining the best of both implementations: fastest overall, small footprint and perfect accuracy for small populations, constant memory footprint and acceptable accuracy for larger ones.
  
  Parameters:
  
  compression - The compression parameter. 100 is a common value for normal uses. 1000 is extremely large. The number of centroids retained will be a smallish (usually less than 10) multiple of this number.
  
  Returns:
  
  the HybridDigest
- add
  
  public abstract void add(double x, long w)
  
  Adds a sample to a histogram.
  
  Parameters:
  
  x - The value to add.
  
  w - The weight of this point.
- add
  
  public final void add(double x)
  
  Add a single sample to this TDigest.
  
  Parameters:
  
  x - The data value to add
- compress
  
  public abstract void compress()
  
  Re-examines a t-digest to determine whether some centroids are redundant. If your data are perversely ordered, this may be a good idea. Even if not, this may save 20% or so in space. The cost is roughly the same as adding as many data points as there are centroids. This is typically < 10 * compression, but could be as high as 100 * compression. This is a destructive operation that is not thread-safe.
- size
  
  public abstract long size()
  
  Returns the number of points that have been added to this TDigest.
  
  Returns:
  
  The sum of the weights on all centroids.
- cdf
  
  public abstract double cdf(double x)
  
  Returns the fraction of all points added which are ≤ x. Points that are exactly equal get half credit (i.e. we use the mid-point rule)
  
  Parameters:
  
  x - The cutoff for the cdf.
  
  Returns:
  
  The fraction of all data which is less or equal to x.
- quantile
  
  public abstract double quantile(double q)
  
  Returns an estimate of a cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
  
  Parameters:
  
  q - The desired fraction
  
  Returns:
  
  The smallest value x such that cdf(x) ≥ q
- centroids
  
  public abstract Collection<Centroid> centroids()
  
  A Collection that lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest.
  
  Returns:
  
  The centroids in the form of a Collection.
- compression
  
  public abstract double compression()
  
  Returns the current compression factor.
  
  Returns:
  
  The compression factor originally used to set up the TDigest.
- byteSize
  
  public abstract int byteSize()
  
  Returns the number of bytes required to encode this TDigest using #asBytes().
  
  Returns:
  
  The number of bytes required.
- setScaleFunction
  
  public void setScaleFunction(ScaleFunction scaleFunction)
- add
  
  public abstract void add(TDigest other)
  
  Add all of the centroids of another TDigest to this one.
  
  Parameters:
  
  other - The other TDigest
- centroidCount
  
  public abstract int centroidCount()
- reserve
  
  public void reserve(long size)
  
  Prepare internal structure for loading the requested number of samples.
  
  Parameters:
  
  size - number of samples to be loaded
- getMin
  
  public double getMin()
- getMax
  
  public double getMax()

Class TDigest

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

scale

Constructor Details

TDigest

Method Details

createMergingDigest

createAvlTreeDigest

createSortingDigest

createHybridDigest

add

add

compress

size

cdf

quantile

centroids

compression

byteSize

setScaleFunction

add

centroidCount

reserve

getMin

getMax