java.lang.Object

co.elastic.clients.elasticsearch._types.aggregations.AggregationBase

co.elastic.clients.elasticsearch._types.aggregations.CategorizeTextAggregation

All Implemented Interfaces:: AggregationVariant, JsonpSerializable

@JsonpDeserializable
public class CategorizeTextAggregation
extends AggregationBase
implements AggregationVariant

A multi-bucket aggregation that groups semi-structured text into buckets. Each text field is re-analyzed using a custom analyzer. The resulting tokens are then categorized creating buckets of similarly formatted text values. This aggregation works best with machine generated text like system logs. Only the first 100 analyzed tokens are used to categorize the text.

See Also:: API specification

Nested Class Summary

Nested Classes

Modifier and Type Class Description

static class CategorizeTextAggregation.Builder
Builder for CategorizeTextAggregation.

Nested classes/interfaces inherited from class co.elastic.clients.elasticsearch._types.aggregations.AggregationBase
AggregationBase.AbstractBuilder<BuilderT extends AggregationBase.AbstractBuilder<BuilderT>>
Field Summary

Fields

Modifier and Type Field Description

static JsonpDeserializer<CategorizeTextAggregation> _DESERIALIZER
Json deserializer for CategorizeTextAggregation

Method Summary

Modifier and Type	Method	Description
`Aggregation.Kind`	`_aggregationKind()`	Aggregation variant kind.
`CategorizeTextAnalyzer`	`categorizationAnalyzer()`	The categorization analyzer specifies how the text is analyzed and tokenized before being categorized.
`java.util.List<java.lang.String>`	`categorizationFilters()`	This property expects an array of regular expressions.
`java.lang.String`	`field()`	Required - The semi-structured text field to categorize.
`java.lang.Integer`	`maxMatchedTokens()`	The maximum number of token positions to match on before attempting to merge categories.
`java.lang.Integer`	`maxUniqueTokens()`	The maximum number of unique tokens at any position up to max_matched_tokens.
`java.lang.Integer`	`minDocCount()`	The minimum number of documents for a bucket to be returned to the results.
`static CategorizeTextAggregation`	`of(java.util.function.Function<CategorizeTextAggregation.Builder,ObjectBuilder<CategorizeTextAggregation>> fn)`
`protected void`	`serializeInternal(jakarta.json.stream.JsonGenerator generator, JsonpMapper mapper)`
`protected static void`	`setupCategorizeTextAggregationDeserializer(ObjectDeserializer<CategorizeTextAggregation.Builder> op)`
`java.lang.Integer`	`shardMinDocCount()`	The minimum number of documents for a bucket to be returned from the shard before merging.
`java.lang.Integer`	`shardSize()`	The number of categorization buckets to return from each shard before merging all the results.
`java.lang.Integer`	`similarityThreshold()`	The minimum percentage of tokens that must match for text to be added to the category bucket.
`java.lang.Integer`	`size()`	The number of buckets to return.

Methods inherited from class co.elastic.clients.elasticsearch._types.aggregations.AggregationBase

meta, name, serialize, setupAggregationBaseDeserializer, toString

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface co.elastic.clients.elasticsearch._types.aggregations.AggregationVariant

_toAggregation

Field Details
- _DESERIALIZER
  
  public static final JsonpDeserializer<CategorizeTextAggregation> _DESERIALIZER
  
  Json deserializer for CategorizeTextAggregation
Method Details
- of
  
  public static CategorizeTextAggregation of(java.util.function.Function<CategorizeTextAggregation.Builder,ObjectBuilder<CategorizeTextAggregation>> fn)
- _aggregationKind
  
  public Aggregation.Kind _aggregationKind()
  
  Aggregation variant kind.
  
  Specified by:
  
  _aggregationKind in interface AggregationVariant
- field
  
  public final java.lang.String field()
  
  Required - The semi-structured text field to categorize.
  API name: field
- maxUniqueTokens
  
  @Nullable public final java.lang.Integer maxUniqueTokens()
  
  The maximum number of unique tokens at any position up to max_matched_tokens. Must be larger than 1. Smaller values use less memory and create fewer categories. Larger values will use more memory and create narrower categories. Max allowed value is 100.
  API name: max_unique_tokens
- maxMatchedTokens
  
  @Nullable public final java.lang.Integer maxMatchedTokens()
  
  The maximum number of token positions to match on before attempting to merge categories. Larger values will use more memory and create narrower categories. Max allowed value is 100.
  API name: max_matched_tokens
- similarityThreshold
  
  @Nullable public final java.lang.Integer similarityThreshold()
  
  The minimum percentage of tokens that must match for text to be added to the category bucket. Must be between 1 and 100. The larger the value the narrower the categories. Larger values will increase memory usage and create narrower categories.
  API name: similarity_threshold
- categorizationFilters
  
  public final java.util.List<java.lang.String> categorizationFilters()
  
  This property expects an array of regular expressions. The expressions are used to filter out matching sequences from the categorization field values. You can use this functionality to fine tune the categorization by excluding sequences from consideration when categories are defined. For example, you can exclude SQL statements that appear in your log files. This property cannot be used at the same time as categorization_analyzer. If you only want to define simple regular expression filters that are applied prior to tokenization, setting this property is the easiest method. If you also want to customize the tokenizer or post-tokenization filtering, use the categorization_analyzer property instead and include the filters as pattern_replace character filters.
  API name: categorization_filters
- categorizationAnalyzer
  
  @Nullable public final CategorizeTextAnalyzer categorizationAnalyzer()
  
  The categorization analyzer specifies how the text is analyzed and tokenized before being categorized. The syntax is very similar to that used to define the analyzer in the Analyze endpoint. This property cannot be used at the same time as categorization_filters.
  API name: categorization_analyzer
- shardSize
  
  @Nullable public final java.lang.Integer shardSize()
  
  The number of categorization buckets to return from each shard before merging all the results.
  API name: shard_size
- size
  
  @Nullable public final java.lang.Integer size()
  
  The number of buckets to return.
  API name: size
- minDocCount
  
  @Nullable public final java.lang.Integer minDocCount()
  
  The minimum number of documents for a bucket to be returned to the results.
  API name: min_doc_count
- shardMinDocCount
  
  @Nullable public final java.lang.Integer shardMinDocCount()
  
  The minimum number of documents for a bucket to be returned from the shard before merging.
  API name: shard_min_doc_count
- serializeInternal
  
  protected void serializeInternal(jakarta.json.stream.JsonGenerator generator, JsonpMapper mapper)
  
  Overrides:
  
  serializeInternal in class AggregationBase
- setupCategorizeTextAggregationDeserializer
  
  protected static void setupCategorizeTextAggregationDeserializer(ObjectDeserializer<CategorizeTextAggregation.Builder> op)

Class CategorizeTextAggregation

Nested Class Summary

Nested classes/interfaces inherited from class co.elastic.clients.elasticsearch._types.aggregations.AggregationBase

Field Summary

Method Summary

Methods inherited from class co.elastic.clients.elasticsearch._types.aggregations.AggregationBase

Methods inherited from class java.lang.Object

Methods inherited from interface co.elastic.clients.elasticsearch._types.aggregations.AggregationVariant

Field Details

_DESERIALIZER

Method Details

of

_aggregationKind

field

maxUniqueTokens

maxMatchedTokens

similarityThreshold

categorizationFilters

categorizationAnalyzer

shardSize

size

minDocCount

shardMinDocCount

serializeInternal

setupCategorizeTextAggregationDeserializer