Class DataDescription

  • All Implemented Interfaces:
    ToXContent, ToXContentObject

    public class DataDescription
    extends java.lang.Object
    implements ToXContentObject
    Describes the format of the data used in the job and how it should be interpreted by the ML job.

    getTimeField() is the name of the field containing the timestamp and getTimeFormat() is the format code for the date string in as described by DateTimeFormatter.

    • Field Detail

      • EPOCH

        public static final java.lang.String EPOCH
        Special time format string for epoch times (seconds)
        See Also:
        Constant Field Values
      • EPOCH_MS

        public static final java.lang.String EPOCH_MS
        Special time format string for epoch times (milli-seconds)
        See Also:
        Constant Field Values
      • DEFAULT_TIME_FIELD

        public static final java.lang.String DEFAULT_TIME_FIELD
        By default autodetect expects the timestamp in a field with this name
        See Also:
        Constant Field Values
      • DEFAULT_DELIMITER

        public static final char DEFAULT_DELIMITER
        The default field delimiter expected by the native autodetect program.
        See Also:
        Constant Field Values
      • DEFAULT_QUOTE_CHAR

        public static final char DEFAULT_QUOTE_CHAR
        The default quote character used to escape text in delimited data formats
        See Also:
        Constant Field Values
    • Constructor Detail

      • DataDescription

        public DataDescription​(DataDescription.DataFormat dataFormat,
                               java.lang.String timeFieldName,
                               java.lang.String timeFormat,
                               java.lang.Character fieldDelimiter,
                               java.lang.Character quoteCharacter)
    • Method Detail

      • getTimeField

        public java.lang.String getTimeField()
        The name of the field containing the timestamp
        Returns:
        A String if set or null
      • getTimeFormat

        public java.lang.String getTimeFormat()
        Either "epoch", "epoch_ms" or a SimpleDateTime format string. If not set (is null or an empty string) or set to "epoch_ms" (the default) then the date is assumed to be in milliseconds from the epoch.
        Returns:
        A String if set or null
      • getFieldDelimiter

        public java.lang.Character getFieldDelimiter()
        If the data is in a delimited format with a header e.g. csv or tsv this is the delimiter character used. This is only applicable if getFormat() is DataDescription.DataFormat.DELIMITED. The default value for delimited format is 9.
        Returns:
        A char
      • getQuoteCharacter

        public java.lang.Character getQuoteCharacter()
        The quote character used in delimited formats. The default value for delimited format is 34.
        Returns:
        The delimited format quote character
      • equals

        public boolean equals​(java.lang.Object other)
        Overridden equality test
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object