Skip to main content

Generate your anomaly test with Elementary AI

Let our Slack chatbot create the anomaly test you need.
elementary.column_value_anomalies Monitors individual values of a column and detects anomalous rows by comparing each value against the historical distribution of that column. Unlike column_anomalies which computes aggregate metrics (like min, max, average) per time bucket and detects anomalies in the metric time series, column_value_anomalies operates directly on the raw column values — no additional aggregation functions are needed.

How it works

  1. Training: The test collects all values of the column within the training period and computes a baseline distribution (mean and standard deviation).
  2. Detection: For each row in the detection period, the test computes a z-score for the column value against the historical baseline.
  3. Anomaly flagging: Rows where the z-score exceeds the configured anomaly_sensitivity threshold are flagged as anomalous. If any anomalous rows are found, the test fails.
This test is designed for numeric columns. It detects individual row-level outliers, making it ideal for catching unexpected spikes or drops in values such as transaction amounts, prices, scores, or measurements.

When to use

Use caseRecommended test
Detect anomalies in aggregate statistics (avg, min, max) of a column over timecolumn_anomalies
Detect individual rows with anomalous column valuescolumn_value_anomalies

Test configuration

A timestamp_column is required for this test as it uses historical data to build the baseline distribution.
columns:
  - name: column name
    data_tests:
      - elementary.column_value_anomalies:
        arguments:
          timestamp_column: column name
          where_expression: sql expression
          anomaly_sensitivity: int
          anomaly_direction: [both | spike | drop]
          detection_period:
            period: [hour | day | week | month]
            count: int
          training_period:
            period: [hour | day | week | month]
            count: int
          seasonality: day_of_week
          detection_delay:
            period: [hour | day | week | month]
            count: int
models:
  - name: < model name >
    config:
      elementary:
        timestamp_column: < timestamp column >
    columns:
      - name: < column name >
        data_tests:
          - elementary.column_value_anomalies:
              arguments:
                where_expression: < sql expression >
                training_period:
                  period: < time period >
                  count: < number of periods >
                detection_period:
                  period: < time period >
                  count: < number of periods >