Behavioral Analytics and Anomaly Detection Template Macros (Beta)

Detect outliers with Panther-managed macros for behavioral analytics and anomaly detection

Overview

These template macros are in open beta and are available to all customers. The macros became available via CI/CD in panther-analysis v3.75, and available in the console in v3.80. Please share any bug reports and feature requests with your Panther support team.

Panther provides template macros for identifying anomalous and new values across your log data. Using the template macros, you can compare recent log events to historical data to identify activity deviating significantly from the established norm. These behavioral analytics and anomaly detection macros may be useful as part of your User and Entity Behavior Analytics (UEBA) strategy.

These macros work by comparing data in a recent time interval to data in a longer lookback window and determining the level of deviation between the two.

The macros Panther provides are:

  • statistical_anomaly: Identifies outlier values for a numerical field

    • Example: Find VPC hosts that have been sending an unusually high volume of traffic over the last hour, as compared to the last two weeks.

    • Learn more in statistical_anomaly.

  • statistical_anomaly_peer: Identifies outlier values for a numerical field within a peer group

    • Example: Identify attempts by a user to access a resource that is unusual for members of the same team.

  • new_unique_values: Identifies new values for a given entity

    • Example: Find API tokens that have accessed a resource in the last day they have not accessed in the previous 30 days.

    • Learn more in new_unique_values.

  • new_unique_values_peer: Identifies new values for a given entity within a peer group

    • Example: Detect if an EC2 instance has connected to an IP that is atypical for members of its VPC group.

Learn how to view the macros' source code below.

Enabling the macros

Before invoking the macros, you need to make them available in your Panther instance. The macros are provided in a Panther-managed Saved Search, which can be obtained:

To enable the macros in the Panther Console, enable the PantherManaged.Anomalies Pack:

  1. In the left-hand navigation bar of your Panther Console, click Detections, then Packs.

  2. In the Filter Pack by text field, enter "Anomaly."

  3. On the right side of the Panther Anomaly Detection Pack tile, set the Enabled toggle to ON.

Screenshot which shows to to enable a pack. "Detections" is selected from the left-side navbar, and the tab is set to "Packs". A user has searched for the string "anomaly" in the search bar, returning a single pack named "Panther Anomaly Detection Pack". The user has flipped the toggle widget for the pack at the right-side of the panel to "On".

How to use behavioral analytics and anomaly detection macros in Panther

You can invoke the Panther-managed behavioral analytics and anomaly detection macros in Data Explorer by following the instructions below. This process is similar to the Calling template macros in other queries instructions, but is specific to using the behavioral analytics and anomaly detection template macros.

  1. In the left-hand navigation bar of your Panther Console, click Investigate > Data Explorer.

  2. At the top of the SQL editor, add a -- pragma: template statement.

    Optionally, you can also add --pragma: show macro expanded. This will expand the macro into its source code, and is useful for troubleshooting issues with your query. See Debugging template macros for more information.

    -- pragma: template
    -- pragma: show macro expanded # Optional
  3. Import one of the available macros:

    {% import 'anomalies' <statistical_anomaly, new_unique_values, 
    statistical_anomaly_peer, OR new_unique_values_peer> %}
    -- Specify only one macro, not all four
  4. Define a subquery using a Common Table Expression (CTE). The subquery must:

    • SELECT at least:

      • p_event_time

      • An entity column. The entity is often an ID of some kind (such as an email address, user ID, application ID, or hostname), but can be any data type (such as an IP address). The column must be a top-level field. If the field is nested within an object or array, create an alias for the column with the AS keyword.

      • An aggregation column. For statistical_anomaly queries, this column's contents will be aggregated and scanned for unusual values. For new_unique_values queries, this column's contents will be scanned for new values. As with the entity column, this column must be a top-level field.

      • (If you're using statistical_anomaly_peer or new_unique_values_peer) A peer group field.

    • Using a WHERE clause, define the lookback window (i.e., the longer period of time that denotes the baseline against which the shorter window is compared). The lookback window should end at the current time; for this reason it is recommended to use p_occurs_since(). Learn more about p_occurs_since() here.

    with subquery as (
        select 
            user:email as email, -- entity column
            event_type, -- aggregation column
            p_event_time
        from mytable where p_occurs_since(30d)
    ),
  5. Below the subquery, invoke the macro:

    {{ <statistical_anomaly, new_unique_values, statistical_anomaly_peer, OR new_unique_values_peer>
    (<subquery>, '<entity_col>', ...) }}
  6. Click Run Search.

Going beyond ad-hoc searches

While Panther's behavioral analytics and anomaly detection queries are useful for threat hunting, they're more powerful when used as a monitoring system. The queries can be saved, set to run on a schedule, and attached to Scheduled Rules. In this way, you can get alerted whenever anomalous activity is observed.

Full examples

See full examples invoking all macros below.

Full example using statistical_anomaly

This query compares VPC traffic observed in the past hour to a baseline set by the past 24 hours, and alerts if any addresses have sent an unusual volume of outbound traffic, potentially indicating a data exfiltration action.

-- pragma: template

{% import 'anomalies' statistical_anomaly %}
WITH subquery AS (
  -- Look for outbound requests:
  SELECT
    p_event_time as p_timeline,
    concat(srcAddr,' -> ',dstAddr,':',dstPort) as traffic,
    *
  FROM
    panther_logs.public.aws_vpcflow
  WHERE
    p_occurs_since('7 day')
    AND dstAddr not like '10.%'
    AND dstPort < 1024
    AND flowDirection = 'egress'
    AND pktDstAwsService is null
),
{{statistical_anomaly('subquery', 'traffic', 'bytes', 'sum', '1', 'hour', 3)}}

Viewing the macro source code

After enabling the macros, you can view the behavioral analytics and anomaly detection macro source code in either your Panther Console or in the panther-analysis repository.

To view the macro source code in your Panther Console:

  1. In the left-hand navigation bar of your Panther Console, click Investigate > Saved Searches.

  2. Search for the Saved Search named anomalies, and click its name.

    • You will be directed to Data Explorer, where you can view the source code:

Peer group analysis

It's often useful to compare an entity's behavior specifically against the behavior of its peers. For example, has an engineer recently signed into an account that other engineers have not?

Panther provides peer versions of statistical_anomaly and new_unique_values to perform such analysis. In the peer versions, baseline statistics are calculated according to the peer group, then entity behavior is compared to the baseline.

These queries function similarly to the non-peer versions, with the addition of an extra parameter, group_field, to define the peer group. This should be a column whose value is used to group entities together. Some common examples of the group_field value are: user role, job department, VPC ID, and account ID.

Behavioral analytics and anomaly detection macro reference

Below, you can find reference information for how to use the macros provided by Panther. Unless otherwise specified, assume all input arguments must be provided in the order shown here.

statistical_anomaly

The statistical_anomaly macro looks over a dataset for unusual data points within a recent period. It takes a CTE as the base data set and compares the baseline activity for an entity over that period to the most recent activity by the same entity, and calculates how unusual this behavior is.

You must provide the base data set, specify which column contains the entity name and which column to use for data comparison. You must also define the size of the recent period in which to look for anomalies.

Input arguments

Each of the following arguments must be provided to the macro, in the order shown below.

Name
Data type
Desccription

subquery

String

Name of the CTE defined previously, which provides data for the macro to analyze.

entity_field

String

Name of the column to use for grouping; usually a name, IP address, or ID.

agg_field

String

Name of the column to search for outliers.

agg_func

String

Which SQL function to use to aggregate the data in agg_field within a time period. Common value are count, sum, and max.

tmag

String

Number of units for the lookback window in which to look for anomalies. i.e.: the 1 in "1 day".

tunit

String

Unit of time for the lookback window in which to look for anomalies. i.e.: the day in "1 day". Must be singular (no "s" at the end).

zscore

Number

Outlier threshold; results will not be returned unless their calculated zscore value is higher than this.

Returns

The table returned after executing the macro will have the following columns:

Name
Data type
Description

N

Number

The value of the data in agg_field, as aggregated by agg_func, for the given entity over the lookback period.

t1

Timestamp

Start of the lookback period.

t2

Timestamp

End of the lookback period.

<entity_field>

Any

Value of the chosen entity_field.

p_zscore

Number

Calculated zscore of the entity's activities over the lookback period. Higher zscore value means more anomalous.

p_mean

Number

Average value of the agg_field column for this entity over the data in subquery, excluding during the lookback period.

p_stddev

Number

Standard deviation of the agg_field column for this entity over the data in subquery, excluding during the lookback period. Larger p_stddev means the entity's activity was less consistent overall.

new_unique_values

The new_unique_values macro scans a data set and returns a set of new values from a chosen column present in a recent lookback period for a given entity.

Input arguments

The following arguments must be provided to the macro, in the order shown below.

Name
Data type
Description

subquery

String

Name of the CTE defined previously, which contains the base data to use for finding anomalies.

entity_field

String

Name of the column to use for grouping; usually a user name, IP address, or ID.

agg_field

String

Name of the column in which to search for new values.

interval

String

Size of the period in which to look for new values. Uses the same syntax as p_occurs_since.

Returns

The table returned after executing the macro will have the following columns:

Name
Data type
Description

<entity_field>

Any

Value of the defined entity_field column.

<agg_field>

Any

Any new values discovered in the agg_field column during the lookback period

statistical_anomaly_peer

Use this macro to determine unusual numerical behavior from an entity compared to its peer group. For example, checking an EC2 instance's traffic volume compared to others with the same tag, or access requests to an S3 object compared to other objects in the same bucket.

Input arguments

The following arguments must be provided to the macro, in the order shown below.

Name
Data type
Desccription

subquery

String

Name of the CTE defined previously, which provides data for the macro to analyze.

entity_field

String

Name of the column to use for identifying an entity; usually a name, IP address, or ID.

group_field

String

Name of the column to use to group entities; for example, a role name or an Account ID.

agg_field

String

Name of the column to search for outliers.

agg_func

String

Which SQL function to use to aggregate the data in agg_field within a time period. Common value are count, sum, and max.

tmag

String

Number of units for the lookback window in which to look for anomalies. i.e.: the 1 in "1 day".

tunit

String

Unit of time for the lookback window in which to look for anomalies. i.e.: the day in "1 day". Must be singular (no "s" at the end).

zscore

Number

Outlier threshold; results will not be returned unless their calculated zscore value is higher than this.

Returns

The table returned after executing the macro will have the following columns:

Name
Data type
Description

N

Number

The value of the data in agg_field, as aggregated by agg_func, for the given entity over the lookback period.

t1

Timestamp

Start of the lookback period.

t2

Timestamp

End of the lookback period.

<entity_field>

Any

Value of the chosen entity_field.

<group_field>

Any

Value of the chosen group_field.

p_zscore

Number

Calculated zscore of the entity's activities over the lookback period. Higher zscore value means more anomalous.

p_mean

Number

Average value of the agg_field column for this entity over the data in subquery, excluding during the lookback period.

p_stddev

Number

Standard deviation of the agg_field column for this entity over the data in subquery, excluding during the lookback period. Larger p_stddev means the entity's activity was less consistent overall.

new_unique_values_peer

Use this macro to identify when an entity has done something that hasn't previously been observed by a member of its peer group.

Input arguments

The following arguments must be provided to the macro, in the order shown below.

Name
Data type
Description

subquery

String

Name of the CTE defined previously, which contains the base data to use for finding anomalies.

entity_field

String

Name of the column to use for identifying an entity; usually a name, IP address, or ID.

group_field

String

Name of the column to use to group entities; for example, a role name or an Account ID.

agg_field

String

Name of the column in which to search for new values.

interval

String

Size of the lookback period in which to look for new values. Uses the same syntax as p_occurs_since.

Returns

The table returned after executing the macro will have the following columns:

Name
Data type
Description

<entity_field>

Any

Value of the defined entity_field column.

<group_field>

Any

Value of the defined group_field column.

<agg_field>

Any

Any new values discovered in the agg_field column during the lookback period

Last updated

Was this helpful?