Behavioral Analytics and Anomaly Detection Template Macros (Beta)
Detect outliers with Panther-managed macros for behavioral analytics and anomaly detection
Last updated
Was this helpful?
Detect outliers with Panther-managed macros for behavioral analytics and anomaly detection
Last updated
Was this helpful?
Panther provides for identifying anomalous and new values across your log data. Using the template macros, you can compare recent log events to historical data to identify activity deviating significantly from the established norm. These behavioral analytics and anomaly detection macros may be useful as part of your User and Entity Behavior Analytics (UEBA) strategy.
These macros work by comparing data in a recent time interval to data in a longer lookback window and determining the level of deviation between the two.
The macros Panther provides are:
statistical_anomaly
: Identifies outlier values for a numerical field
Example: Find VPC hosts that have been sending an unusually high volume of traffic over the last hour, as compared to the last two weeks.
Learn more in .
statistical_anomaly_peer
: Identifies outlier values for a numerical field within a
Example: Identify attempts by a user to access a resource that is unusual for members of the same team.
Learn more in .
new_unique_values
: Identifies new values for a given entity
Example: Find API tokens that have accessed a resource in the last day they have not accessed in the previous 30 days.
Learn more in .
new_unique_values_peer
: Identifies new values for a given entity within a
Example: Detect if an EC2 instance has connected to an IP that is atypical for members of its VPC group.
Learn more in
In the left-hand navigation bar of your Panther Console, click Investigate > Data Explorer.
At the top of the SQL editor, add a -- pragma: template
statement.
Import one of the available macros:
SELECT
at least:
p_event_time
An entity column. The entity is often an ID of some kind (such as an email address, user ID, application ID, or hostname), but can be any data type (such as an IP address). The column must be a top-level field. If the field is nested within an object or array, create an alias for the column with the AS
keyword.
An aggregation column. For statistical_anomaly
queries, this column's contents will be aggregated and scanned for unusual values. For new_unique_values
queries, this column's contents will be scanned for new values. As with the entity column, this column must be a top-level field.
Below the subquery, invoke the macro:
Click Run Search.
See full examples invoking all macros below.
You can view the behavioral analytics and anomaly detection macro source code in either your Panther Console or in the panther-analysis repository.
To view the macro source code in your Panther Console:
In the left-hand navigation bar of your Panther Console, click Investigate > Saved Searches.
It's often useful to compare an entity's behavior specifically against the behavior of its peers. For example, has an engineer recently signed into an account that other engineers have not?
Panther provides peer versions of statistical_anomaly
and new_unique_values
to perform such analysis. In the peer versions, baseline statistics are calculated according to the peer group, then entity behavior is compared to the baseline.
These queries function similarly to the non-peer versions, with the addition of an extra parameter, group_field
, to define the peer group. This should be a column whose value is used to group entities together. Some common examples of the group_field
value are: user role, job department, VPC ID, and account ID.
Below, you can find reference information for how to use the macros provided by Panther. Unless otherwise specified, assume all input arguments must be provided in the order shown here.
The statistical_anomaly
macro looks over a dataset for unusual data points within a recent period. It takes a CTE as the base data set and compares the baseline activity for an entity over that period to the most recent activity by the same entity, and calculates how unusual this behavior is.
You must provide the base data set, specify which column contains the entity name and which column to use for data comparison. You must also define the size of the recent period in which to look for anomalies.
Each of the following arguments must be provided to the macro, in the order shown below.
subquery
String
Name of the CTE defined previously, which provides data for the macro to analyze.
entity_field
String
Name of the column to use for grouping; usually a name, IP address, or ID.
agg_field
String
Name of the column to search for outliers.
agg_func
String
Which SQL function to use to aggregate the data in agg_field
within a time period. Common value are count
, sum
, and max
.
tmag
String
Number of units for the lookback window in which to look for anomalies. i.e.: the 1 in "1 day".
tunit
String
Unit of time for the lookback window in which to look for anomalies. i.e.: the day in "1 day". Must be singular (no "s" at the end).
zscore
Number
Outlier threshold; results will not be returned unless their calculated zscore
value is higher than this.
The table returned after executing the macro will have the following columns:
N
Number
The value of the data in agg_field
, as aggregated by agg_func
, for the given entity over the lookback period.
t1
Timestamp
Start of the lookback period.
t2
Timestamp
End of the lookback period.
<entity_field>
Any
Value of the chosen entity_field
.
p_zscore
Number
Calculated zscore
of the entity's activities over the lookback period. Higher zscore
value means more anomalous.
p_mean
Number
Average value of the agg_field
column for this entity over the data in subquery
, excluding during the lookback period.
p_stddev
Number
Standard deviation of the agg_field
column for this entity over the data in subquery
, excluding during the lookback period. Larger p_stddev
means the entity's activity was less consistent overall.
The new_unique_values
macro scans a data set and returns a set of new values from a chosen column present in a recent lookback period for a given entity.
The following arguments must be provided to the macro, in the order shown below.
subquery
String
Name of the CTE defined previously, which contains the base data to use for finding anomalies.
entity_field
String
Name of the column to use for grouping; usually a user name, IP address, or ID.
agg_field
String
Name of the column in which to search for new values.
interval
String
The table returned after executing the macro will have the following columns:
<entity_field>
Any
Value of the defined entity_field
column.
<agg_field>
Any
Any new values discovered in the agg_field
column during the lookback period
The following arguments must be provided to the macro, in the order shown below.
subquery
String
Name of the CTE defined previously, which provides data for the macro to analyze.
entity_field
String
Name of the column to use for identifying an entity; usually a name, IP address, or ID.
group_field
String
Name of the column to use to group entities; for example, a role name or an Account ID.
agg_field
String
Name of the column to search for outliers.
agg_func
String
Which SQL function to use to aggregate the data in agg_field
within a time period. Common value are count
, sum
, and max
.
tmag
String
Number of units for the lookback window in which to look for anomalies. i.e.: the 1 in "1 day".
tunit
String
Unit of time for the lookback window in which to look for anomalies. i.e.: the day in "1 day". Must be singular (no "s" at the end).
zscore
Number
Outlier threshold; results will not be returned unless their calculated zscore
value is higher than this.
The table returned after executing the macro will have the following columns:
N
Number
The value of the data in agg_field
, as aggregated by agg_func
, for the given entity over the lookback period.
t1
Timestamp
Start of the lookback period.
t2
Timestamp
End of the lookback period.
<entity_field>
Any
Value of the chosen entity_field
.
<group_field>
Any
Value of the chosen group_field
.
p_zscore
Number
Calculated zscore
of the entity's activities over the lookback period. Higher zscore
value means more anomalous.
p_mean
Number
Average value of the agg_field
column for this entity over the data in subquery
, excluding during the lookback period.
p_stddev
Number
Standard deviation of the agg_field
column for this entity over the data in subquery
, excluding during the lookback period. Larger p_stddev
means the entity's activity was less consistent overall.
The following arguments must be provided to the macro, in the order shown below.
subquery
String
Name of the CTE defined previously, which contains the base data to use for finding anomalies.
entity_field
String
Name of the column to use for identifying an entity; usually a name, IP address, or ID.
group_field
String
Name of the column to use to group entities; for example, a role name or an Account ID.
agg_field
String
Name of the column in which to search for new values.
interval
String
The table returned after executing the macro will have the following columns:
<entity_field>
Any
Value of the defined entity_field
column.
<group_field>
Any
Value of the defined group_field
column.
<agg_field>
Any
Any new values discovered in the agg_field
column during the lookback period
Learn how to view the macros' source code .
You can invoke the Panther-managed behavioral analytics and anomaly detection macros in by following the instructions below. This process is similar to the instructions, but is specific to using the behavioral analytics and anomaly detection template macros.
Optionally, you can also add --pragma: show macro expanded
. This will expand the macro into its source code, and is useful for troubleshooting issues with your query. See for more information.
Define a subquery using a . The subquery must:
(If you're using statistical_anomaly_peer
or new_unique_values_peer
) A field.
Using a WHERE
clause, define the lookback window (i.e., the longer period of time that denotes the baseline against which the shorter window is compared). The lookback window should end at the current time; for this reason it is recommended to use p_occurs_since()
. .
See the full list of input arguments for each macro in , below.
While Panther's behavioral analytics and anomaly detection queries are useful for threat hunting, they're more powerful when used as a monitoring system. The queries can be , set to run on a schedule, and attached to Scheduled Rules. In this way, you can get alerted whenever anomalous activity is observed.
The query uses where
filter clauses to only fetch pull request creation events, as well as to exclude events generated during weekends. It also pulls username/development team associations from a .
The query uses a where
filter clause to only fetch pull request creation events. The repo
column is extracted in order to analyze for new values. The query also pulls username/development team associations from a .
Search for the Saved Search named anomalies
, and click its name.
You will be directed to , where you can view the source code:
In your local Panther code repository or in , view the queries/macros/anomalies.yml
file.
Size of the period in which to look for new values. Uses the same syntax as .
Use this macro to determine unusual numerical behavior from an entity compared to its . For example, checking an EC2 instance's traffic volume compared to others with the same tag, or access requests to an S3 object compared to other objects in the same bucket.
Use this macro to identify when an entity has done something that hasn't previously been observed by a member of its .
Size of the lookback period in which to look for new values. Uses the same syntax as .