Data Explorer
Use Panther's Data Explorer to view normalized data and perform SQL queries
Overview
The Data Explorer in your Panther Console is where you can view your normalized Panther data and perform SQL queries (with autocompletion).
In Data Explorer, you can:
Browse collected log data, rule matches, and search standard fields across all data
Save, and optionally schedule, your searches
Scheduled Searches can run through the rule engine
Share results with your team through a shareable link, or download results in a CSV
Select entire rows of JSON to use in the rule engine as unit tests
Preview table data, filter results, and summarize columns without SQL
Limit access to the Data Explorer through Role-Based Access Control (RBAC)
Data Explorer results that include bigint
data exceeding 32-bit precision will be shown rounded due to browser limitations rendering JSON. If you'd like these values to be represented without precision loss, cast them to strings in the SQL command. Actual data stored in the data lake is not affected.
Query syntax in Data Explorer
Queries executed in Data Explorer should use Snowflake SQL syntax described in Snowflake's SQL Command Reference documentation.
You can also learn about:
Best practices for searching in Data Explorer, in Searching effectively in Data Explorer
How to use Data Explorer macros
Referencing nested fields in Data Explorer
When traversing a JSON object, if a key name does not conform to Snowflake SQL identifier rules—for example if it contains periods or spaces—enclose the value in double quotes.
For example, if you want to run a query for accessing the field context.ip_address
from the IPInfo Privacy Enrichment Provider, you would write it as p_enrichment:ipinfo_privacy:"context.ip_address"
.
Learn more in Snowflake's Querying Semi-structured Data documentation.
Searching effectively in Data Explorer
To ensure your results return as quickly as possible, it's recommended to follow these best practices:
Use a
LIMIT
clauseUse the
LIMIT
clause to specify the number of records your query will return. Limiting queries can return results more quickly. Panther limits the size of results to 100MB by default.
Use a time range filter
Snowflake groups files in S3 in micropartitions. When you filter by a time range (such as
p_event_time
orp_occurs_since()
) in your query, Snowflake will only need to access specific partitions, which returns results more quickly.For more information on macros, see the section below: How to use Data Explorer macros.
Use p_any fields
During log ingestion, Panther extracts common security indicators into the
p_any
fields. Thep_any
fields are stored in optimized columns. These fields standardize names for attributes across all data sources, enabling fast data correlation.Learn more on Standard Fields.
Query specific fields
Using
SELECT * FROM ...
pulls all columns, which can slow down queries. When possible, query only the fields you need to investigate. For example,SELECT user_name, event_name FROM ...
.
Summarize
Summaries are faster to run than querying full records. This is especially helpful when investigating logs over a large period of time, or in a situation where you are unsure how much data exists over the time you are investigating.
Instead of querying the full data set, you can use
count(*)
andgroup by
a time range, which will run more quickly and help you determine a more narrow timeframe to subsequently query.For example, if you look back over a day and
GROUP BY hour
, you might determine which specific hour you need to investigate in your data. You can then run a query against that hour to narrow your results further.
If your query is still running slowly after following the best practices above, we recommend the following steps:
Count the rows to see how much data you are querying.
This will help you determine whether it's a large amount of data and expected that it's taking longer.
Reduce the time range you are querying.
Reach out to your Panther Support team for additional help.
How to use Data Explorer
Access Data Explorer
In the left-hand navigation bar of your Panther Console, click Investigate > Data Explorer.
Preview table data
You can preview example table data without writing SQL. To generate a sample SQL query for that log source, click the eye icon next to the table type:
Filter Data Explorer results
You can filter columns from a Result set in Data Explorer without writing SQL.
In the upper right corner of the Results table, click Filter Columns to select the columns you would like to display in the Results:
Note: The filters applied through this mechanism do not apply to the SQL select statement in your query.
Summarize column data
You can generate a summary (frequency count) of a column from a results set in Data Explorer without writing SQL.
On the column that your want to generate a summary for, click the down arrow and then Summarize to display summary results in a separate tab.
You can also generate a summary for the first time after a query is executed by switching to the Summarize tab and selecting a column from the dropdown.
The summary results for a selected column are displayed in the Summary tab, with the option to sort results by highest count or lowest count first (default is the highest count first).
In addition to the row_count
value, the summary also displays first_seen
and last_seen
values if the result data contains the p_event_time
standard field.
How to use Data Explorer macros
All the tables in our supported backend databases (Athena and Snowflake) are partitioned by event time to allow for better performance when querying using a time filter.
For efficiency and ease of use, Panther offers macros that will be expanded into full expressions when sent to the database:
These macros are different from template macros. Learn more about template macros on Templated Searches.
Macro formatting
Time duration format
Some macros take a time duration as a parameter. The format for this duration is a positive integer number followed by an optional suffix to indicate the unit of time. If no suffix is provided the number is interpreted as seconds.
Supported suffixes are list below:
s, sec, second, seconds
— macro adds specified secondsm, min, minute, minutes
— macro adds specified minutes to offseth, hr, hrs, hour, hours
— macro adds specified hours to offsetd, day, days
— macro adds specified days to offsetw, wk, week, weeks
— macro adds a specified number of weeks to offsetif no suffix is detected, the default is seconds
Examples:
'6 d'
- 6 days'2 weeks'
- 2 weeks900
- 900 seconds'96 hrs'
- 96 hours
Timestamp format
Ensure your time expressions can be parsed by the database backend your team is using. Some timestamps that work in Snowflake (i.e. 2021-01-21T11:15:54.346Z
) are not accepted as valid timestamps by Athena. The default safe time format should probably look similar to 2021-01-02 15:04:05.000
and is assumed to be in the UTC time zone.
Data Explorer macros
Current time: p_current_timestamp
p_current_timestamp
p_current_timestamp()
This macro expands to current_timestamp
in Data Explorer, but similar to p_occurs_since
, when run in a scheduled query it expands to the scheduled time of the query (regardless of when the query is actually executed).
Time range filter: p_occurs_between
p_occurs_between
p_occurs_between(startTime, endTime, [, tableAlias [, column]])
startTime
- a time in timestamp format, indicating start of search windowendTime
- a time in timestamp format, indicating the end of the search windowtableAlias
- an optional identifier that will be used as the table alias if providedcolumn
- an optional identifier that will be used as the column if providedIf not present, the default column is
p_event_time
.Indicating a different column (such as
p_parsed_time
) withcolumn
can lead to significantly longer query times, as without a restriction onp_event_time
, the entire table is searched.
Note: Please ensure that your time expression can be parsed by the database backend your instance is using. For more information see Timestamp format.
The macro p_occurs_between()
takes a start time and end time in timestamp format and filters the result set to those events in the time range, using the correct partition (minimizing I/O and speeding up the query).
To be used properly this macro should occur within a filter, such as a WHERE
clause.
The following Snowflake command contains a macro:
The macro that will be automatically expanded before the query is sent to the database. The form the expansion takes is database specific. In Snowflake, this expansion is pretty straightforward:
Keep in mind that different database back-ends allow different date formats and operations. Athena does not allow simple arithmetic operations on dates, therefore the care must be taken to use an Athena-friendly time format:
Because of the structure of allowed indexes on partitions in Athena, the expansion looks different:
The macro also takes an optional table alias. This can be helpful when referring to multiple tables, such as with a JOIN
:
Time offset from present: p_occurs_since
p_occurs_since
p_occurs_since(offsetFromPresent [, tableAlias[, column]])
offsetFromPresent
- an expression in time duration format, interpreted relative to the present, for example'1 hour'
tableAlias
- an optional identifier that will be used as the table alias if providedcolumn
- an optional identifier that will be used as the column if providedIf not present, the default column is
p_event_time
.Indicating a different column (such as
p_parsed_time
) withcolumn
can lead to significantly longer query times, as without a restriction onp_event_time
, the entire table is searched.
The macro p_occurs_since()
takes an offset parameter specified in time duration format and filters the result set down to those events between the current time and the specified offset, using the correct partition or cluster key (minimizing I/O and speeding up the query).
The macro also takes an optional table alias which can be helpful when referring to multiple tables, such as a JOIN
.
To be used properly this macro should occur within a filter, such as a WHERE
clause.
Examples:
If this is used in a Scheduled Search, then rather than using the current time as the reference, the scheduled run time will be used. For example, if a query is scheduled to run at the start of each hour, then the p_occurs_since('1 hour')
macro will expand using a time range of 1 hour starting at the start of each hour (regardless of when the query is actually executed).
In the following example of a macro with a table alias parameter, we look at CloudTrail logs to identify S3 buckets created and deleted within one hour of their creation, a potentially suspicious behavior. To get this information we do a self-join on the aws_cloudtrail
table in panther_logs
, and we use a macro expansion to limit this search to the past 24 hours on each of the two elements of the self-join (aliased ct1
and ct2
below):
There are two separate calls to p_occurs_since
each applied to a different table, as indicated by the table alias used as a second parameter. This is expanded into the following Snowflake query:
Filter around a certain time: p_occurs_around
p_occurs_around
p_occurs_around(timestamp, timeOffset [, tableAlias[, column]])
timestamp
- a time in timestamp format, indicating the time to search aroundtimeOffset
- an expression in time duration format, indicating the amount of time to search around thetimestamp
, for example'1 hour'
tableAlias
- an optional identifier that will be used as the column alias if providedcolumn
- an optional identifier that will be used as the column if providedIf not present, the default column is
p_event_time
.Indicating a different column (such as
p_parsed_time
) withcolumn
can lead to significantly longer query times, as without a restriction onp_event_time
, the entire table is searched.
Note: Please ensure that your time expression can be parsed by the database backend your instance is using. For more information see Timestamp format.
The p_occurs_around()
macro allows you to filter for events that occur around a given time. It takes a timestamp in timestamp format indicating the time to search around and an offset in time duration format specifying the interval to search. The search range is from timestamp - timeOffset
to timestamp + timeOffset
.
For example, the macro p_occurs_around('2022-01-01 10:00:00.000', '10 m')
filters for events that occurred between 09:50 am and 10:10 am UTC on January 1, 2022.
The macro also takes an optional table alias which can be helpful when referring to multiple tables, such as a JOIN
.
To be used properly this macro should occur within a filter, such as the WHERE
clause of a SQL statement.
Examples:
How to manage Saved Searches in Data Explorer
Saving your commonly run searches in Data Explorer means you won't need to rewrite them again and again.
Note that the instructions to delete a Saved Search are outlined on Saved and Scheduled Queries.
Save a search in Data Explorer
Below are instructions for how to save a search you've written in Data Explorer. You can also create a Saved Search from Search.
In the left-hand navigation bar of your Panther Console, click Investigate > Data Explorer in the left sidebar.
In the SQL editor, write a search using SQL.
You can create a Templated Search by including variables in your SQL expression. Learn more on Templated Searches.
Below the SQL editor, click Save As.
In the Save Search modal that pops up, fill in the form:
Search Name: Add a descriptive name.
Tags: Add tags to help you group similar searches together.
Description: Describe the purpose of the search.
Is this a Scheduled Search?: If you want this Saved Search to run on a schedule (making it a Scheduled Search), switch the toggle to ON.
When you switch this toggle to ON, the options described below will appear.
Is it active?: If you want this Scheduled Search to start running on your selected schedule, switch the toggle to ON.
If you've toggled Is this a scheduled query? to ON, configure one of the following interval options: Period or Cron Expression.
Period (select if your query should run on fixed time intervals):
Period(days) and Period(min): Enter the number of days and/or minutes after which the SQL query should run again. For example: setting a period of 0 days and 30 minutes will mean that the query will run every day, every 30 minutes.
Timeout(min): Enter the timeout period in minutes, with a maximum allowed value of 10 minutes. If your query does not complete inside the allowed time window, Panther will retry 3 times before automatically canceling it.
Cron Expression (select if your query should run repeatedly at specific dates, and learn more about how to create a cron expression in How to use the Scheduled Search crontab):
Minutes and Hours: Enter the time of day for the query to run.
Day and Month (day of month): If you wish to have this query run on a specific day and month, enter the day and month.
Day (day of week): If you wish to have this query run on a specific day of the week, enter the day.
Click Save Search.
If you've created a Scheduled Search (by toggling Is this a Scheduled Search? to ON), you can now follow the instructions to create a Scheduled Rule if you'd like the data returned by your search to be passed through a detection, alerting on matches.
Update a Saved Search in Data Explorer
In your Panther Console, navigate to Investigate > Data Explorer in the left sidebar.
In the upper right corner, click Open Saved Search.
An Open a Search modal will pop up.
In the modal, select the Saved Search you'd like to update, and click Open Search.
The Saved Search will populate in the Data Explorer SQL editor.
Make desired changes to the SQL command.
An Update Search modal will pop up.
Make desired changes to the Saved Search's metadata, including the Search Name, Tags, Description, Default Database, and Is this a Scheduled Search? (and related fields).
Click Update Search to save your changes.
Last updated