Links

Data Explorer

Use Panther's Data Explorer to view normalized data and perform SQL queries

Overview

The Data Explorer in your Panther Console is where you can view your normalized Panther data and perform SQL queries (with autocompletion).
In Data Explorer, you can:
  • Browse collected log data, rule matches, and search standard fields across all data
  • Save, tag, and load your queries
  • Create scheduled queries to run through the rule engine
  • Share results with your team through a shareable link
  • Select entire rows of JSON to use in the rule engine as unit tests
  • Download results in a CSV
  • Preview table data, filter results, and summarize columns without SQL

How to use Data Explorer

To access the Data Explorer:

  1. 1.
    Log in to your Panther Console.
  2. 2.
    In the left sidebar, click Investigate > Data Explorer.
You can limit access to the Data Explorer through Role-Based Access Control (RBAC).

Searching effectively in Data Explorer

We recommend the following best practices to ensure that your queries are returning results quickly:
  • Always use a LIMIT clause
    • The LIMIT clause is used to specify the number of records your query will return. By limiting your query, you can make it return results faster. Panther limits the size of results to 100MB by default.
  • Use a time range filter on your queries
    • Snowflake groups files in S3 in micropartitions. When you filter by a time range (such as p_event_time or p_occurs_since()) in your query, Snowflake will only need to access specific partitions, which returns results more quickly.
    • For more information on macros, see the section below: How to use Data Explorer macros.
  • Use p_any fields
    • During log ingestion, Panther extracts common security indicators into the p_any fields. The p_any fields are stored in optimized columns. These fields standardize names for attributes across all data sources, enabling fast data correlation.
  • Query specific fields
    • Using SELECT * FROM ... pulls all of the columns, which can make queries run more slowly. When it makes sense, query only the fields you need to investigate. For example, you may want to query SELECT user_name, event_name FROM ...
  • Summarize
    • Summaries are faster to run compared to querying full records. This is especially helpful when investigating logs over a large period of time, or in a situation where you are unsure how much data exists over the time you are investigating.
    • Instead of querying the full data set, you can use count(*) and group by a time range, which will run more quickly and help you determine a more narrow timeframe to subsequently query.
    • For example, if you look back over a day and GROUP BY hour, you might determine which specific hour you need to investigate in your data. You can then run a query against that hour to narrow your results further.
If the query is still running slowly after following the best practices above, we recommend the following steps:
  • Count the rows to see how much data you are querying.
    • This will help you determine whether it's a large amount of data and expected that it's taking longer.
  • Reduce the time range you are querying.
  • Reach out to your Panther CS representative or Panther Support for additional help.

Preview Table Data

You can preview example table data without writing SQL. To generate a sample SQL query for that log source, click the eye icon next to the table type:

Filter Data Explorer results

You can filter columns from a Result set in Data Explorer without writing SQL.
In the upper right corner of the Results table, click Filter Columns to select the columns you would like to display in the Results:
Note: The filters applied through this mechanism do not apply to the SQL select statement in your query.

Summarize column data

You can generate a summary (frequency count) of a column from a Result set in Data Explorer without writing SQL.
On the column that your want to generate a summary for, click the down arrow and then Summarize to display summary results in a separate tab.
You can also generate a summary for the first time after a query is executed by switching to the Summarize tab and selecting a column from the dropdown.
The summary results for a selected column are displayed in the Summary tab, with the option to sort results by highest count or lowest count first (default is the highest count first).
In addition to the row_count value, the summary also displays first_seen and last_seen values if the result data contains the p_event_time standard field.

How to use Data Explorer macros

All the tables in our supported backend databases (Athena and Snowflake) are partitioned by event time to allow for better performance when querying using a time filter.
For efficiency and ease of use, Panther offers macros that will be expanded into full expressions when sent to the database:
  • Time range filter: p_occurs_between
  • Time offset from present: p_occurs_since
  • Filter around a certain time: p_occurs_around

Macro formatting

Time duration format

Some macros take a time duration as a parameter. The format for this duration is a positive integer number followed by an optional suffix to indicate the unit of time. If no suffix is provided the number is interpreted as seconds.
Supported suffixes are list below:
  • s, sec, second, seconds — macro adds specified seconds
  • m, min, minute, minutes — macro adds specified minutes to offset
  • h, hr, hrs, hour, hours — macro adds specified hours to offset
  • d, day, days — macro adds specified days to offset
  • w, wk, week, weeks — macro adds a specified number of weeks to offset
  • if no suffix is detected, the default is seconds
Examples:
  • '6 d' - 6 days
  • '2 weeks' - 2 weeks
  • 900 - 900 seconds
  • '96 hrs' - 96 hours

Timestamp format

Please ensure that your time expression can be parsed by the database backend your team is using. Some timestamps that work in Snowflake (i.e. 2021-01-21T11:15:54.346Z) will not be accepted as valid timestamps by Athena. The default safe time format should probably look similar to this 2021-01-02 15:04:05.000 and is assumed to be in the UTC time zone.

Data Explorer macros

Time range filter: p_occurs_between

p_occurs_between(startTime, endTime, [, tableAlias])
  • startTime - a time in timestamp format, indicating start of search window
  • endTime - a time in timestamp format, indicating the end of the search window
  • tableAlias - an optional identifier that will be used as the table alias if provided
Note: Please ensure that your time expression can be parsed by the database backend your instance is using. For more information see Timestamp format.
The macro p_occurs_between() takes a start time and end time in timestamp format and filters the result set to those events in the time range, using the correct partition (minimizing I/O and speeding up the query).
The macro also takes an optional table alias which can be helpful when referring to multiple tables, such as a JOIN.
To be used properly this macro should occur within a filter, such as a WHERE clause.
The following Snowflake command contains a macro:
select p_db_name, count(*) as freq from panther_views.public.all_databases
where p_occurs_between(current_date - 1, current_timestamp)
group by p_db_name
limit 1000
The macro that will be automatically expanded before the query is sent to the database. The form the expansion takes is database specific. In Snowflake, this expansion is pretty straightforward:
select p_db_name, count(*) as freq from panther_views.public.all_databases
where p_event_time between convert_timezone('UTC',current_date - 1)::timestamp_ntz
and convert_timezone('UTC',current_timestamp)::timestamp_ntz
group by p_db_name
limit 1000
Keep in mind that different database back-ends allow different date formats and operations. Athena does not allow simple arithmetic operations on dates, therefore the care must be taken to use an Athena-friendly time format:
select p_db_name, count(*) as freq from panther_views.public.all_databases
where p_occurs_between(current_date - interval '1' day, current_timestamp)
group by p_db_name
limit 1000
Because of the structure of allowed indexes on partitions in Athena, the expansion looks different:
select p_db_name, count(*) as freq from panther_views.all_databases
where p_event_time between cast (current_date - interval '1' day as timestamp) and cast (current_timestamp as timestamp)
and partition_time between to_unixtime(date_trunc('HOUR', (cast (current_date - interval '1' day as timestamp))))
and to_unixtime(cast (current_timestamp as timestamp))
group by p_db_name
limit 1000

Time offset from present: p_occurs_since

p_occurs_since(offsetFromPresent [, tableAlias])
  • offsetFromPresent - an expression in time duration format, interpreted relative to the present, for example '1 hour'
  • tableAlias - an optional identifier that will be used as the table alias if provided
The macro p_occurs_since() takes an offset parameter specified in time duration format and filters the result set down to those events between the current time and the specified offset, using the correct partition or cluster key (minimizing I/O and speeding up the query).
The macro also takes an optional table alias which can be helpful when referring to multiple tables, such as a JOIN.
To be used properly this macro should occur within a filter, such as a WHERE clause.

Examples:

p_occurs_since('6 d')
p_occurs_since('2 weeks')
p_occurs_since(900) // assumes seconds
p_occurs_since('96 hrs')
Note: If this is used in a scheduled query, then rather than using the current time as the reference, the scheduled run time will be used. For example, if a query is scheduled to run at the start of each hour, then the p_occurs_since('1 hour') macro will expand using a time range of 1 hour starting at the start of each hour (regardless of when the query is actually executed).
In the following example of a macro with a table alias parameter, we look at CloudTrail logs to identify S3 buckets created and deleted within one hour of their creation, a potentially suspicious behavior. To get this information we do a self-join on the aws_cloudtrail table in panther_logs, and we use a macro expansion to limit this search to the past 24 hours on each of the two elements of the self-join (aliased ct1 and ct2 below):
select
ct1.p_event_time createTime, ct2.p_event_time deleteTime,
timediff('s',createTime, deleteTime) timeExtant,
ct1.requestparameters:"bucketName"::varchar createdBucket,
ct1.useridentity:"arn"::varchar createArn, deleteArn,
ct1.useragent createUserAgent, deleteUserAgent
from panther_logs.public.aws_cloudtrail ct1
join (
select p_event_time, requestparameters:"bucketName"::varchar deletedBucket, errorcode,
eventname deleteEvent, useridentity:"arn"::varchar deleteArn, useragent deleteUserAgent from panther_logs.public.aws_cloudtrail) ct2
on (ct1.requestparameters:"bucketName"::varchar = ct2.DeletedBucket
and ct2.p_event_time > ct1.p_event_Time
and timediff('s',ct1.p_event_time, ct2.p_event_time) < 3600)
where ct2.deleteEvent = 'DeleteBucket'
and ct1.eventName = 'CreateBucket'
and ct1.errorCode is null and ct2.errorcode is null
and p_occurs_since('1 day',ct2) -- apply to ct2
and p_occurs_since('24 hours',ct1) -- apply to ct1
order by createdBucket, createTime;
There are two separate calls to p_occurs_since each applied to a different table, as indicated by the table alias used as a second parameter. This is expanded into the following Snowflake query:
select
ct1.p_event_time createTime, ct2.p_event_time deleteTime,
timediff('s',createTime, deleteTime) timeExtant,
ct1.requestparameters:"bucketName"::varchar createdBucket,
ct1.useridentity:"arn"::varchar createArn, deleteArn,
ct1.useragent createUserAgent, deleteUserAgent
from panther_logs.public.aws_cloudtrail ct1
join (
select p_event_time, requestparameters:"bucketName"::varchar deletedBucket, errorcode,
eventname deleteEvent, useridentity:"arn"::varchar deleteArn, useragent deleteUserAgent from panther_logs.public.aws_cloudtrail) ct2
on (ct1.requestparameters:"bucketName"::varchar = ct2.deletedBucket
and ct2.p_event_time > ct1.p_event_Time
and timediff('s',ct1.p_event_time, ct2.p_event_time) < 3600)
where ct2.deleteEvent = 'DeleteBucket'
and ct1.eventName = 'CreateBucket'
and ct1.errorCode is null and ct2.errorcode is null
and ct2.p_event_time >= current_timestamp - interval '86400 second'
and ct1.p_event_time >= current_timestamp - interval '86400 second'
order by createdBucket, createTime;

Filter around a certain time: p_occurs_around

p_occurs_around(timestamp, timeOffset [, tableAlias])
  • timestamp - a time in timestamp format, indicating the time to search around
  • timeOffset - an expression in time duration format, indicating the amount of time to search around the timestamp, for example '1 hour'
  • tableAlias - an optional identifier that will be used as the column alias if provided
Note: Please ensure that your time expression can be parsed by the database backend your instance is using. For more information see Timestamp format.
The p_occurs_around() macro allows you to filter for events that occur around a given time. It takes a timestamp in timestamp format indicating the time to search around and an offset in time duration format specifying the interval to search. The search range is from timestamp - timeOffset to timestamp + timeOffset.
For example, the macro p_occurs_around('2022-01-01 10:00:00.000', '10 m') filters for events that occurred between 09:50 am and 10:10 am UTC on January 1, 2022.
The macro also takes an optional table alias which can be helpful when referring to multiple tables, such as a JOIN.
To be used properly this macro should occur within a filter, such as the WHERE clause of a SQL statement.

Examples:

p_occurs_around('2022-01-01 10:00:00.000', '6 d')
p_occurs_around('2022-01-01 10:00:00.000', '2 weeks')
p_occurs_around('2022-01-01 10:00:00.000', 900) // assumes seconds
p_occurs_around('2022-01-01 10:00:00.000', '96 hrs')