Links

Data Explorer

Use Panther's Data Explorer to view normalized data and perform SQL queries

Overview

The Data Explorer in your Panther Console is where you can view your normalized Panther data and perform SQL queries (with autocompletion).
In Data Explorer, you can:
  • Browse collected log data, rule matches, and search standard fields across all data
  • Share results with your team through a shareable link
  • Select entire rows of JSON to use in the rule engine as unit tests
  • Download results in a CSV
  • Preview table data, filter results, and summarize columns without SQL
Data Explorer results that include bigint data exceeding 32-bit precision will be shown rounded due to browser limitations rendering JSON. If you'd like these values to be represented without precision loss, cast them to strings in the SQL command. Actual data stored in the data lake is not affected.

How to use Data Explorer

To access the Data Explorer:

  1. 1.
    Log in to your Panther Console.
  2. 2.
    In the left sidebar, click Investigate > Data Explorer.
The Data Explorer page has a column on the left labeled Tables. In the middle of the page, there is a text editor labeled New Query. The bottom of the page has tabs labeled Results and Summarize.
You can limit access to the Data Explorer through Role-Based Access Control (RBAC).

Searching effectively in Data Explorer

We recommend the following best practices to ensure that your queries are returning results quickly:
  • Always use a LIMIT clause
    • The LIMIT clause is used to specify the number of records your query will return. By limiting your query, you can make it return results faster. Panther limits the size of results to 100MB by default.
  • Use a time range filter on your queries
    • Snowflake groups files in S3 in micropartitions. When you filter by a time range (such as p_event_time or p_occurs_since()) in your query, Snowflake will only need to access specific partitions, which returns results more quickly.
    • For more information on macros, see the section below: How to use Data Explorer macros.
  • Use p_any fields
    • During log ingestion, Panther extracts common security indicators into the p_any fields. The p_any fields are stored in optimized columns. These fields standardize names for attributes across all data sources, enabling fast data correlation.
  • Query specific fields
    • Using SELECT * FROM ... pulls all of the columns, which can make queries run more slowly. When it makes sense, query only the fields you need to investigate. For example, you may want to query SELECT user_name, event_name FROM ...
  • Summarize
    • Summaries are faster to run compared to querying full records. This is especially helpful when investigating logs over a large period of time, or in a situation where you are unsure how much data exists over the time you are investigating.
    • Instead of querying the full data set, you can use count(*) and group by a time range, which will run more quickly and help you determine a more narrow timeframe to subsequently query.
    • For example, if you look back over a day and GROUP BY hour, you might determine which specific hour you need to investigate in your data. You can then run a query against that hour to narrow your results further.
If the query is still running slowly after following the best practices above, we recommend the following steps:
  • Count the rows to see how much data you are querying.
    • This will help you determine whether it's a large amount of data and expected that it's taking longer.
  • Reduce the time range you are querying.
  • Reach out to your Panther CS representative or Panther Support for additional help.

Preview Table Data

You can preview example table data without writing SQL. To generate a sample SQL query for that log source, click the eye icon next to the table type:
In Data Explorer, under the Tables column, each log source has an eye icon next to it. The image shows a red circle around the eye icons.

Filter Data Explorer results

You can filter columns from a Result set in Data Explorer without writing SQL.
In the upper right corner of the Results table, click Filter Columns to select the columns you would like to display in the Results:
The Results tab of the Data Explorer shows there are 3202 Results. The Query Time was 528 ms, and the Data Scanned was 1.89MB. There is a select dropdown that has "Filter Columns (3)" selected, and a "Download CSV" button.
Note: The filters applied through this mechanism do not apply to the SQL select statement in your query.

Summarize column data

You can generate a summary (frequency count) of a column from a Result set in Data Explorer without writing SQL.
On the column that your want to generate a summary for, click the down arrow and then Summarize to display summary results in a separate tab.
The Results tab of the Data Explorer shows there are 3202 Results. The Query Time was 528 ms, and the Data Scanned was 1.89MB. There is a select dropdown that has "Filter Columns (3)" selected, and a "Download CSV" button. The first result of the query is shown in table format. The following fields are visible: p_timeline, type, timestamp, elb, clientIp, clientPort, targetIp, targetPort, requestProcessingTime, and targetProcessingTime.
You can also generate a summary for the first time after a query is executed by switching to the Summarize tab and selecting a column from the dropdown.
In the Summarize tab, the type-ahead dropdown allows selecting a column to summarize
The summary results for a selected column are displayed in the Summary tab, with the option to sort results by highest count or lowest count first (default is the highest count first).
row_count represents the frequency of each unique clientIp in the result set
In addition to the row_count value, the summary also displays first_seen and last_seen values if the result data contains the p_event_time standard field.

How to use Data Explorer macros

All the tables in our supported backend databases (Athena and Snowflake) are partitioned by event time to allow for better performance when querying using a time filter.
For efficiency and ease of use, Panther offers macros that will be expanded into full expressions when sent to the database:
  • Time range filter: p_occurs_between
  • Time offset from present: p_occurs_since
  • Filter around a certain time: p_occurs_around

Macro formatting

Time duration format

Some macros take a time duration as a parameter. The format for this duration is a positive integer number followed by an optional suffix to indicate the unit of time. If no suffix is provided the number is interpreted as seconds.
Supported suffixes are list below:
  • s, sec, second, seconds — macro adds specified seconds
  • m, min, minute, minutes — macro adds specified minutes to offset
  • h, hr, hrs, hour, hours — macro adds specified hours to offset
  • d, day, days — macro adds specified days to offset
  • w, wk, week, weeks — macro adds a specified number of weeks to offset
  • if no suffix is detected, the default is seconds
Examples:
  • '6 d' - 6 days
  • '2 weeks' - 2 weeks
  • 900 - 900 seconds
  • '96 hrs' - 96 hours

Timestamp format

Please ensure that your time expression can be parsed by the database backend your team is using. Some timestamps that work in Snowflake (i.e. 2021-01-21T11:15:54.346Z) will not be accepted as valid timestamps by Athena. The default safe time format should probably look similar to this 2021-01-02 15:04:05.000 and is assumed to be in the UTC time zone.

Data Explorer macros

Time range filter: p_occurs_between

p_occurs_between(startTime, endTime, [, tableAlias])
  • startTime - a time in timestamp format, indicating start of search window
  • endTime - a time in timestamp format, indicating the end of the search window
  • tableAlias - an optional identifier that will be used as the table alias if provided
Note: Please ensure that your time expression can be parsed by the database backend your instance is using. For more information see Timestamp format.
The macro p_occurs_between() takes a start time and end time in timestamp format and filters the result set to those events in the time range, using the correct partition (minimizing I/O and speeding up the query).
The macro also takes an optional table alias which can be helpful when referring to multiple tables, such as a JOIN.
To be used properly this macro should occur within a filter, such as a WHERE clause.
The following Snowflake command contains a macro:
select p_db_name, count(*) as freq from panther_views.public.all_databases
where p_occurs_between(current_date - 1, current_timestamp)
group by p_db_name
limit 1000
The macro that will be automatically expanded before the query is sent to the database. The form the expansion takes is database specific. In Snowflake, this expansion is pretty straightforward:
select p_db_name, count(*) as freq from panther_views.public.all_databases
where p_event_time between convert_timezone('UTC',current_date - 1)::timestamp_ntz
and convert_timezone('UTC',current_timestamp)::timestamp_ntz
group by p_db_name
limit 1000
Keep in mind that different database back-ends allow different date formats and operations. Athena does not allow simple arithmetic operations on dates, therefore the care must be taken to use an Athena-friendly time format:
select p_db_name, count(*) as freq from panther_views.public.all_databases
where p_occurs_between(current_date - interval '1' day, current_timestamp)
group by p_db_name
limit 1000
Because of the structure of allowed indexes on partitions in Athena, the expansion looks different:
select p_db_name, count(*) as freq from panther_views.all_databases
where p_event_time between cast (current_date - interval '1' day as timestamp) and cast (current_timestamp as timestamp)
and partition_time between to_unixtime(date_trunc('HOUR', (cast (current_date - interval '1' day as timestamp))))
and to_unixtime(cast (current_timestamp as timestamp))
group by p_db_name
limit 1000

Time offset from present: p_occurs_since

p_occurs_since(offsetFromPresent [, tableAlias])
  • offsetFromPresent - an expression in time duration format, interpreted relative to the present, for example '1 hour'
  • tableAlias - an optional identifier that will be used as the table alias if provided
The macro p_occurs_since() takes an offset parameter specified in time duration format and filters the result set down to those events between the current time and the specified offset, using the correct partition or cluster key (minimizing I/O and speeding up the query).
The macro also takes an optional table alias which can be helpful when referring to multiple tables, such as a JOIN.
To be used properly this macro should occur within a filter, such as a WHERE clause.

Examples:

p_occurs_since('6 d')
p_occurs_since('2 weeks')
p_occurs_since(900) // assumes seconds
p_occurs_since('96 hrs')
Note: If this is used in a scheduled query, then rather than using the current time as the reference, the scheduled run time will be used. For example, if a query is scheduled to run at the start of each hour, then the p_occurs_since('1 hour') macro will expand using a time range of 1 hour starting at the start of each hour (regardless of when the query is actually executed).
In the following example of a macro with a table alias parameter, we look at CloudTrail logs to identify S3 buckets created and deleted within one hour of their creation, a potentially suspicious behavior. To get this information we do a self-join on the aws_cloudtrail table in panther_logs, and we use a macro expansion to limit this search to the past 24 hours on each of the two elements of the self-join (aliased ct1 and ct2 below):
select
ct1.p_event_time createTime, ct2.p_event_time deleteTime,
timediff('s',createTime, deleteTime) timeExtant,
ct1.requestparameters:"bucketName"::varchar createdBucket,
ct1.useridentity:"arn"::varchar createArn, deleteArn,
ct1.useragent createUserAgent, deleteUserAgent
from panther_logs.public.aws_cloudtrail ct1
join (
select p_event_time, requestparameters:"bucketName"::varchar deletedBucket, errorcode,
eventname deleteEvent, useridentity:"arn"::varchar deleteArn, useragent deleteUserAgent from panther_logs.public.aws_cloudtrail) ct2
on (ct1.requestparameters:"bucketName"::varchar = ct2.DeletedBucket
and ct2.p_event_time > ct1.p_event_Time
and timediff('s',ct1.p_event_time, ct2.p_event_time) < 3600)
where ct2.deleteEvent = 'DeleteBucket'
and ct1.eventName = 'CreateBucket'
and ct1.errorCode is null and ct2.errorcode is null
and p_occurs_since('1 day',ct2) -- apply to ct2
and p_occurs_since('24 hours',ct1) -- apply to ct1
order by createdBucket, createTime;
There are two separate calls to p_occurs_since each applied to a different table, as indicated by the table alias used as a second parameter. This is expanded into the following Snowflake query:
select
ct1.p_event_time createTime, ct2.p_event_time deleteTime,
timediff('s',createTime, deleteTime) timeExtant,
ct1.requestparameters:"bucketName"::varchar createdBucket,
ct1.useridentity:"arn"::varchar createArn, deleteArn,
ct1.useragent createUserAgent, deleteUserAgent
from panther_logs.public.aws_cloudtrail ct1
join (
select p_event_time, requestparameters:"bucketName"::varchar deletedBucket, errorcode,
eventname deleteEvent, useridentity:"arn"::varchar deleteArn, useragent deleteUserAgent from panther_logs.public.aws_cloudtrail) ct2
on (ct1.requestparameters:"bucketName"::varchar = ct2.deletedBucket
and ct2.p_event_time > ct1.p_event_Time
and timediff('s',ct1.p_event_time, ct2.p_event_time) < 3600)
where ct2.deleteEvent = 'DeleteBucket'
and ct1.eventName = 'CreateBucket'
and ct1.errorCode is null and ct2.errorcode is null
and ct2.p_event_time >= current_timestamp - interval '86400 second'
and ct1.p_event_time >= current_timestamp - interval '86400 second'
order by createdBucket, createTime;

Filter around a certain time: p_occurs_around

p_occurs_around(timestamp, timeOffset [, tableAlias])
  • timestamp - a time in timestamp format, indicating the time to search around
  • timeOffset - an expression in time duration format, indicating the amount of time to search around the timestamp, for example '1 hour'
  • tableAlias - an optional identifier that will be used as the column alias if provided
Note: Please ensure that your time expression can be parsed by the database backend your instance is using. For more information see Timestamp format.
The p_occurs_around() macro allows you to filter for events that occur around a given time. It takes a timestamp in timestamp format indicating the time to search around and an offset in time duration format specifying the interval to search. The search range is from timestamp - timeOffset to timestamp + timeOffset.
For example, the macro p_occurs_around('2022-01-01 10:00:00.000', '10 m') filters for events that occurred between 09:50 am and 10:10 am UTC on January 1, 2022.
The macro also takes an optional table alias which can be helpful when referring to multiple tables, such as a JOIN.
To be used properly this macro should occur within a filter, such as the WHERE clause of a SQL statement.

Examples:

p_occurs_around('2022-01-01 10:00:00.000', '6 d')
p_occurs_around('2022-01-01 10:00:00.000', '2 weeks')
p_occurs_around('2022-01-01 10:00:00.000', 900) // assumes seconds
p_occurs_around('2022-01-01 10:00:00.000', '96 hrs')

How to manage Saved Queries in Data Explorer

Saving your commonly run queries in Data Explorer means you won't need to rewrite them again and again.
Note that the instructions to delete a Saved Query are outlined on Saved and Scheduled Queries.

Save a query in Data Explorer

Below are instructions for how to save a query you've written in Data Explorer. You can also create a Saved Query from Query Builder.
  1. 1.
    Log in to the Panther Console and click Investigate > Data Explorer in the left sidebar.
  2. 2.
    In the SQL editor, write a query.
    • If you have enabled the system-wide setting to require a LIMIT clause, make sure your query includes a LIMIT.
      The image shows an example query written in the Scheduled Query text editor in the Panther Console.
  3. 3.
    Below the SQL editor, click Save As.
  4. 4.
    In the Save Query modal that pops up, fill in the form:
    • Query Name: Add a descriptive name.
    • Tags: Add tags to help you group similar queries together.
    • Description: Describe the purpose of the query.
    • Is it active?: If you want this Scheduled Query to start running on your selected schedule, switch the toggle to ON.
    • Is this a Scheduled Query?: If you want this Saved Query to run on a schedule (making it a Scheduled Query), switch the toggle to ON.
      • When you switch this toggle to ON, the options described below will appear.
    If you've toggled Is this a scheduled query? to ON, configure one of the following interval options: Period or Cron Expression.
    • Period (select if your query should run on fixed time intervals):
      • Period(days) and Period(min): Enter the number of days and/or minutes after which the SQL query should run again. For example: setting a period of 0 days and 30 minutes will mean that the query will run every day, every 30 minutes.
      • Timeout(min): Enter the timeout period in minutes, with a maximum allowed value of 10 minutes. If your query does not complete inside the allowed time window, Panther will retry 3 times before automatically canceling it.
    • Cron Expression (select if your query should run repeatedly at specific dates, and learn more about how to create a cron expression in How to use the Scheduled Query crontab):
      • Minutes and Hours: Enter the time of day for the query to run.
      • Day and Month (day of month): If you wish to have this query run on a specific day and month, enter the day and month.
      • Day (day of week): If you wish to have this query run on a specific day of the week, enter the day.
      • Timeout: Enter the timeout period in minutes, with a maximum allowed value of 10 minutes. If your query does not complete inside the allowed time window, Panther will retry 3 times before automatically canceling it.
        The image shows the query creation screen. There are fields for Query Name, Tags, Description, and Default Database. Next to "Is this a Scheduled Query?" the toggle is set to "on." Next to "Is it active?" the toggle is set to "On." The interval option "Period" is selected.
  5. 5.
    Click Save Query.
If you've created a Scheduled Query (by toggling Is this a scheduled query? to ON), you can now follow the instructions to create a Scheduled Rule if you'd like the data returned by your query to be passed through a detection, alerting on matches.

Update a Saved Query in Data Explorer

  1. 1.
    In your Panther Console, navigate to Investigate > Data Explorer in the left sidebar.
  2. 2.
    In the upper right corner, click Open.
    • An Open a Query modal will pop up.
  3. 3.
    In the modal, select the Saved Query you'd like to update, and click Open Query.
    • The Saved Query will populate in the Data Explorer SQL editor.
  4. 4.
    Make desired changes to the SQL command.
  5. 5.
    Below the editor, click Update.
    The bottom of the SQL editor in Data Explorer is shown. Below the editor are two buttons: Run Query and Update. There is an arrow pointing to Update.
    • An Update Query modal will pop up.
  6. 6.
    Make desired changes to the Saved Query's metadata, including the Query Name, Tags, Description, Default Database, and Is this a Scheduled Query? (and related fields).
  7. 7.
    Click Update Query to save your changes.