Data Explorer

Use Panther's Data Explorer to view normalized data and perform SQL queries

Overview

The Data Explorer in your Panther Console is where you can view your normalized Panther data and perform SQL queries (with autocompletion).

In Data Explorer, you can:

Data Explorer results that include bigint data exceeding 32-bit precision will be shown rounded due to browser limitations rendering JSON. If you'd like these values to be represented without precision loss, cast them to strings in the SQL command. Actual data stored in the data lake is not affected.

How to use Data Explorer

To access the Data Explorer:

  1. Log in to your Panther Console.

  2. In the left sidebar, click Investigate > Data Explorer.

You can limit access to the Data Explorer through Role-Based Access Control (RBAC).

Searching effectively in Data Explorer

We recommend the following best practices to ensure that your queries are returning results quickly:

  • Always use a LIMIT clause

    • The LIMIT clause is used to specify the number of records your query will return. By limiting your query, you can make it return results faster. Panther limits the size of results to 100MB by default.

  • Use a time range filter on your queries

    • Snowflake groups files in S3 in micropartitions. When you filter by a time range (such as p_event_time or p_occurs_since()) in your query, Snowflake will only need to access specific partitions, which returns results more quickly.

    • For more information on macros, see the section below: How to use Data Explorer macros.

  • Use p_any fields

    • During log ingestion, Panther extracts common security indicators into the p_any fields. The p_any fields are stored in optimized columns. These fields standardize names for attributes across all data sources, enabling fast data correlation.

  • Query specific fields

    • Using SELECT * FROM ... pulls all of the columns, which can make queries run more slowly. When it makes sense, query only the fields you need to investigate. For example, you may want to query SELECT user_name, event_name FROM ...

  • Summarize

    • Summaries are faster to run compared to querying full records. This is especially helpful when investigating logs over a large period of time, or in a situation where you are unsure how much data exists over the time you are investigating.

    • Instead of querying the full data set, you can use count(*) and group by a time range, which will run more quickly and help you determine a more narrow timeframe to subsequently query.

    • For example, if you look back over a day and GROUP BY hour, you might determine which specific hour you need to investigate in your data. You can then run a query against that hour to narrow your results further.

If the query is still running slowly after following the best practices above, we recommend the following steps:

  • Count the rows to see how much data you are querying.

    • This will help you determine whether it's a large amount of data and expected that it's taking longer.

  • Reduce the time range you are querying.

  • Reach out to your Panther CS representative or Panther Support for additional help.

Preview table data

You can preview example table data without writing SQL. To generate a sample SQL query for that log source, click the eye icon next to the table type:

Filter Data Explorer results

You can filter columns from a Result set in Data Explorer without writing SQL.

In the upper right corner of the Results table, click Filter Columns to select the columns you would like to display in the Results:

Note: The filters applied through this mechanism do not apply to the SQL select statement in your query.

Summarize column data

You can generate a summary (frequency count) of a column from a results set in Data Explorer without writing SQL.

On the column that your want to generate a summary for, click the down arrow and then Summarize to display summary results in a separate tab.

You can also generate a summary for the first time after a query is executed by switching to the Summarize tab and selecting a column from the dropdown.

The summary results for a selected column are displayed in the Summary tab, with the option to sort results by highest count or lowest count first (default is the highest count first).

In addition to the row_count value, the summary also displays first_seen and last_seen values if the result data contains the p_event_time standard field.

How to use Data Explorer macros

All the tables in our supported backend databases (Athena and Snowflake) are partitioned by event time to allow for better performance when querying using a time filter.

For efficiency and ease of use, Panther offers macros that will be expanded into full expressions when sent to the database:

These macros are different from template macros. Learn more about template macros on Templated Searches.

Macro formatting

Time duration format

Some macros take a time duration as a parameter. The format for this duration is a positive integer number followed by an optional suffix to indicate the unit of time. If no suffix is provided the number is interpreted as seconds.

Supported suffixes are list below:

  • s, sec, second, seconds — macro adds specified seconds

  • m, min, minute, minutes — macro adds specified minutes to offset

  • h, hr, hrs, hour, hours — macro adds specified hours to offset

  • d, day, days — macro adds specified days to offset

  • w, wk, week, weeks — macro adds a specified number of weeks to offset

  • if no suffix is detected, the default is seconds

Examples:

  • '6 d' - 6 days

  • '2 weeks' - 2 weeks

  • 900 - 900 seconds

  • '96 hrs' - 96 hours

Timestamp format

Ensure your time expressions can be parsed by the database backend your team is using. Some timestamps that work in Snowflake (i.e. 2021-01-21T11:15:54.346Z) are not accepted as valid timestamps by Athena. The default safe time format should probably look similar to 2021-01-02 15:04:05.000 and is assumed to be in the UTC time zone.

Data Explorer macros

Time range filter: p_occurs_between

p_occurs_between(startTime, endTime, [, tableAlias [, column]])

  • startTime - a time in timestamp format, indicating start of search window

  • endTime - a time in timestamp format, indicating the end of the search window

  • tableAlias - an optional identifier that will be used as the table alias if provided

  • column - an optional identifier that will be used as the column if provided

    • If not present, the default column is p_event_time.

    • Indicating a different column (such as p_parsed_time) with column can lead to significantly longer query times, as without a restriction on p_event_time, the entire table is searched.

Note: Please ensure that your time expression can be parsed by the database backend your instance is using. For more information see Timestamp format.

The macro p_occurs_between() takes a start time and end time in timestamp format and filters the result set to those events in the time range, using the correct partition (minimizing I/O and speeding up the query).

To be used properly this macro should occur within a filter, such as a WHERE clause.

The following Snowflake command contains a macro:

select p_db_name, count(*) as freq from panther_views.public.all_databases
where p_occurs_between(current_date - 1, current_timestamp)
group by p_db_name
limit 1000

The macro that will be automatically expanded before the query is sent to the database. The form the expansion takes is database specific. In Snowflake, this expansion is pretty straightforward:

select p_db_name, count(*) as freq from panther_views.public.all_databases
where p_event_time between convert_timezone('UTC',current_date - 1)::timestamp_ntz
    and convert_timezone('UTC',current_timestamp)::timestamp_ntz
group by p_db_name
limit 1000

Keep in mind that different database back-ends allow different date formats and operations. Athena does not allow simple arithmetic operations on dates, therefore the care must be taken to use an Athena-friendly time format:

select p_db_name, count(*) as freq from panther_views.public.all_databases
where p_occurs_between(current_date - interval '1' day, current_timestamp)
group by p_db_name
limit 1000

Because of the structure of allowed indexes on partitions in Athena, the expansion looks different:

select p_db_name, count(*) as freq from panther_views.all_databases
where p_event_time between cast (current_date - interval '1' day as timestamp) and cast (current_timestamp as timestamp)
  and partition_time between to_unixtime(date_trunc('HOUR', (cast (current_date - interval '1' day as timestamp))))
    and to_unixtime(cast (current_timestamp as timestamp))
group by p_db_name
limit 1000

The macro also takes an optional table alias. This can be helpful when referring to multiple tables, such as with a JOIN:

select aws.awsRegion, ata.digestEndTime
from panther_logs.public.aws_cloudtrail as aws
join panther_logs.public.aws_cloudtraildigest as ata ON aws.awsRegion = ata.awsAccountId
where p_occurs_between('2023-01-01', '2023-06-30', aws)
limit 10

Time offset from present: p_occurs_since

p_occurs_since(offsetFromPresent [, tableAlias[, column]])

  • offsetFromPresent - an expression in time duration format, interpreted relative to the present, for example '1 hour'

  • tableAlias - an optional identifier that will be used as the table alias if provided

  • column - an optional identifier that will be used as the column if provided

    • If not present, the default column is p_event_time.

    • Indicating a different column (such as p_parsed_time) with column can lead to significantly longer query times, as without a restriction on p_event_time, the entire table is searched.

The macro p_occurs_since() takes an offset parameter specified in time duration format and filters the result set down to those events between the current time and the specified offset, using the correct partition or cluster key (minimizing I/O and speeding up the query).

The macro also takes an optional table alias which can be helpful when referring to multiple tables, such as a JOIN.

To be used properly this macro should occur within a filter, such as a WHERE clause.

Examples:

p_occurs_since('6 d')
p_occurs_since('2 weeks')
p_occurs_since(900) // assumes seconds
p_occurs_since('96 hrs')

Note: If this is used in a Scheduled Search, then rather than using the current time as the reference, the scheduled run time will be used. For example, if a query is scheduled to run at the start of each hour, then the p_occurs_since('1 hour') macro will expand using a time range of 1 hour starting at the start of each hour (regardless of when the query is actually executed).

In the following example of a macro with a table alias parameter, we look at CloudTrail logs to identify S3 buckets created and deleted within one hour of their creation, a potentially suspicious behavior. To get this information we do a self-join on the aws_cloudtrail table in panther_logs, and we use a macro expansion to limit this search to the past 24 hours on each of the two elements of the self-join (aliased ct1 and ct2 below):

select 
ct1.p_event_time createTime, ct2.p_event_time deleteTime,
timediff('s',createTime, deleteTime) timeExtant,
ct1.requestparameters:"bucketName"::varchar createdBucket,
ct1.useridentity:"arn"::varchar createArn, deleteArn,
ct1.useragent createUserAgent, deleteUserAgent
from panther_logs.public.aws_cloudtrail ct1
join (
select p_event_time, requestparameters:"bucketName"::varchar deletedBucket, errorcode,
  eventname deleteEvent, useridentity:"arn"::varchar deleteArn, useragent deleteUserAgent  from panther_logs.public.aws_cloudtrail) ct2
on (ct1.requestparameters:"bucketName"::varchar = ct2.DeletedBucket
    and ct2.p_event_time > ct1.p_event_Time
    and timediff('s',ct1.p_event_time, ct2.p_event_time) < 3600)
where ct2.deleteEvent = 'DeleteBucket'
and ct1.eventName = 'CreateBucket'
and ct1.errorCode is null and ct2.errorcode is null
and p_occurs_since('1 day',ct2)  -- apply to ct2
and p_occurs_since('24 hours',ct1)  -- apply to ct1
order by createdBucket, createTime;

There are two separate calls to p_occurs_since each applied to a different table, as indicated by the table alias used as a second parameter. This is expanded into the following Snowflake query:

select 
ct1.p_event_time createTime, ct2.p_event_time deleteTime,
timediff('s',createTime, deleteTime) timeExtant,
ct1.requestparameters:"bucketName"::varchar createdBucket,
ct1.useridentity:"arn"::varchar createArn, deleteArn,
ct1.useragent createUserAgent, deleteUserAgent
from panther_logs.public.aws_cloudtrail ct1
join (
select p_event_time, requestparameters:"bucketName"::varchar deletedBucket, errorcode,
  eventname deleteEvent, useridentity:"arn"::varchar deleteArn, useragent deleteUserAgent  from panther_logs.public.aws_cloudtrail) ct2
on (ct1.requestparameters:"bucketName"::varchar = ct2.deletedBucket
    and ct2.p_event_time > ct1.p_event_Time
    and timediff('s',ct1.p_event_time, ct2.p_event_time) < 3600)
where ct2.deleteEvent = 'DeleteBucket'
and ct1.eventName = 'CreateBucket'
and ct1.errorCode is null and ct2.errorcode is null
and ct2.p_event_time >= current_timestamp - interval '86400 second'
and ct1.p_event_time >= current_timestamp - interval '86400 second'
order by createdBucket, createTime;

Filter around a certain time: p_occurs_around

p_occurs_around(timestamp, timeOffset [, tableAlias[, column]])

  • timestamp - a time in timestamp format, indicating the time to search around

  • timeOffset - an expression in time duration format, indicating the amount of time to search around the timestamp, for example '1 hour'

  • tableAlias - an optional identifier that will be used as the column alias if provided

  • column - an optional identifier that will be used as the column if provided

    • If not present, the default column is p_event_time.

    • Indicating a different column (such as p_parsed_time) with column can lead to significantly longer query times, as without a restriction on p_event_time, the entire table is searched.

Note: Please ensure that your time expression can be parsed by the database backend your instance is using. For more information see Timestamp format.

The p_occurs_around() macro allows you to filter for events that occur around a given time. It takes a timestamp in timestamp format indicating the time to search around and an offset in time duration format specifying the interval to search. The search range is from timestamp - timeOffset to timestamp + timeOffset.

For example, the macro p_occurs_around('2022-01-01 10:00:00.000', '10 m') filters for events that occurred between 09:50 am and 10:10 am UTC on January 1, 2022.

The macro also takes an optional table alias which can be helpful when referring to multiple tables, such as a JOIN.

To be used properly this macro should occur within a filter, such as the WHERE clause of a SQL statement.

Examples:

p_occurs_around('2022-01-01 10:00:00.000', '6 d')
p_occurs_around('2022-01-01 10:00:00.000', '2 weeks')
p_occurs_around('2022-01-01 10:00:00.000', 900) // assumes seconds
p_occurs_around('2022-01-01 10:00:00.000', '96 hrs')

How to manage Saved Searches in Data Explorer

Saving your commonly run searches in Data Explorer means you won't need to rewrite them again and again.

Note that the instructions to delete a Saved Search are outlined on Saved and Scheduled Queries.

Save a search in Data Explorer

Below are instructions for how to save a search you've written in Data Explorer. You can also create a Saved Search from Search.

  1. In the left-hand navigation bar of your Panther Console, click Investigate > Data Explorer in the left sidebar.

  2. In the SQL editor, write a search using SQL.

    • You can create a Templated Search by including variables in your SQL expression. Learn more on Templated Searches.

  3. Below the SQL editor, click Save As.

  4. In the Save Search modal that pops up, fill in the form:

    • Search Name: Add a descriptive name.

    • Tags: Add tags to help you group similar searches together.

    • Description: Describe the purpose of the search.

    • Is this a Scheduled Search?: If you want this Saved Search to run on a schedule (making it a Scheduled Search), switch the toggle to ON.

      • When you switch this toggle to ON, the options described below will appear.

      • Is it active?: If you want this Scheduled Search to start running on your selected schedule, switch the toggle to ON.

    If you've toggled Is this a scheduled query? to ON, configure one of the following interval options: Period or Cron Expression.

    • Period (select if your query should run on fixed time intervals):

      • Period(days) and Period(min): Enter the number of days and/or minutes after which the SQL query should run again. For example: setting a period of 0 days and 30 minutes will mean that the query will run every day, every 30 minutes.

      • Timeout(min): Enter the timeout period in minutes, with a maximum allowed value of 10 minutes. If your query does not complete inside the allowed time window, Panther will retry 3 times before automatically canceling it.

    • Cron Expression (select if your query should run repeatedly at specific dates, and learn more about how to create a cron expression in How to use the Scheduled Search crontab):

      • Minutes and Hours: Enter the time of day for the query to run.

      • Day and Month (day of month): If you wish to have this query run on a specific day and month, enter the day and month.

      • Day (day of week): If you wish to have this query run on a specific day of the week, enter the day.

  5. Click Save Search.

If you've created a Scheduled Search (by toggling Is this a Scheduled Search? to ON), you can now follow the instructions to create a Scheduled Rule if you'd like the data returned by your search to be passed through a detection, alerting on matches.

Update a Saved Search in Data Explorer

  1. In your Panther Console, navigate to Investigate > Data Explorer in the left sidebar.

  2. In the upper right corner, click Open Saved Search.

    • An Open a Search modal will pop up.

  3. In the modal, select the Saved Search you'd like to update, and click Open Search.

    • The Saved Search will populate in the Data Explorer SQL editor.

  4. Make desired changes to the SQL command.

    • An Update Search modal will pop up.

  5. Make desired changes to the Saved Search's metadata, including the Search Name, Tags, Description, Default Database, and Is this a Scheduled Search? (and related fields).

  6. Click Update Search to save your changes.

Last updated

#1924: [don't merge until ~Oct] Notion Logs (Beta)

Change request updated