PantherFlow

PantherFlow is Panther's pipelined query language

Overview

PantherFlow is in open beta starting with Panther version 1.110, and is available to all customers. Please share any bug reports and feature requests with your Panther support team.

PantherFlow is Panther's pipelined query language. It is designed to be simple to understand, yet powerful and expressive.

You can use PantherFlow to explore and analyze your data in Panther. Using its operators and functions, you can perform a variety of data operations, such as filtering, transformations, and aggregations. PantherFlow is schema-flexible, meaning you can seamlessly search across multiple data sources—including those with different schemas—in a single query.

PantherFlow queries use pipes (|) to delineate data operations, which are processed sequentially. This means that the output of the first operator in a query is passed as the input to the second operator, and so on.

panther_logs.public.okta_systemlog
| where p_event_time > time.ago(1d)
| search 'doug'
| summarize agg.count() by eventType

Where to use PantherFlow

Use PantherFlow during investigations, to query data in Search. Learn how to use PantherFlow in Search here.

To assist your query writing, the PantherFlow code editor in Search has autocomplete and error underlining functionality.

Selections made in the database, table, and date range filters in the upper-right corner of the Search page (including the fields' default values) will take effect only if you do not also provide a database, table, and/or date/time range in the PantherFlow query—if these values are provided in the query, they will overwrite corresponding filter selections.

How a PantherFlow query works

Typically when people say "PantherFlow query," they're talking about a tabular expression statement, which retrieves a dataset and returns it in some form. A tabular expression statement is usually made up of operators separated by pipes (|). Each operator performs some action on the data—i.e., filters or transforms it—before passing it on to the next operator. Operator order is important, as PantherFlow statements are read sequentially.

See a full overview of supported PantherFlow syntax on PantherFlow Quick Reference, or explore the below syntax topics in more detail:

Example

Let's explore the following PantherFlow query:

panther_logs.public.aws_alb
| where p_event_time > time.ago(1d)
| sort p_event_time
| limit 10

Put simply, this query reads data from the panther_logs.public.aws_alb table, filters for events that occurred within the last day, sorts those events by time, and returns the first 10 events. Let's take a deeper look:

panther_logs.public.aws_alb
- This first statement identifies the data source.
- This query is reading from the panther_logs.public.aws_alb table. If the query contained only this line, all data in the table would be returned.
| where p_event_time > time.ago(1d)
- The where operator takes an expression to filter the data.
- This query is requesting data where the p_event_time field value is greater than one day ago. In other words, it's asking for events from the last day. The time.ago() function subtracts from the current time, and its argument (1d) is a timestamp constant representing one day.
| sort p_event_time
- The sort operator lets you order events by one or more field values.
- This query orders data by p_event_time. This will return the most recent events first.
| limit 10
- The limit operator defines how many events you'd like returned, at most.
- This query is requesting 10 events.

See more examples on the following pages:

Limitations of PantherFlow

While you can create a Saved Search using PantherFlow in the Panther Console, it's not possible to:
- Schedule a Saved Search (i.e., create a Scheduled Search)
- Create a Saved Search using PantherFlow in the developer workflow (i.e., by uploading a saved_query via the Panther Analysis Tool or by using the REST or GraphQL APIs)
Aggregations (i.e., the summarize operator) do not show information on the Search results histogram.
In Search, the Available Fields list does not reflect fields that are added or removed when using operators like project, extend, and summarize.
In some cases, a PantherFlow query may run slower than an equivalent SQL query.

Best practices when using PantherFlow

To ensure your PantherFlow query results return as quickly as possible (and to minimize costs associated with the search), it's recommended to follow these best practices:

Use the limit operator
- Use the limit operator to specify the number of records your query will return. While PantherFlow can return data without a limit, using one can return results faster.
- Example: panther_logs.public.aws_alb | limit 100
Use a time range filter
- Use the where operator to filter by a time range. When you filter by a time range (such as p_event_time) in your query, the query will access fewer micro-partitions, which returns results faster.
- Example: panther_logs.public.aws_alb | where p_event_time > time.ago(1d)
- Learn more about what time functions are available.
Use p_any fields
- During log ingestion, Panther extracts common security indicators into the p_any fields—these fields standardize names for attributes across all data sources. The p_any fields are stored in optimized columns. It's recommended to query p_any fields instead of various differently named fields for multiple log types.
- Learn more on Standard Fields.
- Example: panther_logs.public.aws_alb | '10.0.0.0' in p_any_ip_addresses
Use the project operator
- A query without project pulls all columns, which can slow down queries. When possible, use project to query only the fields you need to investigate.
- Example: panther_logs.public.aws_alb | project targetIp, targetPort
Summarize
- Summaries are faster to run than querying full records. This is especially helpful when investigating logs over a long period of time, or in a situation where you are unsure how much data exists for the time range you are investigating.
- Instead of querying the full data set, you can use the summarize operator, which will execute faster and help you determine a narrower timeframe to query next.
- Example: panther_logs.public.aws_alb | summarize count=agg.count() by targetIp
- Learn more about what aggregator functions are available.
Filter data early
- Filter data before performing expensive operations such as summarize or join (rather than after using those operations).
- Example: Instead of: panther_logs.public.aws_alb | summarize agg.count() by actor | where actor != nil use: panther_logs.public.aws_alb | where actor != nil | summarize agg.count() by actor

If your query is still running slowly after implementing the best practices above, it's recommended to:

Check the number of returned rows to see how much data you are querying.
- This will help you determine whether it's a large amount of data and therefore expected that it's taking a while.
Reduce the time range you are querying.
Reach out to your Panther Support team for additional help.

PreviousAthena NextPantherFlow Quick Reference

Last updated 1 year ago

Was this helpful?

hashtagOverview

hashtagWhere to use PantherFlow

hashtagHow a PantherFlow query works

hashtagExample

hashtagLimitations of PantherFlow

hashtagBest practices when using PantherFlow

Overview

Where to use PantherFlow

How a PantherFlow query works

Example

Limitations of PantherFlow

Best practices when using PantherFlow