Panther System Architecture

Diagrams and explanations of the Panther system architecture

Overview of Panther system

A diagram with various box elements and arrows between boxes is shown. Boxes have names like, "Log Processing," Cloud Security Scanning," and "Alerting."

The diagram above flows roughly from left to right, and can be read in the following steps:

Raw log data flows into Panther from various log sources, including SaaS pullers (e.g., Okta) and Data Transport sources (e.g., AWS S3). These raw logs are parsed, filtered and normalized in the Log Processing subsystem.
- The output of Log Processing flows into two subsystems: Data Lake and Detection.
If enabled, Cloud Security Scanning will scan onboarded cloud infrastructure, then pass the resources it finds into the Detection subsystem.
The Enrichment subsystem optionally adds additional context to the data flowing into the Detection subsystem, which can be used to enhance detection efficacy (e.g., IPinfo, Okta Profiles).
The Detection subsystem applies detections to the following inputs:
- From Log Processing: Log events
- From Scheduled Searches: Log events
- From Cloud Security Scanning: Infrastructure resources
The AI subsystem is used for alert triage, and can be configured to trigger automatically when alerts are created. The AI service is available via API and various Console UI entry points.
If a detection generates an alert, it is sent to the Alerting subsystem for dispatch to its appropriate alert destinations (e.g., Slack, Jira, a webhook, etc.). A single alert can be routed to more than one destination.

At the bottom of the diagram, the Control Plane represents the cross-cutting infrastructure responsible for configuring and controlling the subsystems above (the data plane). This will be expanded on in the descriptions of each subsystems, below. The API Server referenced in the upper right corner is the external entry point into the Control Plane.

General considerations

AWS

Each Panther customer has a Panther instance deployed into a dedicated AWS account.
- A customer can choose to own the AWS account or have Panther manage the account.
- No data is shared or accessible between customers.
- The AWS account forms the permission boundary for the application.
- There is a single VPC used for services requiring networking.
Processing is done via AWS Lambda and Fargate instances.
- A proprietary control plane dynamically picks the best compute to minimize cost (see below).
- Compute resources do not communicate with one another directly; rather, they communicate via AWS services. In other words, there is no "east/west" network traffic, there is only "north/south" network traffic
The Principle of least privilege is followed by using minimally scoped IAM roles for each infrastructure component.

Snowflake

Each Panther customer has a Panther Snowflake instance deployed into a dedicated Snowflake account.
- A customer can choose to own the Snowflake account or have Panther manage the account.
- No data is shared or accessible between customers.
Snowflake secrets are managed by AWS Secret Manager using RSA keys, and rotated daily.

Other

All data is encrypted in transit and at rest.
All external interactions are conducted using the API:
- The Panther Console is a React application interfacing with the API server.
- The public API exposes GraphQL and REST endpoints.
- All API actions are logged as Panther Audit Logs, which can then be ingested as a log source in Panther.
Secrets related to external integrations are managed in Amazon DynamoDB using KMS encrypted fields.
The system scales up and down according to load.
Panther infrastructure is managed by Pulumi.
- All infrastructure is tagged (e.g., resource name, subsystem), enabling effective billing analysis.
- Customers owning their AWS account can add their own tags to integrate into their larger organization's billing reporting.
Monitoring is performed using a combination of CloudWatch, Sentry, and Datadog.

API subsystem

A flow diagram has various components: API Client, API Services, Log Processing, Detection, and more. There are icons associated to each component, and arrows drawn between components. At the bottom, a Control Plane rectangle runs along the entire width.

The Panther API is the entry point for all external interactions with Panther. The Console, GraphQL, and REST clients connect via an AWS ALB. Customers can optionally configure an allowlist for ALB access using IP CIDRs.

API authentication is performed using AWS Cognito. GraphQL and REST clients use tokens, while the Panther Console uses JWTs managed by AWS Cognito. The Console supports Single Sign-On (SSO) via AWS Cognito.

There is an internal API server that resolves the requests. Some requests are processed entirely within the API server, while others require one or more calls to other internal services implemented via AWS Lambda functions.

Log Processing subsystem

A flow diagram has various components: Log Processing, Log Events, Event Sampling, and more. There are icons associated to each component, and arrows drawn between components. At the bottom, a Control Plane rectangle runs along the entire width.

All data inputted into this subsystem is delivered via AWS S3 and S3 notifications. Upstream sources that are not S3-based (e.g., SaaS pullers, HTTP Source, Google Cloud Storage Source) use Amazon Data Firehose to aggregate events into S3 objects. These notifications are routed through a master Amazon SNS topic. The Log Processing and Event Sampling workflows each subscribe to this SNS topic.

Log Processing computation is implemented with AWS Lambda and Fargate.

Dynamic compute cost optimization

Panther uses an efficient, proprietary control plane that orchestrates compute selection, aggregation and scaling.

As traffic increases, additional compute is required. Panther's control plane scales to match traffic, meaning it minimizes the number of compute instances used to maximize aggregation of data and minimize cost.

Lambda is used as Panther's core compute because its low latency allows us to quickly follow traffic variations, which is cost effective for bursty and light traffic loads. However, Lambda's cost per unit time is higher than other compute options. In the case of sustained and predicable traffic, Lambda is not as cost effective as other compute options. This is why, if the control plane detects a high volume of stable traffic, Fargate (Fargate Spot, if available) is used instead of Lambda to minimize costs.

For each notification received, the following steps are taken:

The integration source associated with the S3 object is looked up in DynamoDB and the associated role is assumed for reading.
The data is read from S3.
Each event is parsed according to the associated schema for that data type.
- If classification or parsing errors arise, System Errors are generated and the associated "bad" data is stored in the Data Lake within the classification_failures table.
Ingestion filters and transformations are applied.
Indicator fields (p_any fields) are extracted, and standard fields are inserted.
Processed events are written as S3 objects and notifications are sent to an internal SNS topic, which the Data Lake and Detection subsystems are subscribed to.

You can optionally configure an event threshold alarm for each onboarded log source to alert if traffic stops unexpectedly.

The S3 notifications also route to the Event Sampling subsystem, which is used for log schema field discovery. As new attributes are found in the data, they are analyzed and added automatically to the schema (and associated Data Lake tables).

Enrichment subsystem

A flow diagram has various components: Customer Data Provider, Lookup Table Processor, Detections, and more. There are icons associated to each component, and arrows drawn between components. At the bottom, a Control Plane rectangle runs along the entire width.

Enrichment in Panther is implemented via Lookup Tables (LUTs). A LUT is a table containing data associated to a unique primary key. A LUT also has a mapping from schemas to primary key, which allows for automatic enrichment in the Detection subsystem. Detections may also use a function call interface to look up data.

IPinfo, for example, is a Panther-managed enrichment provider containing geolocation data. IP addresses in a log event will automatically be enriched with location, ASN, and privacy information. Customers can also create their own custom LUTs to bring context relevant to their business and security concerns.

LUTs are created either via the Panther Console or in the CLI workflow (using a YAML specification file). Data for the LUT can be made accessible to Panther in a few ways: uploaded in the Console, included as a file in the CLI configuration, or stored as an S3 object. In general, the most useful way to manage LUT data is as an S3 object reference—you can create S3 objects in your own account, and Panther will poll for changes.

The metadata associated with a LUT is stored in DynamoDB. When there is new data, the Lookup Table Processor assumes the specified role from the metadata and processes the S3 data. This creates two outputs: a real-time database in EFS used by the Detection subsystem, and tables in the Data Lake. The tables in the Data Lake can be used by Scheduled Searches to enrich events using joins.

Detection subsystem

A flow diagram has various components: Log Processing, Cloud Security Scanning, Streaming, Scheduled, and more. There are icons associated to each component, and arrows drawn between components. At the bottom, a Control Plane rectangle runs along the entire width.

The streaming detection processor allows Python-based detections to run on log events from Log Processing and Scheduled Searches, as well as resources from Cloud Security Scanning. The streaming detection processor runs as an AWS Lambda function (or Fargate instance) optimized for high speed execution of Python. (The processor is, however, not simply a Python Lambda—although it was in an earlier iteration of Panther's infrastructure. After years of experience, we have learned that a naive Python Lambda implementation is neither efficient nor cost effective.)

The streaming detection processor evaluates the following types of detections:

Streaming detections (rules): Targeted at one or more log schemas (also called LogTypes)
Scheduled detections (scheduled rules): Targeted at the output of one or more Scheduled Searches
Policy detections: Targeted at resources

Processing data from these sources follows these steps:

For every active Lookup Table, any matches are applied to the p_enrichment field so that the information is available for detections.
All detections associated to the given LogType, cloud resource, or Scheduled Search are found.
Each detection's rule() function is run on the event/resource. If it returns True, then the other optional functions are run, and an alert is sent to the Alerting subsystem. For rules and scheduled rules, alerts are only sent for the first detection within the detection's deduplication window.
Events associated with the detection are written to an S3 object and an S3 notification is sent to an internal SNS topic.
- The Data Lake subsystem subscribes to the SNS topic for data ingestion into the rule matches and signals tables.

When a Scheduled Search is finished executing, the streaming detection processor Lambda is invoked with a reference to the results of the query. The results are read, and each event is processed according to the steps above.

Data Replay allows for testing of detections on historical data. This is implemented via a "mirror" set of infrastructure that is independent of the live infrastructure.

Data Lake subsystem

A flow diagram has various components: Database API, Query Execution History, AWS Secrets Manager, Security Data Lake, and more. There are icons associated to each component, and arrows drawn between components. At the bottom, a Control Plane rectangle runs along the entire width.

Panther uses the Snowflake Snowpipe service to ingest data into the Data Lake. This service uses AWS IAM permissions and is therefore not dependent on Snowflake users configured for queries and management. The onboarding of a new data source in Panther triggers the creation of associated tables and Snowpipe infrastructure using the Admin database API Lambda. This Lambda has an associated user with read/write permissions to Panther databases and schemas. Notice there is no direct outside connect to invoke this Lambda; rather, this Lambda is driven by the internal Control Plane.

Queries are run using the read only database API Lambda. This Lambda has an associated user with read only permissions.

Queries are asynchronous. When an API request is made to run a query, the associated SQL is executed in Snowflake and Snowflake returns a queryId. API calls are then made with the queryId to check the status and read the associated results. The status of the execution of a query is tracked in DynamoDB.

Query results are stored in EFS for 30 days (though this length is configurable). Customers can use the Search History in Panther to view results of past searches.

Scheduled Searches used by Detection are run via an AWS Step Function. Upon query execution completion, the streaming detection processor is invoked with a reference to the query results for further processing.

When RBAC per logtype is enabled, there is a unique, managed read-only user per role.

Snowflake secrets are stored in AWS Secrets Manger. RSA secrets are used and rotated daily.

Alerting subsystem

A flow diagram has various components: Detection, Database, Alert Dispatch, Alert "Storm" Limiter, and more. There are icons associated to each component, and arrows drawn between components. At the bottom, a Control Plane rectangle runs along the entire width.

The Detection subsystem inserts alerts into a DynamoDB table, which the alert dispatch Lambda listens to on a stream. This Lambda uses the configured integrations to send alerts to destinations.

To display alerts in the Panther Console, core alert data is retrieved from DynamoDB, while the alert's associated events are retrieved from the Data Lake.

The alert limiter functionality is intended to prevent "alert storms" from overloading your destinations, which arise from (likely) misconfigured detections. If more than 1000 alerts are generated in one hour from the same detection, alerts will be suppressed. (This limit is configurable.) If the limit is met, the detection will continue to run and store events in the Data Lake (so there is no data loss), however no alerts are created. In this case, a System Error is generated to notify the customer, who can manually remove the alert suppression in the Console (perhaps after some detection tuning).

There are special authenticated endpoints for Jira and Slack to "call back" to Panther in order to sync alert state (e.g., to update the status of an alert to Resolved).

AI subsystem

An icon that looks like half a brain and half of a computer network is surrounded by "AI" and "AWS Bedrock."

The Panther AI subsystem is not architecturally complex. It consists of:

Server: Receives requests and orchestrates response and persistence
- Dynamically generates system prompts depending on the entry point and context
- Enforces quotas and permissions
- Interfaces with Amazon Bedrock APIs
- Orchestrates tool use (this consists of calling internal Panther APIs)
- Manages persisting responses in Amazon DynamoDB
Amazon Bedrock: The server communicates with Bedrock for all AI inferences
Amazon DynamoDB: Used for persistence of AI responses
- Responses are deleted after 30 days unless explicitly saved

Panther AI workflows

Panther AI exposes many of the services available to human Panther users as tools. Since Panther AI will use tools as needed (i.e., given the context and the task), workflows are often variable. Below, a typical alert triage workflow is illustrated.

A diagram with many boxes and arrows is shown. A blue box near the center of the diagram reads, "AI Processes with specialized alert triage prompts & tools."

FAQs: Panther AI architecture and security

Does Panther use customer data to train AI?

No, Panther does not use customer data to train AI. Panther AI only performs AI inference using Amazon Bedrock foundation models.

Which foundation models does Panther AI use?

Panther AI uses Anthropic Claude models.

How are AI responses stored and protected?

AI responses are stored in Amazon DynamoDB, in the Panther customer account. Responses are deleted after 30 days unless explicitly saved.

Panther AI enforces log type access restrictions, if set for the current user's role. This means:

Responses for alerts have access restricted based on the user's permissions for alerts.
Responses for Search results have access restricted based on the user's permissions for the data lake.

Does Panther AI have any guardrails?

Panther AI has the following cost safety control quotas (which may be changed, if requested):

Inferences per hour (the default is 100)
Data lake queries executed per hour (the default is 100)

Additionally, Cloud Connected customers can implement Amazon Bedrock Guard Rails.

Panther also offers CloudTrail detections to monitor Amazon Bedrock.

Can Panther AI be used via API call?

Yes, Cloud Connected customers and SaaS customers with "pass-through billing" can use AI API operations. Tool use will respect the permissions associated with the API token.

Is any AI content present in errors generated by Panther, or in operational logs (e.g., in CloudWatch or Datadog)?

No, AI content is not logged in Panther errors or operational logs—only metadata is logged.

PreviousLegal

Last updated 2 months ago

Was this helpful?