Panther System Architecture
Diagrams and explanations of the Panther system architecture
Last updated
Diagrams and explanations of the Panther system architecture
Last updated
The diagram above flows roughly from left to right, and can be read in the following steps:
Raw log data flows into Panther from various log sources, including SaaS pullers (e.g., Okta) and Data Transport sources (e.g., AWS S3). These raw logs are parsed, filtered and normalized in the Log Processing subsystem.
The output of Log Processing flows into two subsystems: Data Lake and Detection.
If enabled, Cloud Security Scanning will scan onboarded cloud infrastructure, then pass the resources it finds into the Detection subsystem.
The Enrichment subsystem optionally adds additional context to the data flowing into the Detection subsystem, which can be used to enhance detection efficacy (e.g., IPinfo, Okta Profiles).
The Detection subsystem applies detections to the following inputs:
From Log Processing: Log events
From Scheduled Searches: Log events
From Cloud Security Scanning: Infrastructure resources
If a detection generates an alert, it is sent to the Alerting subsystem for dispatch to its appropriate alert destinations (e.g., Slack, Jira, a webhook, etc.). A single alert can be routed to more than one destination.
At the bottom of the diagram, the Control Plane represents the cross-cutting infrastructure responsible for configuring and controlling the subsystems above (the data plane). This will be expanded on in the descriptions of each subsystems, below. The API Server referenced in the upper right corner is the external entry point into the Control Plane.
Each Panther customer has a Panther instance deployed into a dedicated AWS account.
A customer can choose to own the AWS account or have Panther manage the account.
No data is shared or accessible between customers.
The AWS account forms the permission boundary for the application.
There is a single VPC used for services requiring networking.
Processing is done via AWS Lambda and Fargate instances.
Compute resources do not communicate with one another directly; rather, they communicate via AWS services. In other words, there is no "east/west" network traffic, there is only "north/south" network traffic.
The Principle of least privilege is followed by using minimally scoped IAM roles for each infrastructure component.
Each Panther customer has a Panther Snowflake instance deployed into a dedicated Snowflake account.
A customer can choose to own the Snowflake account or have Panther manage the account.
No data is shared or accessible between customers.
Snowflake secrets are managed by AWS Secret Manager using RSA keys, and rotated daily.
All data is encrypted in transit and at rest.
All external interactions are conducted using the API:
The Panther Console is a React application interfacing with the API server.
All API actions are logged as Panther Audit Logs, which can then be ingested as a log source in Panther.
Secrets related to external integrations are managed in DynamoDB using KMS encrypted fields.
The system scales up and down according to load.
Panther infrastructure is managed by Pulumi.
All infrastructure is tagged (e.g., resource name, subsystem), enabling effective billing analysis.
Customers owning their AWS account can add their own tags to integrate into their larger organization's billing reporting.
Monitoring is performed using a combination of CloudWatch, Sentry, and Datadog.
All data inputted into this subsystem is delivered via AWS S3 and S3 notifications. Upstream sources that are not S3-based (e.g., SaaS pullers, HTTP Source, Google Cloud Storage Source) useAmazon Data Firehose to aggregate events into S3 objects. These notifications are routed through a master Amazon SNS topic. The Log Processing and Event Sampling workflows each subscribe to this SNS topic.
Log Processing is implemented using AWS Lambda functions. There is an efficient, proprietary Control Plane that orchestrates aggregation and scaling. For each notification received, the following steps are taken:
The integration source associated with the S3 object is looked up in DynamoDB and the associated role is assumed for reading.
The data is read from S3.
Each event is parsed according to the associated schema for that data type.
If classification or parsing errors arise, System Errors are generated and the associated "bad" data is stored in the Data Lake within the classification_failures
table.
Ingestion filters and transformations are applied.
Indicator fields (p_any
fields) are extracted, and standard fields are inserted.
You can optionally configure an event threshold alarm for each onboarded log source to alert if traffic stops unexpectedly.
The S3 notifications also go to the Event Sampling subsystem, which is used for log schema field discovery. As new attributes are found in the data, they are analyzed and added automatically to the schema (and associated Data Lake tables).
Enrichment in Panther is implemented via Lookup Tables (LUTs). A LUT is a table containing data associated to a unique primary key. A LUT also has a mapping from schemas to primary key, which allows for automatic enrichment in the Detection subsystem. Detections may also use a function call interface to look up data.
IPinfo, for example, is a Panther-managed enrichment provider containing geolocation data. IP addresses in a log event will automatically be enriched with location, ASN, and privacy information. Customers can also create their own custom LUTs to bring context relevant to their business and security concerns.
LUTs are created either via the Panther Console or in the CLI workflow (using a YAML specification file). Data for the LUT can be made accessible to Panther in a few ways: uploaded in the Console, included as a file in the CLI configuration, or stored as an S3 object. In general, the most useful way to manage LUT data is as an S3 object reference—you can create S3 objects in your own account, and Panther will poll for changes.
The metadata associated with a LUT is stored in DynamoDB. When there is new data, the Lookup Table Processor assumes the specified role from the metadata and processes the S3 data. This creates two outputs: a real-time database in EFS used by the Detection subsystem, and tables in the Data Lake. The tables in the Data Lake can be used by Scheduled Searches to enrich events using joins.
The streaming detection processor allows Python-based detections to run on log events from Log Processing and Scheduled Searches, as well as resources from Cloud Security Scanning. The streaming detection processor runs as an AWS Lambda function optimized for high speed execution of Python.
The streaming detection processor evaluates the following types of detections:
Streaming detections (rules): Targeted at one or more log schemas (also called LogTypes
)
Scheduled detections (scheduled rules): Targeted at the output of one or more Scheduled Searches
Policy detections: Targeted at resources
Processing data from these sources follows these steps:
For every active Lookup Table, any matches are applied to the p_enrichment
field so that the information is available for detections.
All detections associated to the given LogType
, cloud resource, or Scheduled Search are found.
Each detection's rule()
function is run on the event/resource. If it returns True
, then the other optional functions are run, and an alert is sent to the Alerting subsystem. For rules and scheduled rules, alerts are only sent for the first detection within the detection's deduplication window.
When a Scheduled Search is finished executing, the streaming detection processor Lambda is invoked with a reference to the results of the query. The results are read, and each event is processed according to the steps above.
Data Replay allows for testing of detections on historical data. This is implemented via a "mirror" set of infrastructure that is independent of the live infrastructure.
Panther uses the Snowflake Snowpipe service to ingest data into the Data Lake. This service uses AWS IAM permissions and is therefore not dependent on Snowflake users configured for queries and management. The onboarding of a new data source in Panther triggers the creation of associated tables and Snowpipe infrastructure using the Admin database API Lambda. This Lambda has an associated user with read/write
permissions to Panther databases and schemas. Notice there is no direct outside connect to invoke this Lambda; rather, this Lambda is driven by the internal Control Plane.
Queries are run using the read only
database API Lambda. This Lambda has an associated user with read only
permissions.
Queries are asynchronous. When an API request is made to run a query, the associated SQL is executed in Snowflake and Snowflake returns a queryId
. API calls are then made with the queryId
to check the status and read the associated results. The status of the execution of a query is tracked in DynamoDB.
Query results are stored in EFS for 30 days (though this length is configurable). Customers can use the Search History in Panther to view results of past searches.
Scheduled Searches used by Detection are run via an AWS Step Function. Upon query execution completion, the streaming detection processor is invoked with a reference to the query results for further processing.
When RBAC per logtype is enabled, there is a unique, managed read-only user per role.
Snowflake secrets are stored in AWS Secrets Manger. RSA secrets are used and rotated daily.
The Detection subsystem inserts alerts into a DynamoDB table, which the alert dispatch Lambda listens to on a stream. This Lambda uses the configured integrations to send alerts to destinations.
To display alerts in the Panther Console, core alert data is retrieved from DynamoDB, while the alert's associated events are retrieved from the Data Lake.
The alert limiter functionality described below is currently in closed beta.
The alert limiter functionality is intended to prevent "alert storms" from overloading your destinations, which arise from (likely) misconfigured detections. If more than 1000 alerts are generated in one hour from the same detection, alerts will be suppressed. (This limit is configurable.) If the limit is met, the detection will continue to run and store events in the Data Lake (so there is no data loss), however no alerts are created. In this case, a System Error is generated to notify the customer, who can manually remove the alert suppression in the Console (perhaps after some detection tuning).
There are special authenticated endpoints for Jira and Slack to "call back" to Panther in order to sync alert state (e.g., to update the status of an alert to Resolved
).
The Panther API is the entry point for all external interactions with Panther. The Console, GraphQL, and REST clients connect via an AWS ALB. Customers can optionally configure an allowlist for ALB access using IP CIDRs.
API authentication is performed using AWS Cognito. GraphQL and REST clients use tokens, while the Panther Console uses JWTs managed by AWS Cognito. The Console supports Single Sign-On (SSO) via AWS Cognito.
There is an internal API server that resolves the requests. Some requests are processed entirely within the API server, while others require one or more calls to other internal services implemented via AWS Lambda functions.