Panther System Architecture
Diagrams and explanations of the Panther system architecture
Last updated
Was this helpful?
Diagrams and explanations of the Panther system architecture
Last updated
Was this helpful?
The diagram above flows roughly from left to right, and can be read in the following steps:
Each Panther customer has a Panther instance deployed into a dedicated AWS account.
A customer can choose to own the AWS account or have Panther manage the account.
No data is shared or accessible between customers.
The AWS account forms the permission boundary for the application.
There is a single VPC used for services requiring networking.
Processing is done via AWS Lambda and Fargate instances.
A proprietary control plane dynamically picks the best compute to minimize cost (see below).
Compute resources do not communicate with one another directly; rather, they communicate via AWS services. In other words, there is no "east/west" network traffic, there is only "north/south" network traffic
Each Panther customer has a Panther Snowflake instance deployed into a dedicated Snowflake account.
A customer can choose to own the Snowflake account or have Panther manage the account.
No data is shared or accessible between customers.
Snowflake secrets are managed by AWS Secret Manager using RSA keys, and rotated daily.
All data is encrypted in transit and at rest.
The Panther Console is a React application interfacing with the API server.
Secrets related to external integrations are managed in DynamoDB using KMS encrypted fields.
The system scales up and down according to load.
All infrastructure is tagged (e.g., resource name, subsystem), enabling effective billing analysis.
Log Processing computation is implemented with AWS Lambda and Fargate.
For each notification received, the following steps are taken:
The integration source associated with the S3 object is looked up in DynamoDB and the associated role is assumed for reading.
The data is read from S3.
LUTs are created either via the Panther Console or in the CLI workflow (using a YAML specification file). Data for the LUT can be made accessible to Panther in a few ways: uploaded in the Console, included as a file in the CLI configuration, or stored as an S3 object. In general, the most useful way to manage LUT data is as an S3 object reference—you can create S3 objects in your own account, and Panther will poll for changes.
The streaming detection processor evaluates the following types of detections:
Processing data from these sources follows these steps:
For every active Lookup Table, any matches are applied to the p_enrichment
field so that the information is available for detections.
All detections associated to the given LogType
, cloud resource, or Scheduled Search are found.
Events associated with the detection are written to an S3 object and an S3 notification is sent to an internal SNS topic.
Queries are run using the read only
database API Lambda. This Lambda has an associated user with read only
permissions.
Queries are asynchronous. When an API request is made to run a query, the associated SQL is executed in Snowflake and Snowflake returns a queryId
. API calls are then made with the queryId
to check the status and read the associated results. The status of the execution of a query is tracked in DynamoDB.
To display alerts in the Panther Console, core alert data is retrieved from DynamoDB, while the alert's associated events are retrieved from the Data Lake.
There are special authenticated endpoints for Jira and Slack to "call back" to Panther in order to sync alert state (e.g., to update the status of an alert to Resolved
).
There is an internal API server that resolves the requests. Some requests are processed entirely within the API server, while others require one or more calls to other internal services implemented via AWS Lambda functions.
Raw log data flows into Panther from various log sources, including SaaS pullers (e.g., ) and Data Transport sources (e.g., ). These raw logs are parsed, filtered and normalized in the subsystem.
The output of flows into two subsystems: and .
If enabled, will scan onboarded cloud infrastructure, then pass the resources it finds into the subsystem.
The subsystem optionally adds additional context to the data flowing into the subsystem, which can be used to enhance detection efficacy (e.g., , ).
The subsystem applies detections to the following inputs:
From : Log events
From : Log events
From : Infrastructure resources
If a detection generates an alert, it is sent to the subsystem for dispatch to its appropriate alert (e.g., Slack, Jira, a webhook, etc.). A single alert can be routed to more than one destination.
At the bottom of the diagram, the Control Plane represents the cross-cutting infrastructure responsible for configuring and controlling the subsystems above (the data plane). This will be expanded on in the descriptions of each subsystems, below. The referenced in the upper right corner is the external entry point into the Control Plane.
The is followed by using minimally scoped IAM roles for each infrastructure component.
All external interactions are conducted using the :
The public API exposes and endpoints.
All API actions are logged as , which can then be ingested as a log source in Panther.
Panther infrastructure is managed by .
Customers owning their AWS account to integrate into their larger organization's billing reporting.
Monitoring is performed using a combination of , , and .
All data inputted into this subsystem is delivered via AWS S3 and S3 notifications. Upstream sources that are not S3-based (e.g., SaaS pullers, , ) use to aggregate events into S3 objects. These notifications are routed through a master topic. The Log Processing and Event Sampling workflows each subscribe to this SNS topic.
Each event is parsed according to the associated for that data type.
If classification or parsing errors arise, are generated and the associated "bad" data is stored in the Data Lake within the classification_failures
table.
and are applied.
(p_any
fields) are extracted, and are inserted.
Processed events are written as S3 objects and notifications are sent to an internal SNS topic, which the and subsystems are subscribed to.
You can optionally configure an for each onboarded log source to alert if traffic stops unexpectedly.
The S3 notifications also route to the Event Sampling subsystem, which is used for . As new attributes are found in the data, they are analyzed and added automatically to the schema (and associated Data Lake tables).
in Panther is implemented via Lookup Tables (LUTs). A LUT is a table containing data associated to a unique primary key. A LUT also has a mapping from schemas to primary key, which allows for automatic enrichment in the subsystem. Detections may also use a function call interface to look up data.
, for example, is a Panther-managed enrichment provider containing geolocation data. IP addresses in a log event will automatically be enriched with location, ASN, and privacy information. Customers can also create their own to bring context relevant to their business and security concerns.
The metadata associated with a LUT is stored in DynamoDB. When there is new data, the Lookup Table Processor assumes the specified role from the metadata and processes the S3 data. This creates two outputs: a real-time database in EFS used by the subsystem, and tables in the . The tables in the Data Lake can be used by to enrich events using joins.
The streaming detection processor allows Python-based detections to run on log events from and , as well as resources from . The streaming detection processor runs as an AWS Lambda function (or Fargate instance) optimized for high speed execution of Python. (The processor is, however, not simply a Python Lambda—although it was in an earlier iteration of Panther's infrastructure. After years of experience, we have learned that a naive Python Lambda implementation is neither efficient nor cost effective.)
(rules): Targeted at one or more log schemas (also called LogTypes
)
(scheduled rules): Targeted at the output of one or more
: Targeted at resources
Each detection's rule()
function is run on the event/resource. If it returns True
, then the other optional functions are run, and an alert is sent to the subsystem. For rules and scheduled rules, alerts are only sent for the first detection within the detection's .
The subsystem subscribes to the SNS topic for data ingestion into the rule matches and tables.
When a is finished executing, the streaming detection processor Lambda is invoked with a reference to the results of the query. The results are read, and each event is processed according to the steps above.
allows for testing of detections on historical data. This is implemented via a "mirror" set of infrastructure that is independent of the live infrastructure.
Panther uses the to ingest data into the Data Lake. This service uses AWS IAM permissions and is therefore not dependent on Snowflake users configured for queries and management. The onboarding of a new data source in Panther triggers the creation of associated tables and Snowpipe infrastructure using the Admin database API Lambda. This Lambda has an associated user with read/write
permissions to Panther databases and schemas. Notice there is no direct outside connect to invoke this Lambda; rather, this Lambda is driven by the internal Control Plane.
Query results are stored in EFS for 30 days (though this length is configurable). Customers can use the in Panther to view results of past searches.
used by are run via an AWS Step Function. Upon query execution completion, the streaming detection processor is invoked with a reference to the query results for further processing.
When is enabled, there is a unique, managed read-only user per role.
Snowflake secrets are stored in . .
The subsystem inserts alerts into a DynamoDB table, which the alert dispatch Lambda listens to on a stream. This Lambda uses the to send alerts to destinations.
The is intended to prevent "alert storms" from overloading your destinations, which arise from (likely) misconfigured detections. If more than 1000 alerts are generated in one hour from the same detection, alerts will be suppressed. (This limit is configurable.) If the limit is met, the detection will continue to run and store events in the Data Lake (so there is no data loss), however no alerts are created. In this case, a is generated to notify the customer, who can manually remove the alert suppression in the Console (perhaps after some detection tuning).
The is the entry point for all external interactions with Panther. The Console, , and clients connect via an AWS ALB. Customers can optionally configure an allowlist for ALB access using IP CIDRs.
API authentication is performed using AWS Cognito. GraphQL and REST clients use , while the Panther Console uses managed by AWS Cognito. The Console supports via AWS Cognito.