System Health Notifications
Panther's System Health notifications alert you if the Panther platform is not functioning correctly
Overview
Panther's System Health notifications alert users when a part of the Panther platform is not functioning correctly. This includes the following:
Log Source Health Notifications
Log sources turning unhealthy as a result of a failed health check
Logs dropping off entirely from a log source
Alerts failing to deliver to the Alert Destination
Logs failing to classify
Panther failing to fetch S3 objects
Cloud Security Scanning Failure
Panther failing to scan a cloud resource because of an "access denied" error
These types of alerts are classified as System Errors
in Panther. System Errors
will always have a CRITICAL
severity level—and be sent to alert destinations configured to receive System Errors
, even if they are not configured to receive alerts with a CRITICAL
severity. They are automatically generated, with the exception of log drop-off alarms which you can configure manually per log source.
It's strongly recommended to configure an alert destination to receive the System Error
alert type.
System Error
alerts are visible in your Panther Console within Alerts & Errors > System Errors.
How to configure System Health Notification alarms
To ensure that you receive alerts for all types of System Health errors:
Configure an alert destination that is receiving the
System Error
alert type.Configure Log Drop-off alarms for log sources that will trigger an alert when data is no longer being received.
Note that you do not need to enable alerts for Log Classification errors, Alert Delivery failure, S3 GetObject errors, and Cloud Security Scanning failure.
Configuring an Alert Destination for System Health errors
By default, Panther will send System Errors
alerts to the Alerts page in your Panther Console. It is also strongly recommended to configure one of your alert destinations to receive them.
Alert destinations configured to receive System Errors
will receive them even if the destination is not configured to receive alerts with a CRITICAL
severity.
To ensure these alerts are sent to a custom Alert Destination, follow the steps below:
Log in to your Panther Console.
On the left sidebar navigation, click Configure > Alert Destinations
Choose an existing Alert Destination or add a new Alert Destination.
On the configuration page for the Alert Destination, add
System Errors
to the Alert Types section:
Configuring log drop-off alarms for log sources
Panther allows you to set up event threshold alarms for individual log sources, which will trigger an alert if data is not received over a specific time interval.
For example, if you configure the threshold to 15 minutes, then you will receive an alert if no events are processed in 15 minutes.
This can be useful for log sources that have been incorrectly linked to Panther or are experiencing issues outside of Panther.
Note: The alert is only sent one time; there is no re-notification for event threshold.
You can add an alarm to a new or an existing log source:
Setting up an alarm for a new log source
In the left-hand navigation bar of your Panther Console, click Configure > Log Sources.
In the upper-right corner, click Create New.
Complete each step of the onboarding workflow.
See Data Sources and Transports for specific setup instructions by source.
On the success page at the end of the onboarding workflow, the Trigger an alert when no events are processed defaults to YES. Leave this enabled.
Enter your desired time period by filling in the Number and Period fields next to How long should Panther wait before it sends you an alert that no events have been processed?.
Types of System Errors
Log Source Health notifications
Panther performs health checks on log sources to ensure that Panther is correctly linked to the source, has the right credentials, and is receiving data from the source consistently.
Log drop-off alerts
Panther allows you to set up event threshold alarms for individual log sources, which will trigger an alert if data is not received over a specific time interval. For instructions on enabling these alerts, see the section above: Configuring log drop-off alarms for log sources.
It is not possible to set up a log drop-off alarm for Panther audit logs, when enabled as a log source.
Log Classification alerts
Log classification alerts generate when logs hit a parsing error and fail to classify when sent to Panther. When this happens, the following actions take place by default:
Logs that failed to classify are sent to the data lake and are searchable in a table called
classification_failures
in thepanther_monitor
database.An alert is generated immediately after the first log fails to classify. The alert will display all log lines that are failing to classify.
An alert's details page in the Panther Console highlights the log lines that fail to parse correctly, to help you determine which lines in the log type's respective schemas need to be corrected or added.
The alert includes a link to the respective log source's Log Source Ops page where you can view the rate at which events are failing to classify within the Health tab.
Remediate Classification Failures
After a source has received classification errors for a set of events, you will need to identify which schema of your source has failed and for what reason. You can find this information either on the Health tab of the Log Source Operation page or directly from the Data Explorer in a table called classification_failures
in the panther_monitor
database.
Common causes for Classification Failures include:
A field is tagged as
required
didn't exist on some of the incoming dataA field is tagged as
int
but we receivedstring
A timestamp field has the wrong format definition
After you identify the reason and the schema where those failing events should belong, you should update the failing field(s) properly. The schema changes should be reflected in your sources automatically.
As a last step, mark the alarm on that source as "Resolved" in the Log Source Operations page.
S3 GetObject Error Notifications
S3 GetObject error alerts generate when Panther fails to fetch S3 objects. When this happens, the following actions take place by default:
Panther stores the S3 objects in the data lake which can be queried through the Data Explorer in a table titled
panther_monitor.data_audit
.An alert is generated if Panther fails to fetch any S3 object in the last 24 hours. The alert displays the specific S3 objects that are failing.
Alert Delivery Failure
Alert Delivery Failure alerts are generated when Panther fails to deliver an alert to a destination.
If the initial attempt to deliver an alert fails, Panther automatically attempts to re-deliver it. After breaching a certain threshold of alert delivery failures, a system health alert is generated and sent to any alert destinations configured to receive System Error
alerts.
Cloud Security Scanning Failure
Cloud Security Scanning Failure alerts are generated when Panther fails to scan a cloud resource because of an "access denied" error.
This occurs when permissions are not configured properly to allow scanning to occur. This is most commonly caused by one of the following scenarios:
Our scanning role (
PantherAuditRole
) is not configured with sufficient permissions.This is an extremely rare case as the permissions of this role rarely change. This can be resolved by updating the
PantherAuditRole
to the latest version.
An AWS organizations Service Control Policy (SCP) is preventing our scanning role from carrying out scans.
Commonly this occurs with SCP's with restrictions for certain regions or services. This can be resolved by either modifying the SCP to add an exception for our scanning role, or by modifying the Cloud Security integration to exclude certain regions or resource types.
An AWS resource base policy is preventing our scanning role from carrying out scans.
In AWS, permissions are bidirectional. The
PantherAuditRole
may be granted permission to access a resource, but the resource itself may not grant permission to be accessed by our role. This can be resolved by either modifying the resource based policy to add an exception for our scanning role, or by modifying the Cloud Security integration to exclude certain resources or resource types.
The alert will indicate which resource scanning failed on, and the AWS error that caused the scanning to fail:
You can use this information to pinpoint the exact permissions issue. In the example above, we can see no resource-based policy allows the kms:ListResourcetags action
. This indicates to us that the issue is related to a resource-based policy.
Last updated