System Health Notifications

Panther's System Health notifications alert you if the Panther platform is not functioning correctly

Overview

Panther's System Health notifications alert users when a part of the Panther platform is not functioning correctly. This includes the following:

These types of alerts are classified as System Errors in Panther. System Errors will always have a CRITICAL severity level—and be sent to alert destinations configured to receive System Errors, even if they are not configured to receive alerts with a CRITICAL severity. They are automatically generated, with the exception of log drop-off alarms which you can configure manually per log source.

It's strongly recommended to configure an alert destination to receive the System Error alert type.

System Error alerts are visible in your Panther Console within Alerts & Errors > System Errors.

How to configure System Health Notification alarms

To ensure that you receive alerts for all types of System Health errors:

  • Configure an alert destination that is receiving the System Error alert type.

  • Configure Log Drop-off alarms for log sources that will trigger an alert when data is no longer being received.

    • Note that you do not need to enable alerts for Log Classification errors, Alert Delivery failure, S3 GetObject errors, and Cloud Security Scanning failure.

Configuring an Alert Destination for System Health errors

By default, Panther will send System Errors alerts to the Alerts page in your Panther Console. It is also strongly recommended to configure one of your alert destinations to receive them.

Alert destinations configured to receive System Errors will receive them even if the destination is not configured to receive alerts with a CRITICAL severity.

To ensure these alerts are sent to a custom Alert Destination, follow the steps below:

  1. Log in to your Panther Console.

  2. On the left sidebar navigation, click Configure > Alert Destinations

  3. Choose an existing Alert Destination or add a new Alert Destination.

  4. On the configuration page for the Alert Destination, add System Errors to the Alert Types section:

Configuring log drop-off alarms for log sources

Panther allows you to set up event threshold alarms for individual log sources, which will trigger an alert if data is not received over a specific time interval.

For example, if you configure the threshold to 15 minutes, then you will receive an alert if no events are processed in 15 minutes.

This can be useful for log sources that have been incorrectly linked to Panther or are experiencing issues outside of Panther.

Note: The alert is only sent one time; there is no re-notification for event threshold.

You can add an alarm to a new or an existing log source:

Setting up an alarm for a new log source

  1. In the left-hand navigation bar of your Panther Console, click Configure > Log Sources.

  2. In the upper-right corner, click Create New.

  3. Complete each step of the onboarding workflow.

  4. On the success page at the end of the onboarding workflow, the Trigger an alert when no events are processed defaults to YES. Leave this enabled.

    • Enter your desired time period by filling in the Number and Period fields next to How long should Panther wait before it sends you an alert that no events have been processed?.

Types of System Errors

Log Source Health notifications

Panther performs health checks on log sources to ensure that Panther is correctly linked to the source, has the right credentials, and is receiving data from the source consistently.

Log drop-off alerts

Panther allows you to set up event threshold alarms for individual log sources, which will trigger an alert if data is not received over a specific time interval. For instructions on enabling these alerts, see the section above: Configuring log drop-off alarms for log sources.

It is not possible to set up a log drop-off alarm for Panther audit logs, when enabled as a log source.

Log Classification alerts

Log classification alerts generate when logs hit a parsing error and fail to classify when sent to Panther. When this happens, the following actions take place by default:

  • Logs that failed to classify are sent to the data lake and are searchable in a table called classification_failures in the panther_monitor database.

  • An alert is generated immediately after the first log fails to classify. The alert will display all log lines that are failing to classify.

An alert's details page in the Panther Console highlights the log lines that fail to parse correctly, to help you determine which lines in the log type's respective schemas need to be corrected or added.

The alert includes a link to the respective log source's Log Source Ops page where you can view the rate at which events are failing to classify within the Health tab.

Remediate Classification Failures

After a source has received classification errors for a set of events, you will need to identify which schema of your source has failed and for what reason. You can find this information either on the Health tab of the Log Source Operation page or directly from the Data Explorer in a table called classification_failures in the panther_monitor database.

Common causes for Classification Failures include:

  • A field is tagged as required didn't exist on some of the incoming data

  • A field is tagged as int but we received string

  • A timestamp field has the wrong format definition

After you identify the reason and the schema where those failing events should belong, you should update the failing field(s) properly. The schema changes should be reflected in your sources automatically.

As a last step, mark the alarm on that source as "Resolved" in the Log Source Operations page.

S3 GetObject Error Notifications

S3 GetObject error alerts generate when Panther fails to fetch S3 objects. When this happens, the following actions take place by default:

  • Panther stores the S3 objects in the data lake which can be queried through the Data Explorer in a table titled panther_monitor.data_audit.

  • An alert is generated if Panther fails to fetch any S3 object in the last 24 hours. The alert displays the specific S3 objects that are failing.

Alert Delivery Failure

Alert Delivery Failure alerts are generated when Panther fails to deliver an alert to a destination.

If the initial attempt to deliver an alert fails, Panther automatically attempts to re-deliver it. After breaching a certain threshold of alert delivery failures, a system health alert is generated and sent to any alert destinations configured to receive System Error alerts.

Cloud Security Scanning Failure

Cloud Security Scanning Failure alerts are generated when Panther fails to scan a cloud resource because of an "access denied" error.

This occurs when permissions are not configured properly to allow scanning to occur. This is most commonly caused by one of the following scenarios:

  • Our scanning role (PantherAuditRole) is not configured with sufficient permissions.

    • This is an extremely rare case as the permissions of this role rarely change. This can be resolved by updating the PantherAuditRole to the latest version.

  • An AWS organizations Service Control Policy (SCP) is preventing our scanning role from carrying out scans.

    • Commonly this occurs with SCP's with restrictions for certain regions or services. This can be resolved by either modifying the SCP to add an exception for our scanning role, or by modifying the Cloud Security integration to exclude certain regions or resource types.

  • An AWS resource base policy is preventing our scanning role from carrying out scans.

    • In AWS, permissions are bidirectional. The PantherAuditRole may be granted permission to access a resource, but the resource itself may not grant permission to be accessed by our role. This can be resolved by either modifying the resource based policy to add an exception for our scanning role, or by modifying the Cloud Security integration to exclude certain resources or resource types.

The alert will indicate which resource scanning failed on, and the AWS error that caused the scanning to fail:

You can use this information to pinpoint the exact permissions issue. In the example above, we can see no resource-based policy allows the kms:ListResourcetags action. This indicates to us that the issue is related to a resource-based policy.

Last updated

#1924: [don't merge until ~Oct] Notion Logs (Beta)

Change request updated