# Data Models

## Overview

Data Models provide a way to configure a set of unified fields across all log types.

Suppose you want to check for a particular source ip address in all events that log network traffic. These LogTypes might not only span different categories (DNS, Zeek, Apache, etc.), but also different vendors. Without a common logging standard, each of these LogTypes may represent the source ip by a different name, such as `ipAddress`, `srcIP`, or `ipaddr`. The more LogTypes you want to monitor, the more complex and cumbersome this simple check becomes:

```python
(event.get('ipAddress') == '127.0.0.1' or 
event.get('srcIP') == '127.0.0.1' or 
event.get('ipaddr') == '127.0.0.1')
```

If instead we define a Data Model for each of these LogTypes, we can translate the unified data model field name to the LogType field name and our logic simplifies to:

```python
event.udm('source_ip') == '127.0.0.1'
```

### Built-in Data Models

By default, Panther comes with built-in data models for several log types, such as `AWS.S3ServerAccess`, `AWS.VPCFlow`, and `Okta.SystemLog`. All currently supported data models can be found [here](https://github.com/panther-labs/panther-analysis/tree/master/data_models).

## How to add Data Models

New Data Models are added in the Panther Console or via the [Panther Analysis Tool](https://docs.panther.com/~/changes/15ann7vKLltCCAGHtdQr/panther-developer-workflows/ci-cd/deployment-workflows/pat). Each log type can only have one enabled data model specified. If you want to change or update an existing data model, `disable` the existing one, and create a new, enabled one.

{% tabs %}
{% tab title="Panther Console" %}
To create a new Data Model in the Panther Console:

1. Log in to your Panther Console and navigate to **Build > Data Models**. \
   .![The list of Data Models in the Panther Console is displayed](https://4011785613-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgdiSWdyJcXPahGi9Rs-2910905616%2Fuploads%2F3kEKCfFeuXl65oaXhAMx%2FScreen%20Shot%202022-08-02%20at%2012.01.19%20PM.png?alt=media\&token=eb76cf98-a41d-45f4-a4e8-0b4ff7f74d23)
2. In the upper right corner, click **Create New**.
3. Fill in the fields under Settings and Data Model Mappings.\
   ![In the Panther Console, the New Data Model screen is displayed. It contains fields for Display Name, ID, and Log Type. Under "Data Model Mappings" there are fields are Name, Field Path, and Field Method.](https://4011785613-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgdiSWdyJcXPahGi9Rs-2910905616%2Fuploads%2FGU6I5QRP29hXYJyln3RG%2FScreen%20Shot%202022-08-02%20at%2012.03.34%20PM.png?alt=media\&token=f70e7eb2-ed41-4f07-8524-55f8f5814be3)
4. In the upper right corner, click **Save**.

You can now access this Data Model in your rule logic with the `event.udm()` method.
{% endtab %}

{% tab title="Panther Analysis Tool" %}

### How to create a Data Model using PAT

####

#### Folder setup

All files related to your custom Data Models must be stored in a folder with a name containing `data_models` (this could be a top-level `data_models` directory, or sub-directories with names matching `*data_models*`).

1. Create your Data Model specification file (e.g. `data_models/aws_cloudtrail_datamodel.yml`):

   ```yaml
   AnalysisType: datamodel
   LogTypes: 
     - AWS.CloudTrail
   DataModelID: AWS.CloudTrail
   Filename: aws_cloudtrail_data_model.py
   Enabled: true
   Mappings:
     - Name: actor_user
       Path: $.userIdentity.userName
     - Name: event_type
       Method: get_event_type
     - Name: source_ip
       Path: sourceIPAddress
     - Name: user_agent
       Path: userAgent
   ```

2. If any `Method`s are defined, create an associated Python file (`data_models/aws_cloudtrail_datamodel.py`), as shown below.\
   **Note**: The Filename specification field is required if a Method is defined in a mapping. If Method is not used in any Mappings, no Python file is required.<br>

   ```python
   from panther_base_helpers import deep_get
   def get_event_type(event):
       if event.get('eventName') == 'ConsoleLogin' and deep_get(event, 'userIdentity', 'type') == 'IAMUser':
           if event.get('responseElements', {}).get('ConsoleLogin') == 'Failure':
               return "failed_login"
           if event.get('responseElements', {}).get('ConsoleLogin') == 'Success':
               return "successful_login"
       return None
   ```

3. Use this Data Model in a rule:
   1. Add the LogType under the Rule specification `LogType` field.&#x20;
   2. Add the LogType to all the Rule's `Test` cases, in the `p_log_type` field.
   3. Leverage the `event.udm()` method in the Rule's python logic:

```
AnalysisType: rule
DedupPeriodMinutes: 60
DisplayName: DataModel Example Rule
Enabled: true
Filename: my_new_rule.py
RuleID: DataModel.Example.Rule
Severity: High
LogTypes:
  # Add LogTypes where this rule is applicable
  # and a Data Model exists for that LogType
  - AWS.CloudTrail
Tags:
  - Tags
Description: >
  This rule exists to validate the CLI workflows of the Panther CLI
Runbook: >
  First, find out who wrote this the spec format, then notify them with feedback.
Tests:
  - Name: test rule
    ExpectedResult: true
    # Add the LogType to the test specification in the 'p_log_type' field
    Log: {
      "p_log_type": "AWS.CloudTrail"
    }
```

```python
def rule(event):
    # filter events on unified data model field
    return event.udm('event_type') == 'failed_login'


def title(event):
    # use unified data model field in title
    return '{}: User [{}] from IP [{}] has exceeded the failed logins threshold'.format(
        event.get('p_log_type'), event.udm('actor_user'),
        event.udm('source_ip'))
```

See [Data Model Specification Reference](#datamodel-specification-reference) below for a complete list of required and optional fields.
{% endtab %}
{% endtabs %}

## Using Data Models&#x20;

### Using Data Models in rules

Use your Data Model in a rule via any of the following methods:

* Add the LogType under the Rule specification `LogType` field&#x20;
* Add the LogType to all the Rule's `Test` cases, in the `p_log_type` field
* Leverage the `event.udm()` method in the Rule's python logic:

  ```python
  def rule(event):    
      # filter events on unified data model field
      return event.udm('event_type') == 'failed_login'


  def title(event):
      # use unified data model field in title
      return '{}: User [{}] from IP [{}] has exceeded the failed logins threshold'.format(
          event.get('p_log_type'), event.udm('actor_user'),
          event.udm('source_ip'))
  ```

See [examples of Data Models in Panther's Github repository](https://github.com/panther-labs/panther-analysis/tree/master/data_models).

### Leveraging existing Data Models

Rules can be updated to use unified data model field names by leveraging the `event.udm()` method. For example:

```python
def rule(event):
  return event.udm('source_ip') in DMZ_NETWORK
def title(event):
  return 'Suspicious request originating from ip: ' + event.udm('source_ip')
```

Update the rule specification to include the pertinent LogTypes:

```
AnalysisType: rule 
Filename: example_rule.py
Description: A rule that uses datamodels
Severity: High
RuleID: Example.Rule
Enabled: true
LogTypes:
  - Logtype.With.DataModel
  - Another.Logtype.With.DataModel
```

### Using Data Models with Enrichment

Panther provides a built-in method on the event object called `event.udm_path`. It returns the original path that was used for the Data Model.

#### AWS.VPCFlow logs example

Using `event.udm_path('destination_ip')` will return `'dstAddr'`, since this is the path defined in the Data Model for that log type.\
\
The following example uses `event.udm_path`:

```python
from panther_base_helpers import deep_get

def rule(event):
    return True

def title(event):
    return event.udm_path('destination_ip')

def alert_context(event):
    enriched_data = deep_get(event, 'p_enrichment', 'lookup_table_name', event.udm_path('destination_ip'))
    return {'enriched_data':enriched_data}
```

This test case was used:

```json
  {   
    "p_log_type": "AWS.VPCFlow",
    "dstAddr": "1.1.1.1",
    "p_enrichment": {
       "lookup_table_name": {
         "dstAddr": {
            "datakey": "datavalue"
       }
      }
     }
    }
```

The test case returns an alert that includes Alert Context with the `datakey` and `datavalue`:

<figure><img src="https://4011785613-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgdiSWdyJcXPahGi9Rs-2910905616%2Fuploads%2FoWEqQLbegzfPNgWlQNeU%2FMock%20Testing.png?alt=media&#x26;token=69ef559e-96ae-4429-858e-5a8658be6bff" alt="The screen shot shows a passing test in the Panther Console including the alert context with the data key and data value"><figcaption></figcaption></figure>

## DataModel Specification Reference

A complete list of DataModel specification fields:

<table data-header-hidden><thead><tr><th width="184">Field Name</th><th width="150">Required</th><th width="189">Description</th><th>Expected Value</th></tr></thead><tbody><tr><td>Field Name</td><td>Required</td><td>Description</td><td>Expected Value</td></tr><tr><td><code>AnalysisType</code></td><td>Yes</td><td>Indicates whether this specification is defining a rule, policy, data model, or global</td><td><code>datamodel</code></td></tr><tr><td><code>DataModelID</code></td><td>Yes</td><td>The unique identifier of the data model</td><td>String</td></tr><tr><td><code>DisplayName</code></td><td>No</td><td>What name to display in the UI and alerts. The <code>DataModelID</code> will be displayed if this field is not set.</td><td>String</td></tr><tr><td><code>Enabled</code></td><td>Yes</td><td>Whether this data model is enabled</td><td>Boolean</td></tr><tr><td><code>FileName</code></td><td>No</td><td>The path (with file extension) to the python DataModel body</td><td>String</td></tr><tr><td><code>LogTypes</code></td><td>Yes</td><td>What log type this policy will apply to</td><td>Singleton List of strings<br>Note: Although <code>LogTypes</code> accepts a list of strings, you can only specify 1 log type per Data Model. </td></tr><tr><td><code>Mappings</code></td><td>Yes</td><td>Mapping from source field name or method to unified data model field name</td><td>List of Maps</td></tr></tbody></table>

### DataModel Mappings

Mappings translate LogType fields to unified data model fields. Each mapping entry must define a unified data model field name (`Name`) and either a Path (`Path`) or a method (`Method`). The `Path` can be a simple field name or a JSON Path. The method must be implemented in the file listed in the data model specification `Filename` field.

```
Mappings:
  - Name: source_ip
    Path: srcIp
  - Name: user
    Path: $.events[*].parameters[?(@.name == 'USER_EMAIL')].value
  - Name: event_type
    Method: get_event_type
```

{% hint style="info" %}
For more information about jsonpath-ng, see [pypi.org's documentation here](https://pypi.org/project/jsonpath-ng/).&#x20;
{% endhint %}

## Unified Data Model Field Reference

The initial set of supported unified data model fields are described below.

<table data-header-hidden><thead><tr><th width="352">Unified Data Model Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Unified Data Model Field Name</td><td>Description</td></tr><tr><td><code>actor_user</code></td><td>ID or username of the user whose action triggered the event.</td></tr><tr><td><code>assigned_admin_role</code></td><td>Admin role ID or name assigned to a user in the event.</td></tr><tr><td><code>destination_ip</code></td><td>Destination IP for the traffic</td></tr><tr><td><code>destination_port</code></td><td>Destination port for the traffic</td></tr><tr><td><code>event_type</code></td><td>Custom description for the type of event. Out of the box support for event types can be found in the global, <code>panther_event_type_helpers.py</code>.</td></tr><tr><td><code>http_status</code></td><td>Numeric http status code for the traffic</td></tr><tr><td><code>source_ip</code></td><td>Source IP for the traffic</td></tr><tr><td><code>source_port</code></td><td>Source port for the traffic</td></tr><tr><td><code>user_agent</code></td><td>User agent associated with the client in the event.</td></tr><tr><td><code>user</code></td><td>ID or username of the user that was acted upon to trigger the event.</td></tr></tbody></table>
