# Data Models

## Overview

Use Data Models to configure a set of unified fields across all log types, by creating [mappings](#datamodel-mappings) between event fields for various log types and unified Data Model names. You can leverage [Panther-managed Data Models](#panther-managed-data-models), and [create custom ones](#how-to-create-custom-data-models).

### Data Models use case

Suppose you have a detection that checks for a particular source IP address in network traffic logs, and you'd like to use it for multiple log types. These log types might not only span different categories (e.g., DNS, Zeek, Apache), but also different vendors. Without a common logging standard, each of these log types may represent the source IP using a different field name, such as `ipAddress`, `srcIP`, or `ipaddr`. The more log types you'd like to monitor, the more complex and cumbersome the logic of this check becomes. For example, it might look something like:

```python
(event.get('ipAddress') == '127.0.0.1' or 
event.get('srcIP') == '127.0.0.1' or 
event.get('ipaddr') == '127.0.0.1')
```

If we instead define a Data Model for each of these log types, we can translate the event's field name to the Data Model name, meaning the detection can simply reference the Data Model version. The above logic then simplifies to:

```python
event.udm('source_ip') == '127.0.0.1'
```

## Panther-managed Data Models

By default, Panther comes with built-in Data Models for several log types, such as `AWS.S3ServerAccess`, `AWS.VPCFlow`, and `Okta.SystemLog`. All currently supported data models can be found in the [panther-analysis repository, here](https://github.com/panther-labs/panther-analysis/tree/main/data_models).

The names of the supported Data Model mappings are listed in the [Panther-managed Data Model mapping names table, below](#panther-managed-data-model-mapping-names).

## How to create custom Data Models

Custom Data Models can be created in a few ways: in the Panther Console, using the [Panther Analysis Tool (PAT)](https://docs.panther.com/panther-developer-workflows/detections-repo/pat), or in the [Panther API](https://docs.panther.com/panther-developer-workflows/api). See the tabs below for creation instructions for each method.

Your custom Data Model mappings can use the [names referenced in Panther-managed Data Models](#panther-managed-data-model-mapping-names), or your own custom names. Each mapping `Name` can map to an event field (with `Path` or **Field Path**) or a method you define (with `Field Method` or **Method**). If you map to a method, you must define the method either in a separate Python file (if working in the CLI workflow), which is referenced in the YAML file using `Filename`, or in the **Python Module** field in the Console.

Each log type can only have one enabled Data Model specified (however, a single Data Model can contain multiple mappings). If you want to change or update an existing Data Model, disable the existing one, then create a new, enabled one.

{% tabs %}
{% tab title="Panther Console" %}
To create a new Data Model in the Panther Console:

1. In the left-hand navigation bar of your Panther Console, click **Detections**.
2. Click the **Data Models** ta&#x62;**.**\
   .![The list of Data Models in the Panther Console is displayed](https://4011785613-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgdiSWdyJcXPahGi9Rs-2910905616%2Fuploads%2Fgit-blob-2e8777db5a86fff6ae5642327f9cc6baffe852e3%2FScreen%20Shot%202022-08-02%20at%2012.01.19%20PM.png?alt=media)
3. In the upper-right corner, click **Create New**.
4. Under **Settings**, fill in the form fields.
   * **Display Name**: Enter a user-friendly display name for this Data Model.
   * **ID**: Enter a unique ID for this Data Model.
   * **Log Type**: Select a log type this Data Model should apply to. Only one log type per Data Model is permitted.
   * **Enabled**: Select wether you'd like this Data Model enabled or disabled.\
     ![In the Panther Console, the New Data Model screen is displayed. It contains fields for Display Name, ID, and Log Type. Under "Data Model Mappings" there are fields are Name, Field Path, and Field Method.](https://4011785613-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgdiSWdyJcXPahGi9Rs-2910905616%2Fuploads%2Fgit-blob-f442523efcb66b97dcf5a2c99fd2639e573cecbf%2FScreen%20Shot%202022-08-02%20at%2012.03.34%20PM.png?alt=media)
5. Under **Data Model Mappings**, create **Name**/**Field Path** or **Name**/**Field Method** pairs.
6. If you used the **Field Method** field, define the method(s) in the **Python Module&#x20;*****(optional)*** section.
7. In the upper right corner, click **Save**.
   * You can now reference this Data Model in your rules. Learn more in [Referencing Data Models in a rule](#referencing-data-models-in-a-rule).
     {% endtab %}

{% tab title="CLI (PAT)" %}
**How to create a Data Model in the CLI workflow**

**Folder setup**

All files related to your custom Data Models must be stored in a folder with a name containing `data_models` (this could be a top-level `data_models` directory, or sub-directories with names matching `*data_models*`).

**File setup**

1. Create your Data Model specification YAML file (e.g., `data_models/aws_cloudtrail_datamodel.yml`):

   ```yaml
   AnalysisType: datamodel
   LogTypes: 
     - AWS.CloudTrail
   DataModelID: AWS.CloudTrail
   Filename: aws_cloudtrail_data_model.py
   Enabled: true
   Mappings:
     - Name: actor_user
       Path: $.userIdentity.userName
     - Name: event_type
       Method: get_event_type
     - Name: source_ip
       Path: sourceIPAddress
     - Name: user_agent
       Path: userAgent
   ```

   * Set `AnalysisType` to `datamodel`.
   * For `LogTypes`, provide the name of one of your log types. Despite this field taking a list, only one log type per Data Model is supported.
   * Provide a value for the `DataModelID` field.
   * Within `Mappings`, create `Name` / `Path` or `Name` / `Method` pairs.
     * Learn more about `Mappings` syntax [below, in DataModel `Mappings`](#datamodel-mappings).
   * See [Data Model Specification Reference](#datamodel-specification-reference) below for a complete list of required and optional fields.
2. If you included one or more `Method` fields within `Mappings`, create an associated Python file (`data_models/aws_cloudtrail_datamodel.py`), and define any referenced methods.
   * In this case, you must also add the `Filename` field to the Data Model YAML file. If no `Method` fields are present, no Python file/`Filename` field is required.

     ```python
     from panther_base_helpers import deep_get
     def get_event_type(event):
         if event.get('eventName') == 'ConsoleLogin' and deep_get(event, 'userIdentity', 'type') == 'IAMUser':
             if event.get('responseElements', {}).get('ConsoleLogin') == 'Failure':
                 return "failed_login"
             if event.get('responseElements', {}).get('ConsoleLogin') == 'Success':
                 return "successful_login"
         return None
     ```
3. Upload your Data Model to your Panther instance using [the PAT `upload` command](https://docs.panther.com/panther-developer-workflows/detections-repo/pat/pat-commands#upload-uploading-packages-to-panther-directly).
   * You can now reference this Data Model in your rules. Learn more in [Referencing Data Models in a rule](#referencing-data-models-in-a-rule).
     {% endtab %}

{% tab title="API" %}
**How to create a Data Model using the Panther API**

* See the `POST` operation on [Data Models](https://docs.panther.com/panther-developer-workflows/api/rest/data-models).
  {% endtab %}
  {% endtabs %}

### Evaluating whether a field exists in `Path`

Within a `Path` value, you can include logic that checks whether a certain event field exists. If it does, the mapping will be applied; if it doesn't, the mapping doesn't take effect.

For example, take the following Path values from the [Panther-managed `gsuite_data_model.yml`](https://github.com/panther-labs/panther-analysis/blob/main/data_models/gsuite_data_model.yml):

```yaml
  - Name: assigned_admin_role
    Path: $.events[*].parameters[?(@.name == 'ROLE_NAME')].value
```

## Using Data Models

### Referencing Data Models in a rule

To reference a Data Model field in a rule:

1. In a rule's YAML file, ensure `LogTypes` field contains the log type of the Data Model you'd like applied:

   ```yaml
   AnalysisType: rule
   DedupPeriodMinutes: 60
   DisplayName: DataModel Example Rule
   Enabled: true
   Filename: my_new_rule.py
   RuleID: DataModel.Example.Rule
   Severity: High
   LogTypes:
     # Add LogTypes where this rule is applicable
     # and a Data Model exists for that LogType
     - AWS.CloudTrail
   Tags:
     - Tags
   Description: >
     This rule exists to validate the CLI workflows of the Panther CLI
   Runbook: >
     First, find out who wrote this the spec format, then notify them with feedback.
   Tests:
     - Name: test rule
       ExpectedResult: true
       # Add the LogType to the test specification in the 'p_log_type' field
       Log: {
         "p_log_type": "AWS.CloudTrail"
       }
   ```
2. Add the log type to all the Rule's `Test` cases, in the `p_log_type` field.
3. Use [the `event.udm()` method](https://docs.panther.com/detections/rules/python/..#udm) in the rule's Python logic:

   ```python
   def rule(event):    
       # filter events on unified data model field
       return event.udm('event_type') == 'failed_login'


   def title(event):
       # use unified data model field in title
       return '{}: User [{}] from IP [{}] has exceeded the failed logins threshold'.format(
           event.get('p_log_type'), event.udm('actor_user'),
           event.udm('source_ip'))
   ```

### Using Data Models with Enrichment

Panther provides a built-in method on the event object called `event.udm_path()`. It returns the original path that was used for the Data Model.

#### AWS.VPCFlow logs example

In the example below, calling `event.udm_path('destination_ip')` will return `'dstAddr'`, since this is the path defined in the Data Model for that log type.

```python
from panther_base_helpers import deep_get

def rule(event):
    return True

def title(event):
    return event.udm_path('destination_ip')

def alert_context(event):
    enriched_data = deep_get(event, 'p_enrichment', 'lookup_table_name', event.udm_path('destination_ip'))
    return {'enriched_data':enriched_data}
```

To test this, we can use this test case:

```json
{   
  "p_log_type": "AWS.VPCFlow",
   "dstAddr": "1.1.1.1",
   "p_enrichment": {
      "lookup_table_name": {
        "dstAddr": {
          "datakey": "datavalue" }}}}
```

The test case returns the following alert, with Alert Context containing the value of `dstAddr` (or `{"datakey": "datavalue"}`) as the value of `enriched_data`.

<figure><img src="https://4011785613-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgdiSWdyJcXPahGi9Rs-2910905616%2Fuploads%2Fgit-blob-f575f455607be37306b53eacea669faaa322f7bd%2FMock%20Testing.png?alt=media" alt="The screen shot shows a passing test in the Panther Console including the alert context with the data key and data value" width="375"><figcaption></figcaption></figure>

### Testing Data Models

To test a Data Model, write [unit tests](https://docs.panther.com/detections/testing) for a detection that references a Data Model mapping using [`event.udm()`](https://docs.panther.com/detections/rules/python/..#udm) in its `rule()` logic.

## DataModel specification reference

A complete list of DataModel specification fields:

<table data-header-hidden><thead><tr><th width="172">Field Name</th><th width="100.18672199170126">Required</th><th width="271">Description</th><th>Expected Value</th></tr></thead><tbody><tr><td>Field name</td><td>Required</td><td>Description</td><td>Expected value</td></tr><tr><td><code>AnalysisType</code></td><td>Yes</td><td>Indicates whether this specification is defining a rule, policy, data model, or global</td><td><code>datamodel</code></td></tr><tr><td><code>DataModelID</code></td><td>Yes</td><td>The unique identifier of the data model</td><td>String</td></tr><tr><td><code>DisplayName</code></td><td>No</td><td>What name to display in the UI and alerts. The <code>DataModelID</code> will be displayed if this field is not set.</td><td>String</td></tr><tr><td><code>Enabled</code></td><td>Yes</td><td>Whether this Data Model is enabled</td><td>Boolean</td></tr><tr><td><code>FileName</code></td><td>No</td><td>The path (with file extension) to the Python Data Model body</td><td>String</td></tr><tr><td><code>LogTypes</code></td><td>Yes</td><td>Which log type this Data Model will apply to</td><td>Singleton List of strings<br>Note: Although <code>LogTypes</code> accepts a list of strings, you can only specify one log type per Data Model</td></tr><tr><td><code>Mappings</code></td><td>Yes</td><td>Mapping from source field name or method to unified data model field name</td><td><a href="#datamodel-mappings">List of <code>Mappings</code></a></td></tr></tbody></table>

### DataModel `Mappings`

Mappings translate `LogType` fields to unified Data Model fields. Each `Mappings` entry must define:

* `Name`: How you will reference this data model in detections.
* One of:
  * `Path`: The path to the field in the original log type's schema. This value can be a simple field name or a JSON path. For more information about jsonpath-ng, see [pypi.org's documentation here](https://pypi.org/project/jsonpath-ng/).
  * `Method`: The name of the method. The method must be defined in the file listed in the data model specification `Filename` field.

Example:

```yaml
Mappings:
  - Name: source_ip
    Path: srcIp
  - Name: user
    Path: $.events[*].parameters[?(@.name == 'USER_EMAIL')].value
  - Name: event_type
    Method: get_event_type
```

The `Path` value of the `user` data model has logic that checks if the `USER_EMAIL` event field exists. Learn more in [Evaluating whether a field exists in `Path`](#evaluating-whether-a-field-exists-in-path).

## Panther-managed Data Model mapping names

The [Panther-managed Data Model](https://github.com/panther-labs/panther-analysis/tree/main/data_models) mapping names are described below. When [creating your own Data Model mappings](#how-to-create-custom-data-models), you may use the names below, in addition to custom ones.

<table data-header-hidden><thead><tr><th width="252">Unified Data Model Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Data Model mapping name</td><td>Description</td></tr><tr><td><code>actor_user</code></td><td>ID or username of the user whose action triggered the event.</td></tr><tr><td><code>assigned_admin_role</code></td><td>Admin role ID or name assigned to a user in the event.</td></tr><tr><td><code>destination_ip</code></td><td>Destination IP for the traffic</td></tr><tr><td><code>destination_port</code></td><td>Destination port for the traffic</td></tr><tr><td><code>event_type</code></td><td>Custom description for the type of event. Out of the box support for event types can be found in the global, <code>panther_event_type_helpers.py</code>.</td></tr><tr><td><code>http_status</code></td><td>Numeric http status code for the traffic</td></tr><tr><td><code>source_ip</code></td><td>Source IP for the traffic</td></tr><tr><td><code>source_port</code></td><td>Source port for the traffic</td></tr><tr><td><code>user_agent</code></td><td>User agent associated with the client in the event.</td></tr><tr><td><code>user</code></td><td>ID or username of the user that was acted upon to trigger the event.</td></tr></tbody></table>
