Transformations

Mutate data structure upon ingest

Overview

Transformations are functions you can use in custom log source schemas to modify the shape of your data upon ingest into Panther. The data will then be stored in the new format.

Transformations help align stored data to the needs of detection and query logic, removing the need for ad-hoc data manipulation and expediting detection writing and search.

The following transformations are available:

You can further manipulate your data on ingest using the Script Log Parser.

Order of transformation execution

There is a specific order in which transformations are performed, which ensures that the transformations are applied one after another in a predictable manner. The order of execution is the sequence provided in the transformation list in the Overview, above.

Follow the defined order to accurately transform data. Each transformation in the sequence operates on the data in the state it was left after the previous transformation. Knowing this order maintains consistency and avoids unexpected results.

Combining transformations

Individual transformations can be combined in pairs or sequences to achieve more complex data transformations. This allows for greater flexibility and customization to meet specific data requirements and facilitate efficient detection creation and search operations.

Suppose there is a field that contains personal identification numbers (PINs). For security purposes, you want to rename the field to something less revealing while also applying a mask to redact the PINs.

To achieve this, you can use the rename transformation to change the field's name to something abstract. For instance, you could rename the field to userId.

Next, apply the mask transformation to the userId field to replace the digits of the PIN with a predefined number of asterisks. This way, the PIN remains hidden, ensuring data privacy.

You can define a field schema with both a rename and a mask directive like this:

- name: userId
  type: string
  rename:
    from: PIN
  mask:
    type: redact
    to: "*****"

You will achieve to transform a payload like this:

{
  "PIN": "1234"
}

To this:

{
  "userId": "*****"
}

rename

The rename transformation changes the name of a field. This can be useful if you want to standardize field names across data sources, improve the clarity of your data's structure, or adjust field names containing invalid characters or reserved keywords.

By defining a field schema with a rename directive, such as:

- name: user
  type: string
  rename: 
    from: "@user"
- name: role
  type: object
  fields:
   - name: level
     type: string
     rename:
       from: type

You will transform a payload like this:

{
  "@user": "john"
  "role": {
    "type": "admin"
  }
}

To this:

{
  "user": "john"
  "role": {
    "level": "admin"
  }
}

copy

The copy transformation copies the value of a nested field into another top-level field. This can be useful if you'd like to flatten your data's JSON structure. If desired, you can then mark your newly defined field as an indicator.

By defining a field schema with a copy directive, such as:

- name: message
  type: string
  copy:
    from: attributes.message
- name: attributes
  type: json

You will transform a payload like this:

{
  "attributes": {
      "message": "hello there", 
      "user": "someone"
  }
}

To this:

{
  "message": "hello there",
  "attributes": {
      "message": "hello there", 
      "user": "someone"
  }
}

concat

The concat transformation allows you to concatenate multiple fields' values into the value of a new field. The resulting combined field can be used, for example, as a key for enrichment.

Fields whose type is timestamp cannot be used in concatenation operations.

To use concat, declare a string field to store the result of the concatenation. Within concat, define the paths, and optionally a separator. Within paths, you must use absolute paths to specify the existing schema fields you'd like to combine. The order of these fields determines the concatenation order. If separator is not defined, the default separator is an empty string ("").

By defining a field schema with a concat directive, such as:

- name: ip
  type: string
- name: ports
  type: object
  fields:
   - name: https
     type: int
- name: socket
  type: string
  concat:
   separator: ":"
   paths: 
    - ip
    - ports.https

You will transform a payload like this:

{
  "ip": "192.168.0.1"
  "ports": {
    "https": 443
  }
}

To this:

{
  "ip": "192.168.0.1"
  "ports": {
    "https": 443
  },
  "socket": "192.168.0.1:443"
}

split

The split transformation allows you to extract a specific value from a string field by splitting it based on a separator. The resulting split fields can be treated as individual schema fields, making it possible to designate them as indicators. Split transformation can also help with data normalization into standardized fields, making it easier to handle unstructured data formats.

Only fields with a type of string can be split into other fields (i.e., the value of split:from: must be a field that contains type: string).

To use split, declare a field of any primitive type (i.e., excluding object, array, and JSON) to store the result. Within the split directive, include the following required fields:

  • from: Provide the absolute path of the field to be divided.

  • separator: Provide the character to split on.

  • index: Provide the position of the value within the resulting array produced by the split.

By defining a field schema with a split directive, such as:

- name: socket
  type: string
- name: ip
  type: string
  split:
   from: socket
   separator: ":"
   index: 0
- name: port
  type: int
  split:
   from: socket
   separator: ":"
   index: 1  

You will transform a payload like this:

{
  "socket": "192.168.0.1:443"
}

To this:

{
  "socket": "192.168.0.1:443",
  "ip": "192.168.0.1",
  "port": 443
}

You can also use split to split array elements. For example, using the following schema:

- name: traffic
  type: array
  element: 
    type: object
    fields:
      - name: socket
        type: string
      - name: ip
        type: string
        indicators: [ip]
        split: 
          from: traffic.socket
          separator: ":"
          index: 0
      - name: port
        type: int
        split: 
          from: traffic.socket
          separator: ":"
          index: 1
       

You will transform a payload like this:

{
  "traffic": [
   {
     "socket": "192.168.0.1:443"
   },
   {
     "socket": "192.168.0.2:80"
   } 
  ]
}

To this:

{
  "traffic": [
   {
     "socket": "192.168.0.1:443",
     "ip": "192.168.0.1",
     "port": 443
   },
   {
     "socket": "192.168.0.2:80",
     "ip": "192.168.0.2",
     "port": 80
   } 
  ]
}

mask

The mask transformation enables you to conceal sensitive information in your logs. Masking is useful if you need to protect the confidentiality of certain data.

There are two masking techniques:

  • Obfuscation (also known as hashing): This technique hashes data, using an optional salt value. With this technique, the value keeps its referential integrity.

  • Redaction: This technique replaces sensitive values with REDACTED, or some other string you provide. With this technique, the value loses its referential integrity.

Note that masking a certain field means you cannot later use Panther's search tools to query for its original value, but you can search for a hashed value.

Obfuscation (hashing)

Hashing incoming data means you can enhance its security while still retaining its usability in the future. To strengthen the protection hashing provides, you can include a salt.

To use obfuscation, on the target field in your schema, include mask. Under mask, include type, and optionally salt.

The value of type is the hashing algorithm you want to use. Supported values include:

  • sha256

  • md5

  • sha1

  • sha512

The value of the optional salt key is a string of your choice. This value is appended to the field's value before it is hashed.

When using mask, the value of the target field's type must always be set as string. The actual input data can be of any type, but type: string is required because, after the value has been masked, it will be stored as a string in the data lake.

By defining a field schema with a mask directive such as:

- name: username
  type: string # Must be set as string (though all data types allowed)
  mask:
    type: sha256 
    salt: random_salt # Optional

You will transform a payload like this:

{
  "username": "john"
}

To this:

{
  "username": "98b4ceb956e9ed4539b0721add25cab0bacce4307cf3140c4430c1513476a3e4"
}

Redaction

Redacting incoming data means replacing it with a predefined value. This technique is useful if you'd like to ensure the sensitive information is not accessible or recoverable.

To use redaction, on the target field in your schema, include mask. Under mask, include type: redact, and optionally to.

The optional to key takes a string value that will replace the actual event value. If to is not included, its default, REDACTED, is used.

When using mask, the value of the target field's type must always be set as string. The actual input data can be of any type, but type: string is required because, after the value has been masked, it will be stored as a string in the data lake.

By defining a field schema with a mask directive such as:

- name: username
  type: string # Must be set as string (though all data types allowed)
  mask:
    type: redact 
    to: "XXXX" # Optional, default: "REDACTED"

You will transform a payload like this:

{
  "username": "john"
}

To this:

{
  "username": "XXXX"
}

isEmbeddedJSON

Sometimes JSON values are delivered embedded in a string.

To have Panther parse the escaped JSON inside the string, use an isEmbeddedJSON: true flag. This flag is valid for values of type object, array and json.

By defining a field schema with a isEmbeddedJSON directive such as:

- name: message
  type: object
  isEmbeddedJSON: true
  fields:
    - name: foo
      type: string

You will transform a payload like this:

{
  "timestamp": "2021-03-24T18:15:23Z",
  "message": "{\"foo\":\"bar\"}"
}

To this:

{
  "timestamp": "2021-03-24T18:15:23Z",
  "message": {
    "foo": "bar"
  }
}

Last updated