Transformations
Mutate data structure upon ingest
Overview
Transformations are functions you can use in custom log source schemas to modify the shape of your data upon ingest into Panther. The data will then be stored in the new format.
Transformations help align stored data to the needs of detection and query logic, removing the need for ad-hoc data manipulation and expediting detection writing and search.
The following transformations are available:
You can further manipulate your data on ingest using the Script Log Parser.
Order of transformation execution
There is a specific order in which transformations are performed, which ensures that the transformations are applied one after another in a predictable manner. The order of execution is the sequence provided in the transformation list in the Overview, above.
Follow the defined order to accurately transform data. Each transformation in the sequence operates on the data in the state it was left after the previous transformation. Knowing this order maintains consistency and avoids unexpected results.
Combining transformations
Individual transformations can be combined in pairs or sequences to achieve more complex data transformations. This allows for greater flexibility and customization to meet specific data requirements and facilitate efficient detection creation and search operations.
Suppose there is a field that contains personal identification numbers (PINs). For security purposes, you want to rename the field to something less revealing while also applying a mask to redact the PINs.
To achieve this, you can use the rename
transformation to change the field's name to something abstract. For instance, you could rename the field to userId
.
Next, apply the mask
transformation to the userId
field to replace the digits of the PIN with a predefined number of asterisks. This way, the PIN remains hidden, ensuring data privacy.
You can define a field schema with both a rename
and a mask
directive like this:
You will achieve to transform a payload like this:
To this:
rename
rename
The rename
transformation changes the name of a field. This can be useful if you want to standardize field names across data sources, improve the clarity of your data's structure, or adjust field names containing invalid characters or reserved keywords.
By defining a field schema with a rename
directive, such as:
You will transform a payload like this:
To this:
copy
copy
The copy
transformation copies the value of a nested field into another top-level field. This can be useful if you'd like to flatten your data's JSON structure. If desired, you can then mark your newly defined field as an indicator
.
By defining a field schema with a copy
directive, such as:
You will transform a payload like this:
To this:
concat
concat
The concat
transformation allows you to concatenate multiple fields' values into the value of a new field. The resulting combined field can be used, for example, as a key for enrichment.
Fields whose type
is timestamp
cannot be used in concatenation operations.
To use concat
, declare a string
field to store the result of the concatenation. Within concat
, define the paths
, and optionally a separator
. Within paths
, you must use absolute paths to specify the existing schema fields you'd like to combine. The order of these fields determines the concatenation order. If separator
is not defined, the default separator is an empty string (""
).
By defining a field schema with a concat
directive, such as:
You will transform a payload like this:
To this:
split
split
The split
transformation allows you to extract a specific value from a string field by splitting it based on a separator. The resulting split fields can be treated as individual schema fields, making it possible to designate them as indicators. Split transformation can also help with data normalization into standardized fields, making it easier to handle unstructured data formats.
Only fields with a type of string
can be split into other fields (i.e., the value of split:from:
must be a field that contains type: string
).
To use split
, declare a field of any primitive type (i.e., excluding object, array, and JSON) to store the result. Within the split
directive, include the following required fields:
from
: Provide the absolute path of the field to be divided.separator
: Provide the character to split on.index
: Provide the position of the value within the resulting array produced by the split.
By defining a field schema with a split
directive, such as:
You will transform a payload like this:
To this:
You can also use split
to split array elements. For example, using the following schema:
You will transform a payload like this:
To this:
mask
mask
The mask
transformation enables you to conceal sensitive information in your logs. Masking is useful if you need to protect the confidentiality of certain data.
There are two masking techniques:
Obfuscation (also known as hashing): This technique hashes data, using an optional salt value. With this technique, the value keeps its referential integrity.
Redaction: This technique replaces sensitive values with
REDACTED
, or some other string you provide. With this technique, the value loses its referential integrity.
Note that masking a certain field means you cannot later use Panther's search tools to query for its original value, but you can search for a hashed value.
Obfuscation (hashing)
Hashing incoming data means you can enhance its security while still retaining its usability in the future. To strengthen the protection hashing provides, you can include a salt.
To use obfuscation, on the target field in your schema, include mask
. Under mask
, include type
, and optionally salt
.
The value of type
is the hashing algorithm you want to use. Supported values include:
sha256
md5
sha1
sha512
The value of the optional salt
key is a string of your choice. This value is appended to the field's value before it is hashed.
When using mask
, the value of the target field's type
must always be set as string
. The actual input data can be of any type, but type: string
is required because, after the value has been masked, it will be stored as a string in the data lake.
By defining a field schema with a mask
directive such as:
You will transform a payload like this:
To this:
Redaction
Redacting incoming data means replacing it with a predefined value. This technique is useful if you'd like to ensure the sensitive information is not accessible or recoverable.
To use redaction, on the target field in your schema, include mask
. Under mask
, include type: redact
, and optionally to
.
The optional to
key takes a string value that will replace the actual event value. If to
is not included, its default, REDACTED
, is used.
When using mask
, the value of the target field's type
must always be set as string
. The actual input data can be of any type, but type: string
is required because, after the value has been masked, it will be stored as a string in the data lake.
By defining a field schema with a mask
directive such as:
You will transform a payload like this:
To this:
isEmbeddedJSON
isEmbeddedJSON
Sometimes JSON values are delivered embedded in a string.
To have Panther parse the escaped JSON inside the string, use an isEmbeddedJSON: true
flag. This flag is valid for values of type object
, array
and json
.
By defining a field schema with a isEmbeddedJSON
directive such as:
You will transform a payload like this:
To this:
Last updated