# Log Schema Reference

In this guide, you will find common fields used to build YAML-based schemas when onboarding [Custom Log Types](https://docs.panther.com/data-onboarding/custom-log-types) and [Lookup Table](https://docs.panther.com/enrichment/lookup-tables) schemas.

{% hint style="info" %}
Required fields throughout this page are in **bold**.
{% endhint %}

## LogSchema fields

Each log schema contains the following fields:

* **`fields`** ([`[]FieldSchema`](#fieldschema))
  * The fields in each *Log Event*.
* `parser` ([`ParserSpec`](#parserspec))
  * A parser that can convert non-JSON logs to JSON and/or perform custom transformations

### CI/CD schema fields

Additionally, schemas defined using a CI/CD workflow can contain the following fields:

* **`schema`** (`string`)
  * The name of the schema
* `description` (`string`)
  * A short description that will appear in the UI
* `referenceURL` (`string`)
  * A link to an external document which specifies the log structure. Often, this is a link to a 3rd party's documentation.
* `fieldDiscoveryEnabled` (`boolean`)
  * Indicates whether [field discovery](https://docs.panther.com/data-onboarding/custom-log-types/..#enabling-field-discovery) will be enabled for this schema or not.

{% hint style="info" %}
See the Custom Logs page for [information on how to manage schemas through a CI/CD pipeline](https://docs.panther.com/data-onboarding/custom-log-types/..#uploading-log-schemas-with-the-panther-analysis-tool)[ ](https://docs.panther.com/data-onboarding/custom-log-types/..#uploading-log-schemas-with-the-panther-analysis-tool)using Panther Analysis Tool (PAT).
{% endhint %}

### Example

The example below contains the CI/CD fields mentioned above.

```yaml
schema: Custom.MySchema
description: (Optional) A handy description so I know what the schema is for.
referenceURL: (Optional) A link to some documentation on the logs this schema is for.
fieldDiscoveryEnabled: true
parser:
  csv:
    delimiter: ','
    hasHeader: true
fields:
- name: action
  type: string
  required: true
- name: time
  type: timestamp
  timeFormats:
    - unix
```

### ParserSpec

A ParserSpec specifies a parser to use to convert non-JSON input to JSON. Only one of the following fields can be specified:

* `fastmatch` (`FastmatchParser{}`): Use `fastmatch` parser
  * Learn more on [Fastmatch Log Parser](https://docs.panther.com/data-onboarding/custom-log-types/fastmatch-parser).
* `regex` (`RegexParser{}`): Use `regex` parser
  * Learn more on [Regex Log Parser](https://docs.panther.com/data-onboarding/custom-log-types/regex-parser).
* `csv` (`CSVParser{}`): Use `csv` parser
  * Note: The `columns` field is required when there are multiple CSV schemas in the same log source.
  * Learn more on [CSV Log Parser](https://docs.panther.com/data-onboarding/custom-log-types/csv-parser).
* `script`: Use `script` parser
  * Learn more on [Script Log Parser](https://docs.panther.com/data-onboarding/custom-log-types/script-parser).

See the fields for `fastmatch`, `regex`, and `csv` in the tabs below.

{% tabs %}
{% tab title="fastmatch" %}
**Parser `fastmatch` fields**

* **`match`** (`[]string`): One or more patterns to match log lines against. This field cannot be empty.
* `emptyValues` (`[]string`): Values to consider as `null`.
* `expandFields` (`map[string]string`): Additional fields to be injected by expanding text templates.
* `trimSpace` (`bool`): Trim space surrounding each value.
  {% endtab %}

{% tab title="regex" %}
**Parser `regex` fields**

* **`match`** (`[]string`): A pattern to match log lines against (can be split it into parts for documentation purposes). This field cannot be empty.
* `patternDefinitions` (`map[string]string`): Additional named patterns to use in match pattern.
* `emptyValues` (`[]string`): Values to consider as `null`.
* `expandFields` (`map[string]string`): Additional fields to be injected by expanding text templates.
* `trimSpace` (`bool`): Trim space surrounding each value.
  {% endtab %}

{% tab title="csv" %}
**Parser `csv` fields**

* **`delimiter`** (`string`): A character to use as field delimiter.
* `hasHeader` (`bool`): Use first row to derive column names (unless `columns` is set also in which case the header is just skipped).
* `columns` (`[]string`, `required(without hasHeader)`, `non-empty`): Names for each column in the CSV file. If not set, the first row is used as a header.
* `emptyValues` (`[]string`): Values to consider as `null`.
* `trimSpace` (`bool`): Trim space surrounding each value.
* `expandFields` (`map[string]string`): Additional fields to be injected by expanding text templates.
  {% endtab %}

{% tab title="script" %}
**Parser `script` fields**

* **`function`** (`string`): The [Starlark](https://bazel.build/rules/language) function to run per event
  {% endtab %}
  {% endtabs %}

### FieldSchema

A *FieldSchema* defines a field and its value. The field is defined by:

* **`name`** (`string`)
  * The name of the field.
* `required` (`boolean`)
  * If the field is required or not.
* `description` (`string`)
  * Some text documenting the field.
* [`copy`](https://docs.panther.com/data-onboarding/transformations#copy) (`object`)
  * If present, the field's value will be copied from the referenced `object`.
* [`rename`](https://docs.panther.com/data-onboarding/transformations#rename) (`object`)
  * If present, the field's name will be changed.
* [`concat`](https://docs.panther.com/data-onboarding/transformations#concat) (`object`)
  * If present, the field's value will be the combination of the values of two or more other fields.
* [`split`](https://docs.panther.com/data-onboarding/transformations#split) (`object`)
  * If present, the field's value will be extracted from another string field by splitting it based on a separator.
* [`mask`](https://docs.panther.com/data-onboarding/transformations#mask) (`object`)
  * If present, the field's value will be masked.

Its value is defined using the fields of a [`ValueSchema`](#valueschema).

### ValueSchema

A `ValueSchema` defines a value and how it should be processed. Each `ValueSchema` has a `type` field that can be of the following values:

<table data-header-hidden><thead><tr><th width="268">Value Type</th><th>Description</th></tr></thead><tbody><tr><td>Type Values</td><td>Description</td></tr><tr><td><code>string</code></td><td>A string value</td></tr><tr><td><code>int</code></td><td>A 32-bit integer number in the range <code>-2147483648</code>, <code>2147483647</code></td></tr><tr><td><code>smallint</code></td><td>A 16-bit integer number in the range <code>-32768</code>, <code>32767</code></td></tr><tr><td><code>bigint</code></td><td>A 64-bit integer number in the range <code>-9223372036854775808</code>, <code>9223372036854775807</code></td></tr><tr><td><code>float</code></td><td>A 64-bit floating point number</td></tr><tr><td><code>boolean</code></td><td>A boolean value <code>true</code> / <code>false</code></td></tr><tr><td><code>timestamp</code></td><td>A timestamp value</td></tr><tr><td><code>array</code></td><td>A JSON array where each element is of the same type</td></tr><tr><td><code>object</code></td><td>A JSON object of <em>known</em> keys</td></tr><tr><td><code>json</code></td><td>Any valid JSON value (JSON object, array, number, string, boolean)</td></tr></tbody></table>

The fields of a `ValueSchema` depend on the value of the `type`.

<table data-header-hidden><thead><tr><th width="171">Type</th><th width="192">Field</th><th width="162">Value</th><th>Description</th></tr></thead><tbody><tr><td>Type</td><td>Field</td><td>Value</td><td>Description</td></tr><tr><td><code>object</code></td><td><strong><code>fields</code></strong> (required)</td><td><a href="#fieldschema"><code>[]FieldSpec</code></a></td><td>An array of <code>FieldSpec</code> objects describing the fields of the object.</td></tr><tr><td><code>array</code></td><td><strong><code>element</code></strong> (required)</td><td><a href="#valueschema"><code>ValueSchema</code></a></td><td>A <code>ValueSchema</code> describing the elements of an array.</td></tr><tr><td><code>timestamp</code></td><td><strong><code>timeFormats</code></strong> (required)</td><td><code>[]String</code></td><td>An array specifying the formats to use for parsing the timestamp (see <a href="#timestamps">Timestamps</a>)</td></tr><tr><td><code>timestamp</code></td><td><code>isEventTime</code></td><td><code>Boolean</code></td><td>A flag to tell Panther to use this timestamp as the <em>Log Event Timestamp</em>.</td></tr><tr><td><code>string</code></td><td><code>indicators</code></td><td><code>[]String</code></td><td>Tells Panther to extract indicators from this value (see <a href="#indicators">Indicators</a>)</td></tr><tr><td><code>string</code></td><td><code>validate</code></td><td>See <a href="#validate">Validate</a></td><td>Validation rules for the string value</td></tr></tbody></table>

### Timestamps

Timestamps are defined by setting the `type` field to `timestamp` and specifying the timestamp format using the `timeFormats` field.

Panther always stores `timestamp` values in Universal Coordinated Time (UTC). This means:

* If a `timestamp` field value indicates a timezone other than UTC (with a [UTC offset](https://en.wikipedia.org/wiki/UTC_offset)), Panther converts it to UTC.
  * For example, if an incoming `timestamp` field had a value of `2025-07-02T00:15:30-08:00` (where the `-08:00` offset means it's in Pacific Standard Time \[PST]), Panther will store it as `2025-07-02 08:15:30.000000000` (converted to UTC).
* If a `timestamp` field value does not indicate a timezone, Panther assumes it is in UTC and stores it as-is.

See the allowed `timeFormats` values below:

<table><thead><tr><th width="154.00000000000003">timeFormats value</th><th width="276">Example</th><th>Description</th></tr></thead><tbody><tr><td><code>rfc3339</code></td><td><code>2022-04-04T17:09:17Z</code></td><td>The most common timestamp format.</td></tr><tr><td><code>unix_auto</code></td><td><code>1649097448</code> (seconds)<br><br><code>1649097491531</code> (milliseconds)<br><br><code>1649097442000000</code> (microseconds)<br><br><code>1649097442000000000</code> (nanoseconds)</td><td>Timestamp expressed in time passed since UNIX epoch time. It can handle seconds, milliseconds, microseconds, and nanoseconds.</td></tr><tr><td><code>unix</code></td><td><code>1649097448</code></td><td>Timestamp expressed in seconds since UNIX epoch time. It can handle fractions of seconds as a decimal part.</td></tr><tr><td><code>unix_ms</code></td><td><code>1649097491531</code></td><td>Timestamp expressed in milliseconds since UNIX epoch time.</td></tr><tr><td><code>unix_us</code></td><td><code>1649097442000000</code></td><td>Timestamp expressed in microseconds since UNIX epoch time.</td></tr><tr><td><code>unix_ns</code></td><td><code>1649097442000000000</code></td><td>Timestamp expressed in nanoseconds since UNIX epoch time. Scientific float notation is supported.</td></tr></tbody></table>

{% hint style="warning" %}
The `timeFormats` field was introduced in Panther v1.46 to support multiple timestamp formats in custom log schemas. While `timeFormat` is still supported for log sources set up before v1.46, use `timeFormats` for all new schemas.
{% endhint %}

#### Defining a custom format

You can also define a custom format by using [strftime](https://strftime.org) notation. For example:

```yaml
# The field is a timestmap using a custom timestamp format like "2020-09-14 14:29:21"
- name: ts
  type: timestamp
  timeFormats:
    - "%Y-%m-%d %H:%M:%S" # note the quotes required for proper YAML syntax
```

Panther's strftime format supports using `%N` code to parse nanoseconds. For example:

`%H:%M:%S.%N` can be used to parse `11:12:13.123456789`

#### Using multiple time formats

When multiple time formats are defined, each of them will be tried sequentially until successful parsing is achieved:

```yaml
- name: ts
  type: timestamp
  timeFormats:
    - rfc3339
    - unix
```

Timestamp values can be marked with `isEventTime: true` to tell Panther that it should use this timestamp as the `p_event_time` field. It is possible to set `isEventTime` on multiple fields. This may be useful in situations where logs have optional or mutually exclusive fields holding event time information. Since there can only be a single `p_event_time` for every log event, the priority is defined using the order of fields in the schema.

#### Working with `timeFormats` in schema tests

When writing schema tests to be run with the [`pantherlog test` command](https://docs.panther.com/panther-developer-workflows/pantherlog#test-run-tests-for-a-schema):

* If your schema field has a single `timeFormats` value, for backwards compatibility, configurations will retain the same format.
* If your schema field has multiple `timeFormats` values, you must define the timestamp field value in the `result` payload formatted as `YYYY-MM-DD HH:MM:SS.fffffffff`.

Example with a single `timeFormats` value:

```yaml
- name: singleFormatTimestamp
  type: timestamp
  timeFormats:
    - unix
```

```yaml
input: >
  {
    "singleFormatTimestamp": "1666613239"
  }
result: >
  {
    "singleFormatTimestamp": "1666613239"
  }
```

Example with multiple `timeFormats` values:

```yaml
- name: multipleFormatTimestamp
  type: timestamp
  timeFormats:
    - unix
    - rfc3339
```

```yaml
input: >
  {
    "multipleFormatTimestamp": "1666613239"
  }
result: >
  {
    "multipleFormatTimestamp": "2022-10-24 12:07:19.459326000"
  }
```

### Indicators

Values of `string` type can be used as "indicators." To mark a field as an indicator, set the `indicators` field to an array of indicator scanner names (more than one may be used). This will instruct Panther to store the value of this field in the relevant `p_any_` field.

For a list of values that are valid to use in the `indicators` field, see [Standard Fields](https://docs.panther.com/search/panther-fields#indicator-fields).

For example:

```yaml
# Will scan the value as IP address and store it to `p_any_ip_addresses`
- name: remote_ip
  type: string
  indicators: [ ip ]

# Will scan the value as a domain name and/or IP address.
# Will store the result in `p_any_domain_names` and/or `p_any_ip_addresses`
- name: target_url
  type: string
  indicators: [ url ]
```

### Validate

Under the `validate` key, you can specify conditions for a field's value that must be met in order for an incoming log to match this schema.

It's also possible to use `validate` on the `element` key (where `type: string`) to perform validation on each element of an array value.

#### `allow` and `deny` validation

You can validate values of `string` type by declaring an allowlist or denylist. Only logs with field values that match (or do not match) the values in `allow`/`deny` will be parsed with this schema. This means you can have multiple log types that have common overlapping fields but differ on values of those fields.

```yaml
# Will only allow 'login' and 'logout' event_type values to match this log type
- name: event_type
  type: string
  validate:
    allow: [ "login", "logout"]
    
# Will match if log has any event_type value other than 'login' and 'logout'
- name: event_type
  type: string
  validate:
    deny: [ "login", "logout"]
    
# Can also be used with string array elements    
# Will match logs with a severities field with value 'info' or 'low' 
- name: severities
  type: array
  element:
    type: string
    validate:
      allow: ["info", "low"]
```

#### `allowContains` and `denyContains` validation

You can validate that string values contain or do not contain specific substrings using `allowContains` and `denyContains`. This is useful when you need to match log types based on partial string content rather than exact values.

```yaml
# Will only match logs where message value contains 'error' or 'fail'
- name: message
  type: string
  validate:
    allowContains: ["error", "fail"]
    
# Will match logs where message value does not contain 'password' or 'secret'
- name: message
  type: string
  validate:
    denyContains: ["password", "secret"]
    
# Can also be used with string array elements
# Will match logs with a tags value containing 'critical' or 'warning'
- name: tags
  type: array
  element:
    type: string
    validate:
      allowContains: ["critical", "warning"]
```

#### `ip` and `cidr` format validation

Values of `string` type can be restricted to match well-known formats. Currently, Panther supports the `ip` and `cidr` formats to require that a string value be a valid IP address or CIDR range.

`ip` and `cidr` validation can be combined with `allow`, `deny`, `allowContains`, or `denyContains` rules but doing so is somewhat redundant—for example, if you allow two IP addresses, then adding an `ip` validation will simply ensure that your validation will not include false positives if the IP addresses in your list are not valid.

```yaml
# Will allow valid ipv4 IP addresses e.g. 100.100.100.100
- name: address
  type: string
  validate:
    ip: "ipv4"
    
# Will allow valid ipv6 CIDR ranges 
# e.g. 2001:0db8:85a3:0000:0000:0000:0000:0000/64
- name: address
  type: string
  validate:
    cidr: "ipv6"
    
# Will allow any valid ipv4 or ipv6 address
- name: address
  type: string
  validate:
    ip: "any"
    
# All elements of the addresses array must be valid ipv4 ID addresses
- name: addresses
  type: array
  element:
    type: string
    validate:
      ip: "ipv4"
```

## Using JSON schema in an IDE

If your code editor or integrated development environment (IDE) supports [JSON Schema](https://json-schema.org/), you can configure it to use [this schema file](https://panther-community-us-east-1.s3.amazonaws.com/latest/logschema/schema.json) for Panther schemas and [this schema-tests file](https://panther-community-us-east-1.s3.amazonaws.com/latest/logschema/schema-tests.json) for schema tests. Doing so will allow you to receive suggestions and error messages while developing Panther schemas and their tests.

### JetBrains custom JSON schemas

See the [JetBrains documentation](https://www.jetbrains.com/help/phpstorm/json.html#ws_json_schema_add_custom) for instructions on how to configure JetBrains IDEs to use custom JSON Schemas.

### VSCode custom JSON schemas

See the [VSCode documentation](https://code.visualstudio.com/Docs/languages/json#_json-schemas-and-settings) for instructions on how to configure VSCode to use JSON Schemas.

## Stream type

While performing certain actions in the Panther Console, such as [configuring an S3 bucket for Data Transport ](https://docs.panther.com/data-onboarding/data-transports/aws/s3)or [inferring a custom schema from raw logs](https://docs.panther.com/data-onboarding/custom-log-types/..#infer-a-schema-from-raw-data), you need to select a log stream type.

View example log events for each type below.

<table><thead><tr><th width="144">Stream type</th><th width="205">Description</th><th>Example log event(s)</th></tr></thead><tbody><tr><td>Auto</td><td>Panther will automatically detect the appropriate stream type.</td><td>n/a</td></tr><tr><td>Lines</td><td>Events are separated by a new line character.</td><td><pre class="language-json"><code class="lang-json">"10.0.0.1","user-1@example.com","France"
"10.0.0.2","user-2@example.com","France"
"10.0.0.3","user-3@example.com","France"
</code></pre></td></tr><tr><td>JSON</td><td>Events are in JSON format.</td><td><pre class="language-json"><code class="lang-json">{ 
    "ip": "10.0.0.1", 
    "un": "user-1@example.com", 
    "country": "France" 
}
OR
{ "ip": "10.0.0.1", "un": "user-1@example.com", "country": "France" }{ "ip": "10.0.0.2", "un": "user-2@example.com", "country": "France" }{ "ip": "10.0.0.3", "un": "user-3@example.com", "country": "France" }
OR
{ "ip": "10.0.0.1", "un": "user-1@example.com", "country": "France" }
{ "ip": "10.0.0.2", "un": "user-2@example.com", "country": "France" }
{ "ip": "10.0.0.3", "un": "user-3@example.com", "country": "France"OR
</code></pre></td></tr><tr><td>JSON Array</td><td><p>Events are inside an array of JSON objects.<br></p><p>OR<br><br>Events are inside an array of JSON objects, which is the value to a key in a top-level object. This can be known as an "enveloped array."</p></td><td><pre class="language-json"><code class="lang-json">[
	{ "ip": "10.0.0.1", "username": "user-1@example.com", "country": "France" },
	{ "ip": "10.0.0.2", "username": "user-2@example.com", "country": "France" },
	{ "ip": "10.0.0.3", "username": "user-3@example.com", "country": "France" }
]
OR
{ "events": [
        { "ip": "10.0.0.1", "username": "user-1@example.com", "country": "France" },
	{ "ip": "10.0.0.2", "username": "user-2@example.com", "country": "France" },
	{ "ip": "10.0.0.3", "username": "user-3@example.com", "country": "France" }
    ] 
}
</code></pre></td></tr><tr><td>CloudWatch Logs</td><td>Events came from CloudWatch Logs.</td><td><pre class="language-json"><code class="lang-json">{
  "owner": "111111111111",
  "logGroup": "services/foo/logs",
  "logStream": "111111111111_CloudTrail/logs_us-east-1",
  "messageType": "DATA_MESSAGE",
  "logEvents": [
      {
          "id": "31953106606966983378809025079804211143289615424298221568",
          "timestamp": 1432826855000,
          "message": "{\"ip\": \"10.0.0.1\", \"user\": \"user-1@example.com\", \"country\": \"France\"}"
      },
      {
          "id": "31953106606966983378809025079804211143289615424298221569",
          "timestamp": 1432826855000,
          "message": "{\"ip\": \"10.0.0.2\", \"user\": \"user-2@example.com\", \"country\": \"France\"}"
      },
      {
          "id": "31953106606966983378809025079804211143289615424298221570",
          "timestamp": 1432826855000,
          "message": "{\"ip\": \"10.0.0.3\", \"user\": \"user-3@example.com\", \"country\": \"France\"}"
      }
  ]
}
</code></pre></td></tr><tr><td>XML</td><td>Events are in XML format. Events are positioned at the top level or <a href="#xml-root-element-support">enclosed in a root element</a>. Learn more about how XML is parsed in <a href="#xml-stream-type">XML stream type</a>.</td><td><pre class="language-xml"><code class="lang-xml">&#x3C;log>
    &#x3C;id>1&#x3C;/id>
    &#x3C;data>first log&#x3C;/data>
&#x3C;/log>
&#x3C;log>
    &#x3C;id>2&#x3C;/id>
    &#x3C;data>second log&#x3C;/data>
&#x3C;/log>
OR
&#x3C;logs>
    &#x3C;log>
        &#x3C;id>1&#x3C;/id>
        &#x3C;data>first log&#x3C;/data>
    &#x3C;/log>
    &#x3C;log>
        &#x3C;id>2&#x3C;/id>
        &#x3C;data>second log&#x3C;/data>
    &#x3C;/log>
&#x3C;/logs>
</code></pre></td></tr></tbody></table>

### JSON Array stream type

With the JSON Array stream type, you can indicate whether the array of events is an "enveloped array"—i.e., whether it's the value of an event field.

{% hint style="warning" %}
The JSON Array "enveloped array" option is not supported when testing a schema against a set of logs in the Panther Console.
{% endhint %}

<figure><img src="https://4011785613-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgdiSWdyJcXPahGi9Rs-2910905616%2Fuploads%2FtATi0eJ6kqV0nddRlcvd%2FScreenshot%202025-11-28%20at%201.37.02%E2%80%AFPM.png?alt=media&#x26;token=046f0643-f2e2-4faf-860f-ec037745dc05" alt="" width="563"><figcaption></figcaption></figure>

### CloudWatch Logs stream type

With the CloudWatch Logs stream type, you can optionally enable [**envelope field retention**](https://docs.panther.com/data-transports/aws/cloudwatch#envelope-field-retention) to preserve the top-level envelope metadata (such as `owner`, `logGroup`, and `logStream`) in a `p_header` field on each processed event.

A CloudWatch Logs subscription delivers events in an envelope like this:

```json
{
  "owner": "111111111111",
  "logGroup": "services/foo/logs",
  "logStream": "111111111111_CloudTrail/logs_us-east-1",
  "messageType": "DATA_MESSAGE",
  "logEvents": [
    {
      "id": "31953106606966983378809025079804211143289615424298221568",
      "timestamp": 1432826855000,
      "message": "{\"ip\": \"10.0.0.1\", \"username\": \"user-1@example.com\", \"country\": \"France\"}"
    },
    {
      "id": "31953106606966983378809025079804211143289615424298221569",
      "timestamp": 1432826855000,
      "message": "{\"ip\": \"10.0.0.2\", \"username\": \"user-2@example.com\", \"country\": \"France\"}"
    }
  ]
}
```

When envelope field retention is enabled, each processed event includes the envelope metadata in the `p_header` field:

```json
{
  "ip": "10.0.0.1",
  "username": "user-1@example.com",
  "country": "France",
  "p_header": {
    "owner": "111111111111",
    "logGroup": "services/foo/logs",
    "logStream": "111111111111_CloudTrail/logs_us-east-1"
  }
}
```

See [Envelope field retention](https://docs.panther.com/data-transports/aws/cloudwatch#envelope-field-retention) for instructions to enable this.

### XML stream type

When parsing XML log events, Panther converts XML elements to JSON objects—at a high level, element names are turned into keys and text content becomes values. [Learn more about how to create a custom schema for XML logs here](https://docs.panther.com/data-onboarding/custom-log-types/..#writing-schemas).

#### **XML root element support**

Panther supports parsing XML files where log events are enclosed within a root element (in addition to supporting files where events are top-level elements). When you specify a root element, Panther will extract individual events contained within it, processing each child element as a separate log event.

To parse events enclosed in a root element, when selecting the stream type in Panther:

1. For the stream type, select **XML**.
2. Set the **Are the XML events enclosed in a root element?** toggle to **Yes**.
3. In the **XML Root Element** field, enter the root element name (e.g., `logs`, `events`, `data`).

   <figure><img src="https://4011785613-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LgdiSWdyJcXPahGi9Rs-2910905616%2Fuploads%2Fgit-blob-620f522780e1f5478f8727cf21fe2a2540f2233d%2FScreenshot%202025-09-15%20at%204.29.54%E2%80%AFPM.png?alt=media" alt="Various form fields are circled: A radio button labeled &#x22;XML,&#x22; a toggle set to &#x22;Yes,&#x22; a text field labeled &#x22;XML Root Element,&#x22; etc."><figcaption></figcaption></figure>

#### **XML processing rules**

In greater detail, here is how Panther processes an XML file:

* Each top-level XML element is processed as a separate event, unless a [root element is specified](#xml-root-element-support), in which case Panther extracts events from within that element.
  * Nested elements are enclosed in a nested object.
* Element names become field names.
  * If multiple elements in the same level of nesting have the same name, an array field is created where the key is the shared element name, and the value is an array of the elements' contents (i.e., text content, attributes, nested fields, etc.).
* Text content becomes field values.
  * If an element has only text content (and no attributes nor nested elements), the text content is parsed directly as the field value.
  * If an element has both 1) text content and 2) at least one attribute or nested element, the text content is stored as the value of a `text` key.
  * If an element is empty (i.e., has no text content), it's given a value of `null`.
  * If text content is broken up by one or more elements, it is concatenated with a space between each part.
* Element attributes (e.g., `<User role="admin">`) are added as key/value pairs alongside text content in a shared nested object.
  * If an attribute name conflicts with the name of a nested element (which will become a field name in the resulting nested object, like the attribute name), the attribute name is given an `_attr` suffix. If the attribute name is `text` and the element has text content (which will generate a nested `text` key), the attribute field will become `text_attr`.
  * The `xmlns` attribute (declaring an XML namespace) is automatically skipped.

Example XML input:

```xml
<logs> 
    <log>
        <id>1</id>
        <data>first log</data>
        <user></user>
    </log>
    <log>
        <id>2</id>
        <data level="info">second log</data>
    </log>
    <log>
        <id>3</id>
        <data>
            text before element
            <source>app</source>
            text after element
        </data>
    </log>
    <log>
        <data>fourth log
            <severity text="sev">high</severity>
        </data>
    </log>
    <log>
        <data id="123" name="John">
            <name>John1</name>
            <name>John2</name>
        </data>
    </log>
</logs>
```

Panther processes as:

```json
{"id":"1","data":"first log","user":null}
{"id":"2","data":{"text":"second log","level":"info"}}
{"id":"3","data":{"text":"text before element text after element","source":"app"}}
{"data":{"text":"fourth log","severity":{"text_attr":"sev","text":"high"}}}
{"data":{"id":"123","name_attr":"John","name":["John1","John2"]}}
```
