Join Operator

Overview

Enrich data with join.

| join kind:<kind> <dest>=(<target_query>) on <condition>

When a join is performed, rows from the pipe are matched up with rows from the <target_query> based on the <condition>. The resulting data contains all of the original fields from the pipe, with the fields from the <target_query> added to the <dest> field. The rows returned are controlled with <kind>.

<target_query>

The target_query is the data that will be joined. It can be any valid query, such as simply a table name (some_table) or a more complex statement (some_table | extend id = a + b).

<condition>

The <condition> is an expression that is used to control which rows in the pipe are joined to rows in the <target_query>. It is required for every join <kind> except cross joins. Joins have a $left side (the pipe) and a $right side (the <target_query>). For example, a condition of $left.id == $right.id will match rows that contain the same value for id.

<kind>

Control which rows are emitted with kind.

kind value
Description

inner

Return only rows that match on condition.

leftouter

Return every row from the pipe enriched with the fields from the target_query, rows with no match have null for the columns from target_query.

rightouter

Return every row from the target_query enriched with the fields from the pipe, rows with no match have null for the columns from the pipe.

fullouter

Return rows from both the pipe and the target_query, even those with no match.

cross

Return every row in the pipe combined with every row from the target_query .

Examples

Join on IP addresses

Let's say our pipe contains IP addresses in the sourceIP field and we want to enrich it with data from a table named ip_location containing IP addresses in a field named ip.

We can use a <condition> of $left.sourceIp == $right.ip to match rows based on IP addresses and kind:leftouter to return all rows from the pipe—even those that might be missing from ip_location.

<source_table>
| join kind:leftouter ip=(ip_location) on $left.sourceIP == $right.ip

Join with tor_exit_nodes table

This example uses an inner join to find aws_alb logs that have a clientIp that is listed in the Panther-managed tor_exit_node Enrichment Provider table.

aws_alb
| join kind:inner tor=(panther_lookups.public.tor_exit_nodes) on        
       $left.clientIp == $right.ip
| limit 10

Join with ipinfo_location_datalake table using UDFs

Here is a more complex example, in which we'll enrich ALB logs with the Panther-managed ipinfo_location_datalake Enrichment Provider table.

Note that this query uses the snowflake.func function to call the SQL user-defined functions (UDFs) panther_lookups.public.ipinfo_to_join_key() and panther_lookups.public.ipinfo_to_int(), which take an IP address as an argument. Learn more about these UDFs on IPinfo.

aws_alb
| where p_event_time > time.ago(1d)
| join kind:leftouter ip=(panther_lookups.public.ipinfo_location_datalake) on        
       snowflake.func('panther_lookups.public.ipinfo_to_join_key', $right.joinKey) == snowflake.func('panther_lookups.public.ipinfo_to_join_key', $left.clientIp)
       and snowflake.func('panther_lookups.public.ipinfo_to_int', $left.clientIp) between 
       snowflake.func('panther_lookups.public.ipinfo_to_int', $right.startIP) .. snowflake.func('panther_lookups.public.ipinfo_to_int', $right.endIP)
| project p_event_time, clientIp, city=ip.city, country=ip.country
| limit 10

Join using datatable

It's possible to inject sample data into a join using the datatable operator:

aws_alb
| join kind:inner system_info=(datatable [{"ip":"192.168.1.1", "hostname":"fluffy"}, {"ip":"192.168.1.2", "hostname":"squishy"}]) on        
       $left.clientIp == $right.ip
| limit 10

Last updated