Configuring Databricks for Panther
Overview
This page describes how to configure Databricks for use as your Panther data storage backend. As you complete the steps below, you will collect and store various configuration values, then provide them to Panther.
You should complete the process on this page only after arriving at Step 9 on Setting Up a Cloud Connected Panther Instance.
This process will:
Create a Databricks workspace for Panther (along with associated Databricks infrastructure in AWS)
Create an IAM role in AWS to allow Databricks to read from the Panther S3 staging bucket.
Create an external storage credential.
Create an external storage integration so Databricks can read data from S3 for loading.
Create service principals—one for loading (read/write) and one for querying (read-only).
Create secrets with KMS keys in AWS to hold OAuth credentials for the service principals.
Create a catalog in Databricks for Panther tables, with permissions for the service principals.
Create load, optimize, query, and scheduled query warehouses.
How to configure Databricks for Panther
Prerequisites
You have a Databricks account.
You have completed the instructions on Setting Up a Cloud Connected Panther Instance and can log in to the Panther Console.
You are logged into the AWS console in the AWS account you'd like to use for Panther compute. This is needed because Databricks will load a CloudFormation template to create a workspace.
This AWS account should not be the AWS account where Panther is hosted.
You have the Databricks and AWS permissions listed in the following pages:
Step 1: Make a copy of the configuration table
Throughout the configuration process, you'll collect values that you'll send to Panther at the end. To organize theses values, make a copy of the table below.
databricks_load_role_arn
databricks_load_secret_kms_key_arn
databricks_query_secret_kms_key_arn
databricks_load_secret_arn
databricks_query_secret_arn
databricks_load_warehouse_id
databricks_optimize_warehouse_id
databricks_query_warehouse_id
databricks_scheduled_query_warehouse_id
Step 2: Create a Databricks workspace
The instructions below apply only to Databricks customers on Pay as you go pricing. If you have Committed Use Contracts with Databricks:
If your Databricks account was created after November 8, 2023, instead follow the instructions on the Databricks Create a workspace with custom AWS configurations documentation page.
If your Databricks account was created before November 8, 2023, instead follow the instructions on the Databricks Manually create a workspace (existing Databricks accounts) documentation page.
Log in into the Databricks console.
In the left-hand navigation menu, click Workspaces.
Click Create workspace.

Fill out the Create Workspace modal:
Workspace name: enter a memorable name.
Region: select the region that matches your AWS deployment of Panther.
Storage and compute: select Use your existing cloud account.
How would you like to deploy the workspace?: select Automatically with Quickstart.
Click Continue.
A new browser tab will open in AWS, on a Quick create stack screen with the CloudFormation template pre-loaded.
Without making any changes, deploy the CloudFormation template by clicking Create stack.
Return to your Databricks browser tab, and wait a few minutes for the new workspace to appear in the Workspaces list. When it appears, click Open to enter the workspace environment.
Step 3: Enable variant shredding in your workspace
In your newly created Databricks workspace, in the upper-right corner, click your profile icon, then Previews.
To the right of Variant Shredding for Optimized Read Performance on Semi-Structured Data, click the toggle On.

Step 4: Create a Panther role for the storage credential
In the AWS account where you created the Databricks workspace infrastructure, create an IAM role named
panther-databricks-s3-reader-role-<region>, accepting all defaults.In your Panther Console, retrieve the Processed Data Bucket value:
Click the gear icon (Settings) > General.
Click Data Lake.
Under Databricks Configuration, copy the Processed Data Bucket value.

Update the role's trust relationship:
In the AWS console, in the Roles list, click the newly created role to view its details page.
Click Trust relationships.
Click Edit trust policy.
Replace the JSON in the code editor with the JSON below:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL",
"arn:aws:iam::<your account for this role>:role/panther-databricks-s3-reader-role-<region>"
]
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "TBD"
}
}
}
]
}e. Click Update policy.
Update the role's permissions:
On the role's details page, click Permissions.
Click Add permissions > Create inline policy.
In the Policy editor section, click JSON.
Replace the JSON in the code editor with the JSON below:
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::<Processed Data Bucket from Panther settings>"
},
{
"Action": "s3:GetObject",
"Effect": "Allow",
"Resource": "arn:aws:s3:::<Processed Data Bucket from Panther settings>/*"
}
],
"Version": "2012-10-17"
}e. Click Next.
f. Under Policy details, enter a Policy name.
g. Click Create policy.
On the role's details page, copy the ARN, and add it as the
databricks_load_role_arnvalue in your configuration table.Leave the browser window with the role details page open, as you will return to it in Step 5.
Step 5: Create a storage credential
Create a Databricks storage credential to represent the AWS IAM role you just created:
In the Databricks workspace you created above, click Catalog, then External Data.

Click Credentials.
Click Create credential.
Fill in the Create a new credential form:
Credential Type: select AWS IAM Role.
Credential name: enter
panther-storage-credential.IAM role (ARN): enter the ARN of the IAM role you created above (which is
databricks_load_role_arnin the configuration table).
Click Create.
On the Credential created page, copy the External ID value, and store it in a secure location, as you will need it in the next step.
Step 6: Update the IAM role trust relationship policy
Return to the AWS console, to the details page for the
panther-databricks-s3-reader-role-<region>IAM role you created above.Click Trust relationships.
Click Edit trust policy.
In the
"sts:ExternalId": "TBD"line, replaceTBDwith the External ID value you copied in Databricks above.Click Update policy.
Step 7: Create an external storage location
In the Databricks workspace you created above, click Catalog, then External Data.

Click Create external location.
Click Manual, then Next.
Fill in the Create a new external location manually form:
External location name: enter
panther-processed-data.Storage type: select S3.
URL: enter the Processed Data Bucket value you retrieved from the Settings page in the Panther Console in Step 3.
Storage credential: select
panther-storage-credential.
Click Create.
You will be routed to a page with a Permission Denied warning box—click Force create.

Step 8: Create a load service principal in Databricks
Access your Databricks workspace settings:
In the upper-right corner, click your initial.
Click Settings.

In the Settings navigation bar, under Workspace admin, click Identity and access.
To the right of Service principals, click Manage.

Click Add service principal.
In the Add service principal modal, click Add new.
In the Service principal name field, enter
panther-load.Click Add.
In the table, click panther-load to view its details page.
Click Secrets.
Click Generate secret.
Under Lifetime (days), enter
730(the maximum).Click Generate.
Copy the Secret and Client ID values and store them in a secure location, as you'll need them in a later step (as an alternative to copying these values, you can leave this browser tab open).
Step 9: Create a load secret KMS key in AWS
In your AWS console, ensure you are in the correct region. Navigate to Key Management Service.
In the left-hand navigation menu, click Customer managed keys.
Click Create Key.
Under Key type, select Symmetric. Under Key usage, select Encrypt and decrypt.
Click Next
Enter an Alias value, then click Next.
Under Key administrators, optionally select users and/or roles, then click Next.
On the Define key usage permissions - optional page, under Other AWS accounts, click Add another AWS account.
In the field that appears, enter the AWS account ID for the account your Panther deployment is in. You can find this value in the Panther Console, in the general settings footer.
Click Next.
Switch to a browser tab with the Panther Console open, and retrieve the Delta Controller Role ARN and Delta Admin Role ARN values:
Click the gear icon (Settings) > General.
Click Data Lake.
Under Databricks Configuration, note the Delta Controller Role ARN and Delta Admin Role ARN values.

In the AWS console, under Key policy, click Edit, then replace the JSON in the code editor with the JSON below:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Panther",
"Effect": "Allow",
"Principal": {
"AWS": [
"<Delta Controller Role ARN from Panther settings>",
"<Delta Admin Role ARN from Panther settings>"
]
},
"Action": "kms:Decrypt",
"Resource": "*"
},
{
"Sid": "root",
"Effect": "Allow",
"Action": [
"kms:*"
],
"Resource": "*",
"Principal": {
"AWS": "arn:aws:iam::<AWS Account ID you are working in>:root"
}
}
]
}Click Next.
On the Review page, review the configuration, then click Finish.
In the Customer managed keys list, click the alias of the key you just created, to view its detail page.
Copy the key ARN into the table above for the
databricks_load_secret_kms_key_arnrow.
Step 10: Create a load secret in AWS
In your AWS console, ensure you are in the correct region. Navigate to Secrets Manager.
You should not be in the AWS account hosting your Panther infrastructure.
Click Store a new secret.
Under Secret type, select Other type of secret.
Under Key/value pairs, in the Key/value tab, enter the following key value pairs:
Key
Value
secret<the Secret value you generated in Databricks in Step 8>
client-id<the Client ID value you generated in Databricks in Step 8>
databricks-host<the URL of your Databricks workspace> While viewing the workspace you created above in your Databricks console, copy the URL of the page. For example,
https://dbc-023ca860-3666.cloud.databricks.comUnder Encryption key, select the
databricks_load_secret_kms_key_arnKMS key you created in the previous step.Click Next.
In the Secret name field, enter
panther-databricks-admin-access, then click Next.Without making any changes on the Configure rotation - optional page, click Next.
Review the secret settings, then click Store.
In the Secrets list, click panther-databricks-admin-access, to view its details page.
In the Resource permissions tile, click Edit permissions.
Under Resource permissions, replace the JSON in the code editor with the JSON below:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Panther",
"Effect": "Allow",
"Principal": {
"AWS": [
"<Delta Controller Role ARN from Panther settings>",
"<Delta Admin Role ARN from Panther settings>"
]
},
"Action": "secretsmanager:GetSecretValue",
"Resource": "*"
}
]
}Click Save.
Copy the ARN of the newly created secret and add it as the
databricks_load_secret_arnvalue in your configuration table.In the Databricks console, return to the External Data page (click Catalog > External Data).
Under External Locations, click the panther-processed-data location you created above.
Click Permissions.
Click Grant.
Under Principals, search for and select panther-load.
Under Privileges, check the boxes for BROWSE and READ FILES.
Click Confirm.
Step 11: Create a query service principal in Databricks
Access your Databricks workspace settings:
In the upper-right corner, click your initial.
Click Settings.

In the Settings navigation bar, under Workspace admin, click Identity and access.
To the right of Service principals, click Manage.

Click Add service principal.
In the Add service principal modal, click Add new.
In the Service principal name field, enter
panther-query.Click Add.
In the table, click panther-query to view its details page.
Click Secrets.
Click Generate secret.
Under Lifetime (days), enter
730(the maximum).Click Generate.
Copy the Secret and Client ID values and store them in a secure location, as you'll need them in a later step (as an alternative to copying these values, you can leave this browser tab open).
Step 12 (Optional): Create a query secret KMS key
In the next step, you'll create an additional secret in AWS. You can either create a new KMS key to associate to this secret, or reuse the KMS key you created in Step 8 (added to your configuration table as databricks_load_secret_kms_key_arn).
If you'd like to reuse the KMS key you created above, copy the value of
databricks_load_secret_kms_key_arntodatabricks_query_secret_kms_key_arnin the configuration table above.If you'd like to create a new KMS key, repeat Step 8: Create a load secret KMS key in AWS, then add the ARN for the key as
databricks_query_secret_kms_key_arnin the configuration table above.
Step 13: Create a query secret in AWS
In your AWS console, ensure you are in the correct region. Navigate to Secrets Manager.
This should NOT be created in the AWS account hosting your Panther infrastructure.
Click Store a new secret.
Under Secret type, select Other type of secret.
Under Key/value pairs, in the Key/value tab, enter the following key value pairs:
Key
Value
secret<the Secret value you generated in Databricks in Step 11>
client-id<the Client ID value you generated in Databricks in Step 11>
databricks-host<the URL of your Databricks workspace> While viewing the workspace you created above in your Databricks console, copy the URL of the page. For example,
https://dbc-023ca860-3666.cloud.databricks.comUnder Encryption key, select the
databricks_query_secret_kms_key_arnKMS key you created in the previous step (or thedatabricks_load_secret_kms_key_arnKMS key, if you are reusing that one).Click Next.
In the Secret name field, enter
panther-databricks-query-access, then click Next.Without making any changes on the Configure rotation - optional page, click Next.
Review the settings, then click Store.
In the Secrets list, click panther-databricks-query-access, to view its details page.
In the Resource permissions tile, click Edit permissions.
Under Resource permissions, replace the JSON in the code editor with the JSON below:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Panther",
"Effect": "Allow",
"Principal": {
"AWS": [
"<Delta Controller Role ARN from Panther settings>",
"<Delta Admin Role ARN from Panther settings>"
]
},
"Action": "secretsmanager:GetSecretValue",
"Resource": "*"
}
]
}Click Save.
Copy the ARN of the newly created secret and add it as the
databricks_query_secret_arnvalue in your configuration table.
Step 14: Create an S3 bucket and external location
In your AWS console, ensure you are in the correct region. Navigate to S3.
You should not be in the AWS account hosting your Panther infrastructure.
Click Create bucket.
Enter a Bucket name.
Click Create bucket.
In the Databricks workspace you created above, click Catalog, then External Data.

Click Create external location.
Click AWS Quickstart (Recommended), then Next.
In the Bucket Name field, enter the name of the bucket you just created.
Under Personal Access Token, click Generate new token.
Copy this value, as you'll need it in the follow steps. Alternatively, you can leave this page open.
Click Launch in Quickstart.
A new browser tab will open in AWS, on a Quick create stack screen with the CloudFormation template pre-loaded.
In the Parameters section, in the Databricks Personal Access Token field, enter the Personal Access Token you generated above in Databricks.
Click Create stack.
After the stack has completed deploying, return to your Databricks console browser tab. On the Create external location with Quickstart screen, click Ok.

Verify that the External Locations list contains the one you just created.
Step 15: Create a Databricks catalog
In the Databricks workspace you created above, click Catalog.
Click Add data > Create a catalog.

Fill in the Create a new catalog form:
Catalog name: enter
panther.Type: select Standard.
Select external location: choose the external location you created in Step 14.
Do not choose panther-processed-data.

Click Create.
On the Catalog created! modal, click View catalog.
Click Permissions.
Click Grant.
In the Grant on panther modal, fill in the form:
Principals: type and select
panther-load.Select the following permissions:
USE CATALOG
USE SCHEMA
BROWSE
SELECT
MODIFY
CREATE SCHEMA
CREATE TABLE

Click Confirm.
Click Grant.
In the Grant on panther modal, fill in the form:
Principals: type and select
panther-query.Select the following permissions:
USE CATALOG
USE SCHEMA
BROWSE
SELECT

Click Confirm.
Step 16: Create a panther-load SQL warehouse
In the Databricks workspace you created above, click Compute.
Click SQL warehouses.
Click Create SQL warehouse.
Fill out the New SQL warehouse form:
Name: enter
panther-load.Cluster size: select 2X-Small.
Scaling: set the Max value to
40(the maximum allowed).Type: select Pro.
Do not use Serverless.
Click Create.
In the Manage permissions modal, add the panther-load user, then select Can use permissions.

Click Add.
Click the X in the upper-right corner to close the Manage permissions modal.
On the panther-load warehouse details page, copy the ID (next to the name) and add it as the
databricks_load_warehouse_idvalue in your configuration table.
Step 17: Create a panther-optimize SQL warehouse
This warehouse runs nightly table maintenance jobs.
In the Databricks workspace you created above, click Compute.
Click SQL warehouses.
Click Create SQL warehouse.
Fill out the New SQL warehouse form:
Name: enter
panther-optimize.Cluster size: select 2X-Small.
Scaling: set the Max value to
40(the maximum allowed).Type: select Serverless.
Click Create.
In the Manage permissions modal, add the panther-load user, then select Can use permissions.

Click Add.
Click the X in the upper-right corner to close the Manage permissions modal.
On the panther-optimize warehouse details page, copy the ID (next to the name) and add it as the
databricks_optimize_warehouse_idvalue in your configuration table.
Step 18: Create a panther-query SQL warehouse
In the Databricks workspace you created above, click Compute.
Click SQL warehouses.
Click Create SQL warehouse.
Fill out the New SQL warehouse form:
Name: enter
panther-query.Cluster size: select Medium.
Scaling: set the Max value to
40(the maximum allowed).Type: select Serverless or Pro.
Click Create.
In the Manage permissions modal, add the panther-query user, then select Can use permissions.

Click Add.
Click the X in the upper-right corner to close the Manage permissions modal.
On the panther-query warehouse details page, copy the ID (next to the name) and add it as the
databricks_query_warehouse_idvalue in your configuration table.
Step 19: Create a panther-scheduled-query SQL warehouse
In the Databricks workspace you created above, click Compute.
Click SQL warehouses.
Click Create SQL warehouse.
Fill out the New SQL warehouse form:
Name: enter
panther-scheduled-query.Cluster size: select 3X-Large.
Scaling: set the Max value to
40(the maximum allowed).Type: select Serverless or Pro.
Click Create.
In the Manage permissions modal, add the panther-query user, then select Can use permissions.

Click Add.
Click the X in the upper-right corner to close the Manage permissions modal.
On the panther-scheduled-query warehouse details page, copy the ID (next to the name) and add it as the
databricks_scheduled_query_warehouse_idvalue in your configuration table.
Step 20: Send configuration values to Panther
Now that the configuration table you created in Step 1 is completely filled-in, share it with the Panther team.
Step 21: Return to the post-setup recommendations
Return to the Post-setup recommendations on Setting Up a Cloud Connected Panther Instance.
Last updated
Was this helpful?

