The Panther API supports the following data lake operations:
Listing your data lake databases, tables, and columns
Executing a data lake (Data Explorer) query using SQL
Executing a Search query
Canceling any currently-running query
Fetching the details of any previously executed query
Listing all currently running or previously-executed queries with optional filters
You can invoke Panther's API by using your Console's API Playground, or the GraphQL-over-HTTP API. Learn more about these methods on Panther API.
See the sections below for GraphQL queries, mutations, and end-to-end workflow examples around core data lake query operations.
Queries managed via the API must be written in SQL; they cannot use PantherFlow.
Common Data Lake query operations
Below are some of the most common GraphQL Data Lake query operations in Panther. These examples demonstrate the documents you have to send using a GraphQL client (or curl) to make a call to Panther's GraphQL API.
Database Entities
# `AllDatabaseEntities` is a nickname for the operationqueryAllDatabaseEntities { dataLakeDatabases { name description tables { name description columns { name description type } } } }
# `DatabaseEntities` is a nickname for the operationqueryDatabaseEntities { dataLakeDatabase(name: "panther_logs.public") { name description tables { name description columns { name description type } } }}
Executing queries
# `IssueDataLakeQuery` is a nickname for the operationmutationIssueDataLakeQuery { executeDataLakeQuery(input: {sql: "select * from panther_logs.public.aws_alb limit 50" }) { id # the unique ID of the query }}
# `IssueIndicatorSearchQuery` is a nickname for the operationmutationIssueIndicatorSearchQuery { executeIndicatorSearchQuery(input: {indicators: ["286103014039", "126103014049"]startTime: "2022-04-01T00:00:00.000Z",endTime: "2022-04-30T23:59:59.000Z"indicatorName: p_any_aws_account_ids # or leave blank for auto-detect }) { id # the unique ID of the query }}
# `AbandonQuery` is a nickname for the operationmutationAbandonQuery { cancelDataLakeQuery(input: { id: "1234-5678" }) { id # return the ID that got canceled }}
Fetching results for a data lake or Search query
When you execute a data lake or Search query, it can take a few seconds to a few minutes for results to come back. To confirm that the query has completed, you must check the status of the query (polling).
You can use the following query to check the query status, while also fetching its results if available:
# `QueryResults` is a nickname for the operationqueryQueryResults { dataLakeQuery(id: "1234-1234-1234-1234") { # the unique ID of the query message status results { edges { node } } }}
# `QueryResults` is a nickname for the operationqueryQueryResults { dataLakeQuery(id: "1234-1234-1234-1234") { # the unique ID of the query message status results(input: { cursor: "5678-5678-5678-5678" }) { # the value of `endCursor` edges { node } pageInfo endCursor hasNextPage } } }
The expected values of status and results depend on the query's status:
If the query is still running:
status will have a value of running
results will have a value of null
If the query has failed:
status will have a value of failed
results will have a value of null and the error message will be available in the message key
Fetching metadata around a data lake or Search query
In the example above, we requested the results of a Panther query. It is also possible to request additional metadata around the query.
In the following example, we request these metadata along the first page of results:
# `QueryMetadata` is a nickname for the operationqueryQueryMetadata { dataLakeQuery(id: "1234-1234-1234-1234") { # the unique ID of the query name isScheduled issuedBy {...onUser { email }...onAPIToken { name } } sql message status startedAt completedAt results { edges { node } } }}
Listing data lake and Search queries
# `ListDataLakeQueries` is a nickname for the operationqueryListDataLakeQueries { dataLakeQueries { name isScheduled issuedBy {...onUser { email }...onAPIToken { name } } sql message status startedAt completedAt results { # we're only fetching the first page of results for each query edges { node } } }
# `ListDataLakeQueries` is a nickname for the operationqueryListDataLakeQueries { dataLakeQueries(input: { cursor: "5678-5678-5678-5678" }) { # the value of `endCursor` name isScheduled issuedBy {...onUser { email }...onAPIToken { name } } sql message status startedAt completedAt results { # we're only fetching the first page of results for each query edges { node } } pageInfo { endCursor hasNextPage } }
# `ListDataLakeQueries` is a nickname for the operationqueryListDataLakeQueries { dataLakeQueries(input: { contains: "aws_alb", isScheduled: true }) { name isScheduled issuedBy {...onUser { email }...onAPIToken { name } } sql message status startedAt completedAt results { # we're only fetching the first page of results for each query edges { node } } }
End-to-end examples
Below, we will build on the Common Operations examples to showcase an end-to-end flow.
Execute a data lake (Data Explorer) Query
// npm install graphql graphql-requestimport { GraphQLClient, gql } from'graphql-request';constclient=newGraphQLClient('YOUR_PANTHER_API_URL', { headers: { 'X-API-Key':'YOUR_API_KEY' } });// `IssueQuery` is a nickname for the query. You can fully omit it.constissueQuery=gql` mutation IssueQuery($sql: String!) { executeDataLakeQuery(input: { sql: $sql }) { id } }`;// `GetQueryResults` is a nickname for the query. You can fully omit it.constgetQueryResults=gql` query GetQueryResults($id: ID!, $cursor: String) { dataLakeQuery(id: $id) { message status results(input: { cursor: $cursor }) { edges { node } pageInfo { endCursor hasNextPage } } } }`;(async () => {try {// an accumulator that holds all result nodes that we fetchlet allResults = [];// a helper to know when to exit the looplet hasMore =true;// the pagination cursorlet cursor =null;// issue a queryconstmutationData=awaitclient.request(issueQuery, { sql:'select * from panther_logs.public.aws_alb limit 5', });// Start polling the query until it returns results. From there,// keep fetching pages until there are no more leftdo {constqueryData=awaitclient.request(getQueryResults, { id:mutationData.executeDataLakeQuery.id, cursor, });// if it's still running, print a message and keep pollingif (queryData.dataLakeQuery.status ==='running') {console.log(queryData.dataLakeQuery.message);continue; }// if it's not running & it's not completed, then it's// either cancelled or it has errored out. In this case,// throw an exceptionif (queryData.dataLakeQuery.status !=='succeeded') {thrownewError(queryData.dataLakeQuery.message); } allResults = [...allResults,...queryData.dataLakeQuery.results.edges.map(edge =>edge.node)]; hasMore =queryData.dataLakeQuery.results.pageInfo.hasNextPage; cursor =queryData.dataLakeQuery.results.pageInfo.endCursor; } while (hasMore);console.log(`Your query returned ${allResults.length} result(s)!`); } catch (err) {console.error(err.response); }})();
# pip install gql aiohttpfrom gql import gql, Clientfrom gql.transport.aiohttp import AIOHTTPTransporttransport =AIOHTTPTransport( url="YOUR_PANTHER_API_URL", headers={"X-API-Key": "YOUR_API_KEY"})client =Client(transport=transport, fetch_schema_from_transport=True)# `IssueQuery` is a nickname for the query. You can fully omit it.issue_query =gql(""" mutation IssueQuery($sql: String!) { executeDataLakeQuery(input: { sql: $sql }) { id } } """)# `GetQueryResults` is a nickname for the query. You can fully omit it.get_query_results =gql(""" query GetQueryResults($id: ID!, $cursor: String) { dataLakeQuery(id: $id) { message status results(input: { cursor: $cursor }) { edges { node } pageInfo { endCursor hasNextPage } } } } """)# an accumulator that holds all results that we fetch from all pagesall_results = []# a helper to know when to exit the loop.has_more =True# the pagination cursorcursor =None# Issue a Data Lake (Data Explorer) querymutation_data = client.execute( issue_query, variable_values={"sql": "select * from panther_logs.public.aws_alb limit 5" })# Start polling the query until it returns results. From there,# keep fetching pages until there are no more leftwhile has_more: query_data = client.execute( get_query_results, variable_values = {"id": mutation_data["executeDataLakeQuery"]["id"],"cursor": cursor } )# if it's still running, print a message and keep pollingif query_data["dataLakeQuery"]["status"] =="running":print(query_data["dataLakeQuery"]["message"])continue# if it's not running & it's not completed, then it's# either cancelled or it has errored out. In this case,# throw an exceptionif query_data["dataLakeQuery"]["status"] !="succeeded":raiseException(query_data["dataLakeQuery"]["message"]) all_results.extend([edge["node"] for edge in query_data["dataLakeQuery"]["results"]["edges"]]) has_more = query_data["dataLakeQuery"]["results"]["pageInfo"]["hasNextPage"] cursor = query_data["dataLakeQuery"]["results"]["pageInfo"]["endCursor"]print(f'Query returned {len(all_results)} results(s)!')
Execute a Search query
// npm install graphql graphql-requestimport { GraphQLClient, gql } from'graphql-request';constclient=newGraphQLClient('YOUR_PANTHER_API_URL', { headers: { 'X-API-Key':'YOUR_API_KEY' } });// `IssueQuery` is a nickname for the query. You can fully omit it.constissueQuery=gql` mutation IssueQuery($input: ExecuteIndicatorSearchQueryInput!) { executeIndicatorSearchQuery(input: $input) { id } }`;// `GetQueryResults` is a nickname for the query. You can fully omit it.constgetQueryResults=gql` query GetQueryResults($id: ID!, $cursor: String) { dataLakeQuery(id: $id) { message status results(input: { cursor: $cursor }) { edges { node } pageInfo { endCursor hasNextPage } } } }`;(async () => {try {// an accumulator that holds all result nodes that we fetchlet allResults = [];// a helper to know when to exit the looplet hasMore =true;// the pagination cursorlet cursor =null;// issue a queryconstmutationData=awaitclient.request(issueQuery, { input: { indicators: ["226103014039"], startTime:"2022-03-29T00:00:00.001Z", endTime:"2022-03-30T00:00:00.001Z", indicatorName:"p_any_aws_account_ids" } });// Keep fetching pages until there are no more leftdo {constqueryData=awaitclient.request(getQueryResults, { id:mutationData.executeIndicatorSearchQuery.id, cursor, });// if it's still running, print a message and keep pollingif (queryData.dataLakeQuery.status ==='running') {console.log(queryData.dataLakeQuery.message);continue; }// if it's not running & it's not completed, then it's// either cancelled or it has errored out. In this case,// throw an exceptionif (queryData.dataLakeQuery.status !=='succeeded') {thrownewError(queryData.dataLakeQuery.message); } allResults = [...allResults,...queryData.dataLakeQuery.results.edges.map(edge =>edge.node)]; hasMore =queryData.dataLakeQuery.results.pageInfo.hasNextPage; cursor =queryData.dataLakeQuery.results.pageInfo.endCursor; } while (hasMore);console.log(`Your query returned ${allResults.length} result(s)!`); } catch (err) {console.error(err.response); }})();
# pip install gql aiohttpfrom gql import gql, Clientfrom gql.transport.aiohttp import AIOHTTPTransporttransport =AIOHTTPTransport( url="YOUR_PANTHER_API_URL", headers={"X-API-Key": "YOUR_API_KEY"})client =Client(transport=transport, fetch_schema_from_transport=True)# `IssueQuery` is a nickname for the query. You can fully omit it.issue_query =gql(""" mutation IssueQuery($input: ExecuteIndicatorSearchQueryInput!) { executeIndicatorSearchQuery(input: $input) { id } } """)# `GetQueryResults` is a nickname for the query. You can fully omit it.get_query_results =gql(""" query GetQueryResults($id: ID!, $cursor: String) { dataLakeQuery(id: $id) { message status results(input: { cursor: $cursor }) { edges { node } pageInfo { endCursor hasNextPage } } } } """)# an accumulator that holds all results that we fetch from all pagesall_results = []# a helper to know when to exit the loophas_more =True# the pagination cursorcursor =None# Issue an Indicator Search querymutation_data = client.execute( issue_query, variable_values={"input": { "indicators": ["226103014039"],"startTime": "2022-03-29T00:00:00.001Z","endTime": "2022-03-30T00:00:00.001Z","indicatorName": "p_any_aws_account_ids" } })# Start polling the query until it returns results. From there,# keep fetching pages until there are no more leftwhile has_more: query_data = client.execute( get_query_results, variable_values = {"id": mutation_data["executeIndicatorSearchQuery"]["id"],"cursor": cursor } )# if it's still running, print a message and keep pollingif query_data["dataLakeQuery"]["status"] =="running":print(query_data["dataLakeQuery"]["message"])continue# if it's not running & it's not completed, then it's# either cancelled or it has errored out. In this case,# throw an exceptionif query_data["dataLakeQuery"]["status"] !="succeeded":raiseException(query_data["dataLakeQuery"]["message"]) all_results.extend([edge["node"] for edge in query_data["dataLakeQuery"]["results"]["edges"]]) has_more = query_data["dataLakeQuery"]["results"]["pageInfo"]["hasNextPage"] cursor = query_data["dataLakeQuery"]["results"]["pageInfo"]["endCursor"]print(f'Query returned {len(all_results)} results(s)!')