Data Lake Queries
Panther API search operations
Overview
The Panther API supports the following data lake operations:
Listing your data lake databases, tables, and columns
Executing a data lake (Data Explorer) query using SQL
Executing a Search query
Canceling any currently-running query
Fetching the details of any previously executed query
Listing all currently running or previously-executed queries with optional filters
You can invoke Panther's API by using your Console's API Playground, or the GraphQL-over-HTTP API. Learn more about these methods on Panther API.
See the sections below for GraphQL queries, mutations, and end-to-end workflow examples around core data lake query operations.
Common Data Lake query operations
Below are some of the most common GraphQL Data Lake query operations in Panther. These examples demonstrate the documents you have to send using a GraphQL client (or curl
) to make a call to Panther's GraphQL API.
Database Entities
# `AllDatabaseEntities` is a nickname for the operation
query AllDatabaseEntities {
dataLakeDatabases {
name
description
tables {
name
description
columns {
name
description
type
}
}
}
}
Executing queries
# `IssueDataLakeQuery` is a nickname for the operation
mutation IssueDataLakeQuery {
executeDataLakeQuery(input: {
sql: "select * from panther_logs.public.aws_alb limit 50"
}) {
id # the unique ID of the query
}
}
Fetching results for a data lake or Search query
When you execute a data lake or Search query, it can take a few seconds to a few minutes for results to come back. To confirm that the query has completed, you must check the status of the query (polling).
You can use the following query to check the query status, while also fetching its results if available:
# `QueryResults` is a nickname for the operation
query QueryResults {
dataLakeQuery(id: "1234-1234-1234-1234") { # the unique ID of the query
message
status
results {
edges {
node
}
}
}
}
The expected values of status
and results
depend on the query's status:
If the query is still running:
status
will have a value ofrunning
results
will have a value ofnull
If the query has failed:
status
will have a value offailed
results
will have a value ofnull
and the error message will be available in themessage
key
If the query has completed
status
will have a value ofsucceeded
results
will be populated
All of the above (along with the possible values for status
) , along with additional fields you are allowed to request. Learn about the different ways to explore the Panther API schema here.
Fetching metadata around a data lake or Search query
In the example above, we requested the results
of a Panther query. It is also possible to request additional metadata around the query.
In the following example, we request these metadata along the first page of results:
# `QueryMetadata` is a nickname for the operation
query QueryMetadata {
dataLakeQuery(id: "1234-1234-1234-1234") { # the unique ID of the query
name
isScheduled
issuedBy {
... on User {
email
}
... on APIToken {
name
}
}
sql
message
status
startedAt
completedAt
results {
edges {
node
}
}
}
}
Listing data lake and Search queries
# `ListDataLakeQueries` is a nickname for the operation
query ListDataLakeQueries {
dataLakeQueries {
name
isScheduled
issuedBy {
... on User {
email
}
... on APIToken {
name
}
}
sql
message
status
startedAt
completedAt
results { # we're only fetching the first page of results for each query
edges {
node
}
}
}
End-to-end examples
Below, we will build on the Common Operations examples to showcase an end-to-end flow.
Execute a data lake (Data Explorer) Query
// npm install graphql graphql-request
import { GraphQLClient, gql } from 'graphql-request';
const client = new GraphQLClient(
'YOUR_PANTHER_API_URL',
{ headers: { 'X-API-Key': 'YOUR_API_KEY' }
});
// `IssueQuery` is a nickname for the query. You can fully omit it.
const issueQuery = gql`
mutation IssueQuery($sql: String!) {
executeDataLakeQuery(input: { sql: $sql }) {
id
}
}
`;
// `GetQueryResults` is a nickname for the query. You can fully omit it.
const getQueryResults = gql`
query GetQueryResults($id: ID!, $cursor: String) {
dataLakeQuery(id: $id) {
message
status
results(input: { cursor: $cursor }) {
edges {
node
}
pageInfo {
endCursor
hasNextPage
}
}
}
}
`;
(async () => {
try {
// an accumulator that holds all result nodes that we fetch
let allResults = [];
// a helper to know when to exit the loop
let hasMore = true;
// the pagination cursor
let cursor = null;
// issue a query
const mutationData = await client.request(issueQuery, {
sql: 'select * from panther_logs.public.aws_alb limit 5',
});
// Start polling the query until it returns results. From there,
// keep fetching pages until there are no more left
do {
const queryData = await client.request(getQueryResults, {
id: mutationData.executeDataLakeQuery.id,
cursor,
});
// if it's still running, print a message and keep polling
if (queryData.dataLakeQuery.status === 'running') {
console.log(queryData.dataLakeQuery.message);
continue;
}
// if it's not running & it's not completed, then it's
// either cancelled or it has errored out. In this case,
// throw an exception
if (queryData.dataLakeQuery.status !== 'succeeded') {
throw new Error(queryData.dataLakeQuery.message);
}
allResults = [...allResults, ...queryData.dataLakeQuery.results.edges.map(edge => edge.node)];
hasMore = queryData.dataLakeQuery.results.pageInfo.hasNextPage;
cursor = queryData.dataLakeQuery.results.pageInfo.endCursor;
} while (hasMore);
console.log(`Your query returned ${allResults.length} result(s)!`);
} catch (err) {
console.error(err.response);
}
})();
Execute a Search query
// npm install graphql graphql-request
import { GraphQLClient, gql } from 'graphql-request';
const client = new GraphQLClient(
'YOUR_PANTHER_API_URL',
{ headers: { 'X-API-Key': 'YOUR_API_KEY' }
});
// `IssueQuery` is a nickname for the query. You can fully omit it.
const issueQuery = gql`
mutation IssueQuery($input: ExecuteIndicatorSearchQueryInput!) {
executeIndicatorSearchQuery(input: $input) {
id
}
}
`;
// `GetQueryResults` is a nickname for the query. You can fully omit it.
const getQueryResults = gql`
query GetQueryResults($id: ID!, $cursor: String) {
dataLakeQuery(id: $id) {
message
status
results(input: { cursor: $cursor }) {
edges {
node
}
pageInfo {
endCursor
hasNextPage
}
}
}
}
`;
(async () => {
try {
// an accumulator that holds all result nodes that we fetch
let allResults = [];
// a helper to know when to exit the loop
let hasMore = true;
// the pagination cursor
let cursor = null;
// issue a query
const mutationData = await client.request(issueQuery, {
input: {
indicators: ["226103014039"],
startTime: "2022-03-29T00:00:00.001Z",
endTime: "2022-03-30T00:00:00.001Z",
indicatorName: "p_any_aws_account_ids"
}
});
// Keep fetching pages until there are no more left
do {
const queryData = await client.request(getQueryResults, {
id: mutationData.executeIndicatorSearchQuery.id,
cursor,
});
// if it's still running, print a message and keep polling
if (queryData.dataLakeQuery.status === 'running') {
console.log(queryData.dataLakeQuery.message);
continue;
}
// if it's not running & it's not completed, then it's
// either cancelled or it has errored out. In this case,
// throw an exception
if (queryData.dataLakeQuery.status !== 'succeeded') {
throw new Error(queryData.dataLakeQuery.message);
}
allResults = [...allResults, ...queryData.dataLakeQuery.results.edges.map(edge => edge.node)];
hasMore = queryData.dataLakeQuery.results.pageInfo.hasNextPage;
cursor = queryData.dataLakeQuery.results.pageInfo.endCursor;
} while (hasMore);
console.log(`Your query returned ${allResults.length} result(s)!`);
} catch (err) {
console.error(err.response);
}
})();
Last updated
Was this helpful?