Skip to main content

Databricks

Supported Databricks Runtime versions: 12.2 - 16.4

Note: Databricks Serverless is not supported for this instrumentation. You may optionally use the DBT agent instead.

Compatibility Matrix

Databricks ReleaseSpark VersionScala VersionDefinity Agent
16.4_LTS (scala 2.13)3.5.22.133.5_2.13-latest
16.4_LTS (scala 2.12)3.5.22.123.5_2.12-latest
15.4_LTS3.5.02.123.5_2.12-latest
14.3_LTS3.5.02.123.5_2.12-latest
13.3_LTS3.4.12.123.4_2.12-latest
12.2_LTS3.3.22.123.3_2.12-latest

Configuration

Add an init script to your cluster to automatically configure the Definity agent. The script will:

  • Automatically detect your Spark and Scala versions
  • Download the appropriate Definity Spark agent
  • Configure the Definity plugin with default settings
  • If configuration fails, the cluster will continue to start normally
Quick Evaluation

For a quick evaluation, skip to Step 3 — just set your API token in the script.

1. Store Your API Token as a Databricks Secret

Use the Databricks CLI to create a secret scope and store your Definity API token:

databricks secrets create-scope definity
databricks secrets put-secret definity api-token --string-value "<YOUR_DEFINITY_API_TOKEN>"

Then add the following to your cluster's Environment Variables (Cluster configurationAdvanced optionsSpark tab → Environment Variables):

DEFINITY_API_TOKEN={{secrets/definity/api-token}}

This makes the token available to the init script at runtime without hardcoding it. See Databricks documentation for more details.

2. Upload the Agent JARs

Download the agent JARs for the Spark/Scala versions you use (see Compatibility Matrix) and upload them to a location accessible from your cluster. The init script auto-detects the Spark and Scala version at startup and fetches the matching JAR.

Supported storage options:

StorageARTIFACT_BASE_PATH exampleNotes
HTTP/HTTPS"https://your-artifactory.com/repo/libs-release"Artifactory, Nexus, or any HTTP server
S3"s3://your-bucket/definity"Cluster needs an instance profile or IAM role with access
DBFS"/dbfs/FileStore/definity"Upload via Databricks CLI: databricks fs cp <jar> dbfs:/FileStore/definity/<jar>

3. Create an Init Script

Create an init script to automatically download and configure the Definity Spark agent. Set ARTIFACT_BASE_PATH to match your setup:

databricks_definity_init.sh
#!/bin/bash

# ============================================================================
# Definity Agent Configuration for Databricks
# Tested Databricks Runtimes: 12.2 LTS - 16.4 LTS (Spark 3.3 - 3.5)
# ============================================================================
# This script automatically detects your Spark and Scala versions and
# installs the appropriate Definity Spark Agent.
#
# If installation fails, the cluster will start normally without the agent.
# ============================================================================

# ============================================================================
# CONFIGURATION
# ============================================================================

# Base path to the agent JARs.
# The script auto-detects Spark/Scala and appends the JAR filename, e.g.:
# {base_path}/definity-spark-agent-3.5_2.12-0.80.2.jar
#
# IMPORTANT: For production use, upload the agent JAR to your own
# The definity.run URL shown here is for demonstration purposes only.
# artifact repository (Artifactory, Nexus, S3, etc.) and update this URL. For example:
# HTTP/HTTPS : "https://your-artifactory.company.com/repository/libs-release"
# S3 : "s3://your-bucket/definity"
# DBFS : "/dbfs/FileStore/jars"
ARTIFACT_BASE_PATH="https://user:[email protected]/java"

# Version of the Definity agent (e.g. "0.80.2")
DEFINITY_AGENT_VERSION="latest"

# Definity API token. We recommend fetching this from Databricks Secrets
# via a cluster environment variable rather than hardcoding it here.
# See docs for setup instructions.
DEFINITY_API_TOKEN="${DEFINITY_API_TOKEN:-<YOUR_API_TOKEN>}"

# ============================================================================
# AUTO-DETECTION AND INSTALLATION
# ============================================================================

JAR_DIR="/databricks/jars"
mkdir -p "$JAR_DIR"

# Extract Spark version from /databricks/spark/VERSION
FULL_SPARK_VERSION=$(cat /databricks/spark/VERSION)
SPARK_VERSION=$(echo "$FULL_SPARK_VERSION" | grep -oE '^[0-9]+\.[0-9]+')
echo "Detected Spark version: $SPARK_VERSION"

if [ -z "$SPARK_VERSION" ]; then
echo "Spark major.minor version is empty or not found. Will not proceed to install definity agent"
exit 0
fi

# Extract Scala version from /databricks/IMAGE_KEY
DBR_VERSION=$(cat /databricks/IMAGE_KEY)
SCALA_VERSION=$(echo "$DBR_VERSION" | grep -oE 'scala([0-9]+\.[0-9]+)' | sed 's/scala//')
echo "Detected Scala version: $SCALA_VERSION"

if [ -z "$SCALA_VERSION" ]; then
echo "Scala version is empty or not found. Will not proceed to install definity agent"
exit 0
fi

# Build agent version string with Spark and Scala versions
SPARK_AGENT_VERSION="${SPARK_VERSION}_${SCALA_VERSION}"

# Build the full agent version string
FULL_AGENT_VERSION="${SPARK_AGENT_VERSION}-${DEFINITY_AGENT_VERSION}"

# Fetch the agent JAR
AGENT_JAR_NAME="definity-spark-agent-${FULL_AGENT_VERSION}.jar"
AGENT_JAR_SRC="${ARTIFACT_BASE_PATH}/${AGENT_JAR_NAME}"
echo "Fetching Definity Spark Agent ${FULL_AGENT_VERSION} from ${AGENT_JAR_SRC} ..."

if [[ "$ARTIFACT_BASE_PATH" == s3://* ]]; then
aws s3 cp "$AGENT_JAR_SRC" "$JAR_DIR/definity-spark-agent.jar"
elif [[ "$ARTIFACT_BASE_PATH" == /dbfs/* ]]; then
cp "$AGENT_JAR_SRC" "$JAR_DIR/definity-spark-agent.jar"
else
curl -f -o "$JAR_DIR/definity-spark-agent.jar" "$AGENT_JAR_SRC"
fi

if [ $? -ne 0 ]; then
echo "Failed to fetch Definity Spark Agent from: $AGENT_JAR_SRC"
echo "Cluster will start without Definity agent"
exit 0
fi

echo "Successfully installed Definity Spark Agent"

# Configure Definity plugin
cat > /databricks/driver/conf/00-definity.conf << EOF
spark.plugins=ai.definity.spark.plugin.DefinitySparkPlugin
spark.definity.server="https://app.definity.run"
spark.definity.api.token="$DEFINITY_API_TOKEN"
EOF

echo "Definity Spark Agent configured successfully"

4. Attach the Init Script to Your Compute Cluster

In the Databricks UI:

  1. Go to Cluster configurationAdvanced optionsInit Scripts.
  2. Add your script with the appropriate source and path (e.g. S3, DBFS, or Workspace).

5. Configure Cluster Name [Optional]

By default, the cluster name is derived from the Databricks cluster name. To customize it, navigate to Cluster configurationAdvanced optionsSpark and add:

spark.definity.compute.name      my_cluster_name

Advanced Tracking Modes

The default Databricks integration tracks the compute cluster separately from workflows and automatically detects running workflow tasks. You may want to change this behavior in these scenarios:

Single-Task Cluster

If you have a dedicated cluster per task, disable shared cluster tracking mode and provide the Pipeline Tracking Parameters in the init script:

spark.definity.sharedCompute=false

Manual Task Tracking

To manually control task scopes programmatically, disable Databricks automatic tracking:

spark.definity.databricks.automaticSessions.enabled=false

Then follow the Multi-Task Shared Spark App guide.