Databricks
Supported Databricks Runtime versions: 12.2 - 16.4
❗ Note: Databricks Serverless is not supported for this instrumentation. You may optionally use the DBT agent instead.
Compatibility Matrix
| Databricks Release | Spark Version | Scala Version | Definity Agent |
|---|---|---|---|
| 16.4_LTS (scala 2.13) | 3.5.2 | 2.13 | 3.5_2.13-latest |
| 16.4_LTS (scala 2.12) | 3.5.2 | 2.12 | 3.5_2.12-latest |
| 15.4_LTS | 3.5.0 | 2.12 | 3.5_2.12-latest |
| 14.3_LTS | 3.5.0 | 2.12 | 3.5_2.12-latest |
| 13.3_LTS | 3.4.1 | 2.12 | 3.4_2.12-latest |
| 12.2_LTS | 3.3.2 | 2.12 | 3.3_2.12-latest |
Configuration
Add an init script to your cluster to automatically configure the Definity agent. The script will:
- Automatically detect your Spark and Scala versions
- Download the appropriate Definity Spark agent
- Configure the Definity plugin with default settings
- If configuration fails, the cluster will continue to start normally
For a quick evaluation, skip to Step 3 — just set your API token in the script.
1. Store Your API Token as a Databricks Secret
Use the Databricks CLI to create a secret scope and store your Definity API token:
databricks secrets create-scope definity
databricks secrets put-secret definity api-token --string-value "<YOUR_DEFINITY_API_TOKEN>"
Then add the following to your cluster's Environment Variables (Cluster configuration → Advanced options → Spark tab → Environment Variables):
DEFINITY_API_TOKEN={{secrets/definity/api-token}}
This makes the token available to the init script at runtime without hardcoding it. See Databricks documentation for more details.
2. Upload the Agent JARs
Download the agent JARs for the Spark/Scala versions you use (see Compatibility Matrix) and upload them to a location accessible from your cluster. The init script auto-detects the Spark and Scala version at startup and fetches the matching JAR.
Supported storage options:
| Storage | ARTIFACT_BASE_PATH example | Notes |
|---|---|---|
| HTTP/HTTPS | "https://your-artifactory.com/repo/libs-release" | Artifactory, Nexus, or any HTTP server |
| S3 | "s3://your-bucket/definity" | Cluster needs an instance profile or IAM role with access |
| DBFS | "/dbfs/FileStore/definity" | Upload via Databricks CLI: databricks fs cp <jar> dbfs:/FileStore/definity/<jar> |
3. Create an Init Script
Create an init script to automatically download and configure the Definity Spark agent. Set ARTIFACT_BASE_PATH to match your setup:
databricks_definity_init.sh
#!/bin/bash
# ============================================================================
# Definity Agent Configuration for Databricks
# Tested Databricks Runtimes: 12.2 LTS - 16.4 LTS (Spark 3.3 - 3.5)
# ============================================================================
# This script automatically detects your Spark and Scala versions and
# installs the appropriate Definity Spark Agent.
#
# If installation fails, the cluster will start normally without the agent.
# ============================================================================
# ============================================================================
# CONFIGURATION
# ============================================================================
# Base path to the agent JARs.
# The script auto-detects Spark/Scala and appends the JAR filename, e.g.:
# {base_path}/definity-spark-agent-3.5_2.12-0.80.2.jar
#
# IMPORTANT: For production use, upload the agent JAR to your own
# The definity.run URL shown here is for demonstration purposes only.
# artifact repository (Artifactory, Nexus, S3, etc.) and update this URL. For example:
# HTTP/HTTPS : "https://your-artifactory.company.com/repository/libs-release"
# S3 : "s3://your-bucket/definity"
# DBFS : "/dbfs/FileStore/jars"
ARTIFACT_BASE_PATH="https://user:[email protected]/java"
# Version of the Definity agent (e.g. "0.80.2")
DEFINITY_AGENT_VERSION="latest"
# Definity API token. We recommend fetching this from Databricks Secrets
# via a cluster environment variable rather than hardcoding it here.
# See docs for setup instructions.
DEFINITY_API_TOKEN="${DEFINITY_API_TOKEN:-<YOUR_API_TOKEN>}"
# ============================================================================
# AUTO-DETECTION AND INSTALLATION
# ============================================================================
JAR_DIR="/databricks/jars"
mkdir -p "$JAR_DIR"
# Extract Spark version from /databricks/spark/VERSION
FULL_SPARK_VERSION=$(cat /databricks/spark/VERSION)
SPARK_VERSION=$(echo "$FULL_SPARK_VERSION" | grep -oE '^[0-9]+\.[0-9]+')
echo "Detected Spark version: $SPARK_VERSION"
if [ -z "$SPARK_VERSION" ]; then
echo "Spark major.minor version is empty or not found. Will not proceed to install definity agent"
exit 0
fi
# Extract Scala version from /databricks/IMAGE_KEY
DBR_VERSION=$(cat /databricks/IMAGE_KEY)
SCALA_VERSION=$(echo "$DBR_VERSION" | grep -oE 'scala([0-9]+\.[0-9]+)' | sed 's/scala//')
echo "Detected Scala version: $SCALA_VERSION"
if [ -z "$SCALA_VERSION" ]; then
echo "Scala version is empty or not found. Will not proceed to install definity agent"
exit 0
fi
# Build agent version string with Spark and Scala versions
SPARK_AGENT_VERSION="${SPARK_VERSION}_${SCALA_VERSION}"
# Build the full agent version string
FULL_AGENT_VERSION="${SPARK_AGENT_VERSION}-${DEFINITY_AGENT_VERSION}"
# Fetch the agent JAR
AGENT_JAR_NAME="definity-spark-agent-${FULL_AGENT_VERSION}.jar"
AGENT_JAR_SRC="${ARTIFACT_BASE_PATH}/${AGENT_JAR_NAME}"
echo "Fetching Definity Spark Agent ${FULL_AGENT_VERSION} from ${AGENT_JAR_SRC} ..."
if [[ "$ARTIFACT_BASE_PATH" == s3://* ]]; then
aws s3 cp "$AGENT_JAR_SRC" "$JAR_DIR/definity-spark-agent.jar"
elif [[ "$ARTIFACT_BASE_PATH" == /dbfs/* ]]; then
cp "$AGENT_JAR_SRC" "$JAR_DIR/definity-spark-agent.jar"
else
curl -f -o "$JAR_DIR/definity-spark-agent.jar" "$AGENT_JAR_SRC"
fi
if [ $? -ne 0 ]; then
echo "Failed to fetch Definity Spark Agent from: $AGENT_JAR_SRC"
echo "Cluster will start without Definity agent"
exit 0
fi
echo "Successfully installed Definity Spark Agent"
# Configure Definity plugin
cat > /databricks/driver/conf/00-definity.conf << EOF
spark.plugins=ai.definity.spark.plugin.DefinitySparkPlugin
spark.definity.server="https://app.definity.run"
spark.definity.api.token="$DEFINITY_API_TOKEN"
EOF
echo "Definity Spark Agent configured successfully"
4. Attach the Init Script to Your Compute Cluster
In the Databricks UI:
- Go to Cluster configuration → Advanced options → Init Scripts.
- Add your script with the appropriate source and path (e.g. S3, DBFS, or Workspace).
5. Configure Cluster Name [Optional]
By default, the cluster name is derived from the Databricks cluster name. To customize it, navigate to Cluster configuration → Advanced options → Spark and add:
spark.definity.compute.name my_cluster_name
Advanced Tracking Modes
The default Databricks integration tracks the compute cluster separately from workflows and automatically detects running workflow tasks. You may want to change this behavior in these scenarios:
Single-Task Cluster
If you have a dedicated cluster per task, disable shared cluster tracking mode and provide the Pipeline Tracking Parameters in the init script:
spark.definity.sharedCompute=false
Manual Task Tracking
To manually control task scopes programmatically, disable Databricks automatic tracking:
spark.definity.databricks.automaticSessions.enabled=false
Then follow the Multi-Task Shared Spark App guide.