Databricks
Supported Databricks Runtime versions: 12.2 - 16.4
❗ Note: Databricks Serverless is not supported for this instrumentation. You may optionally use the DBT agent instead.
Compatibility Matrix
| Databricks Release | Spark Version | Scala Version | Definity Agent |
|---|---|---|---|
| 16.4_LTS (scala 2.13) | 3.5.2 | 2.13 | 3.5_2.13-latest |
| 16.4_LTS (scala 2.12) | 3.5.2 | 2.12 | 3.5_2.12-latest |
| 15.4_LTS | 3.5.0 | 2.12 | 3.5_2.12-latest |
| 14.3_LTS | 3.5.0 | 2.12 | 3.5_2.12-latest |
| 13.3_LTS | 3.4.1 | 2.12 | 3.4_2.12-latest |
| 12.2_LTS | 3.3.2 | 2.12 | 3.3_2.12-latest |
Configuration
Add an init script to your cluster to automatically configure the Definity agent. The script will:
- Automatically detect your Spark and Scala versions
- Download the appropriate Definity Spark agent
- Configure the Definity plugin with default settings
- If configuration fails, the cluster will continue to start normally
1. Create an Init Script
Create an init script to automatically download and configure the Definity Spark agent:
databricks_definity_init.sh
#!/bin/bash
# ============================================================================
# Definity Agent Configuration for Databricks
# Tested Databricks Runtimes: 12.2 LTS - 16.4 LTS (Spark 3.3 - 3.5)
# ============================================================================
# This script automatically detects your Spark and Scala versions and
# configures the appropriate Definity Spark Agent.
#
# IMPORTANT: Replace YOUR_TOKEN below with your actual Definity API token
# before running this script.
#
# If configuration fails, the cluster will start normally without the agent.
# ============================================================================
# ============================================================================
# CONFIGURATION
# ============================================================================
# Optional: Set a specific agent version (e.g. "0.75.1")
# Leave empty to use the latest version
DEFINITY_AGENT_VERSION=""
DEFINITY_API_TOKEN="YOUR_TOKEN" # <<< REPLACE WITH YOUR ACTUAL TOKEN
# IMPORTANT: For production use, upload the agent JAR to your own
# artifact repository (Artifactory, Nexus, S3, etc.) and update this URL.
# The definity.run URL shown here is for demonstration purposes only.
# Example: "https://your-artifactory.company.com/repository/libs-release/definity-spark-agent"
ARTIFACT_BASE_URL="https://user:[email protected]/java"
# ============================================================================
# AUTO-DETECTION AND CONFIGURATION
# ============================================================================
JAR_DIR="/databricks/jars"
mkdir -p "$JAR_DIR"
# Extract Spark version from /databricks/spark/VERSION
FULL_SPARK_VERSION=$(cat /databricks/spark/VERSION)
SPARK_VERSION=$(echo "$FULL_SPARK_VERSION" | grep -oE '^[0-9]+\.[0-9]+')
echo "Detected Spark version: $SPARK_VERSION"
if [ -z "$SPARK_VERSION" ]; then
echo "Spark major.minor version is empty or not found. Will not proceed to install definity agent"
exit 0
fi
# Extract Scala version from /databricks/IMAGE_KEY
DBR_VERSION=$(cat /databricks/IMAGE_KEY)
SCALA_VERSION=$(echo "$DBR_VERSION" | grep -oE 'scala([0-9]+\.[0-9]+)' | sed 's/scala//')
echo "Detected Scala version: $SCALA_VERSION"
if [ -z "$SCALA_VERSION" ]; then
echo "Scala version is empty or not found. Will not proceed to install definity agent"
exit 0
fi
# Build agent version string with Spark and Scala versions
SPARK_AGENT_VERSION="${SPARK_VERSION}_${SCALA_VERSION}"
# Build the full agent version string
if [ -z "$DEFINITY_AGENT_VERSION" ]; then
# Use latest version
FULL_AGENT_VERSION="${SPARK_AGENT_VERSION}-latest"
else
# Use specific version
FULL_AGENT_VERSION="${SPARK_AGENT_VERSION}-${DEFINITY_AGENT_VERSION}"
fi
# Download the agent
DEFINITY_JAR_URL="${ARTIFACT_BASE_URL}/definity-spark-agent-${FULL_AGENT_VERSION}.jar"
echo "Downloading Definity Spark Agent ${FULL_AGENT_VERSION} for Spark ${SPARK_VERSION} (Scala ${SCALA_VERSION})..."
curl -f -o $JAR_DIR/definity-spark-agent.jar $DEFINITY_JAR_URL
if [ $? -eq 0 ]; then
echo "Successfully downloaded Definity Spark Agent"
else
echo "Failed to download Definity Spark Agent from: $DEFINITY_JAR_URL"
echo "Cluster will start without Definity agent"
exit 0
fi
# Configure Definity plugin
cat > /databricks/driver/conf/00-definity.conf << EOF
spark.plugins=ai.definity.spark.plugin.DefinitySparkPlugin
spark.definity.server="https://app.definity.run"
spark.definity.api.token="$DEFINITY_API_TOKEN"
EOF
echo "Definity Spark Agent configured successfully"
For production use, upload the Definity agent JAR to your own artifact repository (S3, Artifactory, Nexus, etc.) and update the ARTIFACT_BASE_URL in the script. Replace YOUR_TOKEN with your actual Definity API token, and consider using Databricks Secrets to manage the token securely.
2. Attach the Init Script to Your Compute Cluster
In the Databricks UI:
- Go to Cluster configuration → Advanced options → Init Scripts.
- Add your script with:
- Source:
s3 - File path:
s3://your-s3-bucket/init-scripts-dir/definity_init.sh
- Source:
3. Configure Cluster Name [Optional]
By default, the cluster name is derived from the Databricks cluster name. To customize it, navigate to Cluster configuration → Advanced options → Spark and add:
spark.definity.compute.name my_cluster_name
Advanced Tracking Modes
The default Databricks integration tracks the compute cluster separately from workflows and automatically detects running workflow tasks. You may want to change this behavior in these scenarios:
Single-Task Cluster
If you have a dedicated cluster per task, disable shared cluster tracking mode and provide the Pipeline Tracking Parameters in the init script:
spark.definity.sharedCompute=false
Manual Task Tracking
To manually control task scopes programmatically, disable Databricks automatic tracking:
spark.definity.databricks.automaticSessions.enabled=false
Then follow the Multi-Task Shared Spark App guide.