Tracking Modes
Single-Task App [Default]
By default, Definity tracks one Spark application as one task. Configure these parameters when creating the Spark session:
spark.definity.pipeline.name- The pipeline this task belongs tospark.definity.pipeline.pit- Point-in-time for the pipeline runspark.definity.task.name- The name of this task
Definity will track all work done by this Spark application under a single task.
Multi-Task Shared Spark App
When a single Spark application is reused across multiple logical tasks, enable shared compute mode to track the compute cluster separately from the tasks running on it.
Enable Shared Compute Mode
Set these parameters in your Spark configuration:
spark.definity.sharedCompute=true
spark.definity.compute.name=my_compute_name
Start Logical Task Tracking
Use the spark.definity.session property to begin tracking a new task:
# Define a new task scope
spark.conf.set("spark.definity.session", f"pipeline.name={my_pipeline},pipeline.pit={pit_date},task.name={my_task}")
Stop Logical Task Tracking
When a task completes, unset the property to signal completion:
try:
# Your task logic here
...
finally:
# Signal task completion (recommended in a finally block to catch failures)
spark.conf.unset("spark.definity.session")
Platform-Specific Auto-Detection
Some platforms automatically detect task boundaries without manual configuration:
- Databricks: Multi-task workflows are automatically tracked (see Databricks Integration)