Skip to main content

Tracking Modes

Single-Task App [Default]

By default, Definity tracks one Spark application as one task. Configure these parameters when creating the Spark session:

  • spark.definity.pipeline.name - The pipeline this task belongs to
  • spark.definity.pipeline.pit - Point-in-time for the pipeline run
  • spark.definity.task.name - The name of this task

Definity will track all work done by this Spark application under a single task.

Multi-Task Shared Spark App

When a single Spark application is reused across multiple logical tasks, enable shared compute mode to track the compute cluster separately from the tasks running on it.

Enable Shared Compute Mode

Set these parameters in your Spark configuration:

spark.definity.sharedCompute=true
spark.definity.compute.name=my_compute_name

Start Logical Task Tracking

Use the spark.definity.session property to begin tracking a new task:

# Define a new task scope
spark.conf.set("spark.definity.session", f"pipeline.name={my_pipeline},pipeline.pit={pit_date},task.name={my_task}")

Stop Logical Task Tracking

When a task completes, unset the property to signal completion:

try:
# Your task logic here
...
finally:
# Signal task completion (recommended in a finally block to catch failures)
spark.conf.unset("spark.definity.session")

Platform-Specific Auto-Detection

Some platforms automatically detect task boundaries without manual configuration: