Spark Agent Changelog
0.88.3 - (2026-05-10)
- Fix used vcores over time metric (better stages state cleanup)
0.88.2 - (2026-05-06)
- Allow capturing driver thread dump for all threads
- Rename agent token prop & env variable
0.88.1 - (2026-05-06)
- Reduce stage event size (config)
0.88.0 - (2026-05-04)
- Add support for sessions recovery in infinite streaming apps
- Add all spark task time components metrics
- Enrich executors event with metrics and errors
- Report spark's listener bus dropped events
- Fixes & improvements:
- refactor table info to use catalog table only
- avoid ops on shutdown
- increase metrics sampling timer rate
- use adaptive budget supplier
- time series metrics fixes: late start time, negative values & artificial spikes
- locking reduction & timeout protections
- avoid dups in getAllSessions
- avoid exposing non active sessions
0.87.3 - (2026-04-19)
- Support discovering input datasets from every query
0.87.2 - (2026-04-15)
- Add driver cpu utilization metrics
- Fix process tree metrics for client mode pyspark
- fix rate metrics
0.87.1 - (2026-04-12)
- Merge iceberg module to the main agent module
- remove common profile
0.87.0 - (2026-04-09)
- Support Databricks DLT
- Databricks fixes:
- Support auto session creation for legacy single task jobs
- Fix cluster names - include job name when needed and try to clean timestamps
- Fix session rotation to rotate all sessions when needed
- Add trigger type to stream tf description
0.86.1 - (2026-03-30)
- More fixes for Databricks 17.3
- Fix files sanitization to start from second path part
0.86.0 - (2026-03-25)
- Support Spark 4
- Support Databricks 17.3
0.85.0 - (2026-03-01)
- Query scanning v2
- Tf logs - remove tf-plan calls (and tf_name field)
- Support server disabled response v2
- Add flag to allow reporting logical plan based on the analyzed plan instead of the optimized plan (to avoid physical plan subtree, e.g when caching)
- Databricks - kinesis & cloud files sources
- Streaming metrics -
- Global instead of tf-based
- Add trigger count metric
- Log definity stats periodically
- Fixes:
- Streaming rotation (inherit all configs & complete stream tf)
- Analyze input only when first seen and is not a pipeline output
- Report slow planning events only for non skipped queries (i.e not for nested tfs)
- Change "vcore used by all tasks" to be avg
- Change task id to long instead of int
- Add root stream tf id to stage & plan events
- Ignore server errors in shared compute default session
- Revive global definity disablement option (if default session disabled by conf or server request)
- Thread safety protections
- Logging -
- Add more internal error events (scanning errors)
- Log server error responses
- Log versions & definity params on definity init
- Log spark unexpected stop callstack
- Cleaner definity-logger error/disable handling
0.80.2 - (2026-02-01)
- Bug fixes
- Databricks
- improve auto sessions management (+ wheel tasks support)
- report retry context params
- avoid failure by heuristics (recent failure time)
- Use thread local session for broadcast hook
- known exceptions protection -
- unity catalog permissions
- IndexOutOfBoundsException on task distribution metrics
- Unevaluated expressions errors in plan serializer
- Databricks
0.80.1 - (2026-01-06)
- Add root tf id to stages
- Refactor hooks
0.80.0 - (2026-01-01)
- Streaming
- Infinite sessions rotation
- Batch inputs metrics
- Executors periodic thread dump event
0.79.0 - (2025-12-28)
- Streaming support phase 2 - report streaming metrics
- Fixes
- report meta metrics for failed tfs
- limit max inputs per plan node
0.78.0 - (2025-12-17)
- Streaming support phase 1
- Extract collected values from collect queries
- Fixes
- Logger errors handling - increase retries and remove circuit breaker
- Databricks - default cluster name - use job name instead of id
- Databricks - reduce flush requests
- Improve server override support for more features
- Disable range partitioner event by default
0.77.2 - (2025-12-10)
- BigQuery diversion - support project & dataset overrides
0.77.1 - (2025-11-23)
- Plan serialization fixes - cleaner node names, catalogs & kafka relations.
- Lineage inputs / outputs fixes -
- Support inputs group nodes
- Jdbc / BigQuery - support parsing of complex queries with multiple tables
- Limit name length
- Output Diversion -
- Add support for "With" statement
- BQ - Support input query with multiple tables
- Wrap input tables with an alias if table suffix is used
- Add spill distribution metrics to stages
- BigQuery connector - support 0.20.0
0.77.0 - (2025-11-16)
- Link physical plan to stages
0.76.2 - (2025-11-11)
- Add spark's managed memory metrics
- Support BigQuery indirect writes bucket diversion
- Fixes
- Shared compute - pass missing time-series metrics from child to compute
- Use metrics polling time interval for the default time series bucket size, to avoid gaps
- Keep sending executors metrics even after executor error
0.76.1 - (2025-11-06)
- Remove running tfs task status
- move bb logs to debug
0.76.0 - (2025-11-04)
- Support spark 3.5 with scala 2.13
0.75.1 - (2025-10-29)
- Databricks - add support for git python tasks
0.75.0 - (2025-10-27)
- Compute support - add shared compute flag & improve compute name
- Fix - avoid task duration metric for vcore utilization
- Ignore add jars commands
0.74.3 - (2025-10-15)
- DBX fixes
- Fix auto-session for python notebooks
- Use same pit by default for sessions in multi session apps
0.74.2 - (2025-10-05)
- Bug fixes
- Don't fail on missing table location
- Simplify skew key extraction to avoid dbx errors in cast expression
0.74.1 - (2025-09-30)
- BQ - support more versions & extract output metrics
- Add runtime versions params (java, scala, python & bq)
- Extend executor & host info with gc type and machine size
- Fixes - uncaught exception for default session, shuffle hook & more
0.74.0 - (2025-09-17)
- Support tfs batch
- Support params override by server
- Include session thread in driver thread dump
0.73.5 - (2025-09-10)
- GCP & Dataproc info
- Bug fixes (NPE in skew event when taskMetrics is null & broadcast event leak)
0.73.4 - (2025-09-07)
- Fix support for custom metrics in databricks
0.73.3 - (2025-09-04)
- Fix non-heap memory & cpu usage metrics - use custom impl and add java/python/other breakdown
- Custom metrics - increase default task run limit and report calc duration
- Reduce logs
0.73.1 - (2025-08-24)
- Fix shuffle metadata retrieval for skipped stages
- Add S3 debug metrics
- Add plugin msgs over time metric
- Use java shutdown hooks directly (instead of spark wrapper)
0.73.0 - (2025-08-20)
- Add physical plan events
- Add automatic thread dump - for idle driver & skewed tasks
- Skew significance - keep only significant skewed on stage end or delete existing ones
- Range partitioner events fix
- Executors event - add removed reason
- Report definity uncaught exceptions
- Reduce spamming logs
- Remove info field from tfs
- Limit both events & tfs per task run.
0.72.1 - (2025-08-10)
- Stage metrics
- Skew detection improvements and fixes - skewed task metrics, single event, merged intervals & bug fixes.
- Executors avg used memory metric
0.72.0 - (2025-07-23)
- Support Spark 2.4 with Scala 2.12
- Databricks & EMR on EC2 - cluster info
- Executors info
0.71.2 - (2025-07-23)
- Databricks - use job & task names by default
0.71.1 - (2025-07-20)
- default cluster name
0.71.0 - (2025-07-17)
- Multi session support v2
0.70.2 - (2025-07-06)
- Lineage support for dsv2 & delta files
- Support api token from env
- Add debug events
0.70.1 - (2025-06-26)
- Add fs & s3 metrics
0.70.0 - (2025-06-26)
- Support Spark 2.4 Plugin for memory & skew
0.60.5 - (2025-05-29)
- Add permissive load option to support missing paths
0.60.4 - (2025-05-26)
- Add index & cache refresh events
- Add shuffle info to stage events
- Bug Fix for diversion with temp view
0.60.1 - (2025-04-29)
- Databricks - use cluster name by default
- Metrics - add executors vcore time, executor cores & driver cores
- Bug fixes for local mode
0.60.0 - (2025-04-21)
- Plugin - improved integration (dont require listener config) & allow disabling executor side plugin
- Bug fix (skew event for stage retry)
0.43.0 - (2025-04-20)
- Databricks - support for premature cluster termination
- Nested queries - avoid logging
- Params - add shuffle partitions & dynamic allocation details when enabled
0.42.1 - (2025-04-10)
- Databricks - support sessions auto-stop
- Bug fixes (ignore delta/hudi file indexes, ignore post-query-end root updates).
0.42.0 - (2025-04-09)
- Databricks - Bug fixes & Support for 12.2, python tasks sessions and connect.
- Multi session apps - report default id.
- Bug fixes (definity stats, output diversion & spark connect).
0.41.0 - (2025-04-02)
- Support output diversion for files
- Metrics - fixes for task cpus & new active tasks metric
- Improved agent logs
0.40.2 - (2025-03-21)
- Fix task status on yarn & cluster mode
- Support custom rdd inputs
0.40.0 - (2025-03-16)
- Added Events - skew, slow planning, slow load, broadcasts & stages
- Added metrics - dynamic allocation tracking, shuffle times, cache size & disk spill over time
- Databricks support & fixes
0.30.1 - (2025-03-09)
- Move to use async calls to the server
- Added metrics - driver GC time and task retries cost
- Added tracking for Python UDFs
- Fixed BQ output diversion issue
- Fixed old Delta version empty catalog table bug
0.20.3 - (2025-02-05)
- Support Databricks multitask notebooks jobs
- Heartbeat bug fix
0.20.2 - (2025-02-04)
- Databricks related bug fixes
- Support volume metrics for old Delta version (0.4.x)
0.20.1 - (2025-02-02)
- Support Big Query output diversion
- Expose task id on spark session conf ("spark.definity.task.id")
0.20.0 - (2025-01-30)
- Server protocol v2 (minimum supported server version: 0.20.0)
0.11.0 - (2025-01-13)
- Add support for custom metrics
- Add avg vcores used over time metric
0.10.2 - (2024-11-20)
- Skew keys detection
- Support Big Query Connector
- Code comparison v2 - clean plans
0.9.16 - (2024-10-21)
- Support Multi Session Apps
- Added new metrics:
- Skew score
- Broadcasts - count, failures, time & size
- vCore time, CPU time & GC time
- Heap & off-heap utilization metrics
- Used executors over time
- Input, output, shuffle write, shuffle read - records & bytes
- Tasks/stages - retries & failures
- Lost executors
- Spill
- Tasks result size
- Support Nvidia GPU Plugin (rapids-4-spark) for files inputs & outputs
- Added support for JDBC inputs
- Added retry mechanism on server errors
- Added driver allocated and used memory over time
- Added support for Databricks for Spark 3.5
- Support dynamic time-series metrics interval
- DALM
- Support pipelines mode
- Support setting db suffix & location
0.8.1 - (2024-07-02)
- Added executors & driver jvm memory watermark metric - on-heap, off-heap, total - used & allocated
- Added "max memory utilization" metrics for driver and executors
- Added time-series metrics for total memory, used memory, total vcores, used vcores (max value per interval)
- Shade all external dependencies to create independency from runtime env
- Added support for Spark 3.5