Skip to main content

Spark Agent Changelog

Changelog

0.42.1 - (2025-04-10)

  • Databricks - support sessions auto-stop
  • Bug fixes (ignore delta/hudi file indexes, ignore post-query-end root updates).

0.42.0 - (2025-04-09)

  • Databricks - Bug fixes & Support for 12.2, python tasks sessions and connect.
  • Multi session apps - report default id.
  • Bug fixes (definity stats, output diversion & spark connect).

0.41.0 - (2025-04-02)

  • Support output diversion for files
  • Metrics - fixes for task cpus & new active tasks metric
  • Improved agent logs

0.40.2 - (2025-03-21)

  • Fix task status on yarn & cluster mode
  • Support custom rdd inputs

0.40.0 - (2025-03-16)

  • Added Events - skew, slow planning, slow load, broadcasts & stages
  • Added metrics - dynamic allocation tracking, shuffle times, cache size & disk spill over time
  • Databricks support & fixes

0.30.1 - (2025-03-09)

  • Move to use async calls to the server
  • Added metrics - driver GC time and task retries cost
  • Added tracking for Python UDFs
  • Fixed BQ output diversion issue
  • Fixed old Delta version empty catalog table bug

0.20.3 - (2025-02-05)

  • Support Databricks multitask notebooks jobs
  • Heartbeat bug fix

0.20.2 - (2025-02-04)

  • Databricks related bug fixes
  • Support volume metrics for old Delta version (0.4.x)

0.20.1 - (2025-02-02)

  • Support Big Query output diversion
  • Expose task id on spark session conf ("spark.definity.task.id")

0.20.0 - (2025-01-30)

  • Server protocol v2 (minimum supported server version: 0.20.0)

0.11.0 - (2025-01-13)

  • Add support for custom metrics
  • Add avg vcores used over time metric

0.10.2 - (2024-11-20)

  • Skew keys detection
  • Support Big Query Connector
  • Code comparison v2 - clean plans

0.9.16 - (2024-10-21)

  • Support Multi Session Apps
  • Added new metrics:
    • Skew score
    • Broadcasts - count, failures, time & size
    • vCore time, CPU time & GC time
    • Heap & off-heap utilization metrics
    • Used executors over time
    • Input, output, shuffle write, shuffle read - records & bytes
    • Tasks/stages - retries & failures
    • Lost executors
    • Spill
    • Tasks result size
  • Support Nvidia GPU Plugin (rapids-4-spark) for files inputs & outputs
  • Added support for JDBC inputs
  • Added retry mechanism on server errors
  • Added driver allocated and used memory over time
  • Added support for Databricks for Spark 3.5
  • Support dynamic time-series metrics interval
  • DALM
    • Support pipelines mode
    • Support setting db suffix & location

0.8.1 - (2024-07-02)

  • Added executors & driver jvm memory watermark metric - on-heap, off-heap, total - used & allocated
  • Added "max memory utilization" metrics for driver and executors
  • Added time-series metrics for total memory, used memory, total vcores, used vcores (max value per interval)
  • Shade all external dependencies to create independency from runtime env
  • Added support for Spark 3.5