Skip to main content

Metrics

Definity automatically collects and tracks a comprehensive set of metrics across your data pipelines. These metrics are organized into categories based on what they measure and where they apply.

Table Metrics

Table metrics are collected for datasets (tables) accessed during pipeline execution. Most are generated from metadata, ensuring minimal resource overhead.

Volume

Metric NameDescriptionUnits
Query Row CountRow count for this dataset - runs a dedicated query on relevant partitionsrows
Row CountNumber of read/written rows in asset (from Spark metadata)rows
Output SizeNumber of read/written bytes for this datasetbytes
Files SizeNumber of read/written files size for this datasetbytes
Partition CountNumber of read/written partitions for this datasetpartitions
Files CountNumber of read/written files for this datasetfiles
Parts CountNumber of read file parts for this datasetfile-parts
DBT RowsDBT Metric - number of read/write rows in assetrows

Schema

Metric NameDescriptionUnits
Column CountNumber of columns in datasetcolumns
Schema ChangesNumber of changes in the schema from previous runinteger

Freshness

Metric NameDescriptionUnits
Table FreshnessTime elapsed from previous table updateseconds
Max TimeWatermark value of time column (maximum date value)timestamp
Data FreshnessThe difference between reading/execution time and the max time watermark of this column (for input datasets)seconds
Live Data FreshnessThe difference between current time and the max time watermark of this column (for output datasets)seconds

Column Metrics (Distribution)

These metrics analyze individual columns and require Definity to add dedicated queries during task execution. They are not calculated by default to ensure no system footprint or customer data analysis occurs without explicit customer request. Customers must explicitly opt in for each specific column and metric.

Metric NameDescriptionUnits
Null PercentColumn null value percent out of total rowspercent
Distinct CountNumber of distinct values in columnrows
Distinct PercentThe percent of "Distinct Count"/"Query Row Count"percent
Unique PercentThe percent of "Distinct Count"/count(not null in column)percent
Value HistogramLow cardinality field histogram of valuesrows
Min ValueMinimum value for numeric column
Max ValueMaximum value for numeric column
AverageAverage value for numeric column
Standard DeviationStandard deviation for numeric column
NaN CountCount of NaN values for numeric column

Partitioned Tables

For partitioned tables, Definity automatically identifies the relevant partitions accessed during task execution and calculates metrics specific to those partitions.

Execution Metrics

Execution metrics track performance and resource usage at the Pipeline, Task, and Transformation levels.

Time

Metric NameDescriptionUnits
Execution TimeElapsed execution time (end_time - start_time)seconds
Process TimeSum of the tasks execution timesseconds
SLA TimeTime elapsed since last start_timeseconds
Skew TimeTotal skew time across all Spark stagesseconds
Task Idle TimeTotal time no transformations were runningseconds
Executors Idle TimeTotal time no stages were runningseconds
Definity Driver TimeTime overhead in the driver spent for Definity operationsseconds
Definity Executors TimeTime overhead in the executors spent for Definity operationsseconds

Environment

Metric NameDescriptionUnits
Param CountNumber of parameters in the taskinteger
Count Param ChangesNumber of changes in the task paramsinteger

Resources - Overview

Metric NameDescriptionUnits
Task CountNumber of tasks executed in the pipelineinteger
Failed Task CountNumber of failed tasks in the pipeline runinteger
Retry Task CountNumber of task retries in the pipeline runinteger
TF CountNumber of transformations executed in the taskinteger
Total Memory TimeTotal memory size allocated for this task - sum over all executorsGB-seconds

Resources - vCore Time

Metric NameDescriptionUnits
Allocated vCore TimeTotal vCore time allocated for this asset executionseconds
Used vCore TimeActual utilized vCore time in this asset executionseconds
Utilized vCore TimePercentage of vCore time not in idlepercent
Executors Allocated vCore TimeTotal Executors vCore time allocated by Spark tasksseconds
Executors Used vCore TimeTotal Executors vCore time used by Spark tasksseconds
Retried Spark Tasks vCore TimeTotal Executors vCore time of retried Spark tasksseconds
Executors Used CPU TimeTotal Executors CPU time in Spark tasksseconds
Executors Python CPU TimeTotal Executors CPU time spent in Python codeseconds
Executors Java CPU TimeTotal Executors CPU time spent in Java/Scala codeseconds
Executors Other CPU TimeTotal Executors CPU time spent in other operationsseconds
Executors GC TimeTotal Executors GC time in Spark tasksseconds
Executors Shuffle-Read TimeTotal Executors Shuffle-Read time Spark tasksseconds
Executors Shuffle-Write TimeTotal Executors Shuffle-Write time Spark tasksseconds
Driver GC TimeTotal Driver GC timeseconds
Driver GC Time PercentGC time percentage of driver execution timepercent
Idle vCores PercentvCore time percentage of idle executors (in dynamic allocation)percent

Resources - Driver Memory

Metric NameDescriptionUnits
Driver Memory AllocatedThe allocated memory for the driverGB
Driver On-Heap Memory AllocatedThe allocated on-heap memory for the driverGB
Driver Off-Heap Memory AllocatedThe allocated off-heap memory for the driverGB
Driver Memory WatermarkThe maximum used memory in the driver (watermark)GB
Driver On-Heap Memory WatermarkThe maximum used on-heap memory in the driver (watermark)GB
Driver Off-Heap Memory WatermarkThe maximum used off-heap memory in the driver (watermark)GB
Utilized Driver MemoryPercentage of peak memory used from the total allocated memory in the driverpercent
Utilized Driver Heap MemoryPercentage of peak memory used from the total allocated heap memory in the driverpercent
Utilized Driver Off-Heap MemoryPercentage of peak memory used from the total allocated off-heap memory in the driverpercent
Number of Driver vCoresThe number of vCores in the driverinteger

Resources - Executor Memory

Metric NameDescriptionUnits
Executor Memory AllocatedThe allocated memory for each executorGB
Executor On-Heap Memory AllocatedThe allocated on-heap memory for each executorGB
Executor Off-Heap Memory AllocatedThe allocated off-heap memory for each executorGB
Executor Memory WatermarkThe maximum used memory across all executors (watermark)GB
Executor On-Heap Memory WatermarkThe maximum used on-heap memory across all executors (watermark)GB
Executor Off-Heap Memory WatermarkThe maximum used off-heap memory in the executors (watermark)GB
Utilized Executor MemoryPercentage of peak memory used from the total allocated memory in the executorpercent
Utilized Executor Heap MemoryPercentage of peak memory used from the total allocated heap memory in the executorpercent
Utilized Executor Off-Heap MemoryPercentage of peak memory used from the total allocated off-heap memory in the executorpercent
Number of Executor vCoresThe number of vCores in the executorsinteger

Resources - Executor Managed Memory

Metric NameDescriptionUnits
Executor Managed Storage On-Heap Memory WatermarkMaximum managed storage memory used on-heap across executorsGB
Executor Managed Execution On-Heap Memory WatermarkMaximum managed execution memory used on-heap across executorsGB
Executor Managed On-Heap Memory WatermarkMaximum total managed memory used on-heap across executorsGB
Executor Managed Storage Off-Heap Memory WatermarkMaximum managed storage memory used off-heap across executorsGB
Executor Managed Execution Off-Heap Memory WatermarkMaximum managed execution memory used off-heap across executorsGB
Executor Managed Off-Heap Memory WatermarkMaximum total managed memory used off-heap across executorsGB

Resources - Spark Counts

Metric NameDescriptionUnits
Spark Executors CountNumber of Spark executors used in the taskinteger
Lost Executors CountTotal number of lost executorsinteger
Spark Job CountNumber of spark jobsinteger
Spark Stage CountNumber of spark stagesinteger
Spark Task CountNumber of spark tasksinteger
Spark Task Retries CountTotal number of Spark task retriesinteger
Spark Task Failures CountTotal number of Spark task failuresinteger
Spark Stage Retries CountTotal number of Spark stage retriesinteger
Spark Stage Failures CountTotal number of Spark stage failuresinteger

Resources - Data IO

Metric NameDescriptionUnits
Input Records ReadTotal number of input records read by Spark Tasksrows
Output Records WrittenTotal number of output records written by Spark Tasksrows
Shuffle Records ReadTotal number of shuffle records read by Spark Tasksrows
Shuffle Records WrittenTotal number of shuffle records written by Spark Tasksrows
Input Bytes ReadTotal input bytes read by Spark tasksbytes
Output Bytes WrittenTotal output bytes written by Spark tasksbytes
Shuffle Bytes ReadTotal shuffle bytes read by Spark tasksbytes
Shuffle Bytes WrittenTotal shuffle bytes written by Spark tasksbytes
Spark Tasks Result SizeTotal bytes transmitted back to the driver as task resultbytes

Resources - Broadcasts

Metric NameDescriptionUnits
Broadcasts CountNumber of Spark Broadcastsinteger
Broadcasts FailuresNumber of Spark Broadcasts Failuresinteger
Broadcasts RowsTotal number of broadcasts rowsrows
Broadcasts Collect TimeTotal time to collect broadcastsms
Broadcasts Build TimeTotal time to build broadcastsms
Broadcasts Broadcast TimeTotal time to broadcast broadcastsms
Broadcasts Data SizeTotal broadcasts sizebytes

Resources - Spill

Metric NameDescriptionUnits
Memory Bytes SpilledTotal memory bytes spilledbytes
Disk Bytes SpilledTotal bytes spilled to diskbytes

Resources - Cache

Metric NameDescriptionUnits
Cached Data Maximum Memory SizeThe maximum value of cached data in memorybytes
Cached Data Maximum Disk SizeThe maximum value of cached data in diskbytes

Resources - S3 Operations

Metric NameDescriptionUnits
S3 List Objects RequestsTotal number of S3 list objects requestsinteger
S3 Put Object RequestsTotal number of S3 put object requestsinteger
S3 Get Object RequestsTotal number of S3 get object requestsinteger
S3 Head Object RequestsTotal number of S3 head object requestsinteger

Resources - Filesystem Operations

Metric NameDescriptionUnits
Filesystem Open RequestsTotal number of filesystem open requestsinteger
Filesystem List RequestsTotal number of filesystem list requestsinteger