Definity automatically collects and tracks a comprehensive set of metrics across your data pipelines. These metrics are organized into categories based on what they measure and where they apply.
Table Metrics
Table metrics are collected for datasets (tables) accessed during pipeline execution. Most are generated from metadata, ensuring minimal resource overhead.
Volume
| Metric Name | Description | Units |
|---|
| Query Row Count | Row count for this dataset - runs a dedicated query on relevant partitions | rows |
| Row Count | Number of read/written rows in asset (from Spark metadata) | rows |
| Output Size | Number of read/written bytes for this dataset | bytes |
| Files Size | Number of read/written files size for this dataset | bytes |
| Partition Count | Number of read/written partitions for this dataset | partitions |
| Files Count | Number of read/written files for this dataset | files |
| Parts Count | Number of read file parts for this dataset | file-parts |
| DBT Rows | DBT Metric - number of read/write rows in asset | rows |
Schema
| Metric Name | Description | Units |
|---|
| Column Count | Number of columns in dataset | columns |
| Schema Changes | Number of changes in the schema from previous run | integer |
Freshness
| Metric Name | Description | Units |
|---|
| Table Freshness | Time elapsed from previous table update | seconds |
| Max Time | Watermark value of time column (maximum date value) | timestamp |
| Data Freshness | The difference between reading/execution time and the max time watermark of this column (for input datasets) | seconds |
| Live Data Freshness | The difference between current time and the max time watermark of this column (for output datasets) | seconds |
Column Metrics (Distribution)
These metrics analyze individual columns and require Definity to add dedicated queries during task execution. They are not calculated by default to ensure no system footprint or customer data analysis occurs without explicit customer request. Customers must explicitly opt in for each specific column and metric.
| Metric Name | Description | Units |
|---|
| Null Percent | Column null value percent out of total rows | percent |
| Distinct Count | Number of distinct values in column | rows |
| Distinct Percent | The percent of "Distinct Count"/"Query Row Count" | percent |
| Unique Percent | The percent of "Distinct Count"/count(not null in column) | percent |
| Value Histogram | Low cardinality field histogram of values | rows |
| Min Value | Minimum value for numeric column | |
| Max Value | Maximum value for numeric column | |
| Average | Average value for numeric column | |
| Standard Deviation | Standard deviation for numeric column | |
| NaN Count | Count of NaN values for numeric column | |
Partitioned Tables
For partitioned tables, Definity automatically identifies the relevant partitions accessed during task execution and calculates metrics specific to those partitions.
Execution Metrics
Execution metrics track performance and resource usage at the Pipeline, Task, and Transformation levels.
Time
| Metric Name | Description | Units |
|---|
| Execution Time | Elapsed execution time (end_time - start_time) | seconds |
| Process Time | Sum of the tasks execution times | seconds |
| SLA Time | Time elapsed since last start_time | seconds |
| Skew Time | Total skew time across all Spark stages | seconds |
| Task Idle Time | Total time no transformations were running | seconds |
| Executors Idle Time | Total time no stages were running | seconds |
| Definity Driver Time | Time overhead in the driver spent for Definity operations | seconds |
| Definity Executors Time | Time overhead in the executors spent for Definity operations | seconds |
Environment
| Metric Name | Description | Units |
|---|
| Param Count | Number of parameters in the task | integer |
| Count Param Changes | Number of changes in the task params | integer |
Resources - Overview
| Metric Name | Description | Units |
|---|
| Task Count | Number of tasks executed in the pipeline | integer |
| Failed Task Count | Number of failed tasks in the pipeline run | integer |
| Retry Task Count | Number of task retries in the pipeline run | integer |
| TF Count | Number of transformations executed in the task | integer |
| Total Memory Time | Total memory size allocated for this task - sum over all executors | GB-seconds |
Resources - vCore Time
| Metric Name | Description | Units |
|---|
| Allocated vCore Time | Total vCore time allocated for this asset execution | seconds |
| Used vCore Time | Actual utilized vCore time in this asset execution | seconds |
| Utilized vCore Time | Percentage of vCore time not in idle | percent |
| Executors Allocated vCore Time | Total Executors vCore time allocated by Spark tasks | seconds |
| Executors Used vCore Time | Total Executors vCore time used by Spark tasks | seconds |
| Retried Spark Tasks vCore Time | Total Executors vCore time of retried Spark tasks | seconds |
| Executors Used CPU Time | Total Executors CPU time in Spark tasks | seconds |
| Executors Python CPU Time | Total Executors CPU time spent in Python code | seconds |
| Executors Java CPU Time | Total Executors CPU time spent in Java/Scala code | seconds |
| Executors Other CPU Time | Total Executors CPU time spent in other operations | seconds |
| Executors GC Time | Total Executors GC time in Spark tasks | seconds |
| Executors Shuffle-Read Time | Total Executors Shuffle-Read time Spark tasks | seconds |
| Executors Shuffle-Write Time | Total Executors Shuffle-Write time Spark tasks | seconds |
| Driver GC Time | Total Driver GC time | seconds |
| Driver GC Time Percent | GC time percentage of driver execution time | percent |
| Idle vCores Percent | vCore time percentage of idle executors (in dynamic allocation) | percent |
Resources - Driver Memory
| Metric Name | Description | Units |
|---|
| Driver Memory Allocated | The allocated memory for the driver | GB |
| Driver On-Heap Memory Allocated | The allocated on-heap memory for the driver | GB |
| Driver Off-Heap Memory Allocated | The allocated off-heap memory for the driver | GB |
| Driver Memory Watermark | The maximum used memory in the driver (watermark) | GB |
| Driver On-Heap Memory Watermark | The maximum used on-heap memory in the driver (watermark) | GB |
| Driver Off-Heap Memory Watermark | The maximum used off-heap memory in the driver (watermark) | GB |
| Utilized Driver Memory | Percentage of peak memory used from the total allocated memory in the driver | percent |
| Utilized Driver Heap Memory | Percentage of peak memory used from the total allocated heap memory in the driver | percent |
| Utilized Driver Off-Heap Memory | Percentage of peak memory used from the total allocated off-heap memory in the driver | percent |
| Number of Driver vCores | The number of vCores in the driver | integer |
Resources - Executor Memory
| Metric Name | Description | Units |
|---|
| Executor Memory Allocated | The allocated memory for each executor | GB |
| Executor On-Heap Memory Allocated | The allocated on-heap memory for each executor | GB |
| Executor Off-Heap Memory Allocated | The allocated off-heap memory for each executor | GB |
| Executor Memory Watermark | The maximum used memory across all executors (watermark) | GB |
| Executor On-Heap Memory Watermark | The maximum used on-heap memory across all executors (watermark) | GB |
| Executor Off-Heap Memory Watermark | The maximum used off-heap memory in the executors (watermark) | GB |
| Utilized Executor Memory | Percentage of peak memory used from the total allocated memory in the executor | percent |
| Utilized Executor Heap Memory | Percentage of peak memory used from the total allocated heap memory in the executor | percent |
| Utilized Executor Off-Heap Memory | Percentage of peak memory used from the total allocated off-heap memory in the executor | percent |
| Number of Executor vCores | The number of vCores in the executors | integer |
Resources - Executor Managed Memory
| Metric Name | Description | Units |
|---|
| Executor Managed Storage On-Heap Memory Watermark | Maximum managed storage memory used on-heap across executors | GB |
| Executor Managed Execution On-Heap Memory Watermark | Maximum managed execution memory used on-heap across executors | GB |
| Executor Managed On-Heap Memory Watermark | Maximum total managed memory used on-heap across executors | GB |
| Executor Managed Storage Off-Heap Memory Watermark | Maximum managed storage memory used off-heap across executors | GB |
| Executor Managed Execution Off-Heap Memory Watermark | Maximum managed execution memory used off-heap across executors | GB |
| Executor Managed Off-Heap Memory Watermark | Maximum total managed memory used off-heap across executors | GB |
Resources - Spark Counts
| Metric Name | Description | Units |
|---|
| Spark Executors Count | Number of Spark executors used in the task | integer |
| Lost Executors Count | Total number of lost executors | integer |
| Spark Job Count | Number of spark jobs | integer |
| Spark Stage Count | Number of spark stages | integer |
| Spark Task Count | Number of spark tasks | integer |
| Spark Task Retries Count | Total number of Spark task retries | integer |
| Spark Task Failures Count | Total number of Spark task failures | integer |
| Spark Stage Retries Count | Total number of Spark stage retries | integer |
| Spark Stage Failures Count | Total number of Spark stage failures | integer |
Resources - Data IO
| Metric Name | Description | Units |
|---|
| Input Records Read | Total number of input records read by Spark Tasks | rows |
| Output Records Written | Total number of output records written by Spark Tasks | rows |
| Shuffle Records Read | Total number of shuffle records read by Spark Tasks | rows |
| Shuffle Records Written | Total number of shuffle records written by Spark Tasks | rows |
| Input Bytes Read | Total input bytes read by Spark tasks | bytes |
| Output Bytes Written | Total output bytes written by Spark tasks | bytes |
| Shuffle Bytes Read | Total shuffle bytes read by Spark tasks | bytes |
| Shuffle Bytes Written | Total shuffle bytes written by Spark tasks | bytes |
| Spark Tasks Result Size | Total bytes transmitted back to the driver as task result | bytes |
Resources - Broadcasts
| Metric Name | Description | Units |
|---|
| Broadcasts Count | Number of Spark Broadcasts | integer |
| Broadcasts Failures | Number of Spark Broadcasts Failures | integer |
| Broadcasts Rows | Total number of broadcasts rows | rows |
| Broadcasts Collect Time | Total time to collect broadcasts | ms |
| Broadcasts Build Time | Total time to build broadcasts | ms |
| Broadcasts Broadcast Time | Total time to broadcast broadcasts | ms |
| Broadcasts Data Size | Total broadcasts size | bytes |
Resources - Spill
| Metric Name | Description | Units |
|---|
| Memory Bytes Spilled | Total memory bytes spilled | bytes |
| Disk Bytes Spilled | Total bytes spilled to disk | bytes |
Resources - Cache
| Metric Name | Description | Units |
|---|
| Cached Data Maximum Memory Size | The maximum value of cached data in memory | bytes |
| Cached Data Maximum Disk Size | The maximum value of cached data in disk | bytes |
Resources - S3 Operations
| Metric Name | Description | Units |
|---|
| S3 List Objects Requests | Total number of S3 list objects requests | integer |
| S3 Put Object Requests | Total number of S3 put object requests | integer |
| S3 Get Object Requests | Total number of S3 get object requests | integer |
| S3 Head Object Requests | Total number of S3 head object requests | integer |
Resources - Filesystem Operations
| Metric Name | Description | Units |
|---|
| Filesystem Open Requests | Total number of filesystem open requests | integer |
| Filesystem List Requests | Total number of filesystem list requests | integer |