Metrics
Table Metrics
All table metrics are generated from metadata, ensuring close to zero resource footprint.
Volume
| Metric Name | Description | Units |
|---|---|---|
| Row Count | Total number of rows | rows |
| Output Bytes | Total size of data written | bytes |
| Files Size | Total size of data read | bytes |
| Partition Count | Number of partitions accessed (read or written) | partitions |
| Files Count | Total number of files accessed (read or written) | files |
| Parts Count | Total number of file segments accessed | file-parts |
Schema
| Metric Name | Description | Units |
|---|---|---|
| Column Count | Total number of columns in the table | columns |
| Schema Changes | Number of schema changes since the previous run | columns |
Freshness
| Metric Name | Description | Units |
|---|---|---|
| Table Freshness | Time elapsed since the last table update (for input tables) | seconds |
Column Metrics (Distribution)
These metrics require definity to add dedicated queries in the task execution.
| Metric Name | Description | Units |
|---|---|---|
| Null Percent | Percentage of null values in the column out of total rows | % |
| Distinct Count | Count of distinct values in the column | |
| Distinct Percent | Percentage calculated as Distinct Count / Query Row Count | % |
| Unique Percent | Percentage of distinct values out of non-null entries in the column | % |
| Value Histogram | Histogram of values for low-cardinality fields | |
| Min/Max/Average/Standard Deviation | Summary statistics for numeric columns | |
| NaN Count | Count of NaN values in numeric columns | |
| Max Time | Maximum time value in the column. Supported types: timestamp, date, long/int (epoch time in seconds/ms), string (formats like yyyy-MM-dd HH:mm:ss, yyyy/MM/dd HH:mm:ss) | seconds |
| Data Freshness | Time difference between the read time and the column's Max Time value | seconds |
| Live Data Freshness | Real-time freshness of data, measured as the time elapsed since the column's Max Time value (updated periodically) | seconds |
Partitioned Tables
For partitioned tables, definity automatically identifies the relevant partitions accessed during task execution and calculates metrics specific to those partitions.
Execution Metrics
Time
| Metric Name | Relevant Assets | Description |
|---|---|---|
| Execution Time | Pipeline, Task, Transformation | Total elapsed time for execution |
| Process Time | Pipeline | Total execution time across all tasks |
| SLA Time | Pipeline, Task | Time elapsed since the last execution started |
| Skew Time | Task | Time spent in a "skewed" state |
| Task Idle Time | Task | Time during which no queries were executed (driver-only activity) |
Environment
| Metric Name | Relevant Assets | Description |
|---|---|---|
| Param Count | Task | Total number of parameters in the task |
| Count Param Changes | Task | Number of parameter changes in the task |
Resources
| Metric | Relevant Assets | Description | Units |
|---|---|---|---|
| Task Count | Pipeline | Number of tasks executed in the pipeline | |
| TF Count | Task | Number of transformations executed in the task | |
| VCore Time | Pipeline, Task, Transformation | Allocated/Used/Utilized VCore time for asset | VCore-seconds |
| Memory Time | Task, Pipeline | Total memory time allocated for this task | GB-seconds |
| Memory | Task | Allocated and utilized memory, including driver and executor usage (heap/off-heap) | GB |
| Spark Executors Count | Task | Number of Spark executors (running and failed) | |
| Spark Job/Stage/Task Count | Task, Transformation | Counts of Spark jobs, stages, tasks (including failures and retries) | |
| Data IO | Task | Total counts and size of inputs, outputs, shuffles (read/write) | bytes |
| Broadcasts | Task | Total counts and time for broadcast operations, including failures | |
| Spill | Task | Total size of disk and memory spills during task execution | bytes |
| Definity Driver Time | Task | Driver time overhead for definity operations | seconds |