Metrics
Table Metrics - This is a partial list of the actual metrics definity extracts
All table metrics are generated from metadata, ensuring close to zero resource footprint.
Volume
Metric Name | Description | Units |
---|---|---|
Row Count | Total number of rows | rows |
Output Bytes | Total size of data written | bytes |
Files Size | Total size of data read | bytes |
Partition Count | Number of partitions accessed (read or written) | partitions |
Files Count | Total number of files accessed (read or written) | files |
Parts Count | Total number of file segments accessed | file-parts |
Schema
Metric Name | Description | Units |
---|---|---|
Column Count | Total number of columns in the table | columns |
Schema Changes | Number of schema changes since the previous run | columns |
Freshness
Metric Name | Description | Units |
---|---|---|
Table Freshness | Time elapsed since the last table update (for input tables) | seconds |
Column Metrics (Distribution)
These metrics require definity to add dedicated queries in the task execution.
Metric Name | Description | Units |
---|---|---|
Null Percent | Percentage of null values in the column out of total rows | % |
Distinct Count | Count of distinct values in the column | |
Distinct Percent | Percentage calculated as Distinct Count / Query Row Count | % |
Unique Percent | Percentage of distinct values out of non-null entries in the column | % |
Value Histogram | Histogram of values for low-cardinality fields | |
Min/Max/Average/Standard Deviation | Summary statistics for numeric columns | |
NaN Count | Count of NaN values in numeric columns | |
Max Time | Maximum time value in the column. Supported types: timestamp, date, long/int (epoch time in seconds/ms), string (formats like yyyy-MM-dd HH:mm:ss , yyyy/MM/dd HH:mm:ss ) | seconds |
Data Freshness | Time difference between the read time and the column's Max Time value | seconds |
Live Data Freshness | Real-time freshness of data, measured as the time elapsed since the column's Max Time value (updated periodically) | seconds |
Partitioned Tables
For partitioned tables, definity automatically identifies the relevant partitions accessed during task execution and calculates metrics specific to those partitions.
Execution Metrics
Time
Metric Name | Relevant Assets | Description |
---|---|---|
Execution Time | Pipeline, Task, Transformation | Total elapsed time for execution |
Process Time | Pipeline | Total execution time across all tasks |
SLA Time | Pipeline, Task | Time elapsed since the last execution started |
Skew Time | Task | Time spent in a "skewed" state |
Task Idle Time | Task | Time during which no queries were executed (driver-only activity) |
Environment
Metric Name | Relevant Assets | Description |
---|---|---|
Param Count | Task | Total number of parameters in the task |
Count Param Changes | Task | Number of parameter changes in the task |
Resources
Metric | Relevant Assets | Description | Units |
---|---|---|---|
Task Count | Pipeline | Number of tasks executed in the pipeline | |
TF Count | Task | Number of transformations executed in the task | |
VCore Time | Pipeline, Task, Transformation | Allocated/Used/Utilized VCore time for asset | VCore-seconds |
Memory Time | Task, Pipeline | Total memory time allocated for this task | GB-seconds |
Memory | Task | Allocated and utilized memory, including driver and executor usage (heap/off-heap) | GB |
Spark Executors Count | Task | Number of Spark executors (running and failed) | |
Spark Job/Stage/Task Count | Task, Transformation | Counts of Spark jobs, stages, tasks (including failures and retries) | |
Data IO | Task | Total counts and size of inputs, outputs, shuffles (read/write) | bytes |
Broadcasts | Task | Total counts and time for broadcast operations, including failures | |
Spill | Task | Total size of disk and memory spills during task execution | bytes |
Definity Driver Time | Task | Driver time overhead for definity operations | seconds |