Skip to main content

Metrics

Table Metrics - This is a partial list of the actual metrics definity extracts

All table metrics are generated from metadata, ensuring close to zero resource footprint.

Volume

Metric NameDescriptionUnits
Row CountTotal number of rowsrows
Output BytesTotal size of data writtenbytes
Files SizeTotal size of data readbytes
Partition CountNumber of partitions accessed (read or written)partitions
Files CountTotal number of files accessed (read or written)files
Parts CountTotal number of file segments accessedfile-parts

Schema

Metric NameDescriptionUnits
Column CountTotal number of columns in the tablecolumns
Schema ChangesNumber of schema changes since the previous runcolumns

Freshness

Metric NameDescriptionUnits
Table FreshnessTime elapsed since the last table update (for input tables)seconds

Column Metrics (Distribution)

These metrics require definity to add dedicated queries in the task execution.

Metric NameDescriptionUnits
Null PercentPercentage of null values in the column out of total rows%
Distinct CountCount of distinct values in the column
Distinct PercentPercentage calculated as Distinct Count / Query Row Count%
Unique PercentPercentage of distinct values out of non-null entries in the column%
Value HistogramHistogram of values for low-cardinality fields
Min/Max/Average/Standard DeviationSummary statistics for numeric columns
NaN CountCount of NaN values in numeric columns
Max TimeMaximum time value in the column. Supported types: timestamp, date, long/int (epoch time in seconds/ms), string (formats like yyyy-MM-dd HH:mm:ss, yyyy/MM/dd HH:mm:ss)seconds
Data FreshnessTime difference between the read time and the column's Max Time valueseconds
Live Data FreshnessReal-time freshness of data, measured as the time elapsed since the column's Max Time value (updated periodically)seconds

Partitioned Tables

For partitioned tables, definity automatically identifies the relevant partitions accessed during task execution and calculates metrics specific to those partitions.

Execution Metrics

Time

Metric NameRelevant AssetsDescription
Execution TimePipeline, Task, TransformationTotal elapsed time for execution
Process TimePipelineTotal execution time across all tasks
SLA TimePipeline, TaskTime elapsed since the last execution started
Skew TimeTaskTime spent in a "skewed" state
Task Idle TimeTaskTime during which no queries were executed (driver-only activity)

Environment

Metric NameRelevant AssetsDescription
Param CountTaskTotal number of parameters in the task
Count Param ChangesTaskNumber of parameter changes in the task

Resources

MetricRelevant AssetsDescriptionUnits
Task CountPipelineNumber of tasks executed in the pipeline
TF CountTaskNumber of transformations executed in the task
VCore TimePipeline, Task, TransformationAllocated/Used/Utilized VCore time for assetVCore-seconds
Memory TimeTask, PipelineTotal memory time allocated for this taskGB-seconds
MemoryTaskAllocated and utilized memory, including driver and executor usage (heap/off-heap)GB
Spark Executors CountTaskNumber of Spark executors (running and failed)
Spark Job/Stage/Task CountTask, TransformationCounts of Spark jobs, stages, tasks (including failures and retries)
Data IOTaskTotal counts and size of inputs, outputs, shuffles (read/write)bytes
BroadcastsTaskTotal counts and time for broadcast operations, including failures
SpillTaskTotal size of disk and memory spills during task executionbytes
Definity Driver TimeTaskDriver time overhead for definity operationsseconds