Lineage
Overview
Definity provides comprehensive data lineage tracking by monitoring tasks, transformations, and datasets in real-time. By embedding agents directly within task executions, Definity enables users to visualize and analyze task-dataset lineage, including column-level lineage, ensuring full transparency into how data is processed and transformed.
Task-Dataset Lineage
Task-dataset lineage represents the relationship between tasks (jobs) and the datasets (tables) they consume and generate. Definity agents observe transformations within each task and build a lineage map that tracks:
- Input datasets: The source tables and files read by a transformation.
- Output datasets: The tables and files generated or modified by a transformation.
- Transformations applied: The logical processing steps performed within a task.
Example: Task-Dataset Lineage
Column-Level Lineage
Definity provides column-level lineage, mapping transformations down to individual column dependencies. This allows users to:
- Trace specific column origins and transformations across tasks.
- Identify the impact of schema changes on downstream consumers.
- Understand dependencies between datasets and pipelines.
Point-In-Time (PIT) Lineage
Definity maintains lineage with Point-In-Time (PIT) execution connection:
This allows users to track the dependencies on the logical execution level and based on the input data snapshot.
This distinction ensures users can trace lineage across different pipeline execution times and historical data states.
Benefits of Definity Lineage
1. Impact Analysis
- Identify how schema changes impact downstream tables and reports.
- Prevent data disruptions by proactively managing dependencies.
2. Debugging & Root Cause Analysis
- Trace errors back to their originating datasets and transformations.
- Understand how incorrect data propagates through pipelines.
3. Governance & Compliance
- Maintain a clear audit trail of how data is transformed and used.
- Ensure regulatory compliance by tracking data movement across systems.
By leveraging Definity's lineage capabilities, teams gain deeper visibility into their data ecosystem, improving data reliability, governance, and operational efficiency