Architecture & Security
Overview
Definity is an enterprise lakehouse optimization and observability platform. It connects to the enterprise data lakehouse and allows the data teams to monitor, optimize and protect their data infrastructure.
Definity can be deployed in 2 methods:
- A standard SaaS model hosted in the Definity's secure cloud environment. In this model metrics & metadata are reported to Definity servers.
Only Metadata - no data is reported to Definity
- A fully internal service deployment if security requirements will not allow SaaS access.
In internal mode no data leaves the customer's premises. Definity team has zero access to the deployment or to reported metrics
Components
Agents
Definity uses dedicated data pipeline agents that are added by configuration to each pipeline in the customer's data platform. The agents are provided as a code library (JAR File / Python library) and the agent code runs as part of the customer's data application. The agents communicate with Definity platform server either in the SaaS or internal deployment as selected.
The agents send metadata and metrics to the server. The server persists the metrics, runs analysis on the metrics to identify potential issues and optimizations in the user data pipelines, and exposes a web application for the pipeline owners.
Agent instrumentation
Definity agents are added to the data pipeline and run as part of the pipeline. Specific instrumentation info can be found in the relevant installation section.
Server
SaaS Deployment
The Definity platform server is hosted on Definity's cloud provider. All communications with the agents and the web application are TLS encrypted and authenticated with OAuth token.
Definity extracts and transmits only statistical metrics and metadata from the data pipelines. No data is being either digested or sent to Definity.
Customer's internal Deployment
Definity server can be deployed as a service in the customer internal network / VPC. This means no data enters or leaves the customer's premises and Definity team has no access to the deployment. This option requires a dedicated DevOps/champion to support server upgrades and maintenance.
The server deployment is provided as a standard helm / docker as described in the installation page.
Zero-Code Change integration
Definity achieves zero code change instrumentation via agent-based instrumentation:
-
The agent automatically hooks into pipeline runtimes (e.g., Spark, DBT) at execution time.
-
It extracts execution context, schema evolution, and metric data from runtime metadata.
-
No modifications to job code, SQL, or DAGs are required.
Data Flow Summary
-
Pipeline job starts in customer environment.
-
Definity agent attaches at runtime and observes job metadata.
-
Only metrics and metadata are transmitted via TLS to Definity’s control plane.
-
Insights are stored, analyzed, and surfaced in dashboards and APIs.
| Data Type | Collected | Notes |
|---|---|---|
| Pipeline Metadata | ✅ | Job name, start/stop time, run status, DAG/task IDs |
| Schema Metadata | ✅ | Column names, data types, evolution history |
| Metrics | ✅ | Execution metrics (duration, counts, error rates, resource usage), Column distribution statistics |
No raw data, PII, or business logic leaves the customer’s environment.
Supported Runtimes and Targets
| Runtime / Platform | Support | Notes |
|---|---|---|
| Apache Spark (Kubernetes / YARN) | ✅ Full support | Versions & Details |
| Google Dataproc (Cluster & Serverless) | ✅ Full support | Integrated with GCP IAM and Dataproc APIs. Versions & Details |
| **Databricks ** | ✅ Full support | Versions & Details |
| Others | 🧩 Roadmap | Additional runtimes under evaluation. |
Security, Privacy & Compliance
Storage & Retention
-
Telemetry Storage: Stored in Definity-managed or customer-managed databases (depending on mode).
-
Retention: Default 180 days for SaaS; configurable for private/self-hosted.
-
Encryption:
-
In Transit: TLS 1.2+
-
At Rest: AES-256
-
-
Backups: Automated, encrypted, and rotated.
Access Controls & Audit
-
Authentication: OAuth 2.0 or SSO-based identity.
-
Authorization: RBAC governs all API/UI access.
-
Audit Logs: All administrative actions, API requests, and agent registrations are logged immutably.
-
Least Privilege: Each agent receives scoped tokens with limited lifetime.
Network
-
Outbound-only HTTPS (TCP 443) required from agent to Definity server.
-
No inbound ports or firewall exceptions needed.
-
Optional customer-controlled private endpoint for Private SaaS deployments.
-
All traffic signed and verified with mutual TLS (mTLS optional in Enterprise tier).
Isolation and Fault Tolerance
-
Agent Isolation: Agents implement multiple protections to never interfere with pipeline execution.
-
Failure Isolation: If Definity is unreachable, agents are automatically paused and continue job execution unaffected.
-
Platform Resilience: Stateless services with automatic recovery & Zero effect on customer environment.
Compliance & Governance
| Area | Definity Practices |
|---|---|
| Security Frameworks | SOC 2 Type II approved |
| Data Privacy | GDPR and CCPA compliant; no PII collection. |
| Logging & Monitoring | Centralized, tamper-resistant logging; monitored 24×7. |
| Incident Response | Documented IR policy with SLA-based customer notification. |
| Employee Access | Role-based internal access with MFA and background checks. |