Getting started
Quick and Seamless Onboarding
Start monitoring your data pipelines in under two minutes with our streamlined setup process. We support a variety of platforms, including Apache Spark and DPT. Follow these simple steps to get started:
1. Select Your Platform
Choose the data processing platform you wish to monitor, such as Spark or DBT.
2. Generate a Secure Access Token
Create a unique token that authorizes your environment to connect with our monitoring service.
3. Configure Required Parameters
Define the necessary settings—such as data source paths, authentication credentials, and pipeline identifiers.
4. Deploy the Monitoring Agent
Run the agent within your infrastructure to begin capturing real-time performance and reliability metrics.
Once deployed, you'll gain immediate visibility into your pipelines, allowing for proactive issue detection and continuous optimization.
Getting started - System Prerequisites
General Requirements
For all solutions, the following prerequisites must be met:
- Spark Pipelines: Definity requires Spark pipelines in versions 2.x or 3.x.
- Compute Environment:
- All Spark managed environments are supported: Databricks | Amazon EMR | Google Cloud Dataproc and more.
- Spark serverless solutions are supported on most vendors contact definity to onboard your serverless pipelines.
SaaS Solution Prerequisites
For the SaaS solution, ensure that:
- Your Spark environment has network access to external SaaS solutions.
- This may require firewall rules, VPC peering, or private link setup, depending on your cloud provider.
- definity extracts and saves only technical & statistical metadata - no data is ever sent to definity
- definity is SoC2 approved and trusted by Fortune 500 financial services
Self-Hosted Solution Prerequisites
For the Self-Hosted solution, the following requirements apply:
Installation Options
- The Definity server can be installed using Docker or Helm.
Database requirements
- definity requires a Postgres > 15.0
- You can choose between:
- A managed internal / VPC PostgreSQL database (CloudSQL / RDS etc.).
- Alternatively allowing definity to automatically create a Postgres Pod
- persistent storage is required for the deployed DB
In Self-Hosted solutions, the customer is fully responsible to ensure sufficient DB resources, backup and availability