Skip to main content

Getting started

info

Quick and Seamless Onboarding

Start monitoring your data pipelines in under two minutes with our streamlined setup process. We support a variety of platforms, including Apache Spark and DPT. Follow these simple steps to get started:

1. Select Your Platform

Choose the data processing platform you wish to monitor, such as Spark or DBT.

2. Generate a Secure Access Token

Create a unique token that authorizes your environment to connect with our monitoring service.

3. Configure Required Parameters

Define the necessary settings—such as data source paths, authentication credentials, and pipeline identifiers.

4. Deploy the Monitoring Agent

Run the agent within your infrastructure to begin capturing real-time performance and reliability metrics.

Once deployed, you'll gain immediate visibility into your pipelines, allowing for proactive issue detection and continuous optimization.

Getting started - System Prerequisites

General Requirements

For all solutions, the following prerequisites must be met:

  • Spark Pipelines: Definity requires Spark pipelines in versions 2.x or 3.x.
  • Compute Environment:
    • All Spark managed environments are supported: Databricks | Amazon EMR | Google Cloud Dataproc and more.
tip

SaaS Solution Prerequisites

For the SaaS solution, ensure that:

  • Your Spark environment has network access to external SaaS solutions.
    • This may require firewall rules, VPC peering, or private link setup, depending on your cloud provider.
tip
  • definity extracts and saves only technical & statistical metadata - no data is ever sent to definity
  • definity is SoC2 approved and trusted by Fortune 500 financial services

Self-Hosted Solution Prerequisites

For the Self-Hosted solution, the following requirements apply:

Installation Options

  • The Definity server can be installed using Docker or Helm.

Database requirements

  • definity requires a Postgres > 15.0
  • You can choose between:
    • A managed internal / VPC PostgreSQL database (CloudSQL / RDS etc.).
    • Alternatively allowing definity to automatically create a Postgres Pod
      • persistent storage is required for the deployed DB
danger

In Self-Hosted solutions, the customer is fully responsible to ensure sufficient DB resources, backup and availability