v0.9.18 pre-1.0

One config file.
Entire data stack.

DataEngineX unifies data pipelines, ML lifecycle, and AI agents. Config-driven, self-hosted, production-ready. Replaces the Airflow + MLflow + LangChain + FastAPI glue.

pip install dataenginex — or: uv add dataenginex
dex.yaml
data:
  source: s3://my-bucket/raw/
  format: parquet
  quality:
    null_threshold: 0.05

ml:
  backend: mlflow
  training:
    model: xgboost
    target: revenue

ai:
  provider: openai
  retrieval: hybrid
  agents:
    - name: analyst
      tools: [sql, search]

server:
  auth: jwt
  rate_limit: 100/min

observability:
  metrics: prometheus
  tracing: otel

You're maintaining 6 tools.
DataEngineX replaces the glue.

1 config file
0 vendor lock-in
swappable backends

Airflow for orchestration. MLflow for tracking. LangChain for agents. FastAPI wired together by hand. Prometheus bolted on. Each tool: its own config format, auth system, failure mode, oncall rotation. Stop building glue. Start shipping products.

Everything included

Six domains. One framework. No assembly required.

Data

Connectors, transforms, and quality checks from a single config. DuckDB and Spark backends built in.

ML Lifecycle

Experiment tracking, training, serving, and drift detection built in. MLflow, W&B, or the built-in backend — your call.

AI Agents

LLM providers, hybrid BM25+dense retrieval, and LangGraph agent runtime — swappable, not locked in.

Server

FastAPI with JWT auth, rate limiting, and health checks. API, background workers, and scheduler under one roof.

Observability

structlog structured logging, Prometheus metrics, and OpenTelemetry tracing — wired up from config, not code.

Deploy

K3s, Helm, and Terraform via infradex. From dev to production Kubernetes cluster without writing manifests by hand.

One file. Everything configured.

dex.yaml is the single source of truth for your entire platform. Sources, transforms, quality rules, model config, agent definitions, API settings, and observability — all in one place.

No more hunting across twelve repos to find why a pipeline broke. No more "it works in dev" because dev and prod share the same config schema.

  • Validate config with dex validate dex.yaml
  • Swap backends without changing application code
  • Strict Pydantic validation — config errors caught before runtime
  • Same config format from laptop to production cluster
Read the docs
dex.yaml — full example
# DataEngineX — full stack config

data:
  source: s3://my-bucket/raw/
  format: parquet
  backend: duckdb        # or spark
  quality:
    null_threshold: 0.05
    schema_enforcement: strict
    audit_table: quality.audit

ml:
  backend: mlflow
  tracking_uri: http://mlflow:5000
  training:
    model: xgboost
    target: revenue
    features: [clicks, sessions, region]
  serving:
    endpoint: /api/v1/predict
    drift_detection: true

ai:
  provider: openai
  model: gpt-4o-mini
  retrieval: hybrid   # BM25 + dense
  agents:
    - name: analyst
      tools: [sql, search, python]

server:
  host: 0.0.0.0
  port: 17000
  auth: jwt
  rate_limit: 100/min

observability:
  metrics: prometheus
  tracing: otel
  log_level: info

Three components, one ecosystem

Each component is independently useful. Together they cover the full lifecycle.

v0.9.18

dataenginex

pip install dataenginex

Core framework — config system, backend registry, CLI, API server, ML lifecycle, AI agents. The engine everything runs on.

View on GitHub →
NiceGUI

dex-studio

Port 7860

Web UI — single pane of glass built on NiceGUI. Monitor pipelines, browse data, inspect ML experiments, and chat with AI agents.

View on GitHub →
K3s + Helm

infradex

Terraform + Helm

K3s cluster config, Helm charts, and Terraform modules. From blank VPS to production-grade Kubernetes cluster — no manual YAML.

View on GitHub →

Ready to replace the glue?

Install the base package or pick the extras you need.

Base
pip install dataenginex
# or
uv add dataenginex
Extras
pip install "dataenginex[spark]"    # PySpark transforms
pip install "dataenginex[mlflow]"   # MLflow backend
pip install "dataenginex[agents]"   # LangGraph agents
pip install "dataenginex[all]"      # Everything