System Overview¶
Azothedge Pulse is two cooperating planes over a shared, open data core. The control plane (FastAPI) handles synchronous request/response — tenants, the schema registry, dataset creation, and event acceptance. The data plane (the async worker) moves accepted events from Kafka into the Iceberg lake. Postgres is the system of record; Redpanda is the event bus; Iceberg-on-object-storage is the lake.
Month 1 architecture¶
flowchart LR
Web[Web/Mobile SDK] -->|POST /v1/collect| API[FastAPI control plane]
Batch[Batch upload] -->|POST /v1/datasets/.../batch| API
API -->|validate PDM| Schema[(Schema Registry<br/>Postgres)]
API -->|publish| Kafka[(Redpanda<br/>per-tenant topics)]
Kafka --> Worker[Ingestion worker]
Worker -->|micro-batch 30s| Iceberg[(Iceberg tables<br/>MinIO)]
API -->|preview| DuckDB[DuckDB]
DuckDB --> Iceberg
API -.->|metrics/traces| Otel[OpenTelemetry]
Worker -.->|metrics| Otel
Otel --> Grafana[Grafana + Prometheus + Loki + Tempo] Components¶
| Component | Role | Technology |
|---|---|---|
| Control plane | REST API: tenants, schema registry, datasets, collect, preview | FastAPI · Pydantic v2 · SQLAlchemy 2.0 async |
| Schema registry / system of record | Tenants, PDM entities, datasets, audit, dedup ledger | PostgreSQL 17 (with RLS) |
| Event bus | Per-tenant raw + dead-letter topics | Redpanda (Kafka API) |
| Ingestion worker | Kafka → Iceberg, exactly-once, micro-batched | Python async · aiokafka · pyiceberg |
| Data lake | Per-tenant Iceberg tables | Apache Iceberg on MinIO / S3 |
| Query | Dataset preview over the lake | DuckDB (iceberg_scan) |
| Observability | Metrics, logs, traces | Prometheus · Loki · Tempo · Grafana · OpenTelemetry |
| Admin portal | Tenant + schema + dataset UIs | Next.js 15 |
Key properties¶
- Strict isolation. Tenant data is isolated by Postgres row-level security; the runtime connects as a non-superuser role so RLS is actually enforced. The
tenant_idis also the Kafka topic and Iceberg namespace partition key. See Multi-Tenancy and RLS. - Asynchronous, lossless ingestion.
collectreturns202after publishing to Kafka. The worker commits Kafka offsets only after a successful Iceberg append; a crash mid-batch reprocesses without data loss. - Exactly-once into the lake. A Postgres
processed_event_idsgate (claimed per event in its own RLS-bound transaction, never held across the blocking append) ensures a re-delivered event yields exactly one row. - Observable by default. Both planes export Prometheus metrics and OTLP traces; the worker has a
/healthz. See Observability.
Why these choices¶
- FastAPI — async-native, OpenAPI-first, Pydantic v2 codegen for SDKs.
- PostgreSQL — native JSONB, RLS, logical replication, pgvector-ready.
- Redpanda — Kafka-protocol-compatible with no JVM/ZooKeeper, a single binary.
- Iceberg — vendor-neutral and multi-engine (Spark/Trino/DuckDB), unlike a proprietary lake.
See also¶
- Component Map — Pulse's components and capabilities at a glance.
- Data Flow — detailed streaming, batch, and preview paths.
- Quickstart — run all of this locally.