Your First Event¶
This walkthrough takes you from an empty stack to a queryable event in the lake. It follows the real ingestion path: create a tenant → register and publish a schema → materialize a dataset → collect an event → preview it.
It assumes the stack is running (Quickstart) and the API is at http://localhost:8000.
Every request except tenant creation is tenant-scoped
Resource requests carry an X-Tenant-ID header. The /v1/tenants endpoint is the one exception — it's how you mint the tenant in the first place. Tenant isolation is enforced in Postgres by row-level security; see Multi-Tenancy and RLS.
1. Create a tenant¶
curl -s -X POST http://localhost:8000/v1/tenants \
-H 'Content-Type: application/json' \
-d '{"slug": "acme", "name": "Acme Corp"}'
{"id":"019e5897-4cb2-75d2-95af-a3c75845eb44","slug":"acme","name":"Acme Corp","status":"active","metadata":{},"created_at":"...","updated_at":"..."}
Creating a tenant also bootstraps its Iceberg namespace and pre-creates its Kafka topics (pulse.{tenant}.events.raw and .events.dlq). Capture the id:
export TENANT=019e5897-4cb2-75d2-95af-a3c75845eb44
2. Register a schema¶
Schemas compose a class (here, the seeded system class interaction-event) with optional field groups. The definition is a JSON Schema (Draft 2020-12) describing the event's shape.
curl -s -X POST http://localhost:8000/v1/schemas \
-H "X-Tenant-ID: $TENANT" -H 'Content-Type: application/json' \
-d '{
"uri": "https://ns.azothedge.com/pdm/schema/orders-v1",
"title": "Orders",
"class_uri": "https://ns.azothedge.com/pdm/class/interaction-event",
"field_group_uris": [],
"definition": {
"type": "object",
"properties": {
"order_id": {"type": "string"},
"amount": {"type": "number"},
"timestamp": {"type": "string", "format": "date-time"}
},
"required": ["order_id", "timestamp"]
}
}'
A new schema starts in draft. Capture its id from the response as SCHEMA_ID.
Schema URIs are globally unique
uri is unique across the whole deployment. Re-running this with the same URI returns a 409 Conflict. Use a fresh URI (or tenant) when experimenting.
3. Publish the schema¶
A dataset can only be built from a published schema. Publishing also freezes the definition (it becomes immutable).
curl -s -X PATCH http://localhost:8000/v1/schemas/$SCHEMA_ID \
-H "X-Tenant-ID: $TENANT" -H 'Content-Type: application/json' \
-d '{"status": "published"}'
4. Create a dataset¶
A dataset materializes a per-tenant Iceberg table (partitioned by day(timestamp)) on object storage, bound to the schema by its URI.
curl -s -X POST http://localhost:8000/v1/datasets \
-H "X-Tenant-ID: $TENANT" -H 'Content-Type: application/json' \
-d '{
"slug": "orders",
"name": "Orders",
"schema_uri": "https://ns.azothedge.com/pdm/schema/orders-v1"
}'
{"slug":"orders","name":"Orders","schema_uri":"https://ns.azothedge.com/pdm/schema/orders-v1","iceberg_namespace":"pulse_019e58974cb275d295afa3c75845eb44","iceberg_table":"orders","status":"active",...}
5. Collect an event¶
POST /v1/collect validates the payload against the schema, publishes it to the tenant's raw-events topic, and returns 202 Accepted immediately — ingestion is asynchronous.
curl -s -X POST http://localhost:8000/v1/collect \
-H "X-Tenant-ID: $TENANT" -H 'Content-Type: application/json' \
-d '{
"schema_id": "https://ns.azothedge.com/pdm/schema/orders-v1",
"timestamp": "2026-05-22T14:30:00Z",
"payload": {"order_id": "o-1001", "amount": 42.5, "timestamp": "2026-05-22T14:30:00Z"}
}'
{"event_id":"019e5897-7b35-7ad2-8e26-51bb5f0f3783","status":"accepted"}
A payload that fails PDM validation (or references a schema this tenant can't see) is rejected with 422; malformed or unresolvable events that reach the worker are routed to the tenant's dead-letter topic.
6. Preview the lake¶
The worker micro-batches events and flushes to Iceberg every 30 seconds (or 10,000 rows, whichever comes first). After a flush, preview the dataset — the API resolves it under your tenant's RLS scope and queries the Iceberg table with DuckDB:
sleep 35
curl -s "http://localhost:8000/v1/datasets/orders/preview?limit=10" \
-H "X-Tenant-ID: $TENANT"
You'll see your event materialized as a row. Because ingestion is exactly-once, posting the same event_id twice still yields exactly one row.
What just happened¶
flowchart LR
you[curl] -->|POST /v1/collect| api[FastAPI]
api -->|validate PDM| pg[(Postgres<br/>schema registry + RLS)]
api -->|publish| kafka[(Redpanda<br/>events.raw)]
kafka --> worker[Ingestion worker]
worker -->|micro-batch 30s, exactly-once| iceberg[(Iceberg table<br/>MinIO)]
api -->|GET /preview via DuckDB| iceberg Next steps¶
- PDM Schemas — the full data model behind step 2.
- Datasets and the Data Lake — what step 4 created.
- API Reference — every endpoint and field.