Skip to content

Your First Event

This walkthrough takes you from an empty stack to a queryable event in the lake. It follows the real ingestion path: create a tenant → register and publish a schema → materialize a dataset → collect an event → preview it.

It assumes the stack is running (Quickstart) and the API is at http://localhost:8000.

Every request except tenant creation is tenant-scoped

Resource requests carry an X-Tenant-ID header. The /v1/tenants endpoint is the one exception — it's how you mint the tenant in the first place. Tenant isolation is enforced in Postgres by row-level security; see Multi-Tenancy and RLS.

1. Create a tenant

curl -s -X POST http://localhost:8000/v1/tenants \
  -H 'Content-Type: application/json' \
  -d '{"slug": "acme", "name": "Acme Corp"}'
{"id":"019e5897-4cb2-75d2-95af-a3c75845eb44","slug":"acme","name":"Acme Corp","status":"active","metadata":{},"created_at":"...","updated_at":"..."}

Creating a tenant also bootstraps its Iceberg namespace and pre-creates its Kafka topics (pulse.{tenant}.events.raw and .events.dlq). Capture the id:

export TENANT=019e5897-4cb2-75d2-95af-a3c75845eb44

2. Register a schema

Schemas compose a class (here, the seeded system class interaction-event) with optional field groups. The definition is a JSON Schema (Draft 2020-12) describing the event's shape.

curl -s -X POST http://localhost:8000/v1/schemas \
  -H "X-Tenant-ID: $TENANT" -H 'Content-Type: application/json' \
  -d '{
    "uri": "https://ns.azothedge.com/pdm/schema/orders-v1",
    "title": "Orders",
    "class_uri": "https://ns.azothedge.com/pdm/class/interaction-event",
    "field_group_uris": [],
    "definition": {
      "type": "object",
      "properties": {
        "order_id": {"type": "string"},
        "amount": {"type": "number"},
        "timestamp": {"type": "string", "format": "date-time"}
      },
      "required": ["order_id", "timestamp"]
    }
  }'

A new schema starts in draft. Capture its id from the response as SCHEMA_ID.

Schema URIs are globally unique

uri is unique across the whole deployment. Re-running this with the same URI returns a 409 Conflict. Use a fresh URI (or tenant) when experimenting.

3. Publish the schema

A dataset can only be built from a published schema. Publishing also freezes the definition (it becomes immutable).

curl -s -X PATCH http://localhost:8000/v1/schemas/$SCHEMA_ID \
  -H "X-Tenant-ID: $TENANT" -H 'Content-Type: application/json' \
  -d '{"status": "published"}'

4. Create a dataset

A dataset materializes a per-tenant Iceberg table (partitioned by day(timestamp)) on object storage, bound to the schema by its URI.

curl -s -X POST http://localhost:8000/v1/datasets \
  -H "X-Tenant-ID: $TENANT" -H 'Content-Type: application/json' \
  -d '{
    "slug": "orders",
    "name": "Orders",
    "schema_uri": "https://ns.azothedge.com/pdm/schema/orders-v1"
  }'
{"slug":"orders","name":"Orders","schema_uri":"https://ns.azothedge.com/pdm/schema/orders-v1","iceberg_namespace":"pulse_019e58974cb275d295afa3c75845eb44","iceberg_table":"orders","status":"active",...}

5. Collect an event

POST /v1/collect validates the payload against the schema, publishes it to the tenant's raw-events topic, and returns 202 Accepted immediately — ingestion is asynchronous.

curl -s -X POST http://localhost:8000/v1/collect \
  -H "X-Tenant-ID: $TENANT" -H 'Content-Type: application/json' \
  -d '{
    "schema_id": "https://ns.azothedge.com/pdm/schema/orders-v1",
    "timestamp": "2026-05-22T14:30:00Z",
    "payload": {"order_id": "o-1001", "amount": 42.5, "timestamp": "2026-05-22T14:30:00Z"}
  }'
{"event_id":"019e5897-7b35-7ad2-8e26-51bb5f0f3783","status":"accepted"}

A payload that fails PDM validation (or references a schema this tenant can't see) is rejected with 422; malformed or unresolvable events that reach the worker are routed to the tenant's dead-letter topic.

6. Preview the lake

The worker micro-batches events and flushes to Iceberg every 30 seconds (or 10,000 rows, whichever comes first). After a flush, preview the dataset — the API resolves it under your tenant's RLS scope and queries the Iceberg table with DuckDB:

sleep 35
curl -s "http://localhost:8000/v1/datasets/orders/preview?limit=10" \
  -H "X-Tenant-ID: $TENANT"

You'll see your event materialized as a row. Because ingestion is exactly-once, posting the same event_id twice still yields exactly one row.

What just happened

flowchart LR
    you[curl] -->|POST /v1/collect| api[FastAPI]
    api -->|validate PDM| pg[(Postgres<br/>schema registry + RLS)]
    api -->|publish| kafka[(Redpanda<br/>events.raw)]
    kafka --> worker[Ingestion worker]
    worker -->|micro-batch 30s, exactly-once| iceberg[(Iceberg table<br/>MinIO)]
    api -->|GET /preview via DuckDB| iceberg

Next steps