Skip to content

PDM Schemas

Pulse models customer data with PDM — its composable schema system. Rather than authoring one monolithic schema per event, you assemble reusable building blocks (classes, field groups, and datatypes) into validatable schemas.

A schema is composed, not authored monolithically. You assemble reusable pieces — a class, field groups, and datatypes — into a schema, then publish it. The registry is multi-tenant: every entity is owned by a tenant and invisible to others (see Multi-Tenancy and RLS).

The building blocks

Entity What it is
Class Defines the behavioral base of a schema — its identity and core structure. Pulse seeds two: person-profile and interaction-event.
Field group A reusable set of fields that can be mixed into any schema whose class it targets.
Datatype A reusable, named object type referenced by fields in multiple field groups.
Schema A class + zero or more field groups, composed into a concrete, validatable shape.
Descriptor Metadata about a schema — identity fields, relationships, labels.

How composition works

A schema's definition is a JSON Schema (Draft 2020-12). Composition is expressed by URI references, not foreign keys:

  • $class — the URI of the class the schema extends.
  • $fieldGroups — the URIs of the field groups mixed in.

When you create a schema you supply class_uri and field_group_uris (or embed $class / $fieldGroups in the definition). The registry resolves those references — within your tenant's visibility — and rejects a schema that points at something you can't see.

The definition lifecycle

Every registry entity (class, field group, datatype, schema, descriptor) moves through a one-way lifecycle:

flowchart LR
    draft --> published --> archived
    draft --> archived
  • A definition is mutable only while draft. Patching the definition of a published entity returns 422.
  • Transitions are forward-only. published → draft returns 422.
  • Delete is soft. DELETE sets status = archived; the row and any backing Iceberg table are preserved.
  • Archiving is reference-checked. Archiving a class or field group that a schema still uses — or a schema a dataset is bound to — returns 409 with a referenced_by list.

This lifecycle (on registry definitions) is distinct from a dataset's operational lifecycle (active | paused | archived).

Validation at ingestion

When an event arrives at POST /v1/collect, its payload is validated strictly against the referenced schema's definition. Unknown or malformed payloads are rejected with 422; a schema URI the tenant can't see is treated as not-found. The worker re-validates on the way to the lake as defense in depth.

See also