Skip to content

Observability

One submission corresponds to one distributed trace across every service it touches. Every state-changing and auth event is captured for audit.

Tracing

  • OpenTelemetry SDK in every service → Cloud Trace, Cloud Logging, Cloud Monitoring.
  • Trace context propagates through Pub/Sub messages and Temporal activities via a custom propagator.
  • The trace ID is logged on every structured log line — operators can jump from a log entry straight to the full trace.
flowchart LR
  SPA[SPA] -->|trace_id=X| GW[Gateway]
  GW -->|trace_id=X| SUB[submission-svc]
  SUB -->|trace_id=X<br/>via Pub/Sub| WF[workflow-svc]
  WF -->|trace_id=X<br/>via Temporal activity| AI[ai-svc]
  AI -->|trace_id=X| SIGN[signing-svc]

  SUB --> CT[(Cloud Trace)]
  WF --> CT
  AI --> CT
  SIGN --> CT

SLOs

SLO Target
Submission intake availability 99.9%
AI compliance report latency p95 < 5 min
BIM tile first-byte latency p95 < 2 s
SLA reminder delivery success 99.5%

Error budgets and burn-rate alerts are wired to on-call paging.

Audit stream

  • Every state-changing domain event and every auth event is published to Pub/Sub.
  • A BigQuery sink subscribes to those topics. The dataset is append-only, retained 10 years.
  • Auditors read a purpose-built BQ dataset + Looker Studio workspace. They never touch service databases directly.

Tamper evidence

A daily hash-chain digest of the audit stream is signed by Cloud KMS and published to a public transparency log endpoint. Any retroactive tampering with audit rows would break the chain and be externally detectable.

flowchart LR
  EV[Audit events<br/>Pub/Sub] --> BQ[(BigQuery<br/>append-only)]
  BQ -->|nightly| HASH[Hash chain digest]
  HASH --> KMS[KMS sign]
  KMS --> TLOG[Public transparency log]

PDPA subject requests

Export and erasure operate on submission_id / user_id. Erasure is tombstone + redact, never physical delete, so the audit chain stays intact.

Runbook hooks

Every SLO alert links to a runbook (maintained in docs/runbooks/ — M7 deliverable). Alerts that don't link to a runbook are considered a bug and must be fixed before the next release.