Testing¶

SISS tests against real dependencies. docker-compose brings up Postgres, Redis, Pub/Sub emulator, Temporal, fake GCS, and an OpenTelemetry collector. Mocks are used for external providers only (e.g., KMS emulator for signing, recorded Vertex AI responses for AI evals).

Test pyramid¶

UnitContractTemporal replayIntegrationAI evalsUI (e2e)Load

Tool: pytest + pytest-asyncio per service.
Target: ≥ 80% coverage on core domain modules.
Frontend: vitest for component units.
What it catches: logic regressions, boundary conditions.

Tool: Pact (consumer-driven).
What: every inter-service REST call; every Pub/Sub event schema.
Where it runs: on every service PR; producer CI fails on consumer-detected breakage.
What it catches: shape drift between services and between SPA ↔ services.

Tool: Temporal replay test framework.
What: production workflow history is captured and replayed against proposed workflow code changes in CI.
What it catches: breaking workflow changes that would strand in-flight submissions.

Stack: full docker-compose — all services + AlloyDB + Pub/Sub emulator + Temporal + fake GCS + KMS emulator.
What: end-to-end happy-path tests per milestone.
What it catches: event-flow bugs that don't surface in unit tests.

Data: golden set per prompt, recorded Vertex responses.
Metrics: parameter extraction F1, verdict accuracy, malformed-output rate. See the AI pipeline for thresholds.
Gate: regression run on every prompt or model change; thresholds block deploy.

Tool: Playwright.
Flows: PSP portal happy path (new submission wizard, direct-to-GCS upload); officer console (review + sign); xeokit viewer interactions (M4 onward).
Stack: runs against docker-compose (not prod-like cloud stubs).

Tool: k6.
Scenarios: 200 concurrent PSP uploads; 50 concurrent officer dashboard reads; workflow throughput.
Gates: the SLOs — if a scenario breaches p95, the build fails.
When: before M6; trend-tracked after.

The end-to-end M2 sign-off test¶

One script, one docker-compose stack, the entire platform:

Log in as PSP → submit a realistic fixture submission.
Watch the Temporal UI advance through the workflow stages.
Log in as CMU officer → accept pre-consult.
Observe ai-svc producing a compliance report; finalise decisions.
Log in as ATD / ATL officer → review + sign off.
Log in as CMU Admin → issue SIGL.
Assert: SIGL + Kertas Perakuan PDFs verify; audit chain is continuous; every expected notification was recorded.

This test is the primary M2 gate.

What doesn't get mocked¶

AlloyDB — always a real Postgres in tests.
Pub/Sub — always the GCP emulator.
Redis — real container.
Temporal — real server (embedded or container).
GCS — fake-gcs-server, but with real signed-URL semantics.

What does get mocked¶

External cloud APIs for which emulators aren't reliable (fine-grained Vertex AI, production KMS). Replaced with recorded or local equivalents.
Email / SMS providers — MailHog in dev; SendGrid / Twilio only wired up in staging onward.

Why the "no mocks" stance

Mocks replicate bugs in the mock, not in production. SISS's compliance surface (submissions, certificates, audit) is too tight to accept that drift.