Catch the prompt or model change that broke isolation.

The published research is uncomfortable: stronger embedding models leak more. The team that swaps MiniLM for mpnet-base to improve recall accidentally raises their Retrieval-Pivot Rate from 62% to 95%. The team that tightens a prompt to be more helpful accidentally surfaces a fine-tune adapter's memorised content. Sectum AI's baseline engine catches it.

Start with the OSS See engagements

How regression baselines work

  1. Save the baseline: sectum-ai baseline --save records the current run's metrics — Retrieval-Pivot Rate, per embedding model, per-probe confirmed-finding counts, side-channel effect sizes, erasure residue per surface — into baseline.json.
  2. Re-run after a change: a CI step runs sectum-ai probe then sectum-ai baseline --compare.
  3. Get the verdict: each metric is compared against the baseline; any metric that moved in the worse direction is flagged REGRESSED. The CI step exits 2 — the same exit code as a fresh confirmed finding — so existing CI gates pick it up.

What the baseline catches

Embedding model upgrades

Swapping MiniLM for a stronger model is the canonical regression. Recall goes up; Retrieval-Pivot Rate goes up with it. The baseline catches the trade-off before it ships.

Prompt or system-message rewrites

A more helpful system message can flip a probe from “refused” to “leaked.” The baseline catches the cliff.

Adapter / fine-tune rollouts

A newly-deployed LoRA adapter that memorised a canary phrase surfaces it in the next probe run. per_probe_findings for Class 9 jumps; the baseline flags the regression.

Cache-key changes

A semantic-cache refactor that accidentally drops the tenant scope from the key shows up as Class 4 findings appearing where there were none before.

Wire it into CI

The OSS already supports this end-to-end. A minimal GitHub Actions step:

- name: Sectum AI probe + regression check
  run: |
    uv run sectum-ai probe --workdir .sectum-ai --output json > probe-summary.json
    uv run sectum-ai baseline --workdir .sectum-ai --compare
  env:
    # ... secrets resolved by sectum-ai.yaml at runtime

Exit code 0 = no regression. Exit code 2 = a metric moved in the worse direction; CI fails the build. The probe-summary.json captures the headline metrics for your dashboard.

When to upgrade from the OSS

The OSS gives you the engineering CI loop. Upgrade to a hosted SKU when:

Engagement

OSS is free under Apache-2.0 — that's the CI integration path. For continuous, managed verification with the baseline maintained across runs in a dashboard, start an engagement for a quote.

Start with the OSS See engagements