Sectum AI vs Promptfoo
TL;DR. Promptfoo is the broadest open-source LLM red-team and evaluation framework — 50+ vulnerability types, 300,000+ developers, 127 Fortune 500 companies, and as of March 2026 acquired by OpenAI. Its primary unit of analysis is a prompt. Sectum AI is a multi-tenant infrastructure verifier — its unit is a tenant boundary across the AI stack, and its output is a tamper-evident, control-mapped evidence pack an auditor accepts. Use Promptfoo in your CI for fast prompt-level red-team. Use Sectum AI periodically — and at every Article 17 ticket and audit cycle — for tenant-boundary evidence.
What changed in March 2026
OpenAI announced the acquisition of Promptfoo on March 9, 2026. Co-founders Ian Webster and Michael D’Angelo joined OpenAI. The core CLI remains open source under MIT; the enterprise tier continues. The acquisition is the strongest possible validation that LLM red-team is now a strategic line item — and that the leading OSS framework belongs to the model vendor with the largest market position.
The two products
Promptfoo (promptfoo.dev)
Category: open-source LLM evaluation + red-team / vulnerability scanning framework.
License: MIT (OSS core). Commercial enterprise tier on top.
Footprint: 13.2k GitHub stars, 255 contributors, 300,000+ developers, 127 Fortune 500 companies. Used by OpenAI and Anthropic (named on the GitHub README).
Capability surface:
- 50+ vulnerability types — direct/indirect prompt injection, jailbreaking, PII leakage, tool misuse, toxic content.
- Adaptive red-teaming — generates adversarial inputs targeted at the system under test.
- Multi-round testing — multi-turn dialogue attacks.
- MCP testing — checks tool/server invocations.
- Compliance mapping — outputs map to OWASP, NIST, MITRE ATLAS, EU AI Act.
- Provider-agnostic — OpenAI, Anthropic, Gemini, DeepSeek, local models via Ollama, Hugging Face.
- Pricing: Community (OSS) free up to 10k probes / month; Enterprise priced (not public) adds team collab, SOC 2, ISO 27001 for the SaaS itself.
Sectum AI (sectum.ai)
Category: multi-tenant AI verification. Not a red-team framework. The deliverable is auditor-acceptable evidence that the tenant boundary holds across the AI stack — RFC 3161 timestamped, Sigstore Rekor logged, in-toto attested, control-mapped.
License: Apache 2.0 for the OSS core (substrate, attack catalog, adapters, evidence chain, sectum-ai verify). Hosted Sectum Cloud is commercial. The evidence layer in the OSS produces the same artifacts the hosted product does — by design.
Capability surface:
- 11 implemented attack classes focused on the multi-tenant boundary: organic entity-bleed RAG (the flagship), direct tenant-boundary fetch, semantic-cache contamination, KV-cache timing side channel, embedding inversion, agent tool-call hijacking (MCP confused-deputy + token passthrough), persistent memory contamination, LoRA / adapter cross-tenant influence, IKEA-style benign extraction, RAG poisoning, and GDPR Article 17 erasure verification.
- A marker substrate — synthetic tenants seeded with cryptographic canary markers, a hashed ground-truth manifest, and a layered detection pipeline (exact → semantic → calibrated judge) with zero false positives on confirmed findings.
- A tamper-evident evidence chain — RFC 3161 TSA + Sigstore Rekor + in-toto, all independently verifiable via
sectum-ai verify. - Per-finding control mappings — every finding carries
owasp_llm,atlas[], andnist[]IDs, rendered in the audit-pack PDF.
The categorical difference
| Promptfoo | Sectum AI | |
|---|---|---|
| Unit of analysis | A prompt or model output | A tenant boundary across the AI stack |
| Primary surface | The LLM endpoint (input/output) | 13 surfaces: API, vector DB, RAG, prompt/completion logs, semantic cache, KV cache, agent memory, MCP, fine-tunes / adapters, eval sets, backups, search indexes, tracing |
| Method | Adversarial input generation + judge scoring | Synthetic-tenant marker substrate + manifest-grounded layered detection |
| Determinism | Judge-based (probabilistic) | Manifest-grounded (zero false positives by construction on confirmed findings) |
| Output | Pass/fail per check, framework-mapped report | Tamper-evident evidence pack: RFC 3161 TSA, Sigstore Rekor inclusion proof, in-toto attestation envelope, control-mapped audit PDF, evidence.json |
| Verification | Re-run Promptfoo, trust the score | sectum-ai verify <pack> — any third party, without Sectum AI |
| Flagship engagement | — | GDPR Article 17 erasure attestation |
| For | Application engineering, ML platform, security | CISOs, DPOs, audit firms |
The most important row is the first one: the unit of analysis. Promptfoo asks “does this prompt trip this vulnerability?” — and it asks that question 50+ different ways with great breadth. Sectum AI asks “can tenant A’s data reach tenant B across this AI infrastructure, and can I prove it?” — and it answers that question with a manifest-traceable, cryptographically-attested evidence pack. Those are different questions, and most AI shops need both.
Where Promptfoo is the right tool
Promptfoo’s leverage is enormous when you need:
- CI-pipeline LLM red-team — every commit, every PR, a Promptfoo run lights up regressions in prompt-level safety / vulnerability behavior.
- Provider-agnostic A/B testing — comparing GPT-4 vs Claude vs Gemini vs DeepSeek on the same prompts and assertions.
- A single broad sweep — 50+ vulnerability types out of the box covering most LLM-app risks.
- OpenAI-validated backing — post-acquisition, Promptfoo is the default OSS choice with substantial corporate support.
If your AI surface is just an LLM endpoint and your security posture needs broad coverage of prompt-level risks, Promptfoo is the right answer. The OSS license, broad ecosystem, and the OpenAI backing make it hard to compete with on that mission.
Where Sectum AI is the right tool
Sectum AI is built for a different question, and its leverage shows when:
- You operate a multi-tenant AI system and need to prove the tenant boundary holds. Promptfoo’s
CrossContextRetrieval-style probes give a red-team signal; Sectum AI produces multi-surface, manifest-traceable, cryptographically-attested evidence an auditor or DPO accepts. - You’re facing a GDPR Article 17 erasure obligation. Promptfoo doesn’t verify erasure; Sectum AI’s Class 11 does — across all 7 configured erasure surfaces (vector DB, tracing, agent memory, semantic cache, model/fine-tune adapters, search index, eval set), with an attestation pack a DPO can hand to a regulator.
- You’re preparing for SOC 2, ISO 27001, or HIPAA in a multi-tenant SaaS. The auditor’s logical-access / boundary-protection controls (SOC 2 CC6.1 / CC6.6 / CC6.7; ISO 27001 A.5.15 / A.8.3 / A.8.12; HIPAA §164.312(a)(1) / (c)(1) / (e)(1)) need AI-specific evidence. Promptfoo’s report isn’t an attestation; Sectum AI’s evidence pack is.
- You need evidence that anyone can verify without your vendor in the room.
sectum-ai verify <pack>validates the chain end-to-end and exits4on any tampering — no Sectum AI installation required, just the OSS verifier.
Surface coverage, side by side
| Surface | Promptfoo | Sectum AI |
|---|---|---|
| LLM endpoint (input/output) | ✓ (the primary unit) | ✓ |
| RAG retrieval visible from a prompt | ✓ (with adapters) | ✓ |
| Vector DB direct (cross-tenant integrity) | — | ✓ (Pinecone, pgvector, Weaviate, Chroma live adapters) |
| Semantic cache (cross-tenant key safety) | — | ✓ (Class 4 + live Redis adapter) |
| KV cache (timing side channel) | — | ✓ (Class 5, statistical Cohen’s d effect-size test) |
| Embedding inversion across tenants | — | ✓ (Class 6) |
| Agent tool calls (MCP confused-deputy + token passthrough) | ✓ MCP testing (red-team angle) | ✓ Class 7 (the Asana-class flaw with per-finding evidence) |
| Persistent agent memory across tenants | — | ✓ (Class 8) |
| LoRA / fine-tune cross-tenant influence | — | ✓ (Class 9) |
| Multi-turn benign extraction | ✓ (multi-round attacks) | ✓ (Class 10 — Silent Leaks / IKEA-style) |
| RAG poisoning | ✓ (red-team angle) | ✓ (Class 3) |
| GDPR Article 17 erasure verification | — | ✓ (Class 11 — the Erasure Attestation engagement) |
| Observability backends (Langfuse / LangSmith / Phoenix) | — | ✓ (live observability adapters) |
Promptfoo is broad across LLM-prompt risk; Sectum AI is broad across the tenant boundary on the AI infrastructure. The two breadths run perpendicular.
Evidence model
Promptfoo’s compliance mapping is excellent for a red-team product — OWASP, NIST, MITRE ATLAS, EU AI Act all supported. The output is a structured report with framework references that a security engineer can read and a compliance team can file.
Sectum AI’s evidence model is categorically different:
- The run is canonicalized (deterministic JSON, sorted keys) and hashed (SHA-256).
- The digest is submitted to an RFC 3161 Time-Stamp Authority that returns a token attesting the digest existed at this time.
- The digest + signature is recorded in a Sigstore Rekor transparency log entry with an inclusion proof.
- Everything is wrapped in an in-toto attestation envelope.
- The pack ships with the ground-truth manifest hash, so anyone can independently confirm the test conditions were the ones the pack claims.
sectum-ai verify <pack>recomputes everything and reportsVERIFIEDor[FAIL]with the failing check named. Mutating a single byte in the JSON makes verify exit4.
This is a different shape of artifact. A red-team report says “we ran these probes, here’s what we found.” An evidence pack says “on this date, against this exact test condition, these findings hold, and here’s the cryptographic proof that you can verify yourself.” For an auditor or DPO, that distinction is the entire game.
Using both
In practice, AI shops that need both run both:
- Promptfoo in the CI pipeline for fast feedback on prompt-level safety and vulnerability regressions at every PR.
- Sectum AI on a release cadence (and on-demand for customer attestations, audit cycles, and Article 17 tickets) for multi-tenant isolation evidence.
The two are perpendicular: Promptfoo covers depth on prompt-level risk; Sectum AI covers depth on tenant-boundary integrity. Neither replaces the other; both compound the same buyer’s posture.
Honest positioning
Promptfoo is a great product with strong distribution (OpenAI now backs it). Sectum AI is not in the LLM-red-team category — it focuses on multi-tenant verification with tamper-evident evidence, where Promptfoo doesn’t compete.
For a Promptfoo user looking for cross-tenant retrieval coverage: Promptfoo’s cross-context probes find prompt-level smoke; Sectum AI verifies infrastructure-level fire. Use Promptfoo for the smoke alarm; use Sectum AI for the fire marshal’s report.
Pricing
- Promptfoo (OSS) — MIT, free, 10k probes / month included.
- Promptfoo Enterprise — priced (not public), adds team collaboration, SOC 2, ISO 27001 for the SaaS.
- Open Sectum (OSS) — Apache 2.0, free; the evidence layer is fully open.
- Sectum Cloud — see pricing.
References
- Promptfoo — GitHub, docs, red-teaming guide, pricing, acquisition coverage (PE Collective, March 2026), MIT license.
- Sectum AI — GitHub, docs, attack catalog, evidence chain, sample evidence packs.