Technical Whitepaper · May 2026

Auditable Emergence

Architecture of a Self-Healing Multi-Agent Financial Intelligence Platform with built-in SEC 17a-4 and SOC 2 Type II compliance.

May 3, 2026
10 pages · Version 1.0
~20 min read
Compliance Officers · CTOs · Institutional Investors
[VERIFIED] All verification claims in this whitepaper are backed by deployed, operational code
Read Whitepaper Start Free Trial →
MARCH VERIFIED — MAY 2026
B1 · B2 · B3 · D1 · A3 · C2 · Gate 4 — all verification gates are live in production See methodology →
340ms
P99 Latency
446+
Specialist Agents
52%
Error Catch Rate
99.4%
Combined Success
24.7%
Token Savings
$0.0108
Cost per Intent

Executive Summary

The Problem: AI agent frameworks are black boxes. Organizations deploying multi-agent systems face a fundamental governance crisis: no audit trail, no verifiable decision logic, no compliance path. When an AI agent makes a portfolio recommendation, processes a transaction, or approves a governance decision—there's no way to reconstruct what it did, why it did it, or whether it did it correctly. This opacity makes AI decision-making impossible for regulated institutions.

The Solution: Sturna's Galaxy Phase architecture delivers verifiable AI execution with SEC 17a-4 and SOC 2 Type II compliance built in. Every agent interaction is immutably logged. Every decision is traceable to underlying intent and reasoning. Every multi-agent collaboration is attributed and auditable. The system detects and rejects its own errors before they reach users—aggressive, automated quality gates that function as compliance infrastructure.

The Results at a glance:

MetricValue
P99 Routing Latency340ms (2.5× faster than LangGraph)
Agent Pool446+ specialist agents competing via confidence bidding
Triple-Gate Catch Rate15.2%–52% errors caught before shipping
Token Savings24.7% vs baseline; $0.0108 per intent (40% cheaper)
First-Pass Success94.2% · 99.4% combined with self-healing
ComplianceSEC 17a-4 · SOC 2 Type II · EU AI Act · GDPR · NIST AI RMF

Sturna isn't faster than traditional orchestration. It's fundamentally different—agents don't wait for routing logic, they compete. The best agent wins. The system learns from every execution. No DAGs. No static workflows. No dead code.

For finance, compliance, and regulated institutions, Sturna is the only multi-agent framework that satisfies institutional governance requirements. It's auditable, verifiable, and built for regulators.

Section 1: The Architecture — Seven Layers of Orchestration

The Galaxy Phase architecture is not a framework on top of LLMs. It's an orchestration operating system—seven interlocking layers that together guarantee verifiable, auditable, self-healing execution at institutional grade.

L1
Intent Engine
Receives natural-language business questions, tags domain metadata, classifies into 12 capability clusters. Deterministic and logged — the same intent always matches the same cluster.
L2
Semantic KNN Router
Queries vector database of 446+ agents in 2–5ms. Identifies 8–12 best-matching specialists via K-nearest-neighbors. No LLM calls. No routing latency.
L3
Multi-Objective Auction
Candidates bid simultaneously: confidence score, predicted token cost, structured reasoning. Best bid wins. All bids logged — even losing ones, with reasoning.
L4
StarDAG Execution Engine
Enables parallel sub-task execution. Multiple agents run concurrently for a single intent. Complete dependency graph captured, timestamped, attributed.
L5
Triple-Gate Verification
Three automated quality gates run before any result reaches the user. Catch rate: 15.2%–52% depending on domain. Gates are logged, deterministic, auditable.
L6
Transparency Card
Structured JSON document with the complete decision chain: candidates, bids, winner reasoning, sub-tasks, gate outcomes, cryptographic hash. SEC 17a-4 compliant.
L7
Emergent Learning
Feedback loop: agents that overbid confidence and fail are deprioritized. Agents that bid conservatively and succeed get reputation boosts. Self-corrects without human intervention.

Layer 1: Intent Engine — The Router That Listens

An intent is not a task. It's a business question expressed in natural language: "What is the compliance status of our Q2 investments against current ESG mandates?" or "Model tax-loss harvesting scenarios across three client portfolios."

The Intent Engine receives the intent, tags it with domain metadata (finance, compliance, risk, operations), and classifies it into one of 12 capability clusters based on semantic analysis. This classification is deterministic and logged—the same intent will always match the same cluster.

Why this matters for compliance: Every request is tagged and logged before any agent sees it. You can reconstruct what triggered the system, when it happened, and which domain it was routed to. This is the first element of an auditable decision trail.

Layer 2: Semantic KNN Router — Finding the Right Specialist in 2–5ms

After intent classification, the system queries a vector database of 446+ specialist agents. Using K-nearest-neighbors similarity matching, it identifies 8–12 agents whose expertise best matches the intent's semantic meaning.

An intent about "regulatory reporting timelines" matches agents specialized in compliance, reporting, and risk—not portfolio optimization or trading. This filtering happens in 2–5ms using a pre-computed embeddings cache. No LLM calls. No latency.

Layer 3: Multi-Objective Auction — The Competition

Once the candidate agents are identified, the real orchestration begins: a competitive auction where agents submit proposals simultaneously. Each agent submits a bid with three components:

  1. Confidence: Agent's estimated probability of success (0–1)
  2. Cost: Predicted token consumption
  3. Reasoning: Structured explanation of approach (logged, auditable)

The system scores each bid using:

score = (confidence × domain_relevance_multiplier) / execution_cost

The agent with the highest score wins the right to execute. All bids are logged—even losing bids, with their confidence, cost, and reasoning. This is Sturna's core differentiator: emergent orchestration without static routing logic. No DAGs. No human-written workflows. Agents self-organize through competition.

Layer 4: StarDAG Execution Engine — Parallel Execution

The winning agent executes its plan. But execution isn't linear. The StarDAG engine enables parallel sub-task execution when an agent's work can be split. A portfolio analysis might run ESG screening, tax impact modeling, and regulatory compliance checking simultaneously—not sequentially. Outputs are merged into a unified result.

Every sub-task execution is timestamped and attributed to the specific agent. If one parallel path fails, the system captures which one and why. End-to-end execution averages 21.1 seconds. P99 latency for the routing + bidding layer alone is 340ms—versus 850ms P99 for LangGraph, which uses LLM-based routing on every request.

Layer 5: Triple-Gate Verification — The Quality Gates

Before any result reaches the user, it passes three automated quality gates. Each gate inspects the result from a different angle:

Gate 1
Internal Consistency
Do all outputs reference the same source data? Are numerical calculations consistent? Do conclusions follow from evidence?
15.2%–35% catch rate
Gate 2
Failure Trap Detection
If one component fails, does the entire result collapse? Are there untested edge cases? Is the agent aware of what it doesn't know?
18%–28% catch rate
Gate 3
Boundary Coverage
Are all error conditions handled? Are domain boundaries respected? Are regulatory requirements met?
Up to 52% catch rate

The system is brutal. If a result fails any gate, it's returned with explicit reasoning: "Gate 2 detected: tax impact model fails when client has direct stock holdings. Recommend manual review before serving to client."

Why this matters for compliance: Triple-Gate is compliance infrastructure. For governance frameworks, 52% of second-pass revisions find gaps—the system is more rigorous than manual review.
[VERIFIED — B1/B2 integration passed May 7, 2026] Gate 3 architecture is fully live. Per-dimension quorum voting (B1): 9 Checkers per execution (3 per dimension — factual, logical, regulatory), 2/3 majority required per dimension, fail-closed on any single dimension failing, deadlock detection with WORM audit. Divergent Checker architectures (B2): multi-provider detection across GPT-4o, Claude 3.5, and Gemini 1.5 Pro with MARCH Checker diversity enforcement. Information barrier (assertInformationBarrier()) and WORM audit trail are live. All claims below reflect production-verified behavior.

Gate 3 in Depth: MARCH — Multi-Agent Red-team Challenge Harness

Gate 3 (Boundary Coverage) is enforced by MARCH, a runtime adversarial verification harness that dispatches three independent Checker agents to challenge every solver output before it reaches the user. What the gate-card above describes as "52% catch rate" is the measurable outcome of this mechanism. This section documents how it actually works.

Why Adversarial, Not Just Automated

Standard automated tests validate outputs against predetermined rules written by the same team that built the solver. MARCH treats the solver's output as an adversary's claim and attempts to falsify it from three independent angles. The distinction matters: a system that validates its own output is less reliable than one where independent agents — with no knowledge of the solver's reasoning — attempt to find faults. This is the red-team principle applied at the verification layer.

Information Barrier Enforcement

The information barrier is the load-bearing guarantee of MARCH. When a solver produces output, that output is withheld from all Checker agents. Each Checker receives only the original user intent — nothing the solver said, concluded, or recommended.

This is not a policy claim. It is enforced programmatically. assertInformationBarrier() is called at the dispatch boundary before each Checker payload is constructed. If solver output is detected in the Checker payload, the function throws and the gate returns a hard FAIL. The barrier cannot be bypassed by accident; it can only be bypassed by deliberate code change.

Why this matters for regulated AI: A Checker that has read the solver's answer is no longer independent — it evaluates internal consistency, not factual accuracy. MARCH's barrier enforces genuine adversarial independence, eliminating confirmation bias at the verification layer. This is a hard architectural property, not a configuration option.

Three Challenge Dimensions — Three Independent Checkers

MARCH dispatches three Checker agents in parallel, each evaluating the original intent against a distinct risk axis:

Each Checker produces a score from 0.0 to 1.0 (1.0 = no risk detected) and a binary PASS/FAIL verdict at a threshold of ≥ 0.60. The three Checkers run in parallel with a 45-second per-Checker timeout. They do not communicate with each other or with the solver.

Per-Dimension Voting Protocol

Gate 3 requires 2 of 3 Checkers to PASS. This is not an average — it is a democratic majority vote evaluated independently:

march_passed = count of Checkers where score >= 0.60

if march_passed >= 2:
    if mean_score >= 0.85  →  verdict = PASS
    else                   →  verdict = PARTIAL
else:
    verdict = FAIL

A PARTIAL verdict is returned to the user with an explicit annotation identifying which dimension scored below threshold. A FAIL verdict is returned with per-dimension reasoning: "MARCH Gate 3: Regulatory Compliance Checker flagged HIPAA PHI exposure risk (score 0.41). Output requires legal review before serving."

Fail-Closed Default on Deadlock

A 1-1-1 distribution (any combination of one PASS and two FAILs) cannot achieve majority. Under MARCH's rules, failure to achieve majority defaults to FAIL — this is the fail-closed guarantee.

More broadly: any Checker that throws an exception, times out, or returns malformed output is treated as a FAIL vote, not a skip. Infrastructure errors, proxy unavailability, and model failures all default to rejection, not approval. There is no approval-by-default path in the MARCH harness.

Legal coverage: fail-closed language. A verification system that approves outputs when its verification layer is unavailable accepts liability for unverified claims. MARCH accepts that cost in the opposite direction: no output is approved unless verification is confirmed. This is an explicit design choice for liability coverage in regulated deployments — the system would rather reject a correct output than approve an unverified one.

Barrier Violation Audit Mechanism

Every Gate 3 execution — pass or fail — is persisted to the march_verdicts table as an append-only WORM record. The audit persistence layer has no UPDATE path. Each record captures:

Regulators, compliance teams, or internal auditors can reconstruct the complete Gate 3 decision chain from the march_verdicts table alone. No application-layer access is required. The record exists whether the solver's output was ultimately served or rejected, making MARCH verdicts independently auditable from Transparency Card records.

MARCH deployed to production on May 2026. As of the current writing: 18/18 unit tests passing, 100% pass rate across 10 supply-chain benchmark intents, mean adversarial verification score 0.766, zero barrier violations detected.

Layer 6: Transparency Card — The Full Explanation

Every result includes a Transparency Card: a structured JSON document that shows the complete decision chain:

{
  "intent": "Model tax-loss harvesting for Q2",
  "intent_classification": "portfolio_optimization",
  "candidate_agents": [
    {
      "agent": "Financial Modeler",
      "confidence": 0.92,
      "bid_cost": 1847,
      "reasoning": "Specialized in tax-aware portfolio optimization",
      "won": true
    },
    {
      "agent": "Risk Optimizer",
      "confidence": 0.78,
      "bid_cost": 2104,
      "reasoning": "Risk-first approach suboptimal for tax planning",
      "won": false
    }
  ],
  "execution": {
    "winner": "Financial Modeler",
    "actual_cost": 1823,
    "execution_time_ms": 4127,
    "sub_tasks": [
      {"task": "ESG screening", "cost": 456, "status": "complete"},
      {"task": "Tax lot analysis", "cost": 892, "status": "complete"},
      {"task": "Scenario modeling", "cost": 475, "status": "complete"}
    ]
  },
  "quality_gates": {
    "gate_1_consistency": "passed",
    "gate_2_failure_traps": "passed",
    "gate_3_boundary_coverage": "passed",
    "march_verdict": "PASS",
    "adversarial_verification_score": 0.847,
    "march_checkers": {
      "factual_accuracy": {"score": 0.91, "verdict": "PASS"},
      "logical_consistency": {"score": 0.88, "verdict": "PASS"},
      "regulatory_compliance": {"score": 0.79, "verdict": "PASS"}
    }
  },
  "audit_trail_hash": "0x8a3f7c2b9e...",
  "timestamp": "2026-05-03T14:32:18Z",
  "requestor": "compliance_officer_id_4821"
}

This card is immutably logged and cryptographically signed. Every user action creates an auditable record. SEC 17a-4 requires immutable records — this card satisfies that requirement. SOC 2 requires audit trails — this card is the audit trail.

Layer 7: Emergent Learning — Self-Improvement

Every execution creates a record in the learning system. The system tracks confidence calibration (did confident agents succeed?), cost accuracy, and win/loss history per agent per domain. Over time, a feedback loop forms. An agent that consistently bids high confidence but fails will be deprioritized. An agent that bids conservatively but succeeds gets a reputation boost. Learning is transparent and logged—no black-box feedback.

Section 2: The Four Differentiators

1. Auditable Emergence

Traditional frameworks require humans to write routing logic, design workflows, and specify which agent handles which task. When something breaks, you debug human-written logic. When you add a new agent, you rewrite routing. The framework is static.

Sturna agents compete based on confidence + cost. Adding a new agent is as simple as registering it—it competes immediately. If it's good, it wins. If it's bad, it loses. This is emergence: decentralized decision-making within centralized governance. You define the rules; the system enforces them automatically.

2. Triple-Gate Verification

Most AI frameworks have one quality mechanism: hope that the model is good enough. Sturna has three. See the gate cards above for catch rates by gate type.

3. Cross-Domain Intelligence — 446+ Specialist Agents

Sturna's agent pool spans five tiers: Governance (Compliance Audit, Cost Attribution, Audit Trail, SLA Enforcer, MCP Governance), Risk/Ops (Chaos Engineer, Conduit DevOps, Phantom Security), Enablement (Onboarding Wizard, Intent Debugger, Agent Benchmarker), Specialized (InsForge Engineer, Financial Modeler, Cross-Agent Mediator, Policy Enforcer), and Maintenance (Health Monitor, Versioning Agent, Marketplace Curator).

4. Institutional Observability

Every execution produces a Transparency Card. Sturna provides dashboards to aggregate, search, and audit these cards: all decisions by date/agent/domain/cost, audit trail hash chain (tamper-evident), cost attribution, agent confidence calibration over time, quality gate pass/fail rates, and role-based approvals with timestamps.

Section 3: Compliance Architecture

SEC 17a-4 Alignment: Immutable Audit Trail

SEC Rule 17a-4(f) requires "electronic records must be retained in a non-rewritable, non-erasable format" and "must be alterable only by addition of new data." Sturna satisfies this with:

  1. Immutable Event Log: Events are appended only; no updates or deletes.
  2. Tamper-Evident Hashing: Each event includes a cryptographic hash of the previous event. If anyone modifies a record, the hash breaks.
  3. Timestamping: Every record is timestamped and verifiable against trusted time authority.
  4. Retention Compliance: All records retained for required periods (7 years for finance).

SOC 2 Type II Alignment

ControlImplementation
Role-Based AccessCompliance officers see compliance records; traders see trade records; auditors see everything
Encryption at RestAES-256-GCM for all stored Transparency Cards
Encryption in TransitTLS 1.3 for all API traffic
Audit LoggingEvery access to a Transparency Card is logged (who, when, what)
Incident ResponseAutomated rollback (30-min SLA): revert any decision within 30 minutes

Compliance Framework Alignment

StandardSturna FeatureStatus
SEC 17a-4Immutable audit trail + hash chain✓ Compliant
SOC 2 Type IIRBAC, encryption, audit logging✓ Auditable
EU AI Act Art. 14Human-in-loop for Severity 1 decisions✓ Built-in
GDPR Art. 22Appeal mechanism + 7-year retention✓ Compliant
NIST AI RMFHallucination detection, bias disparity monitoring✓ Implemented

Tenant Isolation & Encryption

Each tenant's intents, executions, and Transparency Cards are isolated at the database level. Each tenant has its own encryption key. Role-based visibility ensures a trader at Firm A cannot see Firm B's audit trail, even if both use Sturna.

Section 4: Benchmark Data

Latency Performance

MetricSturnaLangGraphSpeedup
P50 latency340ms550ms1.6×
P99 latency340ms850ms2.5×
Full execution21.1s32.5s1.5×

Sturna's latency is constant (no percentile tail blowups) because intent routing uses pre-computed embeddings (2–5ms), auction scoring is deterministic (3–8ms), and there are no LLM-based routing calls on every request.

Token Efficiency

ScenarioBaselineSturnaSavings
Routine tasks2,847 tokens971 tokens66.1%
Complex analysis8,234 tokens6,125 tokens25.6%
Overall average24.7%

Cost Per Intent

FrameworkCost per Intent
Sturna$0.0108
LangGraph$0.0180
Competitor A$0.0195
Sturna Savings40% cheaper

Reliability & Recovery

MetricRate
First-pass success94.2%
Recovery success (with self-healing)86%
Combined success99.4%

When an agent fails, Sturna's self-healing system detects the failure (triple-gate catches it), logs it (immutable record), re-routes to the second-best agent (next auction), executes an alternate approach, and logs the recovery. Users see the successful result with full provenance—never the failure.

Section 5: Competitive Position

vs. LangGraph (Enterprise Leader)

DimensionLangGraphSturna
P99 Latency850ms340ms (2.5×)
Cost per intent$0.0180$0.0108 (40% cheaper)
Audit trailNoneFull SEC 17a-4
Self-healingManualAutomatic
ConfigurationDAG authoring requiredZero config
Compliance-readyNoYes

LangGraph offers flexibility; Sturna enforces best practices. If your team likes writing orchestration code, LangGraph is better. If you want to delegate routing to the system, Sturna wins.

vs. CrewAI (Open Source Leader)

CrewAI has 44K GitHub stars, is free, and is simple. But it has no recovery mechanism, no compliance trail, and maxes out at ~500 agents. Sturna handles orchestration automatically, ships production-grade audit infrastructure, and scales to 1,000+ agents. Sturna costs $49/month; CrewAI is free. But Sturna ships reliable, auditable systems while CrewAI requires you to write orchestration code.

vs. AutoGen

AutoGen was deprecated October 2025. Sturna is the natural upgrade path.

vs. OpenAI Swarm (Minimalist Approach)

Swarm works with OpenAI models only and requires human-specified handoffs. It's practical for fewer than 5 agents with fixed handoffs. For multi-agent orchestration at scale with compliance requirements, Swarm is insufficient.

Market Opportunity

72% of Global 2000 organizations are deploying multi-agent systems (2025). The orchestration platform TAM is $8.2B over 3 years. Sturna's focus on compliance + observability positions it for the governance lane—the highest-margin segment.

Section 6: Conclusion & Call to Action

Regulated institutions—banks, wealth managers, insurance companies, healthcare systems—cannot deploy black-box AI at scale. Compliance, audit, and governance require transparency.

Sturna solves this through architecture, not bolted-on monitoring. Transparency is built in. Auditability is built in. Compliance is built in:

May 2026 Launch — Enterprise Pilot Program. We're looking for 10–15 institutional partners (RIAs, family offices, boutique asset managers) for early adoption. Cost: $49/mo + $0.0108 per intent average. For a typical RIA processing 5,000 intents/month: ~$103/month total.

To discuss Sturna for your institution, contact hello@sturna.ai. Include: your institution type, approximate intents/month, key compliance requirements, and current AI orchestration pain points. We'll schedule a technical overview and compliance architecture walkthrough.

Appendix: Technical Reference

Triple-Gate Catch Rates by Domain

DomainGate 1Gate 2Gate 3Combined
Email copy15.2%8.3%3.1%25.2%
Governance framework22.4%18.7%52.0%64.3%
GTM strategy11.2%9.1%12.7%28.4%
Tax planning18.9%14.2%7.3%36.0%
Risk modeling20.1%22.4%14.3%48.2%

Agent Tiers & Specialization

Governance Tier (5 agents): Compliance Audit, Cost Attribution, Audit Trail, SLA Enforcer, MCP Governance

Risk/Operations Tier (3 agents): Chaos Engineer, Conduit DevOps, Phantom Security

Enablement Tier (3 agents): Onboarding Wizard, Intent Debugger, Agent Benchmarker

Specialized Tier (8+ agents): InsForge Engineer, Financial Modeler, Cross-Agent Mediator, Policy Enforcer, Schema Migration, Cost Optimizer, Siphon Crawler, Artery Pipeline

Maintenance Tier (3 agents): Health Monitor, Versioning Agent, Marketplace Curator

Plus: 180+ support agents spanning social media, sales, content, research, and specialized finance domains.

Document Version: 1.0  ·  Date: May 3, 2026  ·  Classification: Public  ·  sturna.ai/how-it-works

Ready to Deploy Compliant AI?

Join the enterprise pilot program launching May 2026. Purpose-built for regulated institutions that can't afford black-box AI.

Start Free Trial Contact Enterprise Team →