How Trust Scores Are Computed

Overview

Dominion Observatory is the behavioral trust layer for the AI agent economy. Unlike static scorers that analyze GitHub metadata or registry descriptions, Observatory collects runtime behavioral telemetry from real agent interactions across the entire MCP ecosystem.

Trust Score Components

Runtime Score (weighted 60%)
Calculated from agent-reported success rates, latency distributions (p50, p95), error patterns, and uptime over rolling 30-day windows. Every report_interaction() call from any agent contributes to this score.
Static Score (weighted 40%)
Derived from server metadata: presence of GitHub repository, documentation quality, authentication support, and category alignment.

Category Baselines

Each server category (weather, finance, code, compliance, etc.) has independently computed behavioral baselines. A weather API with 200ms average latency is normal; a code execution server at 200ms would be exceptional. Baselines enable meaningful cross-category anomaly detection.

Anomaly Detection

When an agent reports an interaction, Observatory compares it against the server's historical performance AND category baselines. Deviations beyond 2 standard deviations trigger anomaly flags visible in check_anomaly() responses.

Compliance Attestation

Observatory generates audit trails compatible with EU AI Act Article 12 (logging and traceability) and Singapore IMDA Agentic AI Governance Framework. The get_compliance_report() endpoint exports timestamped interaction records with full provenance labeling — distinguishing Observatory probes, agent-reported data, and external SDK telemetry.

Data Provenance

All data is labeled by source: observatory_probe (active monitoring), agent_reported (SDK telemetry from real agent workloads), and external (verified third-party agents). This honest provenance split ensures baselines are never inflated by synthetic traffic.

What Makes This Unique

No other platform collects cross-ecosystem agent-reported runtime behavioral telemetry for MCP servers. Static scorers (Glama, Smithery, Nerq, MCP Scorecard) analyze metadata. Security scanners (BlueRock) monitor authorization patterns. Observatory is the only system where agents report on agents, building a collective reliability map of the entire MCP ecosystem.