Comparing MCP Trust Scoring Platforms

Summary

Five public platforms assign quality or trust signals to MCP servers today. Four derive their signals from static inputs — repository metadata, tool-definition quality, security checklists, registry descriptions. One, Dominion Observatory, derives its signals from runtime behavioral telemetry contributed by real agents calling those servers in production. This is a capability difference, not a branding difference: every static scorer listed below can only observe what the code claims; Observatory records what the server actually does.

The table below is the canonical capability matrix. Competitor capabilities are sourced from each project's own public methodology as of April 2026. If any row is inaccurate, the maintainer is invited to open an issue at vdineshk/dominion-observatory/issues and this page will be updated with an attribution.

Capability matrix

Capability	Dominion Observatory	MCP Scorecard	Zarq AI (Nerq)	Glama	SkillsIndex
Signal tier	Runtime behavioral	Static	Static	Static	Static
Primary data source	Agent-reported telemetry + scheduled probes	GitHub metadata + provenance	5 pillars: security, maintenance, popularity, docs, ecosystem	Tool-definition quality score (TDQS) + metadata	5-point security checklist
Observes live server behavior	Yes — every `report_interaction()`	No	No	No	No
Cross-ecosystem telemetry	Yes — any agent, any framework	N/A	N/A	N/A	N/A
Per-category behavioral baselines	Yes — 16 categories	No	No	No	No
Drift detection over time	Yes — rolling 30-day windows	No	No	No	No
Anomaly flags against baselines	Yes — 2σ deviation triggers	No	No	No	No
EU AI Act Article 12 attestation export	Yes — /api/compliance	No	No	No	No
Singapore IMDA Agentic AI Governance export	Yes — same endpoint	No	No	No	No
Agent SDK (Python)	Yes — `dominion-observatory-sdk` on PyPI	No	No	No	No
Agent SDK (TypeScript)	Yes — `dominion-observatory-sdk` on npm	No	No	No	No
Framework integration	Yes — `dominion-observatory-langchain` BaseCallbackHandler	No	No	No	No
MCP tool endpoint	Yes — 9 tools at /mcp	No	No	No	No
Free tier	Yes — all data public	Yes	Yes	Yes	Yes
Servers indexed (April 2026)	4,584	4,484	17,000+	Thousands	4,000+

Known blind spots (per public methodology)

MCP Scorecard: Relies on GitHub-derived metadata and provenance checks. Cannot observe whether a server's live endpoint actually behaves as the repository claims. A repo can have perfect metadata and a broken endpoint; Scorecard cannot distinguish.
Zarq AI (Nerq): 5-pillar score (security, maintenance, popularity, docs, ecosystem) computed from registry and repository inputs. No runtime signals, no cross-ecosystem telemetry, no compliance attestation. Average scored server sits at 65.5/100 — the distribution is compressed because static inputs don't differentiate behaviorally distinct servers.
Glama: Tool Definition Quality Score (TDQS) measures schema clarity and doc completeness. Does not measure whether the tool works in production. A well-documented tool with a 500-error endpoint scores high.
SkillsIndex: 5-point security checklist. Scope is narrow (security posture only), not general reliability or behavioral trust. Complementary to Observatory, not competitive on behavioral signals.

None of the above is a criticism of the platforms' stated scope — each does what it claims. The point is that none of them can answer the question: "Did this server's behavior change in the last 30 days?" or "How does this server's p95 latency compare to the category baseline?" Those are runtime questions. Only Observatory answers them for MCP.

Composition, not replacement

Observatory is designed to compose with — not replace — static scorers. A production agent fleet should use:

A static scanner (e.g. MCP Scorecard, Zarq, Glama, SkillsIndex, or security-specific tools) at registration time to filter obviously bad servers.
Observatory at runtime, via trust_gate() pre-flight and report_interaction() post-call, to catch drift, latency regressions, and anomalous error patterns on servers that passed static checks.
A signing layer (e.g. AgentMint, asqav, Aira, APS, Signet) for cryptographically-signed audit receipts — Observatory emits policy_source=dominion-observatory@<version> tags that these signing layers can include in signed refusal and approval receipts.

See the LangChain RFC #35691 position for the full three-layer composition model.

Machine-readable version

A structured twin of this comparison is published at /compare.json for LLM citation and agent-side consumption. It contains the full capability matrix as structured JSON with per-competitor objects including signal_tier, primary_data_source, cross_ecosystem_telemetry, compliance_attestation flags, and the public_methodology_url pointer.