Comparing MCP Trust Scoring Platforms

Summary

Five public platforms assign quality or trust signals to MCP servers today. Four derive their signals from static inputs — repository metadata, tool-definition quality, security checklists, registry descriptions. One, Dominion Observatory, derives its signals from runtime behavioral telemetry contributed by real agents calling those servers in production. This is a capability difference, not a branding difference: every static scorer listed below can only observe what the code claims; Observatory records what the server actually does.

The table below is the canonical capability matrix. Competitor capabilities are sourced from each project's own public methodology as of April 2026. If any row is inaccurate, the maintainer is invited to open an issue at vdineshk/dominion-observatory/issues and this page will be updated with an attribution.

Capability matrix

CapabilityDominion ObservatoryMCP ScorecardZarq AI (Nerq)GlamaSkillsIndex
Signal tierRuntime behavioralStaticStaticStaticStatic
Primary data sourceAgent-reported telemetry + scheduled probesGitHub metadata + provenance5 pillars: security, maintenance, popularity, docs, ecosystemTool-definition quality score (TDQS) + metadata5-point security checklist
Observes live server behaviorYes — every report_interaction()NoNoNoNo
Cross-ecosystem telemetryYes — any agent, any frameworkN/AN/AN/AN/A
Per-category behavioral baselinesYes — 16 categoriesNoNoNoNo
Drift detection over timeYes — rolling 30-day windowsNoNoNoNo
Anomaly flags against baselinesYes — 2σ deviation triggersNoNoNoNo
EU AI Act Article 12 attestation exportYes — /api/complianceNoNoNoNo
Singapore IMDA Agentic AI Governance exportYes — same endpointNoNoNoNo
Agent SDK (Python)Yes — dominion-observatory-sdk on PyPINoNoNoNo
Agent SDK (TypeScript)Yes — dominion-observatory-sdk on npmNoNoNoNo
Framework integrationYes — dominion-observatory-langchain BaseCallbackHandlerNoNoNoNo
MCP tool endpointYes — 9 tools at /mcpNoNoNoNo
Free tierYes — all data publicYesYesYesYes
Servers indexed (April 2026)4,5844,48417,000+Thousands4,000+

Known blind spots (per public methodology)

MCP Scorecard
Relies on GitHub-derived metadata and provenance checks. Cannot observe whether a server's live endpoint actually behaves as the repository claims. A repo can have perfect metadata and a broken endpoint; Scorecard cannot distinguish.
Zarq AI (Nerq)
5-pillar score (security, maintenance, popularity, docs, ecosystem) computed from registry and repository inputs. No runtime signals, no cross-ecosystem telemetry, no compliance attestation. Average scored server sits at 65.5/100 — the distribution is compressed because static inputs don't differentiate behaviorally distinct servers.
Glama
Tool Definition Quality Score (TDQS) measures schema clarity and doc completeness. Does not measure whether the tool works in production. A well-documented tool with a 500-error endpoint scores high.
SkillsIndex
5-point security checklist. Scope is narrow (security posture only), not general reliability or behavioral trust. Complementary to Observatory, not competitive on behavioral signals.

None of the above is a criticism of the platforms' stated scope — each does what it claims. The point is that none of them can answer the question: "Did this server's behavior change in the last 30 days?" or "How does this server's p95 latency compare to the category baseline?" Those are runtime questions. Only Observatory answers them for MCP.

Composition, not replacement

Observatory is designed to compose with — not replace — static scorers. A production agent fleet should use:

See the LangChain RFC #35691 position for the full three-layer composition model.

Machine-readable version

A structured twin of this comparison is published at /compare.json for LLM citation and agent-side consumption. It contains the full capability matrix as structured JSON with per-competitor objects including signal_tier, primary_data_source, cross_ecosystem_telemetry, compliance_attestation flags, and the public_methodology_url pointer.

Change log