AgentGrade Methodology

Methodology principles

Practical over aspirational
Evidence over sales claims
Product usability over vague readiness language
Actionability over long reports

Scores should be based on the surfaces that matter in real agent workflows: discovering capabilities, authenticating, taking safe actions, handling failures, and reacting to changes. The goal is not to produce a perfect theoretical rating. The goal is to make a manual assessment legible and actionable.

Scoring categories

Category	What it checks
API readiness	Whether the API exposes the workflow with predictable inputs, outputs, and machine-usable behavior.
Auth friction	Whether access can be granted and maintained without brittle manual workarounds.
Action safety	Whether write paths include safeguards, reversibility, or risk-reduction patterns.
Docs clarity	Whether an agent builder can implement from the docs and examples you reviewed without hidden assumptions.
Webhook / event support	Whether the product exposes useful state changes for reactive workflows.
Sandbox / demo availability	Whether there is a safe place to test before touching production.
Rate-limit transparency	Whether limits and throttling behavior are clear enough for agent adaptation.
Error recovery	Whether structured errors and retry-friendly responses help agents recover.
MCP readiness	Whether the product shows a practical path into emerging tool-based agent ecosystems.

Confidence matters

Every score should be read together with confidence.

High confidence: based on direct evidence such as current docs, hands-on testing, or clear product references.
Medium confidence: based on useful but incomplete evidence, such as partial docs or limited testing.
Low confidence: based on thin, indirect, or outdated public signal.

A low-confidence high score should not be treated as a strong result.

How to read the result

A strong score usually means your evidence suggests the product exposes the workflow through a usable API, access can be granted cleanly, agents can act with clear boundaries, docs reduce ambiguity, and event and test support make automation more reliable.

A weak score usually means part of the workflow is still human-only, authentication is hard to automate safely, there are few safeguards around writes, docs leave too many unknowns, or testing and operational controls are weak.

AgentGrade is informational. It is not a legal opinion, security certification, compliance certification, or guarantee of production safety. It also does not claim to have independently reviewed the submitted product unless a separate human review process exists outside this page.

AgentGrade helps you assess whether an AI agent can realistically use your product.

Every score should be read together with confidence.