Report Schema
The scan report is a structured JSON document organized for action. Understand what each section means so you can turn the data into specific SEO improvements.
Report sections
Standard and Deep customer_v1 reports share the core sections below; the structured_data_benchmarks, structured_data_coverage, and serp_speed_benchmark rows are Deep-only, and the originality row appears on Standard and Deep scans when the analysis completes. The schema_version field is always onpage-report-customer-v1.
metaReport date, target keyword, location, and URL.
on_page_optimizationFinal On-Page Optimization Score, grade, numeric confidence, summary, and high-level focus areas.
benchmarksPage-1 averages vs your page, side by side.
entity_coverageEntity and term coverage, including related-entity density versus competitors — the core of most optimization work.
topic_and_classificationTopic classification, swipe content, and authority questions.
internal_linkingSource pages that should link to the analyzed URL.
competitor_term_coverageTerm-by-term comparison against ranking competitors.
structured_data_benchmarksStandard and Deep scans only. Page-1 structured data totals vs your page, side by side.
structured_data_coverageStandard and Deep scans only. Compact schema type prevalence across the accepted ranking competitor cohort.
serp_speed_benchmarkDeep scans only. Self-hosted page-experience benchmark of the target URL against the top 3 organic competitor URLs in the same SERP. Optional — Deep responses include the field when the benchmark payload was produced; when present, `status` indicates whether the run completed (`ok`) or was short-circuited (`disabled`, `skipped_*`, `timeout`). Lite and Standard never include it.
originalityStandard and Deep scans only. Embedding-based originality and information-gain analysis of the page against the live ranking cohort. Optional — present when the analysis completed; omitted (never null) when it didn't run (too few competitor pages, too little scoreable text, or time budget exhausted). Lite scans never include it. See the detailed field table below.
How to use the report for SEO
The report is most useful as a prioritization engine. Don't try to act on everything at once — focus on the highest-impact gaps first.
Optimization score — start with the benchmark
on_page_optimization gives the final score, numeric confidence, and focus areas for the page. Use it as the top-level benchmark before drilling into the supporting sections.
Entity coverage — find missing topics
Look at natural_language_entities with coverage_status of "missing". Start with the highest-importance terms. Edit existing sentences to include them rather than adding new paragraphs.
Related entity density — judge depth, not just presence
Compare your_url_related_entity_density_score with competitor_related_entity_density_score to see whether important related entities appear with enough depth for the page length.
Authority questions — close topical gaps
The who/what/where/how questions show what angles your page should address. Add the ones that fit your page intent — don't force irrelevant angles.
Internal linking — strengthen your page
add_internal_links_from gives you specific source pages and anchor text suggestions. These are pages on YOUR site that should link to the analyzed page.
Competitor coverage — benchmark yourself
Compare domains and terms to see which topics the top-ranking pages all cover that you don't. Patterns across multiple competitors are stronger signals than individual outliers.
Benchmarks — quantify the gap
page1_average vs your_url gives you numerical scores across the same metrics. Use this to prioritize which metrics have the largest gaps.
Structured data — compare schema coverage
structured_data_benchmarks and structured_data_coverage show structured-data totals and compact schema.org type prevalence across the accepted ranking competitor cohort.
SERP speed (Deep only) — compare page experience
serp_speed_benchmark.target vs serp_speed_benchmark.competitors shows LCP, CLS, approximate TBT, and TTFB side-by-side with the top 3 organic competitors in the same SERP (FCP is captured at the per-probe level too). Recommend page-experience fixes only where the target is materially worse than the competitor median — skip ties and per-probe statuses other than `ok`.
Originality (Standard/Deep) — add information, not just coverage
Rewrite duplicative_passages first — each one names the competitor domain it matches. Cover content_most_competitors_have briefly (it's table stakes), answer an uncovered topic question that fits the page intent, and close the unique_data_points gap with data only you have. Re-scan to verify the score moved.
The originality section
Standard and Deep scans run an embedding-based originality and information-gain analysis: the page's sentences are compared against the pooled sentences of the live ranking cohort, classified as original, shared, or duplicative, and the gaps are extracted as actionable evidence. The section is optional — when the analysis can't run (too few competitor pages fetched, too little scoreable text, or the time budget is exhausted) the field is omitted entirely, never emitted as null. Lite scans never include it. You can try it on any URL with the free Information Gain Checker.
| Field | Type | Description |
|---|---|---|
score | integer 0–100 | Information Gain Score: the percentage of the page's scored sentences with no close semantic equivalent anywhere in the ranking cohort. |
grade | string enum | One of "Highly original" (score 70+), "Moderately original" (40–69), or "Mostly shared" (below 40). |
sentence_analysis | object | Sentence bucket counts: original (no close cohort equivalent), shared (similar ground covered), duplicative (near-equivalent exists), and total_scored. |
duplicative_passages[] | array (up to 3) | The page's sentences that most closely match a competitor: snippet plus the matched_domain it matched. First candidates for a rewrite. |
information_gain.content_most_competitors_have[] | array (up to 5) | Content clusters shared across multiple competitor domains that the page does not cover: snippet plus competitor_count. |
information_gain.potential_uncovered_topics_for_information_gain[] | string array (up to 5) | Topic questions almost no competitor answers and the page doesn't answer either — whitespace where added information is cheapest. |
information_gain.unique_data_points | object | Numeric data points unique to the page: your_count, page1_average (number or null), and examples (up to 8 tokens). |
chart | string | null | Preformatted text chart (score gauge + sentence buckets) for terminal/agent display. Render as-is in monospace. |
Example payload (illustrative values — no real domains):
"originality": {
"score": 62,
"grade": "Moderately original",
"sentence_analysis": {
"original": 58,
"shared": 29,
"duplicative": 7,
"total_scored": 94
},
"duplicative_passages": [
{
"snippet": "An opening definition that restates the consensus answer…",
"matched_domain": "competitor-example.com"
}
],
"information_gain": {
"content_most_competitors_have": [
{
"snippet": "A basic checklist most ranking pages include…",
"competitor_count": 6
}
],
"potential_uncovered_topics_for_information_gain": [
"What does this cost at different team sizes?",
"How do results differ for non-English content?"
],
"unique_data_points": {
"your_count": 4,
"page1_average": 9.5,
"examples": ["37%", "4.2:1", "120ms"]
}
},
"chart": "Originality: 62/100 Grade: Moderately original\n…"
}Legacy field mapping
If you're migrating from the older report format, here's how the section names map to the current schema.
| Legacy name | Current path |
|---|---|
| Date / TargetKeyword / Location / URL | meta |
| OnPageOptimizationScore | on_page_optimization |
| Page1AverageVsYourUrl | benchmarks |
| NaturalLanguageAnalysis | entity_coverage.natural_language_entities |
| HighlyRelatedWords | entity_coverage.highly_related_terms |
| KeywordVariations | entity_coverage.keyword_variations |
| RelatedCategory | entity_coverage.related_category_entities |
| SpecificCategoryEntities | entity_coverage.specific_category_entities |
| PageClassification | topic_and_classification.page_classification |
| SwipeContent | topic_and_classification.swipe_content |
| TopicalAuthorityQuestions | topic_and_classification.topical_authority_questions |
| Internal Link Recommendations | internal_linking |
| CompetitorAnalysis | competitor_term_coverage |
| StructuredDataBenchmarks | structured_data_benchmarks |
| StructuredDataCoverage | structured_data_coverage |
| Originality | originality |
Machine-readable schemas
Two JSON Schemas are published as MCP resources, one per tier:
schema://customer-report-v1— full shape for Standard and Deep scans (response_format=customer_v1). Carries benchmarks, entity coverage including related-entity density, topic and classification, internal linking, and competitor term coverage. Standard and Deep scans may also carry the originality section, and Deep scans the structured data benchmark and coverage sections.schema://customer-report-v1-lite— reduced shape for Lite scans (response_format=customer_v1_lite). Keeps benchmarks, entity coverage including related-entity density (natural language entities, highly related terms, keyword variations only), and competitor term coverage. Omits topic_and_classification and internal_linking entirely — those are not computed for Lite scans — and never emits originality.
The service picks the right schema automatically based on the job's depth. Explicitly requesting customer_v1 on a Lite job (or customer_v1_lite on a Standard/Deep job) returns a 409 UNSUPPORTED_FORMAT — omit the parameter or use the matching value.