Report Schema

The scan report is a structured JSON document organized for action. Understand what each section means so you can turn the data into specific SEO improvements.

Report sections

Standard and Deep customer_v1 reports share the core sections below; the structured_data_benchmarks, structured_data_coverage, and serp_speed_benchmark rows are Deep-only, and the originality row appears on Standard and Deep scans when the analysis completes. The schema_version field is always onpage-report-customer-v1.

meta

Report date, target keyword, location, and URL.

report_datetarget_keywordlocationurl
on_page_optimization

Final On-Page Optimization Score, grade, numeric confidence, summary, and high-level focus areas.

scoregradeconfidencesummaryfocus_areas
benchmarks

Page-1 averages vs your page, side by side.

page1_averageyour_url
entity_coverage

Entity and term coverage, including related-entity density versus competitors — the core of most optimization work.

your_url_related_entity_density_scorecompetitor_related_entity_density_scorenatural_language_entitieshighly_related_termskeyword_variationsrelated_category_entitiesspecific_category_entities
topic_and_classification

Topic classification, swipe content, and authority questions.

page_classificationswipe_contenttopical_authority_questions
internal_linking

Source pages that should link to the analyzed URL.

add_internal_links_fromto_your_url
competitor_term_coverage

Term-by-term comparison against ranking competitors.

domainsterms
structured_data_benchmarks

Standard and Deep scans only. Page-1 structured data totals vs your page, side by side.

page1_averageyour_url
structured_data_coverage

Standard and Deep scans only. Compact schema type prevalence across the accepted ranking competitor cohort.

competitor_counttop_competitor_countschema_types
serp_speed_benchmark

Deep scans only. Self-hosted page-experience benchmark of the target URL against the top 3 organic competitor URLs in the same SERP. Optional — Deep responses include the field when the benchmark payload was produced; when present, `status` indicates whether the run completed (`ok`) or was short-circuited (`disabled`, `skipped_*`, `timeout`). Lite and Standard never include it.

statuswhat_this_measuresmeasurement_typedevice_profilenetwork_profilecpu_profilebenchmark_versionweb_vitals_versiontargetcompetitors
originality

Standard and Deep scans only. Embedding-based originality and information-gain analysis of the page against the live ranking cohort. Optional — present when the analysis completed; omitted (never null) when it didn't run (too few competitor pages, too little scoreable text, or time budget exhausted). Lite scans never include it. See the detailed field table below.

scoregradesentence_analysisduplicative_passagesinformation_gainchart

How to use the report for SEO

The report is most useful as a prioritization engine. Don't try to act on everything at once — focus on the highest-impact gaps first.

Optimization score — start with the benchmark

on_page_optimization gives the final score, numeric confidence, and focus areas for the page. Use it as the top-level benchmark before drilling into the supporting sections.

Entity coverage — find missing topics

Look at natural_language_entities with coverage_status of "missing". Start with the highest-importance terms. Edit existing sentences to include them rather than adding new paragraphs.

Related entity density — judge depth, not just presence

Compare your_url_related_entity_density_score with competitor_related_entity_density_score to see whether important related entities appear with enough depth for the page length.

Authority questions — close topical gaps

The who/what/where/how questions show what angles your page should address. Add the ones that fit your page intent — don't force irrelevant angles.

Internal linking — strengthen your page

add_internal_links_from gives you specific source pages and anchor text suggestions. These are pages on YOUR site that should link to the analyzed page.

Competitor coverage — benchmark yourself

Compare domains and terms to see which topics the top-ranking pages all cover that you don't. Patterns across multiple competitors are stronger signals than individual outliers.

Benchmarks — quantify the gap

page1_average vs your_url gives you numerical scores across the same metrics. Use this to prioritize which metrics have the largest gaps.

Structured data — compare schema coverage

structured_data_benchmarks and structured_data_coverage show structured-data totals and compact schema.org type prevalence across the accepted ranking competitor cohort.

SERP speed (Deep only) — compare page experience

serp_speed_benchmark.target vs serp_speed_benchmark.competitors shows LCP, CLS, approximate TBT, and TTFB side-by-side with the top 3 organic competitors in the same SERP (FCP is captured at the per-probe level too). Recommend page-experience fixes only where the target is materially worse than the competitor median — skip ties and per-probe statuses other than `ok`.

Originality (Standard/Deep) — add information, not just coverage

Rewrite duplicative_passages first — each one names the competitor domain it matches. Cover content_most_competitors_have briefly (it's table stakes), answer an uncovered topic question that fits the page intent, and close the unique_data_points gap with data only you have. Re-scan to verify the score moved.

The originality section

Standard and Deep scans run an embedding-based originality and information-gain analysis: the page's sentences are compared against the pooled sentences of the live ranking cohort, classified as original, shared, or duplicative, and the gaps are extracted as actionable evidence. The section is optional — when the analysis can't run (too few competitor pages fetched, too little scoreable text, or the time budget is exhausted) the field is omitted entirely, never emitted as null. Lite scans never include it. You can try it on any URL with the free Information Gain Checker.

FieldTypeDescription
scoreinteger 0–100Information Gain Score: the percentage of the page's scored sentences with no close semantic equivalent anywhere in the ranking cohort.
gradestring enumOne of "Highly original" (score 70+), "Moderately original" (40–69), or "Mostly shared" (below 40).
sentence_analysisobjectSentence bucket counts: original (no close cohort equivalent), shared (similar ground covered), duplicative (near-equivalent exists), and total_scored.
duplicative_passages[]array (up to 3)The page's sentences that most closely match a competitor: snippet plus the matched_domain it matched. First candidates for a rewrite.
information_gain.content_most_competitors_have[]array (up to 5)Content clusters shared across multiple competitor domains that the page does not cover: snippet plus competitor_count.
information_gain.potential_uncovered_topics_for_information_gain[]string array (up to 5)Topic questions almost no competitor answers and the page doesn't answer either — whitespace where added information is cheapest.
information_gain.unique_data_pointsobjectNumeric data points unique to the page: your_count, page1_average (number or null), and examples (up to 8 tokens).
chartstring | nullPreformatted text chart (score gauge + sentence buckets) for terminal/agent display. Render as-is in monospace.

Example payload (illustrative values — no real domains):

"originality": {
  "score": 62,
  "grade": "Moderately original",
  "sentence_analysis": {
    "original": 58,
    "shared": 29,
    "duplicative": 7,
    "total_scored": 94
  },
  "duplicative_passages": [
    {
      "snippet": "An opening definition that restates the consensus answer…",
      "matched_domain": "competitor-example.com"
    }
  ],
  "information_gain": {
    "content_most_competitors_have": [
      {
        "snippet": "A basic checklist most ranking pages include…",
        "competitor_count": 6
      }
    ],
    "potential_uncovered_topics_for_information_gain": [
      "What does this cost at different team sizes?",
      "How do results differ for non-English content?"
    ],
    "unique_data_points": {
      "your_count": 4,
      "page1_average": 9.5,
      "examples": ["37%", "4.2:1", "120ms"]
    }
  },
  "chart": "Originality: 62/100  Grade: Moderately original\n…"
}

Legacy field mapping

If you're migrating from the older report format, here's how the section names map to the current schema.

Legacy nameCurrent path
Date / TargetKeyword / Location / URLmeta
OnPageOptimizationScoreon_page_optimization
Page1AverageVsYourUrlbenchmarks
NaturalLanguageAnalysisentity_coverage.natural_language_entities
HighlyRelatedWordsentity_coverage.highly_related_terms
KeywordVariationsentity_coverage.keyword_variations
RelatedCategoryentity_coverage.related_category_entities
SpecificCategoryEntitiesentity_coverage.specific_category_entities
PageClassificationtopic_and_classification.page_classification
SwipeContenttopic_and_classification.swipe_content
TopicalAuthorityQuestionstopic_and_classification.topical_authority_questions
Internal Link Recommendationsinternal_linking
CompetitorAnalysiscompetitor_term_coverage
StructuredDataBenchmarksstructured_data_benchmarks
StructuredDataCoveragestructured_data_coverage
Originalityoriginality

Machine-readable schemas

Two JSON Schemas are published as MCP resources, one per tier:

  • schema://customer-report-v1 — full shape for Standard and Deep scans (response_format=customer_v1). Carries benchmarks, entity coverage including related-entity density, topic and classification, internal linking, and competitor term coverage. Standard and Deep scans may also carry the originality section, and Deep scans the structured data benchmark and coverage sections.
  • schema://customer-report-v1-lite — reduced shape for Lite scans (response_format=customer_v1_lite). Keeps benchmarks, entity coverage including related-entity density (natural language entities, highly related terms, keyword variations only), and competitor term coverage. Omits topic_and_classification and internal_linking entirely — those are not computed for Lite scans — and never emits originality.

The service picks the right schema automatically based on the job's depth. Explicitly requesting customer_v1 on a Lite job (or customer_v1_lite on a Standard/Deep job) returns a 409 UNSUPPORTED_FORMAT — omit the parameter or use the matching value.