html.lang-attr
<html> element is missing a lang attribute
Fix: Add lang="en" (or appropriate language code) to the <html> element
<html lang="en">
Highest score-gain-per-minute fixes for this page.
html.lang-attr
<html> element is missing a lang attribute
Fix: Add lang="en" (or appropriate language code) to the <html> element
<html lang="en">
sd.twitter-card.present
No twitter:card meta tag found
Fix: Add <meta name="twitter:card" content="summary_large_image">
<meta name="twitter:card" content="summary_large_image">
crawl.canonical.matches
Cannot check canonical match — no canonical tag present
Fix: Add <link rel="canonical" href="..."> matching the final URL
<link rel="canonical" href="https://expressjs.com">
crawl.canonical.present
No <link rel="canonical"> found in <head>
Fix: Add <link rel="canonical" href="..."> to the <head>
<link rel="canonical" href="https://expressjs.com">
sd.title-meta.quality
Issues: title missing; meta description missing
Fix: Title should be 10–60 chars; meta description should be 50–160 chars
<title>Your page title</title>
<meta name="description" content="One-sentence summary of what this page is — 50–160 chars, written for both humans and LLM agents.">
crawl.ai-plugin-manifest.present
~60m
score: 0
/.well-known/ai-plugin.json not found
Fix: Publish /.well-known/ai-plugin.json to declare your ChatGPT/OpenAI plugin metadata
Why: /.well-known/ai-plugin.json was defined by OpenAI for ChatGPT plugins. Many agentic systems still look for this file to auto-discover capabilities — it's effectively the /robots.txt of agent APIs.
// /.well-known/ai-plugin.json
{
"schema_version": "v1",
"name_for_human": "Your Product",
"name_for_model": "your_product",
"description_for_model": "API for Your Product. Use to answer questions about this product.",
"api": {
"type": "openapi",
"url": "https://expressjs.com/openapi.json"
}
}crawl.auth-wall
~0m
score: 100
No auth-wall signals detected
Why: Auth-walled pages require login before crawlers can read content. Flagged for awareness — most AI agents can't authenticate and will see only the login form.
crawl.bot-block
~60m
score: 100
No bot-blocking signals detected
Why: WAF challenges that don't recognize legitimate AI bots silently lock you out of LLM training/inference.
crawl.canonical.matches
~3m
score: 0
Cannot check canonical match — no canonical tag present
Fix: Add <link rel="canonical" href="..."> matching the final URL
Why: Mismatched canonical sends agents to a different URL than the one they fetched.
<link rel="canonical" href="https://expressjs.com">crawl.canonical.present
~3m
score: 0
No <link rel="canonical"> found in <head>
Fix: Add <link rel="canonical" href="..."> to the <head>
Why: LLM crawlers use canonical to dedupe and pick the authoritative URL.
<link rel="canonical" href="https://expressjs.com">crawl.hreflang-alternates
~30m
score: 100
No hreflang alternate links found
Why: hreflang tells agents which language/region variant is canonical for each locale, reducing duplicate-content signals across international site variations.
crawl.http-status
~30m
score: 100
Primary fetch returned HTTP 200
Why: Non-2xx responses tell agents the page doesn't exist or is broken — score capped at 0.
crawl.js-divergence
~120m
score: 100
Static and browser bodies within 0% — no significant JS divergence
Why: When static HTML and browser-rendered HTML diverge heavily, AI crawlers that can't execute JS only see the shell — structured data, headings, and content are invisible.
crawl.llms-txt.present
~30m
score: 100
llms.txt is present
Why: /llms.txt is the emerging standard for site-level guidance to LLM agents.
# https://expressjs.com/llms.txt
# (https://llmstxt.org)
# Your page title
> One-paragraph site summary aimed at an LLM agent.
## Docs
- [Getting Started](https://expressjs.com/docs)
## Optional
- [Changelog](https://expressjs.com/changelog)crawl.mcp-manifest.present
~60m
score: 0
/.well-known/mcp.json not found
Fix: Publish /.well-known/mcp.json to declare MCP tool endpoints for AI agent discovery
Why: /.well-known/mcp.json is the emerging discovery endpoint for MCP (Model Context Protocol) tool servers. AI agents that support MCP look here first before prompting a user to configure a server manually.
// /.well-known/mcp.json
{
"version": "1.0",
"name": "Your Product",
"tools": [
{
"name": "search",
"url": "https://expressjs.com/mcp/search"
}
]
}crawl.redirect-chain
~15m
score: 100
Redirect chain length: 0
Why: Long redirect chains burn crawl budget and can lose query/state across hops.
crawl.robots.bot-allow.applebot-extended
~5m
score: 100
applebot-extended is allowed in robots.txt
Why: Explicitly allowing AI crawlers in robots.txt removes the ambiguity that defaults to 'block' in some agent stacks.
# in robots.txt
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /crawl.robots.bot-allow.bytespider
~5m
score: 100
bytespider is allowed in robots.txt
Why: Explicitly allowing AI crawlers in robots.txt removes the ambiguity that defaults to 'block' in some agent stacks.
# in robots.txt
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /crawl.robots.bot-allow.ccbot
~5m
score: 100
ccbot is allowed in robots.txt
Why: Explicitly allowing AI crawlers in robots.txt removes the ambiguity that defaults to 'block' in some agent stacks.
# in robots.txt
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /crawl.robots.bot-allow.claudebot
~5m
score: 100
claudebot is allowed in robots.txt
Why: Explicitly allowing AI crawlers in robots.txt removes the ambiguity that defaults to 'block' in some agent stacks.
# in robots.txt
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /crawl.robots.bot-allow.google-extended
~5m
score: 100
google-extended is allowed in robots.txt
Why: Explicitly allowing AI crawlers in robots.txt removes the ambiguity that defaults to 'block' in some agent stacks.
# in robots.txt
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /crawl.robots.bot-allow.gptbot
~5m
score: 100
gptbot is allowed in robots.txt
Why: Explicitly allowing AI crawlers in robots.txt removes the ambiguity that defaults to 'block' in some agent stacks.
# in robots.txt
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /crawl.robots.bot-allow.perplexitybot
~5m
score: 100
perplexitybot is allowed in robots.txt
Why: Explicitly allowing AI crawlers in robots.txt removes the ambiguity that defaults to 'block' in some agent stacks.
# in robots.txt
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /crawl.robots.present
~5m
score: 100
robots.txt is present
Why: robots.txt is the first file every crawler checks.
# https://expressjs.com/robots.txt
User-agent: *
Allow: /
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
Sitemap: https://expressjs.com/sitemap.xmlcrawl.sitemap.discoverable
~20m
score: 50
Sitemap referenced in robots but file returned 4xx — deploy it
Fix: Deploy a valid sitemap.xml at the referenced URL
Why: Without a sitemap, multi-page agents can't enumerate your site reliably.
# Add to robots.txt:
Sitemap: https://expressjs.com/sitemap.xmlcrawl.sitemap.parseable
~30m
score: 0
Sitemap has 0 parseable URLs or is absent
Fix: Ensure sitemap.xml is valid XML with at least one <loc>
Why: Empty/invalid sitemaps look the same to a crawler as none at all.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://expressjs.com/</loc></url>
<!-- add one <url><loc>…</loc></url> per public page -->
</urlset>crawl.x-robots-header
~10m
score: 100
No X-Robots-Tag or robots meta tag detected
Why: X-Robots-Tag: noindex tells crawlers (and AI agents) to skip the page. noai and none are increasingly used to opt out of AI training — verify this is intentional.
sd.jsonld.count
~15m
score: 0
No JSON-LD blocks found
Fix: Add at least one <script type="application/ld+json"> block
Why: JSON-LD is how LLMs reliably extract entity facts (org, products, articles) without parsing prose.
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Your page title",
"url": "https://expressjs.com"
}
</script>sd.jsonld.entity-types
~20m
score: 0
No valid JSON-LD blocks to check for entity types
Fix: Add Organization, WebSite, or Article/Product/Service JSON-LD
Why: Generic JSON-LD without recognised @type doesn't disambiguate the entity for agents.
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "WebSite",
"name": "Your page title",
"url": "https://expressjs.com"
}
</script>sd.jsonld.valid
~10m
score: 100
No JSON-LD blocks to validate (skipped)
Why: Invalid JSON-LD silently breaks structured-data parsers — they just skip the block.
sd.opengraph.complete
~10m
score: 0
Open Graph missing: title, description, type, image, url
Fix: Add missing og: meta tags: og:title, og:description, og:type, og:image, og:url
Why: OpenGraph drives link previews everywhere (Slack, iMessage, agent UIs that render share cards).
<meta property="og:title" content="Your page title">
<meta property="og:description" content="One-sentence summary of what this page is — 50–160 chars, written for both humans and LLM agents.">
<meta property="og:type" content="website">
<meta property="og:url" content="https://expressjs.com">
<meta property="og:image" content="https://expressjs.com/og-image.png">sd.title-meta.quality
~5m
score: 0
Issues: title missing; meta description missing
Fix: Title should be 10–60 chars; meta description should be 50–160 chars
Why: <title> + meta description are the snippet agents quote when summarising or citing your page.
<title>Your page title</title>
<meta name="description" content="One-sentence summary of what this page is — 50–160 chars, written for both humans and LLM agents.">sd.twitter-card.present
~2m
score: 0
No twitter:card meta tag found
Fix: Add <meta name="twitter:card" content="summary_large_image">
Why: twitter:card extends OpenGraph with X/Twitter-specific layout hints.
<meta name="twitter:card" content="summary_large_image">html.heading-hierarchy
~30m
score: 0
No heading elements found
Fix: Add an <h1> and a logical heading hierarchy
Why: Headings give agents the document outline. No h1 = no clear topic.
<!-- Document outline -->
<h1>Page topic in one sentence</h1>
<section>
<h2>Section heading</h2>
<h3>Sub-section</h3>
</section>html.img-alt-coverage
~30m
score: 100
no <img> elements found, rule skipped
Why: Alt text is how agents (and screen readers) caption images. Missing alt = invisible content.
<img src="/diagram.png" alt="Concise description of what the diagram shows">html.landmarks
~30m
score: 0
No semantic landmark elements found
Fix: Add missing landmark elements: main, nav, header, footer, article
Why: Landmark elements let agents skip nav/footer chrome and zero in on the main content.
<header>
<nav>…site nav…</nav>
</header>
<main>
<article>…page content…</article>
</main>
<footer>…site footer…</footer>html.lang-attr
~1m
score: 0
<html> element is missing a lang attribute
Fix: Add lang="en" (or appropriate language code) to the <html> element
Why: Without lang, agents can't decide whether to translate or which language model to apply.
<html lang="en">html.link-text-quality
~20m
score: 100
no <a> elements found, rule skipped
Why: "click here" / "learn more" tells agents nothing about the destination. Use descriptive link text.
content.action-surface
~0m
score: 100
Action surface: 0 entry points — 0 form(s), 0 button(s), 0 mailto, 0 tel, 0 ARIA button(s)
Why: Forms, CTA buttons, and contact links are conversion entry points. Agent-driven buyers (using AI to evaluate vendors) look for these signals when deciding whether a site is a real business.
content.boilerplate-ratio
~90m
score: 0
Readability extraction returned null — cannot compute boilerplate ratio
Fix: Ensure page has extractable body content
Why: Lots of nav/footer chrome relative to page body dilutes the signal in agent excerpts.
content.readability-extract
~60m
score: 0
Readability extraction returned null — page may be empty or unparseable
Fix: Ensure the page has substantive body text (≥250 chars) that readability can extract
Why: If readability extraction returns null, agents likely can't isolate the article body.
content.readability-signal
~0m
score: 100
Mozilla Readability returned null — page is not extractable as an article
Why: Mozilla Readability is the same library Firefox Reader View uses; if it can't extract your content, neither can most third-party article scrapers.
content.script-density
~60m
score: 100
1 script/iframe elements (1 <script>, 0 cross-origin <iframe>)
Why: Heavy client-side script density is a red flag that content depends on JS to render.
content.signal-noise
~90m
score: 0
Signal-noise ratio: 0.0% (0 text bytes / 561 total bytes)
Fix: Reduce script/style bloat or increase substantive text content
Why: High script/style-to-text ratio buries the actual content agents are trying to extract.
det.bot-cloaking
~60m
score: 100
gptbot and browser fetches returned identical content
Why: Different content to GPTBot vs a browser is the textbook cloaking pattern. Search engines flag this.
det.fetch-stability
~60m
score: 100
Two luxfaber fetches returned identical body hash
Why: If the same URL serves different bytes on every fetch, no agent can cache/cite reliably.
det.ua-cloaking
~60m
score: 100
luxfaber and browser fetches returned identical content
Why: Serving different content to a Luxfaber UA vs a browser UA = SEO/agent cloaking risk.