Bulk up to 100 • Robots + Sitemap health

Sitemap vs Robots Checker

Paste domains or URLs — we’ll fetch /robots.txt, extract Sitemap: directives (or probe common sitemap URLs), and flag missing/broken sitemaps.

Paste up to 100 domains/URLs (one per line)
Tip: If robots contains User-agent: * + Disallow: /, crawlers are blocked. A sitemap can still exist, but discovery/indexing will suffer.

Results

Host Robots + Sitemap HTTP Hops Time Issues
Run a check to see results here.
“Final status” is the combined health of robots + sitemap.

Quick interpretation

Robots controls crawling. Sitemap helps discovery. Both matter.

  • OK robots fetches, no global block, sitemap resolves (200/3xx)
  • Warning robots missing, sitemap missing, odd type, no UA:*, global block
  • Error fetch errors, loops, HTTP 4xx/5xx on robots or sitemap
Crawl + discovery

Sitemap vs Robots Checker: spot indexing blockers fast

Robots.txt can accidentally block your whole site, while a sitemap helps search engines find your pages. This bulk tool checks both signals together and highlights risky patterns.

What we flag

  • Global block: User-agent: * + Disallow: /
  • No sitemap: no Sitemap: directive and common sitemaps missing
  • Broken sitemap: 4xx/5xx, fetch errors, redirect loops

FAQ

What’s the difference between robots.txt and sitemap.xml?

robots.txt controls crawl rules (allow/disallow) and can also point to sitemaps via Sitemap: directives. sitemap.xml lists URLs you want search engines to discover. They solve different problems, and you usually want both.

Is it a problem if robots.txt is missing?

Not always. Many sites work fine without robots.txt. But if you do have one, a bad rule can block crawling, so checking robots.txt is still important.

What does “global block” mean?

It usually means you have User-agent: * plus Disallow: /, which blocks crawlers from the whole site. A sitemap can still exist, but bots won’t crawl URLs if they’re disallowed.

My robots has no “Sitemap:” lines. Is that bad?

Not necessarily. Some sites don’t declare it. This tool probes common locations like /sitemap.xml and /sitemap_index.xml to reduce false “missing sitemap” results. Still, adding Sitemap: in robots.txt is a nice best practice.

Why do you show “odd content-type” for sitemap?

Sitemaps should usually be XML (or sometimes text). If the server returns HTML, it can be a soft error page (like a 200 OK “Not found” HTML page), or a WAF interstitial.

Why can sitemap be OK but indexing still slow?

A sitemap helps discovery, but indexing depends on quality, internal linking, canonical tags, server performance, duplicates, and crawl budget. This tool checks availability, not indexing outcomes.

Why do I see 403 / 429 / status 0?

403 and 429 are commonly caused by WAF/rate limits. Status 0 often means connection/TLS/DNS problems, or the server closed the request.

Can a sitemap be blocked by robots.txt?

The sitemap file itself can be blocked (by server rules) and URLs inside the sitemap can be disallowed. If crawlers are globally blocked, the sitemap won’t help much for crawling.

Does this tool check all sitemaps listed in robots.txt?

In this version, it checks the first candidate sitemap (from robots, or a common guess if none are declared). If you want “check all Sitemap: lines”, I can patch the code to loop through all of them and report per-sitemap.

What does the CSV export include?

It exports robots URL/final URL/status, key robots signals (UA:*, Disallow:/, sitemap count), the tested sitemap URL/final URL/status, content-type/size, and the final summary issues.