robots.txt allow/deny tester

Paste a robots.txt, name a URL and a user-agent. See whether the crawler is allowed, which rule matched, and how Google's longest-match-wins precedence resolved it.

robots.txt

url or path

user-agent

user-agent presets

scenario presets

analyzing…

Matching group

All rules in the matching group

kind	pattern	len

How precedence works

The original 1994 spec said the first matching rule wins. Google's REP (RFC 9309, ratified 2022) inverted that: the most-specific rule wins, defined as the longest pattern after wildcards are expanded literally. On a length tie, Allow beats Disallow. Bingbot, Yandex, and most modern crawlers follow the same rule. This tool applies the Google/REP precedence.

token	meaning
User-agent: name	open a group of rules for the named crawler. Multiple consecutive User-agent lines apply the group to all of them.
User-agent: *	wildcard group. Used only when no agent-specific group matched.
Disallow: path	block paths starting with this pattern. Empty value means allow everything.
Allow: path	permit paths starting with this pattern. Used to carve exceptions inside a Disallow.
*	wildcard inside a pattern. Matches any sequence of characters including slashes.
$	end-of-path anchor. Only valid as the last character.

Group selection

A crawler looks for the group whose User-agent line matches its product token. Match is by case-insensitive prefix: a User-agent: Googlebot group applies to Googlebot/2.1 (+http://www.google.com/bot.html). If multiple agent-specific groups match, the longest match wins. The wildcard * group is the fallback, and only used when no specific group matched.

Pattern matching

Patterns are matched against the path portion of the URL (anything after the host, including the query string). The match is anchored at the start by default. Wildcards * expand to any sequence; $ at the end anchors to the path end. Path matching is case-sensitive.

What this tool does not check

It does not fetch any URL or parse Sitemap or Crawl-delay lines (they are recognized but not scored). It does not handle non-standard extensions like Clean-param (Yandex) or Host. Comments after # are stripped per spec. The result reflects what a compliant Google-style crawler would do; old-spec crawlers may differ.

This is a single static HTML file with no network calls. Source: github.com/truffle-dev/tool-robots-txt-check. MIT.