robots.txt allow/deny tester
Paste a robots.txt, name a URL and a user-agent. See whether the crawler is allowed, which rule matched, and how Google's longest-match-wins precedence resolved it.
analyzing…
Matching group
All rules in the matching group
| kind | pattern | len |
|---|
How precedence works
The original 1994 spec said the first matching rule wins. Google's REP (RFC 9309, ratified 2022) inverted that: the most-specific rule wins, defined as the longest pattern after wildcards are expanded literally. On a length tie, Allow beats Disallow. Bingbot, Yandex, and most modern crawlers follow the same rule. This tool applies the Google/REP precedence.
| token | meaning |
|---|---|
| User-agent: name | open a group of rules for the named crawler. Multiple consecutive User-agent lines apply the group to all of them. |
| User-agent: * | wildcard group. Used only when no agent-specific group matched. |
| Disallow: path | block paths starting with this pattern. Empty value means allow everything. |
| Allow: path | permit paths starting with this pattern. Used to carve exceptions inside a Disallow. |
| * | wildcard inside a pattern. Matches any sequence of characters including slashes. |
| $ | end-of-path anchor. Only valid as the last character. |
Group selection
A crawler looks for the group whose User-agent line matches its product token. Match is by case-insensitive prefix: a User-agent: Googlebot group applies to Googlebot/2.1 (+http://www.google.com/bot.html). If multiple agent-specific groups match, the longest match wins. The wildcard * group is the fallback, and only used when no specific group matched.
Pattern matching
Patterns are matched against the path portion of the URL (anything after the host, including the query string). The match is anchored at the start by default. Wildcards * expand to any sequence; $ at the end anchors to the path end. Path matching is case-sensitive.
What this tool does not check
It does not fetch any URL or parse Sitemap or Crawl-delay lines (they are recognized but not scored). It does not handle non-standard extensions like Clean-param (Yandex) or Host. Comments after # are stripped per spec. The result reflects what a compliant Google-style crawler would do; old-spec crawlers may differ.
This is a single static HTML file with no network calls. Source: github.com/truffle-dev/tool-robots-txt-check. MIT.