Free robots.txt Builder
Build a robots.txt file for any website — with per-bot rule groups, sitemap support, common-mistake warnings, and instant preview. Avoid the silent rule-grouping bugs that most generators produce. Built by Datastrive, a Chicago managed IT and web hosting provider.
- Live preview & download
- Quick presets
- Catches common mistakes
How robots.txt actually works (and why most generators get it wrong)
The single most misunderstood thing about robots.txt is rule grouping. Here’s the rule that catches everyone, including most online generators:
-
Each crawler reads only the most specific group that matches it
If you write rules under
User-agent: *and then add aUser-agent: Googlebotblock, Googlebot completely ignores the*rules. It only follows its own block. This is by design (RFC 9309), but it means a “block all bots, except Google” file needs to repeat the global rules inside the Googlebot block too — which most generators don’t do. -
Disallow blocks crawling, not indexing
A disallowed page can still appear in Google search results if other sites link to it — Google just won’t crawl the content. To keep a page out of search results entirely, use the
noindexmeta tag on the page itself, notrobots.txt. Counterintuitive but important: blocking viarobots.txtcan actually prevent the noindex tag from being seen. -
Order doesn’t matter, specificity does
Google and most modern crawlers use the longest matching path rule, not the order they appear.
Disallow: /admin/followed byAllow: /admin/public/does what you’d expect — allow the public subdirectory inside an otherwise blocked area. Older crawlers may use first-match-wins, which is why the Allow comes first in some recommendations. -
Wildcards work, but only sometimes
*(any characters) and$(end of URL) are widely supported by Google, Bing, and most major crawlers, but not officially part of the original spec. Smaller bots may not honor them. Stick to*and$; avoid more complex regex-style patterns — they won’t work. -
Sitemap declarations are global, not per-User-agent
A
Sitemap:line outside any User-agent group applies to all crawlers. Don’t put sitemap URLs inside a User-agent block — some crawlers won’t see them there. Put them at the top or bottom of the file, on their own. -
robots.txt is a request, not a security boundary
Compliant crawlers respect it. Malicious bots, security scanners, and AI scrapers that don’t respect
robots.txtsimply won’t. If you’re trying to keep something secret, password-protect it or block it at the server level.robots.txtis a polite “please don’t” — not a lock. -
The file lives at the root, period
It must be at
https://example.com/robots.txt— not/blog/robots.txt, not/wp-content/robots.txt. Each subdomain needs its own (soblog.example.comandexample.comhave separate files). Many WordPress sites have a “virtual”robots.txtgenerated by WordPress itself; uploading a real file overrides it.
Frequently asked questions
Where do I upload my robots.txt file?
Upload it to your website’s root directory so it’s accessible at https://yourdomain.com/robots.txt. Common paths: /public_html/robots.txt, /var/www/html/robots.txt, or whatever your hosting calls the document root.
For WordPress sites, install a SEO plugin like Yoast or Rank Math — they let you edit robots.txt through the WordPress admin without touching files. If you upload a static file, it overrides WordPress’s virtual one.
Should I have a robots.txt file?
If you don’t have one, that’s the same as allowing everything — which is fine for most small business sites. You don’t need a robots.txt if you want crawlers to access everything.
You should have one if you want to: block specific directories (admin areas, search results pages, internal-only content), declare your sitemap location for crawlers to find, block specific bots (AI scrapers, aggressive crawlers), or set crawl delays for large sites that need to throttle bot traffic.
What’s the difference between Disallow and noindex?
Disallow in robots.txt tells compliant crawlers not to fetch a page. The page can still appear in search results based on links from other sites — Google just won’t have crawled the content to show a snippet.
noindex is a meta tag on the page itself that tells search engines not to include it in their index. The page must be crawlable for the noindex tag to be seen.
The common mistake: trying to remove a page from Google by adding Disallow, which actually prevents Google from re-crawling and seeing your noindex tag. To remove a page properly, allow crawling, use noindex, then optionally disallow later once it’s removed.
What happens if my robots.txt has a syntax error?
Most crawlers fail open — they ignore unparseable lines and continue. So a typo in one rule won’t lock crawlers out of your site, but it might silently fail to do what you intended.
Test your file with Google Search Console’s “robots.txt Tester” (under “Settings” → “robots.txt”). It shows you exactly what Googlebot sees and lets you test specific URLs. For other crawlers, look at their documentation — Bing has Bing Webmaster Tools, Yandex has its own webmaster console.
Can I block AI scrapers like GPTBot?
You can block compliant ones. OpenAI’s GPTBot, Google’s Google-Extended, Anthropic’s anthropic-ai and ClaudeBot, ByteDance’s Bytespider, Common Crawl’s CCBot, and others honor robots.txt. Adding User-agent: GPTBot with Disallow: / stops them from training on your content.
Non-compliant scrapers will ignore your file. If blocking AI training is important to you, consider also adding the X-Robots-Tag: noai, noimageai response header at the server level, and look at services like Cloudflare’s AI bot blocking or Reddit’s robots.txt as comprehensive examples. The “Block AI crawlers” preset above includes the major compliant ones.
Does Crawl-delay work for Google?
No. Google has explicitly stated they ignore the Crawl-delay directive. To control Googlebot’s crawl rate, use Google Search Console’s crawl rate settings instead.
Bing, Yandex, and most other crawlers do honor Crawl-delay. The directive is supported by this builder for those crawlers, but don’t rely on it for Google.
Need help with the rest of your SEO setup?
Datastrive helps Chicago-area businesses with WordPress, hosting, security, and search engine optimization. If robots.txt is one piece of a bigger SEO and infrastructure puzzle, that’s what we do.