Free. Validates per-bot grouping correctly.

Free robots.txt Builder

Q: Should I have a robots.txt file?

Not having one is the same as allowing everything, which is fine for most small business sites. You should have one if you want to block specific directories, declare your sitemap location, block specific bots like AI scrapers, or set crawl delays for large sites.

Q: What's the difference between Disallow and noindex?

Disallow in robots.txt tells compliant crawlers not to fetch a page, but the page can still appear in search results based on inbound links. Noindex is a meta tag on the page itself that tells search engines not to include it in their index. The page must be crawlable for the noindex tag to be seen.

Q: What happens if my robots.txt has a syntax error?

Most crawlers fail open and ignore unparseable lines, so a typo won't lock crawlers out, but might silently fail. Test your file with Google Search Console's robots.txt Tester to see what Googlebot interprets.

Q: Can I block AI scrapers like GPTBot?

You can block compliant ones. OpenAI's GPTBot, Google-Extended, anthropic-ai, ClaudeBot, ByteDance's Bytespider, Common Crawl's CCBot, and others honor robots.txt. Non-compliant scrapers will ignore your file. For comprehensive AI blocking, also use the X-Robots-Tag response header at the server level.

Q: Does Crawl-delay work for Google?

No. Google explicitly states they ignore the Crawl-delay directive. To control Googlebot's crawl rate, use Google Search Console's crawl rate settings. Bing, Yandex, and most other crawlers do honor Crawl-delay.

Build a robots.txt file for any website — with per-bot rule groups, sitemap support, common-mistake warnings, and instant preview. Avoid the silent rule-grouping bugs that most generators produce. Built by Datastrive, a Chicago managed IT and web hosting provider.

Live preview & download
Quick presets
Catches common mistakes

Start from:

User-agent rule groups

Sitemap declarations

Generated robots.txt

How robots.txt actually works (and why most generators get it wrong)

The single most misunderstood thing about robots.txt is rule grouping. Here’s the rule that catches everyone, including most online generators:

Each crawler reads only the most specific group that matches it If you write rules under User-agent: * and then add a User-agent: Googlebot block, Googlebot completely ignores the * rules. It only follows its own block. This is by design (RFC 9309), but it means a “block all bots, except Google” file needs to repeat the global rules inside the Googlebot block too — which most generators don’t do.
Disallow blocks crawling, not indexing A disallowed page can still appear in Google search results if other sites link to it — Google just won’t crawl the content. To keep a page out of search results entirely, use the noindex meta tag on the page itself, not robots.txt. Counterintuitive but important: blocking via robots.txt can actually prevent the noindex tag from being seen.
Order doesn’t matter, specificity does Google and most modern crawlers use the longest matching path rule, not the order they appear. Disallow: /admin/ followed by Allow: /admin/public/ does what you’d expect — allow the public subdirectory inside an otherwise blocked area. Older crawlers may use first-match-wins, which is why the Allow comes first in some recommendations.
Wildcards work, but only sometimes * (any characters) and $ (end of URL) are widely supported by Google, Bing, and most major crawlers, but not officially part of the original spec. Smaller bots may not honor them. Stick to * and $; avoid more complex regex-style patterns — they won’t work.
Sitemap declarations are global, not per-User-agent A Sitemap: line outside any User-agent group applies to all crawlers. Don’t put sitemap URLs inside a User-agent block — some crawlers won’t see them there. Put them at the top or bottom of the file, on their own.
robots.txt is a request, not a security boundary Compliant crawlers respect it. Malicious bots, security scanners, and AI scrapers that don’t respect robots.txt simply won’t. If you’re trying to keep something secret, password-protect it or block it at the server level. robots.txt is a polite “please don’t” — not a lock.
The file lives at the root, period It must be at https://example.com/robots.txt — not /blog/robots.txt, not /wp-content/robots.txt. Each subdomain needs its own (so blog.example.com and example.com have separate files). Many WordPress sites have a “virtual” robots.txt generated by WordPress itself; uploading a real file overrides it.

Frequently asked questions

Where do I upload my robots.txt file?

Upload it to your website’s root directory so it’s accessible at https://yourdomain.com/robots.txt. Common paths: /public_html/robots.txt, /var/www/html/robots.txt, or whatever your hosting calls the document root.

For WordPress sites, install a SEO plugin like Yoast or Rank Math — they let you edit robots.txt through the WordPress admin without touching files. If you upload a static file, it overrides WordPress’s virtual one.

Should I have a robots.txt file?

If you don’t have one, that’s the same as allowing everything — which is fine for most small business sites. You don’t need a robots.txt if you want crawlers to access everything.

You should have one if you want to: block specific directories (admin areas, search results pages, internal-only content), declare your sitemap location for crawlers to find, block specific bots (AI scrapers, aggressive crawlers), or set crawl delays for large sites that need to throttle bot traffic.

What’s the difference between Disallow and noindex?

Disallow in robots.txt tells compliant crawlers not to fetch a page. The page can still appear in search results based on links from other sites — Google just won’t have crawled the content to show a snippet.

noindex is a meta tag on the page itself that tells search engines not to include it in their index. The page must be crawlable for the noindex tag to be seen.

The common mistake: trying to remove a page from Google by adding Disallow, which actually prevents Google from re-crawling and seeing your noindex tag. To remove a page properly, allow crawling, use noindex, then optionally disallow later once it’s removed.

What happens if my robots.txt has a syntax error?

Most crawlers fail open — they ignore unparseable lines and continue. So a typo in one rule won’t lock crawlers out of your site, but it might silently fail to do what you intended.

Test your file with Google Search Console’s “robots.txt Tester” (under “Settings” → “robots.txt”). It shows you exactly what Googlebot sees and lets you test specific URLs. For other crawlers, look at their documentation — Bing has Bing Webmaster Tools, Yandex has its own webmaster console.

Can I block AI scrapers like GPTBot?

You can block compliant ones. OpenAI’s GPTBot, Google’s Google-Extended, Anthropic’s anthropic-ai and ClaudeBot, ByteDance’s Bytespider, Common Crawl’s CCBot, and others honor robots.txt. Adding User-agent: GPTBot with Disallow: / stops them from training on your content.

Non-compliant scrapers will ignore your file. If blocking AI training is important to you, consider also adding the X-Robots-Tag: noai, noimageai response header at the server level, and look at services like Cloudflare’s AI bot blocking or Reddit’s robots.txt as comprehensive examples. The “Block AI crawlers” preset above includes the major compliant ones.

Does Crawl-delay work for Google?

No. Google has explicitly stated they ignore the Crawl-delay directive. To control Googlebot’s crawl rate, use Google Search Console’s crawl rate settings instead.

Bing, Yandex, and most other crawlers do honor Crawl-delay. The directive is supported by this builder for those crawlers, but don’t rely on it for Google.

Need help with the rest of your SEO setup?

Datastrive helps Chicago-area businesses with WordPress, hosting, security, and search engine optimization. If robots.txt is one piece of a bigger SEO and infrastructure puzzle, that’s what we do.

Talk to Datastrive →

Free robots.txt Builder

How robots.txt actually works (and why most generators get it wrong)

Frequently asked questions

Need help with the rest of your SEO setup?

Solutions

Company

Free Tools

Service Areas

LinkedIn

Twitter

Facebook

Foundation

Identity & Devices

Testing & Awareness

Chicago

Northwest Suburbs

West Suburbs

Healthcare & Life Sciences

Professional Services

Industrial & Trade

Public & Mission-Driven

Consumer-Facing

Technology

Cloud Services

Industry Focus

Foundation

Identity & Devices

Testing & Awareness

Discover Our Story and Values.

Platform partnerships

Core Support

Infrastructure

Strategy & Projects

Industry Focus