LBD #064: Understanding the robots file

The robots.txt file is your website’s rulebook for search engines. Learn it, control it, and boost your SEO game.

By the end of this issue, you will be able to:

  1. Understand how search engines access and index content
  2. Learn how to prioritize pages for effective SEO efforts
  3. Safeguarding critical information using robots.txt

Robots.txt file can seem to be a deceptively simple file but it plays a major role in gatekeeping your website, especially when you intend to safeguard your website. Robots file has instructions for search engines that can help you control the crawlability and the pages (and content) that appears on search.

With AI becoming common, many publications have blocked AI crawlers. This means the AI crawler cannot crawl the content and learn from it. Even today, less than 10% of all pages get any traffic from Google.

This means, nearly 90% of pages are never going to get any traffic search. It’s very likely that 90% of pages aren’t even indexed, let alone ranked. Just to put things in perspective, Google has roughly 400 billion pages in their index.

Robots files can cause critical pages of your business website not to show up on SERPs. You might misconfigure the file at the time of creating it. Unintentionally, you might have defined a disallow directive to block search engines from crawling certain sections of a website.

If not paid attention, especially while dealing with a staging site, you might even globally disallow the search engines to crawl.

To leverage the gatekeeping capabilities, I have brought to you, 3 key things you should know to correctly understand robots files at the fundamental level.

Step 1: Control crawling patterns

Robots.txt allows you to specify which pages or sections of your site should be crawled and indexed first, ensuring that critical content is prioritized by search engines.

You can use robots.txt to prevent search engines from crawling and indexing pages that you don’t want to appear in search results, such as admin pages, login pages, or duplicate content.

By controlling crawling patterns, you can prevent search engine bots from overloading your server with requests, which can lead to slow website performance or downtime.

Robots.txt also helps protect sensitive information by blocking search engine access to confidential sections of your site, like databases or private user profiles. Robot files can also help with dealing with duplicate pages.

Canonicalization is another aspect you can control using a robot file. Although the robots file doesn’t directly help with canonicalization, you can block certain pages from crawling by mentioning the instructions in the file.

Lastly, robot files can be useful for the strategic indexing of pages on your site. You can control the crawling patterns by prioritizing pages in the robots file.

Pro tip: You can even password protect your robots file to further control what you share with search engines

Step 2: Leveraging wildcards

Wildcards in the robot files are the levers that help you control the whole system. Wildcards are used for rules creation that become the instructions for the search engines. Messing up with the rules directly means messing up with the traffic from search engines.

If you’ve never created a robot file before, read this documentation on creating robots file from Google. This contains all the variables & wildcards that search engines understand. If you already have one, see the instructions here to update the robots file.

If your site has language variations, wildcards can simplify language-specific rule creation without specifying each language individually. Wildcards can also be used to manage specific file types across your site, ensuring that they are crawled or excluded according to your SEO strategy.

When you need to make global changes to your robots.txt directives, wildcards provide an efficient way to apply changes across your site. Lastly, wildcards can be used to optimize mobile user experiences by ensuring that search engines crawl and index mobile-friendly versions of your site effectively.

Note: Creating a robot file can be overwhelming for first-time founders. If you need any professional help, I can help you from ground 0 with such technicalities.

Step 3: Crawl delays

Crawl delays are critical for conserving server resources, ensuring they are available for user interactions and search engine crawls, even during peak traffic. Scheduling crawl windows during low-traffic periods guarantees a better user experience with faster website performance and load times during peak hours.

Furthermore, you can distribute server load evenly, preventing performance issues or downtime during high-traffic periods.

If your site undergoes regular maintenance, specifying crawl windows to coincide with these periods reduces the chances of users encountering errors or incomplete pages. For websites that frequently update content, crawl windows ensure that search engines are directed to your site when content is most up-to-date.

Crawl windows allow you to align search engine crawls with your SEO strategy, focusing on critical updates and content changes. Crawl-delay settings help manage the behavior of search engine bots, ensuring they do not overwhelm your server resources.

Depending on your target audience’s location, crawl windows can be adjusted to coincide with regional traffic peaks for optimized SEO. Crawl windows can be used to ensure that high-value pages receive priority indexing by search engines, enhancing your SEO efforts.

Note: Do not mess with robot files too much. If you want to prioritize certain pages, link those pages to the homepage rather than in the robots file. Here’s an official video from Google on the same.

Video Guide

Here’s everything you need to know about robots.txt in one place. Whenever you have some space-time, watch these videos that cover several FAQs related to robot files.

Today’s action steps →

  1. Find the robots file on your site and see what it looks like
  2. Checkout robots files on other sites and see what they are doing with robots file
  3. Reverse engineer using AI tools. Figure what the code in the robots file mean
  4. See if you can replicate the same for your robots file

SEO this week

  1. The crawl limiter tool in Google search console to retire. Read the official announcement
  2. Google’s Bard can now understand YouTube videos. No more watching the whole video to summarize. (The accuracy will always be questionable)
  3. Interesting conversation on which aspect of your site should you focus on
  4. You can now add structured data to your forums
  5. How SGE works? The patent files by Google reveals the process

Masters of SEO

  1. Leaked document shares how search algorithm works
  2. How Quality Rater Guidelines will evolve with AI?
  3. Here’s what has updated in Google’s QRG
  4. SEO without keywords? Here’s what the future looks like
  5. Monitoring backlinks using GA4 and Google Sheets

How can I help you?

I put a lot of effort into coming up with a single edition of this newsletter. I want to help you in every possible way. But I can do only so much by myself. I want you to tell me what you need help with. You can get in touch with me on LinkedIn, Twitter, or email to share your thoughts & questions that you want to be addressed. I’d be more than happy to help.


Whenever you’re ready to dominate SERPs, here’s how I can help:

  1. Sit with you 1-on-1 & create a content marketing strategy for your startup. Hire me for consulting
  2. Write blogs, social posts, and emails for you. Get in touch here with queries (Please mention you found this email in the newsletter to get noticed quickly)
  3. Join my tribe on Twitter & LinkedIn where I share SEO tips (every single day) & teaser of the next issue of Letters ByDavey