Technical SEO: The Ultimate Beginners Guide to Robots.txt

 

Dmytro Spilka

Robots.txt is a simple text file that you can use to control different search crawlers. For example, you can restrict search engine bots from accessing your entire website or individual parts/pages of the site.

It is vital for search marketers, web designers and developers to understand the basic principles and functions of the robots.txt file. Improper use of robots.txt file can have an adverse effect on your search rankings and the overall performance of the website. A simple mistake in robots.txt file can put all of your online/search marketing efforts in danger. Hence, it is important to understand fundamentals of how search engines work and how to configure robots.txt file.

Please Note: Not every website has robots.txt file set-up, and this doesn’t mean that your site is in danger. Not having robots.txt simply means you are not blocking any bots from accessing any of your files.

Common Robots.txt Set-Ups:

User-agent: *
Disallow:
Allows all search bots to access and crawl the entire website.
User-agent: *
Disallow: /
Block all search crawlers from accessing your site.
User-agent: *
Disallow: /wp-admin/
Standard set-up for websites that use WordPress CMS. We’re blocking /wp-admin/ as we don’t really want search bots to try and access our backend. The first implementation is also valid, but search bots may try to access your WP-Admin.
User-agent: *
Disallow: /myfolder/
Blocking search bots from gaining access to a specific folder on the site.
User-agent: *
Disallow: /myfile
Blocking search bots from accessing a particular file.
User-agent: Googlebot
Disallow: /
Blocking specific search bot (Google, in this case) from accessing the site.
User-agent: *
Disallow: /myfolder
Allow: /myfolder/myfile
Blocking bots from accessing “myfolder“, but still allowing to access “myfile“, even though it is located in “myfolder“, which is blocked.
User-agent: *
Disallow:
Sitemap: https://www.mydomain.com/sitemap.xml
Allowing all bots and adding a sitemap.xml file location to the robots.txt file. It’s recommended to add a line with the location of a sitemap.xml to your robots.txt file, just to make the job easier for search engines to crawl and index all of your pages.

Do you need Robots.txt file?

Remember what we’ve said above? If you don’t have robots.txt installed, that doesn’t mean you are in trouble. In fact, in many cases, you won’t even need one.

You may need to have robots.txt in following scenarios:

*You want to block all or some search bots from accessing and crawling your site
*You want to block all or some search bots from accessing some of your folders or files (e.g. /wp-admin/ folder)
*You are using paid advertising links or affiliate links
*You are developing a new website and do not want it to be accessed and crawled by bots yet

Check if you have a robots.txt file:

1. Type in your site address (e.g. www.mywebsite.com)
2. Add “/robots.txt” at the end of your web address so that it will look like this: www.mywebsite.com/robots.txt

If you don’t have a file there, your site will usually return a 404-page error.

Audit your robots.txt file:

If you do have a robots.txt file, please make sure that it doesn’t block what you don’t want to be blocked.
Most websites would want to allow bots, use examples above to understand whether your robots.txt is blocking search bots.

You can carry out robots.txt test within your Search Console (Google Webmaster Tools). More on that here.

How to add Robots.txt?

Robots.txt is a simple text file which you can create with a notepad and then upload it to your website files.

If you are using Yoast Plugin for your WordPress website, simply go to Yoast > Tools > File Editor

Edit your file and then click save. If you haven’t got it set-up, you can also do it with Yoast.

List of the most popular User-agents/search bots:

Search Engine Name
Google Googlebot
Googlebot News Googlebot-News
Googlebot Images Googlebot-Image
Googlebot Video Googlebot-Video
Google Mobile Googlebot-Mobile
Bing Bingbot/MSNBot
Yandex Yandex Bot
Baidu Baiduspider
Ask.com AskJeeves
Duck Duck Go DuckDuckBot
Yahoo Slurp

Dmytro Spilka

Head Wizard

Get free online marketing and blogging insights
Unsubscribe at any time

Previous

Next

Cookies!

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By using this website, you agree to the Cookie Policy & Privacy Policy. If you want to know more or withdraw your consent, please refer to the cookie policy & privacy policy, or use our cookie settings page.