Technical SEO: The Ultimate Beginners Guide to Robots.txt

By Dmytro Spilka

Jul 27, 2016

Dmytro Spilka

Robots.txt is a simple text file that you can use to control different search crawlers. For example, you can restrict search engine bots from accessing your entire website or individual parts/pages of the site.

It is vital for search marketers, web designers and developers to understand the basic principles and functions of the robots.txt file. Improper use of robots.txt file can have an adverse effect on your search rankings and the overall performance of the website. A simple mistake in robots.txt file can put all of your online/search marketing efforts in danger. Hence, it is important to understand fundamentals of how search engines work and how to configure robots.txt file.

Please Note: Not every website has robots.txt file set-up, and this doesn’t mean that your site is in danger. Not having robots.txt simply means you are not blocking any bots from accessing any of your files.

Common Robots.txt Set-Ups:

User-agent: *
Disallow:

Allows all search bots to access and crawl the entire website.

User-agent: *
Disallow: /

Block all search crawlers from accessing your site.

User-agent: *
Disallow: /wp-admin/

Standard set-up for websites that use WordPress CMS. We’re blocking /wp-admin/ as we don’t really want search bots to try and access our backend. The first implementation is also valid, but search bots may try to access your WP-Admin.

User-agent: *
Disallow: /myfolder/

Blocking search bots from gaining access to a specific folder on the site.

User-agent: *
Disallow: /myfile

Blocking search bots from accessing a particular file.

User-agent: Googlebot
Disallow: /

Blocking specific search bot (Google, in this case) from accessing the site.

User-agent: *
Disallow: /myfolder
Allow: /myfolder/myfile

Blocking bots from accessing “myfolder“, but still allowing to access “myfile“, even though it is located in “myfolder“, which is blocked.

User-agent: *
Disallow:
Sitemap: https://www.mydomain.com/sitemap.xml

Allowing all bots and adding a sitemap.xml file location to the robots.txt file. It’s recommended to add a line with the location of a sitemap.xml to your robots.txt file, just to make the job easier for search engines to crawl and index all of your pages.

Do you need Robots.txt file?

Remember what we’ve said above? If you don’t have robots.txt installed, that doesn’t mean you are in trouble. In fact, in many cases, you won’t even need one.

You may need to have robots.txt in following scenarios:

*You want to block all or some search bots from accessing and crawling your site
*You want to block all or some search bots from accessing some of your folders or files (e.g. /wp-admin/ folder)
*You are using paid advertising links or affiliate links
*You are developing a new website and do not want it to be accessed and crawled by bots yet

Check if you have a robots.txt file:

1. Type in your site address (e.g. www.mywebsite.com)
2. Add “/robots.txt” at the end of your web address so that it will look like this: www.mywebsite.com/robots.txt

If you don’t have a file there, your site will usually return a 404-page error.

Audit your robots.txt file:

If you do have a robots.txt file, please make sure that it doesn’t block what you don’t want to be blocked.
Most websites would want to allow bots, use examples above to understand whether your robots.txt is blocking search bots.

You can carry out robots.txt test within your Search Console (Google Webmaster Tools). More on that here.

How to add Robots.txt?

Robots.txt is a simple text file which you can create with a notepad and then upload it to your website files.

If you are using Yoast Plugin for your WordPress website, simply go to Yoast > Tools > File Editor

Edit your file and then click save. If you haven’t got it set-up, you can also do it with Yoast.

List of the most popular User-agents/search bots:

Search Engine	Name
Google	Googlebot
Googlebot News	Googlebot-News
Googlebot Images	Googlebot-Image
Googlebot Video	Googlebot-Video
Google Mobile	Googlebot-Mobile
Bing	Bingbot/MSNBot
Yandex	Yandex Bot
Baidu	Baiduspider
Ask.com	AskJeeves
Duck Duck Go	DuckDuckBot
Yahoo	Slurp

Dmytro Spilka

Head Wizard

Get free online marketing and blogging insights<br>Unsubscribe at any time<link href="//cdn-images.mailchimp.com/embedcode/horizontal-slim-10_7.css" rel="stylesheet" type="text/css"><style type="text/css"> #mc_embed_signup{background:transparent; clear:left; font:12px Helvetica,Arial,sans-serif; width:100%;} /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */</style><div id="mc_embed_signup"><form action="https://solvid.us13.list-manage.com/subscribe/post?u=5164852564d3ba7103743bc16&id=fa1473dcff" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate> <div id="mc_embed_signup_scroll">  <input type="email" value="" name="EMAIL" class="email" id="mce-EMAIL" placeholder="Sign up for regular updates" required><div class="clear"><input type="submit" value="" name="subscribe" id="mc-embedded-subscribe" class="button"></div>  <div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_5164852564d3ba7103743bc16_fa1473dcff" tabindex="-1" value=""></div> </div></form></div>

Join 1000s of

using Solvid.

234 customers signed up in the last 30 days

SEO Accreditation Large

Solvid is a creative SEO, Content and Digital PR agency. Solvid is a registered trademark of Solvi & Heirs LTD, registered in England and Wales. Registered Address: 6 St. Davids Square, London, England, E14 3WA

VAT: GB 326425708

Reg: 09697233

020 7072 8788