Technical SEO: The Ultimate Beginners Guide to Robots.txt
July 27th, 2016. Posted by Solvid.
Robots.txt is a simple text file that you can use to control different search crawlers. For example, you can restrict search engine bots from accessing your entire website or individual parts/pages of the site.
It is vital for search marketers, web designers and developers to understand the basic principles and functions of the robots.txt file. Improper use of robots.txt file can have an adverse effect on your search rankings and the overall performance of the website. A simple mistake in robots.txt file can put all of your online/search marketing efforts in danger. Hence, it is important to understand fundamentals of how search engines work and how to configure robots.txt file.
Please Note: Not every website has robots.txt file set-up, and this doesn’t mean that your site is in danger. Not having robots.txt simply means you are not blocking any bots from accessing any of your files.
Common Robots.txt Set-Ups:
User-agent: * Disallow: Allows all search bots to access and crawl the entire website.
User-agent: * Disallow: / Block all search crawlers from accessing your site.
User-agent: * Disallow: /wp-admin/ Standard set-up for websites that use WordPress CMS. We’re blocking /wp-admin/ as we don’t really want search bots to try and access our backend. The first implementation is also valid, but search bots may try to access your WP-Admin.
User-agent: * Disallow: /myfolder/ Blocking search bots from gaining access to a specific folder on the site.
User-agent: * Disallow: /myfile Blocking search bots from accessing a particular file.
User-agent: Googlebot Disallow: / Blocking specific search bot (Google, in this case) from accessing the site.
User-agent: * Disallow: /myfolder Allow: /myfolder/myfile Blocking bots from accessing “myfolder”, but still allowing to access “myfile”, even though it is located in “myfolder”, which is blocked.
User-agent: * Disallow: Sitemap: https://www.mydomain.com/sitemap.xml Allowing all bots and adding a sitemap.xml file location to the robots.txt file. It’s recommended to add a line with the location of a sitemap.xml to your robots.txt file, just to make the job easier for search engines to crawl and index all of your pages.
Do you need Robots.txt file?
Remember what we’ve said above? If you don’t have robots.txt installed, that doesn’t mean you are in trouble. In fact, in many cases, you won’t even need one.
You may need to have robots.txt in following scenarios:
*You want to block all or some search bots from accessing and crawling your site *You want to block all or some search bots from accessing some of your folders or files (e.g. /wp-admin/ folder) *You are using paid advertising links or affiliate links *You are developing a new website and do not want it to be accessed and crawled by bots yet
Check if you have a robots.txt file:
1. Type in your site address (e.g. www.mywebsite.com) 2. Add “/robots.txt” at the end of your web address so that it will look like this: www.mywebsite.com/robots.txt
If you don’t have a file there, your site will usually return a 404-page error.
Audit your robots.txt file:
If you do have a robots.txt file, please make sure that it doesn’t block what you don’t want to be blocked. Most websites would want to allow bots, use examples above to understand whether your robots.txt is blocking search bots.
You can carry out robots.txt test within your Search Console (Google Webmaster Tools). More on that here.
How to add Robots.txt?
Robots.txt is a simple text file which you can create with a notepad and then upload it to your website files.
Here is a video on how to add Robots.txt with CPanel:
If you are using Yoast Plugin for your WordPress website, simply go to Yoast > Tools > File Editor
Edit your file and then click save. If you haven’t got it set-up, you can also do it with Yoast.
List of the most popular User-agents/search bots:
Duck Duck Go
Check backlinks, website traffic, organic keywords, keyword traffic & difficulty, CPC and more.