Robots.txt, a file residing into the root directory of your website which gives directions to spiders and crawlers, is one of the most under appreciated factors in your SEO list. This file follows Robots Exclusion Standard also known as Robots Exclusion Protocol. It is a standard used by websites to communicate or direct web crawlers and spiders on whether to crawl a certain webpage or not.
According to Wikipedia : The standard specifies the instruction format to be used to inform the robot about which areas of the website should not be processed or scanned. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. Not all robots cooperate with the standard including email harvesters, spambots and malware robots that scan for security vulnerabilities. The standard is different from, but can be used in conjunction with, Sitemaps, a robot inclusion standard for websites.
Why Should You Care About Robots.txt?
- Improper usage of the robots.txt file can hurt your ranking
- The robots.txt file controls how search engine spiders see and interact with your webpages
- This file is mentioned in several of the Google guidelines
- This file, and the bots it interact with, are fundamental parts of how search engines work
What You Should Do First :-
- Check if you have a robots.txt file already.
- If yes, whether it’s blocking important files from crawlers and spiders.
- If no then Do you need it ?
Determining The Existence Of Robots.txt :-
To check whether a robots.txt file exists already or not, you just have to enter your url into the address bar and concatanate it with /robots.txt.
For Example :- wwww.technonerdz.org/robots.txt
Determining Robots.txt’s Effect On SEO :-
To determine whether your robots.txt is blocking important files which could help search engines rank your page, you can use this tool by FeedtheBot. The tool works mainly on Google’s guidelines for webmasters.
But to understand completely how robots.txt works you need to understand it’s contents by yourself.
Keep reading to learn whether your site needs a robots.txt file or not.
Need Of A Robots.txt file For You ?
There are many cases where a website doesn’t need a robots.txt file but including one doesn’t hurt anyone either. But if you are not sure whether your site needs it or not you refer to the following points and if any one of them stands true for you then you must have a robots.txt file.
- You want some of your content to be blocked from search engines and site crawlers.
- You want your underdeveloped but live site not to be indexed until it is fully developed.
- You want to block malicious bots from crawling your site and unnecessarily loading up your server.
- You need to give proper directions to bots for affiliate or paid links on your site.
- You need one or all of the above things.
In case you decide that you are better-off without a robots.txt file – it’s Ok but in that case bots with full access to your site and if you want to create this file you can follow the easy guidelines below.
How To Create Robots.txt For Your Site :-
Robots.txt is nothing but a text file in your sites root directory. To create one – just open a text editor and start typing the directives you want for the crawlers.
Directives :-
Allow Indexing Of Everything : If you want the spiders to crawl and index everything on your website add these rules to your robots.txt.
User-Agent: * Allow: /
Disallow Indexing Of Everything : To block the spiders from your site completely, you need to use these directives.
User-Agent: * Disallow: /
Disallow Indexing Of A Specific Folder : Add these directives to block just a specific folder on your site to the crawlers.
User-Agent: * Disallow: /folder/
Disallow Access To A Particular Bot : Sometimes you want to block access for a particular bot because of many reasons like content scraping, spamming or a bot with malicious activities.
User-Agent: Googlebot Disallow: /
Set Crawl Rate : Setting crawl rate means advising crawlers and spiders about the amount of traffic they can send to your site in a given amount of time. Note that it could make Google and other search engines reduce the frequency they visit your site.
User-agent: bingbot Crawl-delay: 10 where the 10 is in seconds.
Note that Google doesn’t provide support for crawl delay directly from the robots.txt but you can set crawl limit from the webmasters tool.
Conclusion :-
So now that you know how to create and use a robots.txt file, it’s up to you to implement it on your site. To get a nice further reading i must recommend this great article from SEOBOOK – Robots.txt Tutorial.
If this article helped you create a highly optimized robots.txt for your site or if you find it useful enough then don’t forget to share it among your peers.