What are Robots.txt files
What are Robots.txt files
What are Robots.txt files ?
Robots.txt file is a text file (yea, the ones on notepad) that resides on your server, and controls a whole lot of features on your website (whatever platform it is built on). It's a simple text file in which there are a few lines of text, but it's very powerful that it can even decide whether your website should be shown on Google or not, what part of your website should be shown to the search engines (like Google, Yahoo and MSN).
What is a Robots.txt file technically?
In order to understand what robots.txt files are, you have to first understand what a Robot (the web one) is.
A robot - is technically a program from search engines like Google, Yahoo and MSN that are set out on the internet to do the job of finding out new websites, indexing them and gathering the right information about the website. They are sometime called "spiders", "crawlers" and even "bots".

Image by Elliance
Where do the Robots come from?
Robots are commonly set out by search engines like Google, Yahoo, MSN, Altavista, Ask.com and others. Mainly, these are web servers of the search engines, that are on the constant look out of information on the internet. And they gather information (which ultimately goes to the search engines index) by visiting new websites, gathering up new information from them, following links and calculating and analyzing a whole lot of information from them.
What do Robots do?
Robots mainly performs four types of tasks.
Not all the search engine robots are the same, some advanced ones like that of Google are known to do more complex tasks like categorizing websites, analyzing their search engine metrics, popularity ratio etc, but generally all the robots perform the above tasks.

'The Life of a Google Robot' image by Google
What does a Robots.txt file do?
Robots.txt file gives commands to the visiting robots (on the website) to help them index and collect relevant information about the website.
It's more like the helpdesk, which will give all information, guidance and help to the visitors at an event about how to reach the venue, important places, time schedule, map etc.
The commands on the robots.txt file is completely configurable by the webmaster.
Using the right commands, a webmaster can decide everything related to search engines like what search engines are allowed into the website, what is the information available to them, what are the documents that are not available for the search engines and even pass information like how often are pages added to the website and how often should the robots visit them.
Where to spot the Robots.txt file?
The Robots.txt file is located at the root folder of your website. This is most often the _public-html or the http-docs folder. Root folder means the top most directory on the website that is accessible to the public.
It is critical to place the Robots.txt file in the root folder. Placing it elsewhere will not make it functional.
Why is Robots.txt file and Robots important to a webmaster?
Well, for a webmaster Robots.txt should be important because, it helps ensure better indexing of their websites, which means more information passed to search engines and thereby better search engine ranks for them.
It is possible for the webmaster to decide how their websites should be crawled, indexed and ranked by the search engines by the use of well-written Robots.txt files. So, it gives them complete (well almost) control over how a search engine "sees" their websites, which is very crucial.
How does a Robots.txt file look like?
This is a screenshot of two Robots.txt files. One (Screenshot A) has a minimal command usage, while the second (Screenshot B) has a deeper and complex command usage.
(Screenshot A)

(Screenshot B)
If you like to see more Robots.txt files, just type in the domain name followed by the /robots.txt filename in the browser of any website you like, and if they are using a Robots.txt file, then it would show up.(Ex: www.google.com/robots.txt, www.yahoo.com/robots.txt)
How and what can you control using Robots.txt?
You can control how search engines "crawl" your website by using the right commands on Robots.txt. For learning more about commands, let's first take a look at what the general syntax's are.
Robots.txt syntax and commands.
This command on the Robots.txt file specifies the general specification for the search engine robot.
For example:
User-agent: Google (Means robots/crawlers from Google)
User-agent: Ask (Means robots/crawlers from Ask.com)
On the Robots.txt file, you can specify each user agent specifically, or invoke/address them generally by using the asterisk command.
User-agent: * (Means all search engine robots/crawlers)
2. Allow/Disallow:
This command specifies the condition where it instructs a user agent to crawl/not crawl certain parts/all parts (as specified with the command) of the website.
You can specify the directories within the website to be crawled/not crawled using the command.
For example:
User-agent: *
Disallow: / (Means all the robots are not allowed to crawl everything that comes under the root folder, which is the entire website)
User-agent: *
Disallow: /temp/ (Means all the robots are not allowed to crawl the folder named "temp", while other parts are allowed to crawl)
How to set up a Robots.txt file?
Setting up a Robots.txt file can be tricky if you don't know the basic commands, so make sure you have studied the basics well before proceeding to set up a Robots.txt file.
Step 1
Open a new text document on your machine.
Step 2
In it, type these text, accurately.
(This means that all user agents are allowed to crawl your entire website.)
Save it as "Robots.txt"
Step 3
Go to your server by accessing the file manager or the FTP, and go to the root folder. ( normally _public-html or http-docs or find out your's from your host.)
Step 4
Upload the "Robots.txt" file to the root folder.
Your Robots.txt file is now set up successfully. Note that we have given the command to allow allow all search engine robots to crawl the entire site without any restriction. If you would like to selectively disallow/block certain files/folders to be crawled, follow the commands below.
1. Exclude a file from an individual search engine.
User-agent: Google
Disallow: /thepathtoyourfile.html
Replace "Google" with your search engine preference and replace "thepathtoyourfile.html" with the actual path to your file. If you would like to block more than one file, you have to repeat this command (second line) with specific file names.
Ex: Disallow: /file1.html
Disallow: /file2.html
2. Exclude a section of your site from all spiders and bots
User-agent: *
Disallow: /1/2/dir-to-be-blocked/
Replace "dir-to-be-blocked" with the actual path to your directory that is to be blocked.
3. Allow all spiders to index everything
User-agent: *
Disallow:
OR
Leave the Robots.txt blank without any commands.
4. Allow no spiders to index any part of your site
User-agent: *
Disallow: /
This ensures that no spider would index anything at all on your site.
Free Robots.txt file generators
There are quite a number of free online Robots.txt file generators. Here is a list of few.
1. Mcanerin This tool lets you select search engine robots selectively that you'd like to block and create a Robots.txt file which you just need to copy paste.

2. Global promoter Robots.txt generator - Excellent tool that helps you generate a Robots.txt file with the help of a wizard.

Summary
Essentially, Robots.txt is an excellent tool to control how search engines scan your website, and gather up information from them. The more complex and careful you plan your website design, the better your search engine positions would be. But many websites simply ignore this and leave everything to the search engines to decide. Is that a good thing to do? I'd say it all depends on how you want it, if you think "not showing" a folders content to Google will avoid un necessary information being passed to it, and you know exactly how you can accomplish it, then why not use the options?
What are Robotstxt files - To learn more about this author, visit Jeff Foster's Website.
Like this article? Share it with your friends
Through this tutorial we'll see what a robots.txt file is, how can you make one, what are it's uses and how you can use it on your website if you have one. Please follow this link if you would like to hire an internet consultant from WebBizIdeas.com.
What are Robots.txt files ?
Robots.txt file is a text file (yea, the ones on notepad) that resides on your server, and controls a whole lot of features on your website (whatever platform it is built on). It's a simple text file in which there are a few lines of text, but it's very powerful that it can even decide whether your website should be shown on Google or not, what part of your website should be shown to the search engines (like Google, Yahoo and MSN).
What is a Robots.txt file technically?
In order to understand what robots.txt files are, you have to first understand what a Robot (the web one) is.
A robot - is technically a program from search engines like Google, Yahoo and MSN that are set out on the internet to do the job of finding out new websites, indexing them and gathering the right information about the website. They are sometime called "spiders", "crawlers" and even "bots".

Image by Elliance
Where do the Robots come from?
Robots are commonly set out by search engines like Google, Yahoo, MSN, Altavista, Ask.com and others. Mainly, these are web servers of the search engines, that are on the constant look out of information on the internet. And they gather information (which ultimately goes to the search engines index) by visiting new websites, gathering up new information from them, following links and calculating and analyzing a whole lot of information from them.
What do Robots do?
Robots mainly performs four types of tasks.
- Site Indexing - Which is more like taking a copy of a new website it finds and storing it in some location at the search engines servers. This is accomplished by scanning the documents on a website and mirroring them to temporary servers.
- Validates the site code - Which is more like comparing the website code to W3C standards and grading them according to accuracy.
- Link Checks - Which includes tracing all possible links (incoming and outgoing) from indexed websites, and calculating the sites grading factors such as authority, relevance etc.
Not all the search engine robots are the same, some advanced ones like that of Google are known to do more complex tasks like categorizing websites, analyzing their search engine metrics, popularity ratio etc, but generally all the robots perform the above tasks.

'The Life of a Google Robot' image by Google
What does a Robots.txt file do?
Robots.txt file gives commands to the visiting robots (on the website) to help them index and collect relevant information about the website.
It's more like the helpdesk, which will give all information, guidance and help to the visitors at an event about how to reach the venue, important places, time schedule, map etc.
The commands on the robots.txt file is completely configurable by the webmaster.
Using the right commands, a webmaster can decide everything related to search engines like what search engines are allowed into the website, what is the information available to them, what are the documents that are not available for the search engines and even pass information like how often are pages added to the website and how often should the robots visit them.
Where to spot the Robots.txt file?
The Robots.txt file is located at the root folder of your website. This is most often the _public-html or the http-docs folder. Root folder means the top most directory on the website that is accessible to the public.
It is critical to place the Robots.txt file in the root folder. Placing it elsewhere will not make it functional.
Why is Robots.txt file and Robots important to a webmaster?
Well, for a webmaster Robots.txt should be important because, it helps ensure better indexing of their websites, which means more information passed to search engines and thereby better search engine ranks for them.
It is possible for the webmaster to decide how their websites should be crawled, indexed and ranked by the search engines by the use of well-written Robots.txt files. So, it gives them complete (well almost) control over how a search engine "sees" their websites, which is very crucial.
How does a Robots.txt file look like?
This is a screenshot of two Robots.txt files. One (Screenshot A) has a minimal command usage, while the second (Screenshot B) has a deeper and complex command usage.
(Screenshot A)
(Screenshot B)
If you like to see more Robots.txt files, just type in the domain name followed by the /robots.txt filename in the browser of any website you like, and if they are using a Robots.txt file, then it would show up.(Ex: www.google.com/robots.txt, www.yahoo.com/robots.txt)
How and what can you control using Robots.txt?
You can control how search engines "crawl" your website by using the right commands on Robots.txt. For learning more about commands, let's first take a look at what the general syntax's are.
Robots.txt syntax and commands.
1. User-agent:
This command on the Robots.txt file specifies the general specification for the search engine robot.
For example:
User-agent: Google (Means robots/crawlers from Google)
User-agent: Ask (Means robots/crawlers from Ask.com)
On the Robots.txt file, you can specify each user agent specifically, or invoke/address them generally by using the asterisk command.
User-agent: * (Means all search engine robots/crawlers)
2. Allow/Disallow:
This command specifies the condition where it instructs a user agent to crawl/not crawl certain parts/all parts (as specified with the command) of the website.
You can specify the directories within the website to be crawled/not crawled using the command.
For example:
User-agent: *
Disallow: / (Means all the robots are not allowed to crawl everything that comes under the root folder, which is the entire website)
User-agent: *
Disallow: /temp/ (Means all the robots are not allowed to crawl the folder named "temp", while other parts are allowed to crawl)
How to set up a Robots.txt file?
Setting up a Robots.txt file can be tricky if you don't know the basic commands, so make sure you have studied the basics well before proceeding to set up a Robots.txt file.
Step 1
Open a new text document on your machine.
Step 2
In it, type these text, accurately.
User-agent: *
Disallow:
(This means that all user agents are allowed to crawl your entire website.)
Save it as "Robots.txt"
Step 3
Go to your server by accessing the file manager or the FTP, and go to the root folder. ( normally _public-html or http-docs or find out your's from your host.)
Step 4
Upload the "Robots.txt" file to the root folder.
Your Robots.txt file is now set up successfully. Note that we have given the command to allow allow all search engine robots to crawl the entire site without any restriction. If you would like to selectively disallow/block certain files/folders to be crawled, follow the commands below.
1. Exclude a file from an individual search engine.
User-agent: Google
Disallow: /thepathtoyourfile.html
Replace "Google" with your search engine preference and replace "thepathtoyourfile.html" with the actual path to your file. If you would like to block more than one file, you have to repeat this command (second line) with specific file names.
Ex: Disallow: /file1.html
Disallow: /file2.html
2. Exclude a section of your site from all spiders and bots
User-agent: *
Disallow: /1/2/dir-to-be-blocked/
Replace "dir-to-be-blocked" with the actual path to your directory that is to be blocked.
3. Allow all spiders to index everything
User-agent: *
Disallow:
OR
Leave the Robots.txt blank without any commands.
4. Allow no spiders to index any part of your site
User-agent: *
Disallow: /
This ensures that no spider would index anything at all on your site.
Free Robots.txt file generators
There are quite a number of free online Robots.txt file generators. Here is a list of few.
1. Mcanerin This tool lets you select search engine robots selectively that you'd like to block and create a Robots.txt file which you just need to copy paste.
2. Global promoter Robots.txt generator - Excellent tool that helps you generate a Robots.txt file with the help of a wizard.
Summary
Essentially, Robots.txt is an excellent tool to control how search engines scan your website, and gather up information from them. The more complex and careful you plan your website design, the better your search engine positions would be. But many websites simply ignore this and leave everything to the search engines to decide. Is that a good thing to do? I'd say it all depends on how you want it, if you think "not showing" a folders content to Google will avoid un necessary information being passed to it, and you know exactly how you can accomplish it, then why not use the options?
What are Robotstxt files - To learn more about this author, visit Jeff Foster's Website.
Like this article? Share it with your friends
![]() | |
| |
No article feedback found. |
| |
Leave Your Feedback |
|
| |
| |||
|
To learn more about the Evan Elite Author Program please contact us. |
![]() | |
![]()
| |
![]() | |
|
| |
![]() |
|
Jeff Foster Video - Have an Internet Business Idea? Do you need a web site built & marketed? WebBizIdeas.com (Jeff Foster) is a Minneapolis Web Development Company that can take your website idea and turn it into a great internet business.
|
|
|
![]() | |||||||
|
![]() | ||
|
| ||
![]() |
| Have you written articles that would be of value to entrepreneurs? Become an expert on our site by publishing them! Expose yourself to a wide audience, drive more traffic to your website and get more sales! Click Here for details. |
|
|
![]() |
| Modeling the Masters: Learn the true secrets behind Walt Disney's business success factors & grow your company! Video produced by Phanta Media |
|
|
![]() |
"Learn straight from Evan how you can Make a Full Time Income (And More) from a Website"
Click Here To Learn More |
|
|
|
|
Get advice & tips from famous business owners, new articles by entrepreneur experts, my latest website updates, & special sneak peaks at what's to come!
|
![]() |
|
|
![]() | ||
|
Top 50 Geek Business Blogs
Top 50 Geek Business Blogs | ||
|
More PR Resources
Press Release Builder | ||
![]() | ||
![]() | ||||
| ||||
| ||||
| ||||
|
|
|
|
|
||||||||||||
|
|
|
|
|





Subscribe to Jeff's articles











