Feedback Form
Home Features Mastermind Videos About Advertise Blog Network Contact
   

Have A Suggestion?
Toronto Salsa Classes / Toronto Salsa Lessons Email us your ideas on how to make our website more valuable! Thank you Sharon from Toronto Salsa Lessons / Classes for your suggestions to make the newsletter look like the website and profile younger entrepreneurs like Jennifer Lopez and Sean Combs!
Have A Suggestion?

Featured Ebook


ebook Famous Entrepreneurs - Modern Empire Builders


Featured Ebook

More Evan Carmichael
Have A Suggestion?


Sales Lessons From Starbucks And Dell

Working with robotstxt file



Working with robotstxt file
   

What is the robots.txt file?

The robots.txt file is an ASCII text file that has specific instructions for search engine robots about specific content that they are not allowed to index. These instructions are the deciding factor of how a search engine indexes your website's pages. The universal address of the robots.txt file is: www.domain.com/robots.txt. This is the first file that a robot visits. It picks up instructions for indexing the site content and follows them. This file contains two text fields. Lets study this example:

User-agent: *

Disallow:

The User-agent field is for specifying robot name for which the access policy follows in the Disallow field. Disallow field specifies URLs which the specified robots have no access to. An *example:

User-agent: *

Disallow: /

Here "*" means all robots and "/ " means all URLs. This is read as, “No access for any search engine to any URL" Since all URLs are preceded by "/ " so it bans access to all URLs when nothing follows after "/ ". If partial access has to be given, only the banned URL is specified in the Disallow field. Lets consider this example:

# Research access for Googlebot.

User-agent: Googlebot

Disallow:

User-agent: *

Disallow: /concepts/new/

Here we see that both the fields have been repeated. Multiple commands can be given for different user agents in different lines. The above commands mean that all user agents are banned access to /concepts/new/ except Googlebot which has full access. Characters following # are ignored up to the line termination as they are considered to be comments.

Working with the robots.txt file

1. The robots.txt file is always named in all lowercase (e.g. Robots.txt or robots.Txt is
incorrect)

2. Wildcards are not supported in both the fields. Only * can be used in the User-agent field's
command syntax because it is a special character denoting "all". Googlebot is the only robot
that now supports some wildcard file extensions.
Ref: http://www.google.com/support/webmasters/bin/topic.py?topic=8475

3. The robots.txt file is an exclusion file meant for search engine robot reference and not
obligatory for a website to function. An empty or absent file simply means that all robots are
welcome to index any part of the website.

4. Only one robots.txt file can be maintained per domain.

5. Website owners who do not have administrative rights cannot sometimes make a robots.txt file.
In such situations, the Robots Meta Tag can be configured which will solve the same purpose.
Here we must keep in mind that lately, questions have been raised about robot behavior
regarding the Robots Meta Tag. Some robots might skip it altogether. Protocol makes it
obligatory for all robots to start with the robots.txt thereby making it the default starting
point for all robots.

6. Separate lines are required for specifying access to different user agents and Disallow field
should not carry more than one command in a line in the robots.txt file. There is no limit to
the number of lines though i.e. both the User-agent and Disallow fields can be repeated with
different commands any number of times. Blank lines will also not work within a single record
set of both the commands.

7. Use lower-case for all robots.txt file content. Please also note that filenames on Unix
systems are case sensitive. Be careful about case sensitivity when defining directory or files
for Unix hosted domains.

You can use the robots.txt Validator to check your robots.txt from http://www.searchengineworld.com.

Advantages of the robots.txt file

Protocol demands that all search engine robots start with the robots.txt file. This is the default entry point for robots if the file is present. Specific instructions can be placed on this file to help index your site on the web. Major search engines will never violate the Standard for Robots Exclusion.

1. The robots.txt file can be used to keep out unwanted robots like emailing retrievers, imaging
strippers etc.

2. The robots.txt file can be used to specify the directories on your server that you don't want
robots to access and/or index e.g. temporary, cgi, and private/back-end directories.

3. An absent robots.txt file could generate a 404 error and redirect the robot to your default
404 error page. Here it was noticed after careful research that sites that do not have a
robots.txt file present and had a customized 404-error page, would serve the same to the
robots. The robot is bound to treat it as the robots.txt file, which can confuse it's
indexing.

4. The robots.txt file is used to direct select robots to relevant pages to be indexed. This
specially comes in handy where the site has multilingual content or where the robot is
searching for only specific content.

5. The need for the robots.txt file was also felt to stop robots from deluging servers with
rapid-fire requests or re-indexing the same files repeatedly. If you have duplicate content o
your site for any reason, the same can be controlled from getting indexed. This will help you
avoid any duplicate content penalties.

Disadvantages of the robots.txt file

Careless handling of directory and filenames can lead hackers to snoop around your site by studying the robots.txt file, as you sometimes may also list filenames and directories that have classified content. This is not a serious issue as deploying some effective security checks to the content in question can take care of it. For example if you have your traffic log on your site on a URL such as www.domain.com/stats which you do not want robots to index, then you would have to add a command to your robots.txt file. As an example:

User-agent: *

Disallow: /stats/

However, it is easy for a snooper to guess what you are trying to hide and simply typing the URL www.domain.com/stats in his browser would enable access to the same. This calls for one of the following remedies -

1. Change file names:

* Change the stats filename from index.php to something different, such as stats- new.php so that your stats URL now becomes www.domain.com/stats/stats-new.php

* Place a simple text file containing the text, "Sorry you are not authorized to view this page", and save it as index.php in your /stats/directory.

This way the snooper cannot guess your actual filename and get to your banned content.

2. Use login passwords:
Password-protect the sensitive content listed in your robots.txt file.

Optimization of the robots.txt file

The Right Commands in robots.txt :
Use correct commands. Most common errors include - putting the command meant for "User-agent" field in the "Disallow field" and vice-versa.

Please also note that there is no "Allow" command in the standard robots.txt protocol. Content not blocked in the "Disallow" field is considered allowed. Currently, only two fields are recognized: "The User-agent field" and the "Disallow field". Experts are considering the addition of more robot recognizable commands to make the robots.txt file more Webmaster and robot friendly.

Note: Google is the only search engine, which is experimenting with certain new robots.txt
commands.
It recognizes the "allow" command. Please read more details on the google site for robots.txt
usage.

Bad Syntax:

Do not put multiple file URLs in one Disallow line in the robots.txt file. Use a new Disallow line for every directory that you want to block access to. Incorrect Robots.txt

Example:

User-agent: *

Disallow: /concepts/ /links/ /images/

Correct robots.txt example:

User-agent: *

Disallow: /concepts/

Disallow: /links/

Disallow: /images/

Files and Directories:

If a specific file has to be disallowed, end it with the file extension and without a forward slash in the end. Study the following robots.txt example:

For file:

User-agent: *

Disallow: /hilltop.phpl

For Directory:

User-agent: *

Disallow: /concepts/

Remember if you have to block access to all files in the directory, you don't have to specify each and every file in robots.txt. You can simply block the directory as shown above. Another common error is leaving out the slashes altogether. This would leave a very different message than intended.

The Right Location for the robots.txt file:

No robot will access a badly placed robots.txt file. Make sure that the location is www.domain.com/robots.txt.

Capitalization in robots.txt

Never capitalize your syntax commands. Directory and filenames are case sensitive in Unix platforms. The only capitals used per standard are: "User-agent " and "Disallow"

Correct Order for robots.txt :

If you want to block access to all but one or more than one robot, then the specific ones should be mentioned first. Lets study this robots.txt example:

User-agent: *

Disallow: /

User-agent: MSNbot

Disallow:

In the above case, MSNbot would simply leave the site without indexing after reading the first command. Correct syntax is:

User-agent: MSNbot

Disallow:

User-agent: *

Disallow: /

The robots.txt file :

Not having a robots.txt file at all could generate a 404 error for search engine robots, which could redirect the robot to the default 404-error page or your customized 404-error page. If this happens seamlessly, it is up to the robot to decide if the target file is a robots.txt file or an html file. Typically it would not cause many problems but you may not want to risk it. It's always a better idea to put the standard robots.txt file in the root directory, than not having it at all.

The standard robots.txt file for allowing all robots to index all pages is:

User-agent: *

Disallow:

Using # Carefully in the robots.txt file:
Adding comments after the syntax commands is not a good idea using "#". Some robots might misinterpret the line although it is acceptable as per the robots exclusion standard. New lines are always preferred for comments.

Using the robots.txt file

* Robots are configured to read text. Too much graphic content could render your pages invisible
to the search engine. Use the robots.txt file to block irrelevant and graphic-only content.

* Indiscriminate access to all files, it is believed, can dilute relevance to your site content
after being indexed by robots. This could seriously affect your site's ranking with search
engines. Use the robots.txt file to direct robots to content relevant to your site's theme by
blocking the irrelevant files or directories.

* The robots.txt file can be used for multilingual websites to direct robots to relevant content
for relevant topics for different languages. It ultimately helps the search engines to present
relevant results for specific languages. It also helps the search engine in its advanced
search options where language is a variable.

* Some robots could cause severe server loading problems by rapid firing too many requests at
peak hours. This could affect your business. By excluding some robots that might be irrelevant
to your site, in the robots.txt file, this problem can be taken care of. It is really not a
good idea to let malevolent robots use up precious bandwidth to harvest your emails, images
etc.

* Use the robots.txt file to block out folders with sensitive information, text content, demo
areas or content yet to be approved by your editors before it goes live.

The robots.txt file is an effective tool to address certain issues regarding website ranking. Used in conjunction with other SEO strategies, it can significantly enhance a website's presence on the net.

© Copyright 2006, RedAlkemi

Working with robotstxt file - To learn more about this author, visit Atul Gupta's Website.

Like this article? Share it with your friends
[Get Copyright Permissions] E-Mail | Print | More  


Related Articles Related Articles
Can I achieve Quick SEO Results in Google, Yahoo, MSN?
  There is no "Magic Bullet" when it comes to Search Engine Optimization. Regardless of what anybody or any SEO company may tell you, there are never guarantees to achieving top rankings on the search engines. By...
SEO simple tips
  1. Make sure your site is not under construction. 2. Submit your sitemap to Google (sitemap for search engines usually in XML format) 3. Offer sitemap to your site visitors for easy page navigation. (sitemap for v...
Your Firewall Could Get You Delisted
  Some web sites in the latest Google update were removed from top positions on Google that had been there for years. Webmasters were trying to figure out what had happened and couldn't figure it out. After further a...
Art Concerns
  Promotional product art is a unique breed of computer art because it must interface with production machinery. For the most part that machinery is pretty high tech and usually requires high-end graphic files. File t...
HELP
  This article is intended to give initial help for the more common problems that one might encounter with a FileMaker based database. On the whole they are very stable, but, as with anything, problems do occur. When ...

Related Forum Posts Related Forum Posts
Re: What is Your Favorite Thing About Owning A Business? Re: What is Your Favorite Thing About Owning A Business?
dreamhost file manager dreamhost file manager
RE: .mov Files RE: .mov Files
Re: help wanted in u.k. Re: help wanted in u.k.
Re: What is Your Favorite Thing About Owning A Business? Re: What is Your Favorite Thing About Owning A Business?
Exporting CSV Files to Excel Exporting CSV Files to Excel
Re: Facebook interview Re: Facebook interview
Free Business ebooks Free Business ebooks

Related Forum Posts Related Businesses - Evan Elite Authors
Accessible Business Consultants
Dave Turkin, President, of Accessible Business Consultants is a full service business consultant that has over 32 years of experience working with small-medium size businesses. Dave has designed and implemented numerous business and marketing plans, designed internal programs for accounting and operational procedures. He has analyzed businesses and prepared strategic plans setting budgets for growth, expansion and business restructuring. He currently sits on the Board of Directors of various corporations as an advisor. For many years he has been the Business Coach to many executives offering advice and guidance from old and established companies as well as new companies just getting started. Dave has the ability to analyze a business quickly and get a strong indication as to the necessary steps to improve operations, productivity and profitability. - Visit Accessible Business Consultants's Website

Dianne Crampton
Dianne Crampton is an Executive Leadership Coach and Team Building Consultant and creator of the TIGERS team development model. For the past twenty years she has helped leaders and teams achieve goals with high levels of collaboration and teamwork. Crampton is a published author. Her contribution to Working Together: Diversity As Opportunity was endorsed by Stephen Covey. She has written for trade magazines. Merrill Lynch nominated her business for Inc. Magazine’s regional small business and entrepreneurial awards. Her work with Native Americans was recognized at a United Nations sponsored conference in 1994. The TIGERS model passed two rigorous validation studies in 1992 and 1994. The TIGERS Survey is able to measure and track team development over time. Dianne is also the creator and distributor of the TIGERS Team Wheel game. This game helps groups identify behaviors that build collaborative groups and behaviors that cause conflict, morale problems, production failures, and misunderstandings. For more information, or to subscribe to TigerTracks, a free monthly leadership and team newsletter go to http://www.corevalues.com - Visit Dianne Crampton's Website


The Evan Elite Authors program is currently in beta phase. For details please contact us.


 
About the Author
Have A Suggestion?

View Author's Blog
Become An Author

View Author's Video
Become An Author

Free Downloads


Atul Gupta's

Complete
List Of
SEO
Articles

First Name
Last Name
Email
 
If you enjoyed this article, get Atul Gupta's Complete List of SEO Articles For FREE!

More Atul Gupta
Where does your Site Rank on Google
Title Tag and Meta Description Tag Optimization
Working with robotstxt file
Google Florida Algo Update
Analysis and Implications of Hilltop Algorithm
Demystifying Googles Supplemental index
Process of website indexing by Google other Search Engines
Why do we need Search Engine Optimization
Keyword Research for Search Engine Optimization
Google Advanced Search Tips
Become An Author