TulipTools Internet Business Owners and Online Sellers Community

Full Version: Robots.txt and The Robots Exclusion Protocol
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Quote:Controlling how search engines access and index your website

What does robots.txt do?...

you may have a few pages on your site you don't want in Google's index. For example, you might have a directory that contains internal logs, or you may have news articles that require payment to access. You can exclude pages from Google's crawler by creating a text file called robots.txt and placing it in the root directory. The robots.txt file contains a list of the pages that search engines shouldn't access. Creating a robots.txt is straightforward and it allows you a sophisticated level of control over how search engines can access your web site...

full article: http://googleblog.blogspot.com/2007/01/c...ccess.html

Quote:The Robots Exclusion Protocol

Preventing Googlebot from following a link...

What if you didn't want valentinesday.html and promnight.html appearing in Google's index? The articles in the Breaking News section may only appear for a few hours before being updated and moved to the Articles section. In this case you want the full articles indexed, not the breaking news version. You could put the NOINDEX tag on both those pages. But if the set of pages in the Breaking News section changed frequently, it would be a lot of work to continually update the pages with the NOINDEX tag and then remove it again when they moved into the articles section. Instead, you can add the NOFOLLOW tag to the breakingnews.html page. This tells the Googlebot not to follow any links it finds on that page, thus hiding valentinesday.html and promnight.html and any other pages linked from there. Simply add this line to the section of breakingnews.html:

Code:
<META NAME="ROBOTS" CONTENT="NOFOLLOW">

full article: http://googleblog.blogspot.com/2007/02/r...tocol.html
I have found that I need a robots.txt file in every website. Without it I get a lot of error messages: Where oh where is my robots.txt? It seems to be the local hangout for the robots, they don't know what to do with themselves without it. I made a file in notepad, put:

User-agent: *
Disallow:

in it, called it robots.txt and stuck it in each site. Robotic bliss.

The damn thing just says OK, look at everything, which is the default usually anyway, but they like it so who am I to complain.

Sh!t, I would put in Bozos.txt if it would help list me in the search engines.

*