| Keeping robots spiders and wanderers away from your site using robots meta tags and other methods |
Trying to keep those search engine spiders, wanderers and other cataloging robots away from your top secret web pages? Here are some measures you can take:
robots.txt The "Robots Exclusion Protocol", the protocol designed to help web administrators and authors of web spiders agree on a way to navigate and catalog sites, require that you place a plain text file named "robots.txt" containing spidering rules, in the root directory of a site. It is important to note that this file must reside in the root directory of the main site, not in any other directory. For example, if your site is www.dotspecialist.com, the file must be accessible from http://www.dotspecialist.com/robots.txt The content of the robots.txt file mostly consist of two main commands: "User-agent" and "Disallow". The "User-agent:" command should specify the name or the signature of the robot which the spidering commands following it should be applied to. You can set this to * to instruct that the spidering commands should be applied to any robot that has not been identified in any other place inside the robots.txt file. The other command, "Disallow:" specifies a partial URL that should be ignored (not index) by the previously identified web robot. If you leave this field empty, this will be interpreted as a license to navigate any and all pages in your site, by the specified web robot. |
| Return to Listing |