Mythodea Ost Wiki: HarnessingthePowerofRobots.txt

Sometimes, we may want search engines never to index certain parts of the site, or even exclude other SE in the site completely. This is where a simple, little 2-line text file called robots.txt will come in. Once we have a web site up and running, we need to ensure that all visiting search-engines can access all the pages we want them to consider. Sometimes, we may want search engines never to list certain areas of the site, as well as ban other SE in the site all together. This really is the place where a simple, little 2-line text file called robots.txt is available in. Robots.txt resides inside your web sites main directory (on LINUX systems that is your /public_html/ directory), and looks some thing such as the following: User-agent: * Disallow: The very first line controls the robot that will be visiting your site, the next line controls if they are allowed in, or which parts of the site they're maybe not allowed to go to Then basic repeat the aforementioned lines, If you would like to deal with multiple spiders. Therefore an example: User-agent: googlebot Disallow: User-agent: askjeeves Disallow: / This will allow Goggle (user-agent name GoogleBot) to go to every page and index, while in the same time banning Ask Jeeves from the site fully. Identify further on blogspot web resource has some unusual suggestions for where to look at it. It'll stop your problem logs filling up with records from search engines trying to access your robots.txt file that doesnt occur. For more information on robots.txt see, the full list of resources about robots.txt at http://www.websitesecrets101.com/robotstxt-further-reading-resources.

Harnessing the Power of Robots.txt

Mythodea Ost Wiki : HarnessingthePowerofRobots.txt