Using a robots.txt file
- Saturday, October 02 2004 @ 10:45 am EDT
- Contributed by: Dirk
- Views: 27,399
/submit.php?type=event&mode=&month=08&day=07&year=2004&hour=2 /submit.php?type=event&mode=&month=06&day=17&year=2004&hour=8 etc. /comment.php?sid=20020513230754519&pid=0&type=article /comment.php?sid=20020427185655276&pid=0&type=article etc.
Obviously, it doesn't make a lot of sense to index these particular pages, or the submission forms in general.
There's an easy way to prevent this: Create a robots.txt file.
Every decent search engine will look for a robots.txt file before it starts indexing your site to see if there are any files or directories it shouldn't include in its index. So here's how to tell the search engine spiders to leave comment.php and submit.php alone:
User-agent: *
Disallow: /comment.php
Disallow: /submit.php
Disallow: /forum/createtopic.php
Put these lines in a simple text file, name it "robots.txt" and upload it to the root of your site, i.e. usually where Geeklog's index.php file resides.
With the "User-agent" line it's possible to set rules for certain spiders. We use a '*' to allow them all to index our site. The next three lines tell the spiders that they should not index these particular files (the third one is, obviously, for the forum so you don't need it if you don't have the forum plugin installed).
More information about the robots.txt file can be found on robotstxt.org. There's also a robots.txt validator to ensure your robots.txt doesn't have any syntax errors.
It's also worth thinking about adding other files or even directories there as well. For example, does it make sense to index the search form? Probably not. But then again, if you have a lot of links to specific search results, you may want those to be indexed.
Another benefit of using a robots.txt, apart from avoiding unnecessary traffic, is that your site is harder to find for the comment spammers (which are known to search for the key phrases that can be found on the comment submission form).