Welcome to Geeklog, Anonymous Saturday, December 21 2024 @ 09:05 am EST
Geeklog Forums
Google spidering the Calendar
Status: offline
ronack
Forum User
Full Member
Registered: 05/27/03
Posts: 612
Trying to resolve some slow down and lock up issues I was looking at my server logs and noticed what looked like a user trying to what looked like submit events for every day of every month of every year. However doing it very fast like it was a BOT. Upon doing a Whois on the IP I found it belonged to Google. Has anyone noticed this. My guess is that somehow it is spidering everyday. Here are few items of the log file.
66.249.66.51 - - [12/Jan/2005:00:01:07 -0500] "GET /submit.php?type=event&day=16&month=06&year=2003 HTTP/1.1" 200 32147
66.249.66.51 - - [12/Jan/2005:00:01:29 -0500] "GET /submit.php?type=event&day=22&month=05&year=2003 HTTP/1.1" 200 32147
66.249.66.51 - - [12/Jan/2005:00:02:01 -0500] "GET /submit.php?type=event&day=05&month=05&year=2006 HTTP/1.1" 200 32007
This can't be good for my bandwidth. I don't really want to stop Google from indexing the site, but I sure don't want it to do every day of every month of every year until it gives up or locks up the server. Any ideas folks.
Text Formatted Code
66.249.66.51 - - [12/Jan/2005:00:01:07 -0500] "GET /submit.php?type=event&day=16&month=06&year=2003 HTTP/1.1" 200 32147
66.249.66.51 - - [12/Jan/2005:00:01:29 -0500] "GET /submit.php?type=event&day=22&month=05&year=2003 HTTP/1.1" 200 32147
66.249.66.51 - - [12/Jan/2005:00:02:01 -0500] "GET /submit.php?type=event&day=05&month=05&year=2006 HTTP/1.1" 200 32007
3
4
Quote
Status: Banned
machinari
Forum User
Full Member
Registered: 03/22/04
Posts: 1512
if it is a bot, have a look at this article by Dirk.
5
8
Quote
Status: offline
Dirk
Site Admin
Admin
Registered: 01/12/02
Posts: 13073
Location:Stuttgart, Germany
Quote by ronack: Maybe it would help to put the Calendar in it's own directory in future versions of Geeklog.
Why should that help? Search engine spiders follow links and you would have a link to the calendar directory then ...
Quote by ronack: It also appears that google is ignoring the robots.txt file.
It may take some hours before it recognizes that it has changed, but if it is syntactically correct, Googlebot will obey.
bye, Dirk
4
5
Quote
Status: offline
Dirk
Site Admin
Admin
Registered: 01/12/02
Posts: 13073
Location:Stuttgart, Germany
Quote by ronack: I was thinking that if it was in a directory then you could just exclude that directory.
Simply exclude the calendar.php script ...
Quote by ronack: I'm not sure what the googlebot is doing except filling up my log files.
I've seen this before - sometimes it just starts to index the calendar like wild, then it ignores it completely again for weeks. As long as it keeps the bot happy, I don't mind
bye, Dirk
4
5
Quote
All times are EST. The time is now 09:05 am.
- Normal Topic
- Sticky Topic
- Locked Topic
- New Post
- Sticky Topic W/ New Post
- Locked Topic W/ New Post
- View Anonymous Posts
- Able to post
- Filtered HTML Allowed
- Censored Content