Welcome to Geeklog, Anonymous Saturday, December 21 2024 @ 09:05 am EST

Geeklog Forums

Google spidering the Calendar


Status: offline

ronack

Forum User
Full Member
Registered: 05/27/03
Posts: 612
Trying to resolve some slow down and lock up issues I was looking at my server logs and noticed what looked like a user trying to what looked like submit events for every day of every month of every year. However doing it very fast like it was a BOT. Upon doing a Whois on the IP I found it belonged to Google. Has anyone noticed this. My guess is that somehow it is spidering everyday. Here are few items of the log file.
Text Formatted Code

66.249.66.51 - - [12/Jan/2005:00:01:07 -0500] "GET /submit.php?type=event&day=16&month=06&year=2003 HTTP/1.1" 200 32147
66.249.66.51 - - [12/Jan/2005:00:01:29 -0500] "GET /submit.php?type=event&day=22&month=05&year=2003 HTTP/1.1" 200 32147
66.249.66.51 - - [12/Jan/2005:00:02:01 -0500] "GET /submit.php?type=event&day=05&month=05&year=2006 HTTP/1.1" 200 32007

 
This can't be good for my bandwidth. I don't really want to stop Google from indexing the site, but I sure don't want it to do every day of every month of every year until it gives up or locks up the server. Any ideas folks.
 Quote

Status: Banned

machinari

Forum User
Full Member
Registered: 03/22/04
Posts: 1512
if it is a bot, have a look at this article by Dirk.
 Quote

Status: offline

ronack

Forum User
Full Member
Registered: 05/27/03
Posts: 612
Thanks, good info, I'll admit I failed to do a search on this one.

Maybe it would help to put the Calendar in it's own directory in future versions of Geeklog.
 Quote

Status: offline

ronack

Forum User
Full Member
Registered: 05/27/03
Posts: 612
It also appears that google is ignoring the robots.txt file.
 Quote

Status: offline

Dirk

Site Admin
Admin
Registered: 01/12/02
Posts: 13073
Location:Stuttgart, Germany
Quote by ronack: Maybe it would help to put the Calendar in it's own directory in future versions of Geeklog.

Why should that help? Search engine spiders follow links and you would have a link to the calendar directory then ...

Quote by ronack: It also appears that google is ignoring the robots.txt file.

It may take some hours before it recognizes that it has changed, but if it is syntactically correct, Googlebot will obey.
bye, Dirk
 Quote

Status: offline

ronack

Forum User
Full Member
Registered: 05/27/03
Posts: 612
Ok Dirk thanks for the info.

I was thinking that if it was in a directory then you could just exclude that directory. I'm not sure what the googlebot is doing except filling up my log files.
 Quote

Status: offline

Dirk

Site Admin
Admin
Registered: 01/12/02
Posts: 13073
Location:Stuttgart, Germany
Quote by ronack: I was thinking that if it was in a directory then you could just exclude that directory.

Simply exclude the calendar.php script ...

Quote by ronack: I'm not sure what the googlebot is doing except filling up my log files.

I've seen this before - sometimes it just starts to index the calendar like wild, then it ignores it completely again for weeks. As long as it keeps the bot happy, I don't mind Mr. Green

bye, Dirk
 Quote

Status: offline

ronack

Forum User
Full Member
Registered: 05/27/03
Posts: 612
I had to complety exclude MSNbot, It ate up gigabites of bandwidth and from what I've read, MS isn't even using it.
 Quote

All times are EST. The time is now 09:05 am.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content