Hi, folks-
I used geeklog to run a website for a university course I was teaching. The term is now finished, and though I'd like to keep the website up for my (and the students') future reference, I'd like to convert it to a static set of html pages (to get rid of php/mysql overhead and allow me to move the site wholesale to a server without php/mysql access). Is there a way to do this -- i.e., basically take a snapshot of a site, keeping the links (though not necessarily the search capability or other interactive features) intact but making everything static?
Another alternative is to use Adobe Acrobat's Web Capture feature. This will create a single searchable PDF document that contains everything on the site.
05/08/03 03:04am
lhauck
Anonymous
In theory you might be able to use "wget -m".
This should recursively suck down the web site into a directory.
Haven't tried it myself however...
05/08/03 03:26am
Anonymous
Anonymous
Creates a lot of pages as it follows stuff like the calendar to the bitter end...
05/08/03 03:33am
Anonymous
Anonymous
I'm very surprised a university professor would take a huge step backwards and want a website with only static html unless your course material rarely changes. MySQL and PHP definitely have an overhead but make it far easier to maintain a website.
Any site grabbing software should be able to do what you want. Your own students are good sources to ask.
Well, I'll continue to use Geeklog for future iterations of the course, but I want to start each year fresh (i.e., with most student contributions removed) while also maintaining an archive of previous years.
I also want to shut down all of the interactive features (except possibly searching) in-between years so I don't accumulate irrelevant comments. (It's a course in evolutionary genetics, and sites about evolution tend to attract the wackos -- many of whom can't resist posting their unhelpful views).
Ideally, the website would remain more or less identical to its original state (i.e., PDF archives wouldn't work): I want not only an archive for my own edification, but a demo to show current and prospective future employers what I've done.
Well, I get the definite impression that there's no facility within Geeklog itself to do what I want, so wget it is.
FYI, the wget command I've used (and which seems to work) is:
wget -r -N -l 10 -E http://address.of.geeklog/path/to/geeklog
the -l 10 sets the recursion depth to 10 (YMMV); the -E adds .html to the end of the filenames of all downloaded text files.
Then I did a grep search & replace to put an .html in the appropriate place in all links in all the downloaded files. I also had to replace all instances of the string "php?" (minus quotes) with "php%3F", though I didn't seem to need to do the same with equals signs within links.
Et voila. As far as I can tell, this gives a cosmetically perfect rendition of my geeklog as it stood when I did the wget. Obviously all interactive features (posting, commenting, searching, logging in) are disabled, but that's what I wanted.
Thanks for the suggestions.