Welcome to Geeklog, Anonymous Thursday, November 21 2024 @ 01:16 pm EST
Geeklog Forums
Blocking unwanted requests to reduce server load
Status: offline
Dirk
Site Admin
Admin
Registered: 01/12/02
Posts: 13073
Location:Stuttgart, Germany
Most tips to speed up a site revolve around optimizing SQL requests or throwing bigger and better hardware at the problem. But if you check your logfiles these days, you'll find a lot of non-sensical requests that can also have some quite dramatic negative impact on your site's performance. Here's a few tips on how to deal with them ...
Santy and other worms
The outbreak of the original Santy worm which only attacked phpBB boards was quickly followed by variants (now called Spyki or PhpInclude worm) that target all PHP scripts out there - including Geeklog.
The Spyki worm tries to exploit a common programming mistake in PHP scripts where the author includes another file based on some parameter passed in the URL. The worm simply tries to exploit this with a brute-force attack on all parameters it can find for a script.
Geeklog itself is not vulnerable to this attack (can't speak for all the existing plugins and other add-ons, but I'm not aware of any problems with them at the moment). But the sheer amount of requests caused by this worm can really slow down a Geeklog site.
So what can we do? The idea is to detect the worm's requests on the server before they're actually executed. I.e. we use the webserver's abilities to catch these requests and make sure the PHP script they're trying to attack is not executed. This safes CPU time (for calling up the PHP interpreter) and DB load (for creating sessions, loading the blocks, etc.).
On geeklog.net, we currently use this in the site's .htaccess file:
RewriteEngine On
RewriteCond %{QUERY_STRING} ^(.*)wget%20 [OR]
RewriteCond %{QUERY_STRING} ^(.*)echr(.*) [OR]
RewriteCond %{QUERY_STRING} ^(.*)esystem(.*) [OR]
RewriteCond %{QUERY_STRING} ^(.*)highlight=%2527 [OR]
RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR]
RewriteCond %{HTTP_COOKIE}% s:(.*):%22test1%22%3b
RewriteRule ^.*$ http://127.0.0.1/ [L,R=301]
As explained above, this tries to detect patterns typical to the worm's requests and then redirects them to 127.0.0.1. While it is unlikely that the worm will even follow that redirect, it at least saves our webserver the trouble of having to execute the non-sensical request.
This is not the place to discuss and explain how Apache's mod_rewrite works. Check the Apache manual if you want to learn more about it.
The above rules are derived from similar ones you can find on the web. This site, for example, has similar rules for mod_security (an Apache 2 module) and also discusses some flaws in the above rules (but they seem to work for us for now ...).
Referrer spam
The same approach can also be used against those stupid referal spams. If you check your logfiles, you'll often find requests allegedly coming from porn or mortgage sites. If you look closely, you'll notice that they only send one request for a story or a forum post, but that it doesn't load any images. So it's not someone actually following a link to your site, it's just a stupid bot trying to draw attention to that site.
Again, if those requests come in a lot, they can increase the server load quite a bit. So we use the same idea as for the worms to catch them before the PHP script is even executed:
RewriteCond %{HTTP_REFERER} ^http://.*hosting4u.gb.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*4free.gb.com.*$ [NC]
RewriteRule ^.*$ http://127.0.0.1/ [L,R=301]
The two URLs in this example are two from a batch of sites that are being used in referrer spam at the moment. They are probably gone in a few days or weeks and replaced by others.
Which is the main problem with referrer spam: It's a moving target. So most of the time, I'd say don't bother and ignore them. Use the above only if you see a lot of requests and your server's load is increasing because of them.
404s and home-made problems
Some days ago, I was debugging a script and wondering why it caused SQL requests even after it had collected all the data it needed to display the page. The reason was that it was trying to load an image that wasn't there. And because I had set up Apache to use Geeklog's 404.php for the "404 Not Found" error message, it called up that script every time it couldn't find the image.
In case you're not aware, you can set up your own 404 page in a .htaccess like this:
So now every time a file is not found, the error message comes nicely wrapped in Geeklog.
However, that also means that every 404 causes the execution of a PHP script and requests to the database. So if you have a lot of 404s, you're creating a lot of load for both your server and your database. Check your logfiles regularily and try to fix those 404s.
Hope these tips help someone ...
bye, Dirk
Santy and other worms
The outbreak of the original Santy worm which only attacked phpBB boards was quickly followed by variants (now called Spyki or PhpInclude worm) that target all PHP scripts out there - including Geeklog.
The Spyki worm tries to exploit a common programming mistake in PHP scripts where the author includes another file based on some parameter passed in the URL. The worm simply tries to exploit this with a brute-force attack on all parameters it can find for a script.
Geeklog itself is not vulnerable to this attack (can't speak for all the existing plugins and other add-ons, but I'm not aware of any problems with them at the moment). But the sheer amount of requests caused by this worm can really slow down a Geeklog site.
So what can we do? The idea is to detect the worm's requests on the server before they're actually executed. I.e. we use the webserver's abilities to catch these requests and make sure the PHP script they're trying to attack is not executed. This safes CPU time (for calling up the PHP interpreter) and DB load (for creating sessions, loading the blocks, etc.).
On geeklog.net, we currently use this in the site's .htaccess file:
Text Formatted Code
# attempts to stop the Santy wormRewriteEngine On
RewriteCond %{QUERY_STRING} ^(.*)wget%20 [OR]
RewriteCond %{QUERY_STRING} ^(.*)echr(.*) [OR]
RewriteCond %{QUERY_STRING} ^(.*)esystem(.*) [OR]
RewriteCond %{QUERY_STRING} ^(.*)highlight=%2527 [OR]
RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR]
RewriteCond %{HTTP_COOKIE}% s:(.*):%22test1%22%3b
RewriteRule ^.*$ http://127.0.0.1/ [L,R=301]
As explained above, this tries to detect patterns typical to the worm's requests and then redirects them to 127.0.0.1. While it is unlikely that the worm will even follow that redirect, it at least saves our webserver the trouble of having to execute the non-sensical request.
This is not the place to discuss and explain how Apache's mod_rewrite works. Check the Apache manual if you want to learn more about it.
The above rules are derived from similar ones you can find on the web. This site, for example, has similar rules for mod_security (an Apache 2 module) and also discusses some flaws in the above rules (but they seem to work for us for now ...).
Referrer spam
The same approach can also be used against those stupid referal spams. If you check your logfiles, you'll often find requests allegedly coming from porn or mortgage sites. If you look closely, you'll notice that they only send one request for a story or a forum post, but that it doesn't load any images. So it's not someone actually following a link to your site, it's just a stupid bot trying to draw attention to that site.
Again, if those requests come in a lot, they can increase the server load quite a bit. So we use the same idea as for the worms to catch them before the PHP script is even executed:
Text Formatted Code
# Referrer spam :-(RewriteCond %{HTTP_REFERER} ^http://.*hosting4u.gb.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*4free.gb.com.*$ [NC]
RewriteRule ^.*$ http://127.0.0.1/ [L,R=301]
The two URLs in this example are two from a batch of sites that are being used in referrer spam at the moment. They are probably gone in a few days or weeks and replaced by others.
Which is the main problem with referrer spam: It's a moving target. So most of the time, I'd say don't bother and ignore them. Use the above only if you see a lot of requests and your server's load is increasing because of them.
404s and home-made problems
Some days ago, I was debugging a script and wondering why it caused SQL requests even after it had collected all the data it needed to display the page. The reason was that it was trying to load an image that wasn't there. And because I had set up Apache to use Geeklog's 404.php for the "404 Not Found" error message, it called up that script every time it couldn't find the image.
In case you're not aware, you can set up your own 404 page in a .htaccess like this:
Text Formatted Code
ErrorDocument 404 /404.phpSo now every time a file is not found, the error message comes nicely wrapped in Geeklog.
However, that also means that every 404 causes the execution of a PHP script and requests to the database. So if you have a lot of 404s, you're creating a lot of load for both your server and your database. Check your logfiles regularily and try to fix those 404s.
Hope these tips help someone ...
bye, Dirk
29
27
Quote
Status: offline
THEMike
Forum User
Moderator
Registered: 07/25/03
Posts: 141
Location:Sheffield, UK
I have a new version of my HTTP_REFERER module for geeklog pending, just ironing out a couple of details. But, it integrates to spamX to detect referer spam and ignore it for logging purposes. Additionaly, I have an alternative config value for spamx action, rather than sending the default one from your config.php it sends it's own configurable value. Added to that I have a custom action for spamx to perform a die(); command when spam is detected and thus stop you wasting a single further CPU cycle on HTTP_REFERER spammers.
42
23
Quote
Obviously, the prudent thing to do is implement whatever preventative measures we can to ensure our individual sites are not affected - at least, not affected much. I wonder however, if a GL site that requires user logon to view or post content ( $_CONF['loginrequired'] = 1 ) is somewhat protected from Santy and other like worms???
~Brian
~Brian
25
22
Quote
Status: offline
Dirk
Site Admin
Admin
Registered: 01/12/02
Posts: 13073
Location:Stuttgart, Germany
Quote by bcbrock: I wonder however, if a GL site that requires user logon to view or post content ( $_CONF['loginrequired'] = 1 ) is somewhat protected from Santy and other like worms???
Depends on what you mean by "protected".
As I said, a Geeklog site can not be infected by this worm. However, since that worm stupidly tries to call each and every PHP script it can find, your site's performance can certainly be affected. Login required or not - calling up the script, doing SQL requests only to display a "you have to be logged in" error message will have a certain impact on the load of your webserver and database.
The method described above tries to ease that load by deflecting the worm's requests.
Anyone remember the Code Red worm that attacked Microsoft's IIS webservers some years ago? It couldn't infect Apache webservers, but still caused server loads and annoying logfile entries there. As with the Santy worm, there's nothing you can do at your end to actually stop those attacks (other than taking down your site ...). All you can do is to try to minimize the impact it has on your site / server.
bye, Dirk
31
17
Quote
Status: Banned
machinari
Forum User
Full Member
Registered: 03/22/04
Posts: 1512
Quote by Dirk:So now every time a file is not found, the error message comes nicely wrapped in Geeklog.
However, that also means that every 404 causes the execution of a PHP script and requests to the database. So if you have a lot of 404s, you're creating a lot of load for both your server and your database. Check your logfiles regularily and try to fix those 404s.
yes yes, 404s were being returned for most of those nasty worm's requests. so scripts were running and the db was taking a hit (cuz i was using gl's 404.php). just now implemented some of the above rewrite rules... looking forward to positive results. thanks Dirk. However, that also means that every 404 causes the execution of a PHP script and requests to the database. So if you have a lot of 404s, you're creating a lot of load for both your server and your database. Check your logfiles regularily and try to fix those 404s.
27
26
Quote
All times are EST. The time is now 01:16 pm.
- Normal Topic
- Sticky Topic
- Locked Topic
- New Post
- Sticky Topic W/ New Post
- Locked Topic W/ New Post
- View Anonymous Posts
- Able to post
- Filtered HTML Allowed
- Censored Content