![]() ![]() It seems working, because in the apache log, i find lots of these It seems working after the above methods. They will actually give whatever USER_AGENT (they can change this value) and they are usually from several IPs (so it is not easy to identify all of them using specific IP ranges). #ANGRY BOTS 2 WINDOWS#I have also noticed in the log file, there are quite many entries like this:ġ19.188.91.121 – – “GET /?charset=big5&do=System.Online&lang=ch&page=25&per=10&skin=2011anniversary HTTP/1.0” 200 3919 “ … …” “ Mozilla/5.0 (compatible MSIE 9.0 Windows NT 6.1 Trident/5.0)“įrom the HTTP_USER_AGENT you normally think it is not a bot, but i think they are. The preg_match uses regular expression and the option /i specifies case insensitive comparisons. $agent='' ĭefine('BADBOTS','/(yisouspider|easouspider|yisou|youdaobot|yodao|360|linkscrawler|soguo)/i') īasically, what the above PHP does is to check the HTTP_USER_AGENT string against these bad bots. 99% website pages are generated using this index file. #ANGRY BOTS 2 CODE#SetEnvIfNoCase User-Agent "Sogou" bad_botĪs a safety precaution, I have also put the following code at the index.php which is used to generate different pages according to the URL parameters. SetEnvIfNoCase User-Agent "360Spider" bad_bot SetEnvIfNoCase User-Agent "LinksCrawler" bad_bot SetEnvIfNoCase User-Agent "YisouSpider" bad_bot SetEnvIfNoCase User-Agent "EasouSpider" bad_bot It can be also used to control these bots. It is used by apache re-write module mod_rewrite to make URLs look nicer. htaccess is a text and hidden file at each website directory. Here are the rules I add to tell these bad bots go away. But not all of the bots follow the ‘instructions’. The robots.txt is a text file under the root of the website that directs the search bots that which directories to index and which are not allowed. but for other users, they may face the same problem. The only way to ban them is to mark them in the black list (i can do that for sure). 360, youdao), they don’t actually quite obey the crawling rules. Google spiders are fine because I can configure the parameters and they obey the robots.txt file. but due to quite a number of pages (around 5000 reported in google webmaster), some spiders may not be clever enough to figure out the duplicate. I have optimised my website before in order to reduce the CPU usage by caching them into static HTMLs. Just imagine, any other websites may face the same problem. I am sorry that this causes trouble to other share hosts but in my opinion it might be better to block them using a higher level (e.g. The 360spider problem has returned later so they have had to disable my site again until I have a script ready to block its access as it’s causing issues for other users of the server. mod_fcgid: can’t apply process slot for /var/www/fcgi/php54-cgiĪpparently, it looks like 360spider was hitting the site quite heavily and it obviously affects other websites on the same shared-host, and that is why the fasthosts have to take down my site. Then, I checked the apache2 log, and I find lots of these: #ANGRY BOTS 2 OFFLINE#If your site causes the same performance problem whilst the scan is running I will take it offline again until you can provide an explanation of why it’s tieing up approximately 200 Apache processes. I am also going to remove the 2 renamed htdocs folders unless you object? Our terms state that all files in your webspace must be part of the website so are all 85,000 files part of the site and accessible through the site? If not, they need to be removed please. Please note that your site contains 85,000 amounting to 8.6GB. I am running a security scan against your site at the moment. This has caused massive problems for all the other customers on the webserver. I’m not sure what you are doing with your site exactly but you’ve been consuming over 75% of the available Apache processes. The Fasthosts IT Operations Engineer Ewan MacDonald mailed me and he said: These majorly come from bot crawling the site. Recently, my site has been disabled many times due to a huge number of requests to my site. My website has been on one of the fasthosts shared-hosting servers. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |