Author Topic: How to prevent bad bots (web crawlers) with mod security  (Read 5385 times)

0 Members and 1 Guest are viewing this topic.

Offline
*
How to prevent bad bots (web crawlers) with mod security
« on: October 10, 2021, 11:07:05 AM »
I'm using apache + mod_security (with Comodo WAF rules):

1. Install mod_security
How to install here > http://wiki.centos-webpanel.com/mod_security-for-cwp
Optional: select Comodo WAF rules (I use this rules, CWPanel -> Security -> ModSecurity -> Select Comodo WAF )

2. Check what web crawlers are the most common on your server
Command to list top 100 agents on your apache:
#cat /usr/local/apache/domlogs/*.log | awk -F\" '{print $6}' | sort | uniq -c | sort -nr | head -100

Short wiki about web crawlers: https://linuxreviews.org/Web_crawlers

3. Add rules in modsecurity to prevent some web bots / web crawlers
Add rules below in file #/usr/local/apache/modesecurity-cwaf/custom_user.conf (this is file custom user conf file if you are using Comodo WAF rules)

Examples:
Code: [Select]
SecRule REQUEST_HEADERS:User-Agent "@contains blexbot" "id:'1000000',t:none,t:lowercase,deny,nolog,msg:'BAD BOT - Detected and Blocked. '"
SecRule REQUEST_HEADERS:User-Agent "@contains semrushbot" "id:'1000001',t:none,t:lowercase,deny,nolog,msg:'BAD BOT - Detected and Blocked. '"
SecRule REQUEST_HEADERS:User-Agent "@contains ahrefsbot" "id:'1000002',t:none,t:lowercase,deny,nolog,msg:'BAD BOT - Detected and Blocked. '"
SecRule REQUEST_HEADERS:User-Agent "@contains dotbot" "id:'1000003',t:none,t:lowercase,deny,nolog,msg:'BAD BOT - Detected and Blocked. '"
SecRule REQUEST_HEADERS:User-Agent "@contains mj12bot" "id:'1000004',t:none,t:lowercase,deny,nolog,msg:'BAD BOT - Detected and Blocked. '"
SecRule REQUEST_HEADERS:User-Agent "@contains barkrowler" "id:'1000005',t:none,t:lowercase,deny,nolog,msg:'BAD BOT - Detected and Blocked. '"
SecRule REQUEST_HEADERS:User-Agent "@contains megaindex" "id:'1000006',t:none,t:lowercase,deny,nolog,msg:'BAD BOT - Detected and Blocked. '"

4. Reload apache
Reload apache to reload updated mod_security custom rules
#systemctl reload httpd.service

5. Check one of your domain logs
Check log to see if your rules are valid and working, you must get 403 response (403 forbidden error)
Example: #less /usr/local/apache/domlogs/somedomain.com.log
Code: [Select]
185.191.171.39 - - [10/Oct/2021:13:00:08 +0200] "GET /page/ HTTP/1.1" 403 199 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)"




« Last Edit: October 10, 2021, 11:10:15 AM by idovecer »