Remove malicious and marketing bots
Blocking bad bots, intelligence bots, or unwanted bots from crawling the website. At the same time we will allow Google and all other important search engines and social networks to crawl.
We all know that bots crawl websites at any time. Unfortunately not all bots are good. Nearly 2 out of every 5 visitors to your site are trying to steal information, sell your SEO secrets, exploit security loopholes and pretend to be something they are not.
Bandit bots are not only stealing information and exploit your website’s security, but also generate a lot of traffic. Around 61% of the generated traffic is from bots and this is slowing down the speed of your site which is bad not only for your real visitors but also for SEO.
There is list full of bad marketing bots which you can add to your Robots.txt file to prevent them to crawling your website:
Not all bots will comply with your request not to be crawled. Robots.txt is only a request that you make towards the bot. In all other cases, you need to block the bots on a server level. Here is where htaccess comes into play.
The .htaccess file is a configuration file that is used by the Apache. This is the security guard on your website. Except in this case, the security guard has the ability to see whether or not the person trying to enter is coming from Mike`s home, is wearing a shirt that says “I’m a thief”, or otherwise identifies itself.
.htaccess file can block most of the malicious bots. In particular, the botnet bots – slaved computers from normal users – are generally not blocked by default. This is because those are regular user computers, using regular user software. If you block them, you’re blocking humans. For most other bots, though, the .htaccess file is ideal.
Some bots will pretend they are Google. In those cases, detecting a bot based on the User Agent alone will not help.
Please stay tuned as I plan to expand this article and add other techniques on how to block unwanted crawlers, including detecting if javascript is present (is there a session open?), bot traps, IP range bans and others.
Also please note that these techniques listed in this post will work only on Apache linux servers. If you are running on NginX or any other web server you will have to search a little bit online to find how to do this stuff.
Specific bad bots for WordPress
For WordPress websites there are some well-known folders which are on top of the hackers and their bots list to attack and exploit information such as usernames, passwords, e-mails and others.
You can protect them to be crawled by adding the following code in your robots.txt file. Remember that works only for WordPress!
Vulnerable WordPress folders to protect
Are you interested in making your WordPress site secured on high level? Check out our Ultimate WordPress Security Guide