You’ve probably heard of bots on the Internet by now, but you might not be sure on what they are or what they do. To be brief, Internet bots, or web robots, or simply bots, are a software application that automatically runs various tasks on the Internet. These tasks can be for good or bad, depends on the creator. Good bots are typically used to assist in automating tasks, like website scanning or data collection, and are generally used to make our lives easier. Bad bots, however, are a pain and are used for malicious purposes in the online world. In Imperva Incapsula’s annual report on bot traffic, they found that nearly 60% of all bot traffic could be attributed to bad bots.
Bad bots are used to hack, spam, spy, interrupt, and compromise websites of various sizes. If you have an online presence of any kind, chances are, you’ve already had some dealings with these annoying bots. There are ways of dealing with bots. Bots make up nearly half of the Internet traffic today (48.5% to be exact), so chances are you won’t be able to avoid them altogether, but with our help, you won’t be bothered by them on a regular occurrence.
We have two solutions to get rid of bots on your website, a hard way, and an easy way. Let’s start with the hard way:
The hard way
Why is this hard? This solution takes a lot of effort on your part, a lot of knowledge, and a lot of time. If you’re having a problem with bots spamming your website the first thing you’ll need to do is find out where they came from. This is all going to get very technical, so try and follow along as best you can. If you get lost, don’t worry, the easy way is only a few paragraphs down.
To find out where the bots came from and block them you are going to need either, the IP address that bots were sent from or their User Agent String. An IP address is a unique identifier used to identify each computer on the Internet through a string of numbers that are separated by periods. A User Agent String, on the other hand, is the name of the actual program. For example, a Google search engine bot goes by Googlebot/2.1.
To find either of these things you will need to access your raw web log. At HostPapa you can find your raw web logs in cPanel under Metrics > Raw Access > Download Current Raw Access Logs. These files are usually quite large and will need to be decompressed through an archiver. You can find many versions of archivers on the web or through app stores. Once the file has been decompressed, open it in an ASCII text editor (like Notepad), which can also be found on the web.
Now you have to scan through the web file to try and find the bot that you want to block. Some helpful identifiers are knowing the time that the bot tried to gain access to your website or the page it was interacting with. With either or both of those pieces of information, you should be able to track down an IP address or the User Agent String. Once you have located either/both of these pieces of information, jot them down and prepare for the next step.
Keep in my mind, this solution can be very patchy and you may not get the end result that you desire. The next step is about blocking the IP address or User Agent String that you found, but this could backfire on your company. Just because bots came from one IP address does not mean that they will come from that same IP address the next time they attack. By blocking random IP addresses you could very well block an entire Internet Service Provider (ISP) along with all of the customers that use that ISP. The same risks come with blocking specific User Agent Strings. Hackers are clever and they will often name their bots after browsers or software that everyone uses. This becomes problematic when you try to block a bot named “Safari” and in the same process, block every person using the Safari web browser. If you are not sure of what you are doing you might be better off using the easy solution below.
If you still feel that this solution is worth the risk, the next step is to download your .htaccess file. WARNING, one wrong change on your .htaccess file can potentially break your website, so make sure you backup your website before making any further changes. To download your .htaccess file go to the web directory of your website from your control panel and find the file titled, “.htaccess”. If you can’t find the file, chances are it doesn’t exist and you’ll have to create one.
If you managed to find the .htaccess file the next step is to open it in your ASCII text editor or open a fresh document if you need to create a new .htaccess file. Using a word processor like Office, Word, or WordPad to create this file can cause your website to fail when you re-upload the .htaccess file, so make sure that you use an ASCII text editor.
To block an IP address, simply add the following lines of code to your .htaccess file (just add the actual IP address you want to block in place of the example IP addresses we listed below):
Deny from 22.214.171.124
If you already have text in your .htaccess file add the above code to the bottom of the file. You can add another line of code with the same “Deny From ___” format for each IP address you wish to block. You can block as many IP addresses as you need to, however, note that the longer your list becomes the more sluggish your website can become.
Blocking a User Agent Strings is very similar to blocking IP addresses. Let’s say you found a bot that you want to block named “SpamRobot/3.1 (+http://www.randomsite.com/bot.html)”, you would add the following code to your .htaccess file (replacing SpamRobot with the actual bot that you found):
BrowserMatchNoCase SpamRobot bad_bot
BrowserMatchNoCase OtherSpamRobot bad_bot
Deny from env=bad_bot
To add multiple User Agent Strings to block, simply add another BrowserMatchNoCase line of code above the “Order Deny,Allow” line of code. Just like blocking IP addresses, adding too many bots to block can slow down your website.
Once you’ve finished updating your file make sure to save it as “.htaccess” file WITH the quotation marks included. Upload your updated or brand new file to your website and you should be safe from the IP addresses and User Agent Strings that you’ve identified.
Remember this fix will not protect your website from all bots. In fact, it only protects you from the specified IP addresses and User Agent Strings that you have blocked. Hackers are smart these days and will often make it so if you block them in this format you will also be blocking several other users as well. IP addresses can also change, meaning that you could be blocking an innocent user instead of a bot if you use this solution.
The easy way
At the start of the blog, I mentioned that there were good bots and bad bots. Services like Cloudflare, SiteLock and Sucuri, use good bots to automatically deal with incoming bad bots. As you probably saw above in the hard way, despite being free, is a long, arduous solution, that may not even protect your website from the majority of bots. Alternatively, SiteLock and Sucuri will take care of spam bots for you, among many other features.
Both of these services continually scan your website for intruders and remove them if they are found. Other fixes to your website include eliminating backdoors and vulnerability remediation. Further intrusions are prevented on your website with a Web Application Firewall, DDoS prevention, and backdoor mitigation. On top of all the security that these services provide, SiteLock also gives users access to a Global CDN to speed up your website.
The important thing to remember about this solution is that it’s easy, automatic, and reliable. You don’t have to worry about editing your files or blocking IP addresses when you have a service like SiteLock or Sucuri because they will do that all for you and they will do it much more efficiently. SiteLock, Sucuri, or another security service that you can find, will protect your website from incoming bad bots and patch up your code to ensure that nothing malicious can get on your website.
If you don’t run one of the most popular websites that get over 100,000 views per day, then chances are the majority of your traffic comes from bots, not humans. Bots play a big part of all the Internet activity that happens today. While a lot of that activity can have malicious intent, there are many bots out there made to make our lives earlier. Take advantage of the services that the good bots can provide and take all the necessary precautions to protect your website from incoming bad bots.