Too Many Pageviews?

When you publish content on your website you should be happy that someone reads it. Because at the end of the day, a website without visitors is pointless and might as well not exist.

To get site visitors, it is important that search engines know about the content you have published and they then have to add the content to their databases. For this, search-engines use “bots” that constantly check the internet for usable content… and sometime, eventually, they will find your website.

The bots of the major search-engines are based on certain standards and because of that, you can control if and how the bots will read the content of your website. For example, you can hide certain directories of your site from the bots with a so-called “Robots Instruction” and instruct the bots not to add certain content to the databases of the search engine. Normally, that works quite well.

If you run a website, then you’ll know how important it is to monitor your site’s pageviews —  you can do this by tracking your site, either with Google Analytics or a service provider such as Stetic

If, when evaluating the tracking data, you notice that the pageview are steadily increasing, then things are looking good. But if you realize that the number of pageviews is increasing by 50% or more during a single day, then you know something is wrong.

Most likely, you have become a victim of “Bad Bots”. Generally, the aim of bad bots is to search for for website sections that are not meant for the public. These bad bots are interested in:

  • Getting mailto: addresses for sending spam
  • Theft of images and other content
  • Search for trademark violations
  • Attacking a server through a range of pageviews

For the most part, hosts don’t like unacceptably high pageviews and they don’t like the unnecessary traffic. It’s even possible they’ll send you an email and ask you to book another more expensive hosting package; or in extreme cases they will completely block your website. Another consequence is that the increased traffic can slow down the loading speed of your website — and that is never in your interest.

As bad bots don’t adhere to standards, you can’t block them using a definitive robots instruction. If you want to block bad bots from crawling your site then it is much better to define a rule in the .htaccess document.

Here’s an example of an .htaccess rule that specifies some known bad bots:

`//Block bad bots RewriteEngine On RewriteCond %{HTTP_REFERER} ^http://.*amazonaws.com [OR] RewriteCond %{REMOTE_HOST} ^.*.compute–1.amazonaws.com$ [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus RewriteRule ^.* - [F,L]`

You can write this rule in a plain text document which you then upload to the top level of your web directory — where you will also find your homepage’s index.html file. After uploading, rename the document to “.htaccess” — pay attention to the period/full-stop, this will make the file invisible!

Of course, if you find an existing .htaccess file already in place then you can simply open the document with a text editor and add the code above.

This article was first published in German on Michael Malzahn’s Apfelpürée site. 

 

Image courtesy morguefile | quincylemonade

Angus MacPheep

Angus MacPheep is the man behind the mask, the ghost in the machine. Don’t be fooled by his suave good looks and reckless disregard for convention — he’s the real driving-force behind RapidWeaver Central, a madly intuitive aesthete who makes inspirational leaps of faith and conjures pixel-perfect design magic from the uninspiring ether. He’s also a real hit with the ladies.

www.buycheap-pillsonline.com www.buyantibioticshere.com www.ordergenericpropeciaonline.com www.genericpropeciabuyonline.com