1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to protect your website against bots and spammers (part 1)

Discussion in 'Programming' started by healzer, May 19, 2017.

  1. healzer

    healzer Jr. Executive VIP Jr. VIP

    Joined:
    Jun 26, 2011
    Messages:
    2,632
    Likes Received:
    2,274
    Gender:
    Male
    Occupation:
    RevEngineeringMon$y
    Location:
    Somewhere in Europe
    Home Page:
    In this tutorial I'll demonstrate how you can use very simple techniques to help combat bots and spammers on websites using only PHP. The techniques proposed here are not award winning methods but could definitely help in certain scenarios.

    Let us first start by looking at how spammers and bots work, how they are developed and why they are so effective.
    Botting is all about abusing a website by automating some/all activities in a completely autonomous way. What better way is there than to illustrate by means of a simple example. For the demo I have signed up for a new account on Reddit, because I've heard people are developing bots for the purpose of upvoting their questions and/or comments.

    Next up, I've launched Fiddler (developed by a company named Telerik), this is basically a HTTP(s) proxy logger. It has the ability to capture all HTTP traffic which you make (using your web browser or some software). First things, first, I picked a random topic on Reddit and clicked the "upvote" button next to the title (the orange up-arrow):

    [​IMG]

    Before I did this, I made sure Fiddler was running and HTTPS traffic was captured (since Reddit uses SSL/TLS) for web traffic.
    If you do not know what the HTTP protocol is, I would advise you to learn the very basics (this seems like a good source: https://code.tutsplus.com/tutorials...ery-web-developer-must-know-part-1--net-31177) , otherwise the rest of this tutorial will be a pain to follow and understand.

    The "upvote" click resulted in the following HTTP request details (in raw format):
    [​IMG]

    I have blurred out all irrelevant/private information.
    The most important part is the URL of the request, as you can see it makes a call to the endpoint /api/vote?dir=1&id=t3_TOPIC&sr=CATEGORY

    Further, looking inside the body of the request, we see the same id=t3_TOPIC parameter. (TOPIC corresponds to the Reddit topic ID, obtained from the Reddit page above).

    A spammer/botter's mission is to create a function (in his/her favorite programming language) to reproduce these HTTP requests, such that all she/he has to do is provide a list of URLs such that TOPIC's ID can be extracted from. As easy as that you can upvote all of your comments and dominate Reddit.

    Fortunately enough, Reddit has some other protection mechanism, some of these are hidden in the blurred lines on the image above. But if you invest a couple of hours (or less) figuring out what they are and how they are generated, you can easily bypass their protection.

    Protection by means of randomizing parameters

    Now that you know how bots work on internally, you should also know that those parameters (in the HTTP's request body) will most likely be hard coded into the software. Because these parameters are very unlikely to be changed by the website for various reasons.

    We can use this fact to combat spammers and bots by developing a system that is designed to randomize parameters. Randomization can occur after a certain time (or some other criteria). The purpose is to annoy the bot developers and make their life immensely hard, until they finally decide to quit/abandon botting.

    To illustrate how this can be done, I have prepared a very basic demo for you.
    I've create a form, two textboxes and a submit button. This is a common setup of a contact form. It looks like on the image below (left) and the HTML code is shown on the right:

    [​IMG]

    Lets now enter the fields and click the Submit button. I am using Fiddler to capture the HTTP request:

    [​IMG]

    Notice the body of the HTTP request message. It contains two parameters (NAMO and SUBJ), which correspond to the two fields as defined by the 'name' attribute in the HTML code.

    If a spammer wishes to develop a bot/tool and abuse this contact form, he/she will hard code both parameters and assign them the value he/she wishes. Then submitting a POST request to the correct endpoint URL, will send an email to the webmaster (or whatever is programmed).

    How can we protect contact forms, buttons and all other web based actions?

    Have a look at this HTML+PHP snippet:

    [​IMG]

    You will see that the PHP code at the very top defines a variable "RANDOM_FIRST_NAME". This variable is the concatenation of the current hour of the day, and a suffix '__NAMO'. When the time is 13:55 (1.55 pm), then the field will have the name's attribute set to '13__NAMO'.

    The second PHP code (lower on the image) checks if a POST request was received, and then whether a correct name for the name field is used. If not the correct one is entered, it will throw an error "something went wrong".

    The beauty of this method is that it can be quite burdensome to bot developers. Assume you are attempting to make a bot that reproduces the HTTP requests of our contact form, and you hard code the request parameters into your code, of a certain period in time (at 13.55 (1.55pm) your local time), then all parameter names will have a prefix "13__ ...". Then five minutes later, the clock hits 14:00 and now your bot will no longer work because the hard coded parameters are incorrect.

    Circumventing this technique

    If you're a little bit creative, you can find a way to bypass/circumvent this anti-bot technique easily.
    To do this, you first make a GET request to the form page, and then extract all names of the <input> HTML elements. If you do this every time before sending a POST request, you will always have the correct parameters and you do not have to hard code them at all! [​IMG] The downside is that this induces more overhead (more web traffic) and slows the bot down just a little bit.

    There is however, another smart approach that can provide even better security, but will be explained in part 2.
     
    • Thanks Thanks x 2
  2. V6Proxies

    V6Proxies Jr. VIP Jr. VIP

    Joined:
    Oct 26, 2016
    Messages:
    184
    Likes Received:
    11
    Gender:
    Male
    Occupation:
    V6Proxies
    Location:
    V6Proxies
    Home Page:
    great info dear , Thnx
     
  3. healzer

    healzer Jr. Executive VIP Jr. VIP

    Joined:
    Jun 26, 2011
    Messages:
    2,632
    Likes Received:
    2,274
    Gender:
    Male
    Occupation:
    RevEngineeringMon$y
    Location:
    Somewhere in Europe
    Home Page: