1. This website uses cookies to improve service and provide a tailored user experience. By using this site, you agree to this use. See our Cookie Policy.
    Dismiss Notice

[GUIDE] How Ad Networks (and everyone else) Knows You're A Bot

Discussion in 'Making Money' started by AdvertisingGuy, May 9, 2019.

  1. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    Hi all! Long-time lurker, short-time poster.

    There are a lot of BHW users that want to do arbitrage--meaning buying traffic to a site on a CPC basis and getting paid from ad networks on a CPM or CPA/CPI basis. If you do this, it is essential that the traffic that you buy is human, or else that, not identified as a bot. If you send bot traffic to the ads that your ad network displays, the ad network is well within their rights to ban you and withhold your payment, and since you mostly likely paid for the traffic upfront, then that means all of your traffic buying is wasted and money gone forever.

    I've been in digital advertising a long time and I've worked with every major ad verification provider. I've also built my own "home rolled" ad verification tools. So while I certainly don't know everything, I know enough to (at least try!) to let you know how to protect yourself against bot traffic. The idea here is to go very in-depth with data, screenshots and the works so that you know exactly what you're dealing with.

    Note that I've been posting bits of this in various other threads but having all the information in one place makes it easier to digest, so I've created this thread. I hope this is the appropriate place for the thread--any mods please move it at your discretion.

    More content will follow below!

    Mod Updates -- Bot Flags:
    BOT FLAG #1 -- Automated Browsers
    https://www.blackhatworld.com/seo/guide-how-ad-networks-and-everyone-else-knows-youre-a-bot.1119582/

    BOT FLAG #2 -- DATA CENTER IP ADDRESSES
    https://www.blackhatworld.com/seo/g...nows-youre-a-bot.1119582/page-2#post-12057559

    BOT FLAG #3 -- Invalid Graphics Processing Units (GPUs)
    https://www.blackhatworld.com/seo/g...nows-youre-a-bot.1119582/page-3#post-12060921

    BOT FLAG #4 -- Missing / Invalid Plugins Media + Devices
    https://www.blackhatworld.com/seo/g...nows-youre-a-bot.1119582/page-4#post-12066141

    BOT FLAG # 5 -- Browser Spoofing / Incongruous Browsers
    https://www.blackhatworld.com/seo/g...nows-youre-a-bot.1119582/page-4#post-12069251

    BOT FLAG #6 -- BAD / NO / IMPROPER Referrer URLs
    https://www.blackhatworld.com/seo/g...nows-youre-a-bot.1119582/page-4#post-12089635

    ---

    BOT FLAG #1 -- Automated Browsers

    As can be deduced, browser automation tools (or WebDrivers) automate actions in a web browser. Anything that can be done in a web browser, such as visiting websites, moving the mouse, moving back or forward, or clicking on links, a browser automation tool can do. Automation is often done using frameworks such as Selenium (using Java or Python) or Puppeteer (using JavaScript). These frameworks are primarily used for website testing, but they can be used for scraping data, filling out forms, and taking screenshots.

    An automated browser will open up a full web browser and load all content on a web site. Chrome, Firefox, and even Safari have their own WebDrivers used for automated testing. Loading the full website, which includes JavaScript and images, will necessarily load ads as well. Because the browser is being operated by a computer program, and not by a human user interested in the content of a site, automated browsers are and should be flagged as invalid traffic for advertising 100% of the time. If you think about it, every bot that visits websites an automated browser in one way or another.

    Automated browsers are identified using the webdriver parameter of the navigator WebAPI.

    To see this in action, open up your browser, and open up developer tools (CTRL+SHIFT+i on your keyboard). From there, go to the (JavaScript) "Console."

    If you type in 'navigator.webdriver' and hit enter in Safari or Firefox (screenshot below), the response will be 'false'. If do it in Chrome, it will be undefined / null. That's because you, as a normal browser user, are not using an a automated browser.

    [​IMG]

    Below is a ultra-lite Python script that will open up a automated browser and will send it to Google. Python comes pre-installed with Mac computers if you want to try it for yourself, but you will have to download the Chrome WebDriver to get it to work.

    from selenium import webdriver

    browser = webdriver.Chrome(executable_path='chromedriver.exe')
    brower.get('http://www.google.com')​

    If you were using an automated browser, then open up developer tools / console and type in "navigator.webdriver", the response will be true.

    [​IMG]

    There are bot-blocking tools such as Encapsula, Distil Networks and PerimeterX which will prevent bad bots from websites. If you visit a site using these tools using a normal web browser, such as twentytwowords.com or streeteasy.com, you should be able to view the site’s content without issue. However, if you point your automated browser towards that same site (screenshot below), the browser will get blocked from entering. Even if the same computer, ip address, and browser are used.


    from selenium import webdriver

    browser = webdriver.Chrome(executable_path='chromedriver.exe')
    brower.get('http://www.streeteasy.com')
    [​IMG]

    Why? Because bot-blockers are identifying the browser as a bot from that navigator.webdriver parameter.
     
    • Thanks Thanks x 35
    Last edited by a moderator: May 22, 2019
  2. blackbrut

    blackbrut Jr. VIP Jr. VIP

    Joined:
    Mar 21, 2016
    Messages:
    322
    Likes Received:
    156
    Home Page:
    Very interesting, would be nice to have more bot flag :)
     
  3. BABL

    BABL Newbie

    Joined:
    May 8, 2019
    Messages:
    15
    Likes Received:
    1
    Occupation:
    Founder
    Location:
    at my desk
    Please do update as it is interesting
     
  4. Gogol

    Gogol Jr. VIP Jr. VIP

    Joined:
    Sep 10, 2010
    Messages:
    5,472
    Likes Received:
    5,185
    Gender:
    Male
    Occupation:
    Programmer
    Location:
    Pale Blue Dot
    Home Page:
    Now, the question is.. How do I set navigator.webdriver to false? AFAIK it's a readonly variable, but people seemed to have cracked it already. Like @jamie3000 i guess. :D
    I would be glad to know the answer.
     
  5. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    Thank you! I'll probably post one per day and they take a lot of effort to put together.

    That's a good question! From what I know you can add an extension to your browser to change the field, but I can't tell you exactly how to do it. I have been told a rumor (from a semi-reliable source) that because Google uses made/owns Chrome there are some special ways for them to determine if someone is using an automated browser beyond that flag. But I don't know what that is and they sure ain't telling.
     
    • Thanks Thanks x 1
  6. Filipo666

    Filipo666 Jr. VIP Jr. VIP

    Joined:
    Aug 14, 2018
    Messages:
    486
    Likes Received:
    480
    Gender:
    Male
    Occupation:
    NEW BST click here ↓
    Location:
    Czech Republic
    Home Page:
    Bookmarked for later, Btw nice profile picture. Haven't seen in a while Don, the boss who can hire himself for the job ahhaah
     
    • Thanks Thanks x 2
  7. Gogol

    Gogol Jr. VIP Jr. VIP

    Joined:
    Sep 10, 2010
    Messages:
    5,472
    Likes Received:
    5,185
    Gender:
    Male
    Occupation:
    Programmer
    Location:
    Pale Blue Dot
    Home Page:
    Hmm, I will give it a go when I have some free time. My idea was, that if I could somehow custom build the chrome driver executable; I could set that var as false.
     
  8. JackFruit

    JackFruit Regular Member

    Joined:
    Jun 9, 2009
    Messages:
    448
    Likes Received:
    189
    Gender:
    Male
    Ok, so how to get around it.
     
  9. Herion

    Herion Jr. VIP Jr. VIP

    Joined:
    Jul 8, 2012
    Messages:
    576
    Likes Received:
    161
    Object.defineProperty and javascript injection. There are limits to what you can do with it on chromium based browsers though.
     
    • Thanks Thanks x 2
  10. Tozzy

    Tozzy Power Member

    Joined:
    Nov 26, 2015
    Messages:
    533
    Likes Received:
    163
    Gender:
    Male
    Location:
    World
    Thanks @AdvertisingGuy
    Any more bot flags you're aware of?
    I heard that some advanced detection techniques involve measuring page load time, tracking user's mouse activity, using flash/java and storing files or so-called permanent cookies etc etc. Maybe there are even more twists you've seen. Maybe you know of any ways to distinguish a wiped or freshly installed browser from one in normal use?
     
  11. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    I am an AdvertisingGuy after all. :D

    I can't comment beyond what I've already said. If you have the right answer and can confirm, put it in this thread. Some of these things can't be gotten around, however.

    Some of those things I'll cover in this series. I don't know of any "permanent cookies" that track users--Verizon used to do this and they were hit by a $1.35B fine. According to the below link it was pretty prevalent once upon a time, but I'm not sure if it's done now. A billion dollars is a good way to dissuade people.

    https://qz.com/634294/a-short-guide-to-supercookies-whether-youre-being-tracked-and-how-to-opt-out/

    As for the freshly wiped / installed browser, I know it's done with cookies. If you were to close Chrome, delete the Chrome cookie folder(s) and then open them back up, a bunch of new cookie folders would be recreated, especially after you visit your first site. But beyond the cookie, ad tech companies have additional ways to ID users. There are advanced fingerprinting techniques which take your graphics card, ip address / dma, browser version/user agent and can reliably ID people to a high degree of accuracy. I'll get into that stuff as well.
     
    • Thanks Thanks x 1
  12. Gogol

    Gogol Jr. VIP Jr. VIP

    Joined:
    Sep 10, 2010
    Messages:
    5,472
    Likes Received:
    5,185
    Gender:
    Male
    Occupation:
    Programmer
    Location:
    Pale Blue Dot
    Home Page:
    Hmmmm interesting! I will def try that out! Thanks!

    Initial browser size, initial cursor position could also be tracked I guess. For a bot, it will be same mostly..
     
    • Thanks Thanks x 1
  13. Pistacho

    Pistacho BANNED BANNED

    Joined:
    Apr 24, 2019
    Messages:
    78
    Likes Received:
    51
    Gender:
    Male
    @AdvertisingGuy you forgot one more thing. Ad networks are also checking for your IP address ASN if you looked at the tracking scripts they use you will see that they use 3rd party JS scripts that return whether your IP is mobile, residential or a bot IP. The way they check for a bot IP is of the IP address came from a datacenter IP.

    You forgot to mention this as it is extremely easy to spot if you look at your requests under Chrome Developer Tools. Just like Instagram and YouTube care about the type of IPS Ad networks care as well.
     
  14. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    You just spoiled yesterday's post! lol. :D :D

    I'll be going a little bit more in-depth into data center IPs tomorrow.
     
  15. DanDD

    DanDD Jr. VIP Jr. VIP

    Joined:
    Mar 10, 2014
    Messages:
    699
    Likes Received:
    178
    here we go again :D btw you can also use any automation library that's based on cef, that will fix everything
     
  16. Pistacho

    Pistacho BANNED BANNED

    Joined:
    Apr 24, 2019
    Messages:
    78
    Likes Received:
    51
    Gender:
    Male
    Yeah I always tell people that if you really must use a browser then puppeteer and Object.defineProperty are your friends. This is a good example of how to do it properly: https://gist.github.com/nicoandmee/7d7dc2e79e2d553a22645d312f39bf3c
     
    • Thanks Thanks x 2
  17. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    Looks like we have a winner! Though I didn't run the script, it looks like like 165 changes the navigator.webdriver property from "true" to "undefined."

    For everyone asking how you do it, that's how you do it.
     
    • Thanks Thanks x 1
  18. TasDePixels

    TasDePixels Junior Member

    Joined:
    Mar 8, 2018
    Messages:
    192
    Likes Received:
    234
    Gender:
    Male
    Occupation:
    Software engineer
    Location:
    Morocco
    i personally use nightmare.js (based on electron) i switch useragent signature & proxy and bunch of other stuff as a paramaters in every call for full stealth.

    [​IMG]
    it works.

    i also tested the link you provided.
    [​IMG]

    What's sweet about nightmare is that you can easly access a bunch of methods that define ur emulated navigator such as WebRTC etc..
     
    • Thanks Thanks x 1
    Last edited by a moderator: May 10, 2019
  19. British Botter

    British Botter BANNED BANNED

    Joined:
    May 9, 2019
    Messages:
    43
    Likes Received:
    16
    Gender:
    Male
    Nico Mee is a beast some of his work is outstanding!
     
    • Thanks Thanks x 1
  20. BrazilianBusinessman

    BrazilianBusinessman Jr. VIP Jr. VIP

    Joined:
    Mar 18, 2019
    Messages:
    617
    Likes Received:
    282
    Gender:
    Male
    Occupation:
    Detective of my own failure
    Location:
    The place where humans aint slaved by money
    OP, entering into the tech savy aspect, as sellenium can be seen as a automated tool, this means any tool that use it can be flagged, right?

    I think I realized why many creation tools are leaving footprints that arent related to proxies but the tool itself