1. This website uses cookies to improve service and provide a tailored user experience. By using this site, you agree to this use. See our Cookie Policy.
    Dismiss Notice

[GUIDE] How Ad Networks (and everyone else) Knows You're A Bot

Discussion in 'Making Money' started by AdvertisingGuy, May 9, 2019.

  1. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    I don't know what that would entail but that sounds very difficult to do.

    I've never used the service, so I can't comment.
     
  2. velkan

    velkan Junior Member

    Joined:
    Feb 14, 2019
    Messages:
    152
    Likes Received:
    29
    Gender:
    Male
    No, I didn't mean http requests. I ment click/fill text fields/click keys entierly seperate from webbrowser. Bot based on display coordinates. Have nothing to do with webbrowser - except running webbrowser but not based on website code.

    EDIT
    Nevermind. I will test it in field :)
     
  3. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    You mean just typing keys onto your computer? That's absolutely possible to script. But it would make browsing / scraping difficult.

    It's a shame this account got banned because I had a lot of questions! :D

    What timing attacks are used to ID a proxy? Identifying a proxy in my experience is rare, especially if it's high-anonymous / VPN.
     
    Last edited by a moderator: May 16, 2019
  4. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    BOT FLAG #3 -- Invalid Graphics Processing Units (GPUs)

    To put it as simply as possible, a graphics processing unit (GPU) is a computer chip that controls the output to a display screen. Assuming that you're reading this on a screen (and how else would you read it?), then a graphics card helped get it there.

    If you're on a windows computer, you can see the manufacturer of your graphics card, by going into the search bar and typing in "dxdiag" + enter.

    A windows tool will pop up called "DirectX Diagnostic Tool." Then click on the “Display” tab. You’ll see something like the below:

    [​IMG]

    I can see that I have a computer with an Intel(R) UHD Graphics 620. And through the use of a JavaScript library called “WebGL” most analytics platforms can see this too.

    WebGL was originally designed for browsers to render 2D / 3D graphics without the use of additional plugins. Does anyone here who was active online in the 90’s remember things like downloading flash or shockwave plugins? WebGL was designed to help avoid some of that. But in order to properly render an image, you need to know what GPU the user had, so WebGL enables this kind of functionality.

    If you want more information, read here: https://dev.opera.com/articles/introduction-to-webgl-part-1/

    The thing is, clever ad networks / ad verification vendors realized that they could use the same parameter to help identify the graphics card that a browser was using. When you visit a website or load an ad, the ad itself actually renders on the page. It uses a your graphics card to load the content. These vendors through the use of WebGL can see that you’re in a real browser, loading real content. To confirm, go to https://browserleaks.com/webgl.

    If you scroll down, the Unmasked Renderer field, collected via WebGL, should match your graphics card pretty closely (as should the other WebGL parameters). “Should” and “closely” being the operative words. I’m going to include some technical details below because I think they’re interesting, but if you don’t really care, you can skip the next few paragraphs.

    [​IMG]

    As you can see, my detected GPU is: “ANGLE (Intel(R) UHD Graphics 620 Direct3D11 vs_5_0 ps_5_0)”. The Intel(R) UHD Graphics 620 is from our DxDiag. ANGLE stands for “Almost Native Graphics Layer Engine.” It’s a product developed by Google whose goal is to “allow users to seamlessly run WebGL and other content” by translating OpenGL calls to one of the hardware-supported APIs available for the browser user’s (in this case, my own) operating system. Basically this makes using a web brower (like Chrome) faster.

    In my case, because I’m using a Microsoft computer, ANGLE is using Direct3D11 (a Microsoft 3D graphics rendering API) to render the graphics on the page.

    As for “vs_5_0 ps_5_0”:
    • “vs” stands for “vertex shader” version 5.0
    • “ps” stands for “pixel shader” version 5.0
    This is just instructing Direct3D11 how to render the graphics on the screen using vertex shader v5.0 and pixel shader v5.0.

    The point of all this is that the WebGL library has identified that we have a graphics card, and a valid graphics card (ad networks check for both). That means that the page rendered and any ads ran likely rendered too.

    However, there are such things as headless browsers where the browser itself runs in the background, but you don’t see it. In this case, WebGL wouldn’t actually detect anything, because Headless Browsers run in the background without utilizing the graphics card. Instead, it’ll return results such as the ones below:
    • Google SwiftShader
    • Microsoft Basic Render Driver
    • llvmpipe (LLVM 6.0, 256 bits)
    Google SwiftShader is when a Chrome/Chromium based browser renders in the background and it doesn’t connect to your actual graphics card. Microsoft Basic Render Driver is the same thing, but with non-chrome browsers on windows devices. llvmpipe (LLVM 6.0, 256 bits) is when it’s a Linux computer (probably a server) with no graphics card at all! There are other bot indicators, but I'm sure you get the idea. An example from my tracking pixel below.

    TL;DR -- If a computer doesn’t have a valid graphics card, then it’s likely a bot and will get flagged.

    [​IMG]

    One more thing to note--typically when a bot is a bot, they have a LOT of these flags set off simultaneously. This has a bad user agent (covered in the future), a data center IP address, and no valid graphics card.
     
    • Thanks Thanks x 6
  5. DanDD

    DanDD Jr. VIP Jr. VIP

    Joined:
    Mar 10, 2014
    Messages:
    699
    Likes Received:
    178
    There was a linux software that allowed headless browsers to be detected/run as normal withiut much ram usage
    Dont remember its name but it was another thread just like these
     
  6. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    I'd be interested to see the software. Is it possible to search the forum? Maybe you posted to it or liked it?

    I think it's possible but only if you hard-code it specially into the browser using a browser extension. But even then it has to be the RIGHT graphics card for your computer, otherwise it will get identified.

    Ad networks / ad verification companies really know their stuff when it comes to catching people who bot.
     
  7. DanDD

    DanDD Jr. VIP Jr. VIP

    Joined:
    Mar 10, 2014
    Messages:
    699
    Likes Received:
    178
    nope mate, it was pretty decent software
    will send it later when on pc

    im unable to find it but heres an interesting read https://www.blackhatworld.com/seo/does-anyone-have-an-undetectable-selenium-jar.962732/

    here it is http://elementalselenium.com/tips/38-headless
     
    • Thanks Thanks x 4
    Last edited by a moderator: May 21, 2019
  8. Yee

    Yee Junior Member

    Joined:
    Dec 31, 2018
    Messages:
    141
    Likes Received:
    28
    Occupation:
    Software developer
    So basically you have two options build a flawless bot or be creative on page with Javascript
     
    Last edited: May 11, 2019
  9. DanDD

    DanDD Jr. VIP Jr. VIP

    Joined:
    Mar 10, 2014
    Messages:
    699
    Likes Received:
    178
    see above reply, cef headless is the same as others
     
  10. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    Oh, you mean Xvfb!

    Really interesting. I do think that would bypass the graphics card issue that we talked about. But I think it would still fail other tests that I'm in the process of covering, such as spoofing the user agent / incongruous browser flags.

    Thanks for sharing.
     
  11. DanDD

    DanDD Jr. VIP Jr. VIP

    Joined:
    Mar 10, 2014
    Messages:
    699
    Likes Received:
    178
    why mess with shitty automation tools, there are far better alternatives
    look for api's that use cef(chromium emblemmed framework)
    it has exact flags as a regular chrome since it automates chromium
    you got cefsharp for c#/.net and cefpython which is not heavily documented, there are other cef libraries in other languages but nothing comes close to cefsharp
    also save yourself the headache and use browserautomationstudio, it uses cef and its free
     
  12. patadeperro

    patadeperro Jr. VIP Jr. VIP

    Joined:
    Jul 5, 2011
    Messages:
    1,339
    Likes Received:
    539
    Home Page:
    I love this thread and I don't want to to think I am bashing it with my comment the contrary, it is VERY IMPORTANT to understand how this companies think and what their behaviours are, what information they expect and IMHO the easiest way to bypass all this is not by sending bot traffic BUT by tricking real people to take the action you want them to take, in that way they will get a real person just that nobody will ever know if the person you sent was interested or not, and since you get pay by click or by thousand impressions those are the only 2 metrics you need to increase, in other words if all the traffic you sent it is robotic traffic you will be out of the game pretty quick.
     
  13. peakaboo

    peakaboo Newbie

    Joined:
    May 8, 2019
    Messages:
    22
    Likes Received:
    5
    Gender:
    Female
    Do you want to post your test here so we can try to bypass it? :)
     
  14. Azure Spark

    Azure Spark Newbie

    Joined:
    Mar 22, 2019
    Messages:
    11
    Likes Received:
    5
    Gender:
    Male
    Unlikely, at least in the case of Puppeteer, considering that Chromium is open source and bots are highly lucrative, so botmakers would have probably noticed if they placed some detectable measure in the code.

    Puppeteer also automates Chromium. That isn't the problem. The real issue is all the subtle ways webmasters and adnets can tell whether a visitor is automated, from GPU settings to WebGL vendors. This affects all. Check this out: intoli com/blog/making-chrome-headless-undetectable/
     
    • Thanks Thanks x 2
  15. DanDD

    DanDD Jr. VIP Jr. VIP

    Joined:
    Mar 10, 2014
    Messages:
    699
    Likes Received:
    178
    Puppeteer uses Chrome Remote Debug Protocol and it is detectable unless very tweaked, like the code shared in pages below, a better approach is nightmarejs which needs only useragent change and will pass distil.
    Also companies mostly flag you when you get profitable and not the way around, cef is a swiss army knife, if you use it then there's not much to worry about.
    If you are giving them headaches($$$) then they will flag you no matter how good your bot is because at the end of the day it is still a bot, no real users interact that fast as a bot, and no real user stays every second on the website
     
  16. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    I appreciate the comment and the love! I agree that if 100% of your traffic is bot, you'll get banned right quick. Part of the point of this thread is to show that buying crappy traffic will get you banned eventually. Incentivized traffic / toolbar traffic works well because they are real people with real browsers viewing the ads.

    It's not just my test--I have a simple JS tracking pixel which works almost identically to everything you see on browserleaks.com. It's the verification vendors I speak of--Moat, WhiteOps, DV, IAS, etc. They all have technologies that track bot traffic as well.

    I'm happy to post some sample csv files I just don't know how to.

    This is ultimately what I'm getting at. If the visitors to your site, whether you buy them or they come organically, they need to act like they are people. Clicking on different parts of your site, scrolling around, highlighting text, spending time on each page. If a visitor had all of the parameters of a real visitor, but they didn't do any of these things, that would be a red flag for me.
     
    Last edited by a moderator: May 14, 2019
  17. TasDePixels

    TasDePixels Junior Member

    Joined:
    Mar 8, 2018
    Messages:
    192
    Likes Received:
    234
    Gender:
    Male
    Occupation:
    Software engineer
    Location:
    Morocco
    If you want to emulate "real" behavior it's manadatory that you avoid triggering honey-pots. (usually links which aren’t visible to a normal user but only to a spider. When a scraper/spider tries to access the link, the alarms are tripped.)
    you should also be smart on how you emulate a click for example. No normal user could or should be able to click inside the outer html selector of the link.
     
  18. DanDD

    DanDD Jr. VIP Jr. VIP

    Joined:
    Mar 10, 2014
    Messages:
    699
    Likes Received:
    178
    then browserautomationstudio is the best, real mouse emulation, browser fingerprints etc, and will save you developement time
     
  19. patadeperro

    patadeperro Jr. VIP Jr. VIP

    Joined:
    Jul 5, 2011
    Messages:
    1,339
    Likes Received:
    539
    Home Page:
    Two details, the first one is: I have been asking here if there is anybody who knows a good PPV (Pay per view) network, in the good old days the pay per view networks were basically places where you will bid on a domain or keyword and you will pay per impression, most of the times the traffic were people who installed toolbars or games that will redirect them to the site you wanted, Do you know any network like that today? I am asking you this because you commented on toolbar traffic and most of the people today when you ask about PPV they send you to video marketing (youtube, twitch or related platforms)

    Second thing I am planning to make a post about how networks detect fraudulent activity, specifically in the CPA arena, I will tag you on the comments to hear your feedback if you don't mind because I think we need to elevate the level of the discussion by sharing more in depth knowledge so we can be on the edge of the blackhat techniques.
     
    • Thanks Thanks x 2
  20. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    Totally true. Though I wonder how honeypots work on a large-scale, practical settings. I could see website owners using them, but verification companies I feel like would have trouble doing this on a large-scale.

    As I've tried to cover (and will do in the future) ad verification companies / ad networks are able to ID this using the methods I laid out.

    I think you're referring to a DSP? Can you clarify? In general, I try not to comment on specific products to buy b/c I don't want to break BHW rules.

    Please do! If you'd like you can send me a draft without formally posting it. Otherwise, I'll just comment on it if I have anything to add.