1. This website uses cookies to improve service and provide a tailored user experience. By using this site, you agree to this use. See our Cookie Policy.
    Dismiss Notice

[GUIDE] How Ad Networks (and everyone else) Knows You're A Bot

Discussion in 'Making Money' started by AdvertisingGuy, May 9, 2019.

  1. patadeperro

    patadeperro Jr. VIP Jr. VIP

    Joined:
    Jul 5, 2011
    Messages:
    1,337
    Likes Received:
    539
    Home Page:
    I don't know what you mean by DSP, the thing I am talking about this, on some Pay per view networks you would bid on a domain name or a keyword and pay accordingly, example you would bid on a site like "google.com" .5 cents, and then enter the domain you wanted to show (your doman example yahoo.com), the network would have several people using their software and when the user will type google.com they would charge you .5 cents and the user would see yahoo.com, that was basically it there were no bot networks, they were open and running and this was the advertising people "would pay" for downloading background images, video games etc....

    When I was talking about tagging you on my thread I was not talking about a paid product, I was talking about an open thread like this were you could jump in and share your experience since you seem to know the subject to complement what I was writing.
     
  2. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    I'm not sure I've heard of that but it sounds interesting. A DSP is a product that lets you buy banner / video / native ad space. So then, when someone visits Yahoo.com, and you can bid on a banner ad to show on Yahoo.com. My point about not recommending products is that I wouldn't recommend a specific DSP, that's all.

    I would love to, truly. I'm totally up for it. Send me an advanced copy, or tag me in a post and I'll comment for sure.
     
  3. TasDePixels

    TasDePixels Junior Member

    Joined:
    Mar 8, 2018
    Messages:
    192
    Likes Received:
    234
    Gender:
    Male
    Occupation:
    Software engineer
    Location:
    Morocco
    "BIg websites" like FB or LinkedIn are pretty smart to figure out whether ur bot actions stinks or not. I remember getting temporary search ban on multiple times on facebook because i skipped stuff like hovering before clicking or accessing built links ( /search?params ) without encoding the url or omiting referal link param.
     
  4. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    BOT FLAG #4 -- Missing / Invalid Plugins Media + Devices

    Have you ever talked to someone heavy into the Network Security space about JavaScript? Before long, all topics of conversation sound like the ratings of an insane person.

    What's hilarious about this, outside of seeing some pasty weirdo get more and more red in the face is that it's all completely true! Companies actually can see what media devices and browser plugins, and the ad networks actually use this info not just to send marketing messages, but also to identify bots.

    Plugins
    The Navigator Property in JavaScript will return a "PluginArray" object which is basically a list of all of the browser plugins you have on your browser. If you were to check this in Chrome Developer tools (CTRL + SHIFT + I, for those following along at home), it'll look something like this:

    upload_2019-5-13_10-15-56.png

    Obviously, whats contained in the array will depending on the actual plugins on your browser. But if this image were represented by a value it would look like the below:

    For valid Chrome browsers, it'll look something like this:
    Code:
    [{"description":"Portable Document Format","filename":"internal-pdf-viewer","length":1,"name":"Chrome PDF Plugin","mimeTypes":[{"description":"Portable Document Format","suffixes":"pdf","type":"application\/x-google-chrome-pdf"}]},{"description":"","filename":"mhjfbmdgcfjbbpaeojofohoefgiehjai","length":1,"name":"Chrome PDF Viewer","mimeTypes":[{"description":"","suffixes":"pdf","type":"application\/pdf"}]},{"description":"","filename":"internal-nacl-plugin","length":2,"name":"Native Client","mimeTypes":[{"description":"Native Client Executable","suffixes":"","type":"application\/x-nacl"},{"description":"Portable Native Client Executable","suffixes":"","type":"application\/x-pnacl"}]}]
    
    And for valid Edge Browsers, it'll look something like this:
    Code:
    [{"description":"Portable Document Format","filename":"","length":1,"name":"Edge PDF Viewer","mimeTypes":[{"description":"Edge PDF Viewer","suffixes":"pdf","type":"application\/pdf"}]}]
    
    There are a few things worth noting here. First, the format of these plugins matter--for the Edge browser, you have a array with a dictionary which contains a key whose value is another array with another dictionary. When analyzing plugins for bot signatures, ad networks will confirm if the plugins match up with "normal" browser plugins and if the format of these plugins are correct. If they're not, it'll register as bot.

    Broadly, browser plugins aren't used that much so most plugins will be one of a few (let's say 100) or so combinations. And if it's outside of one of these combinations, it might be a human being, but if you plugin is just a string that says "plugins go here lol" then that'll get flagged.

    Media Devices
    Media devices work similarly. Through the WebRTC protocol, it will return an array that contains the media devices on a user's computer. An example of that is below:

    WebRTC media devices:
    Code:
    [{"deviceId":"default","groupId":"072ad3d8838fe3142b8b3deda747e19d53efa2c36b094556134e8b2ce21d49cf","kind":"audiooutput","label":null},{"deviceId":"communications","groupId":"072ad3d8838fe3142b8b3deda747e19d53efa2c36b094556134e8b2ce21d49cf","kind":"audiooutput","label":null},{"deviceId":"012cd203cbcb5495c2a4d4893d9a42481a6e4da141aaa878aa441ca156551000","groupId":"834ab8506e0809d2dd8ec962fbbc2c9d6afc1361c9b31079ca89ed47fa3577b2","kind":"audiooutput","label":null},{"deviceId":"75fcaf117e91fd14661137d1c0e775b49de922f08439d8a98c2ceef5435a1b42","groupId":"14bfd1b8c542ab2059c20ec1ad42be92910602c241ddda3ee97b2333e437d619","kind":"audiooutput","label":null},{"deviceId":"8f804409ef25cb511e726fe91e3a8c555d4d6bcee85e3537d9eb95620cc45640","groupId":"072ad3d8838fe3142b8b3deda747e19d53efa2c36b094556134e8b2ce21d49cf","kind":"audiooutput","label":null}]
    
    Once again, this is pretty specific. The above entry gives the audio output / inputs on the computer. Additional features of this library will let an ad network see if you have a camera, a microphone, and even the local IP address of the computer loading the JavaScript. Pretty crazy. Of course, this all contributes to the "human-ness" of the visitor on the site. If they have valid audio / video inputs / outputs, proper device IDs, camera, microphones, etc., then it's much more likely they're a human being than if they don't.

    So to sum up--if a visitor doesn't have valid browser plugins or valid media devices, it's likely a bot.

    PostScript:
    Much of what I'm writing about is for desktop visitors because:
    • Desktop visitors generally pay more than mobile
    • There's more user data available on desktop
    • And I just think it's more interesting.
    But for mobile, even if a mobile device has JavaScript enabled, mobile devices don't have plugin information or media devices available. When these devices runs JavaScript, they'll return empty arrays. But they won't return blanks. If JavaScript is enabled on the mobile device, blanks means bot traffic--empty arrays means the code is working exactly as intended.
     
    • Thanks Thanks x 6
    Last edited: May 13, 2019
  5. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    BOT FLAG # 5 -- Browser Spoofing / Incongruous Browsers

    The response to these posts have been gratifying and appreciated--it’s not easy to write all of this, after all. That being said, there are still some that have commented in this thread that think that there are still ways to alter your browser, using frameworks such as Selenium or even Nightmare.js to pass bot verification filters. The technology used to identify spoofed browsers, in my research / experience, indicates that this not possible.

    This might be the most important bot flag on this thread. In my experience, the three most common instances of bot traffic to a website are:
    • Data Center IPs
    • Automated Browsers
    • Spoofed Browsers / Incongruous Browsers
    The last one is probably the hardest to spot. According to the Internet Advertising Bureau, the industry trade body that sets standards for online advertising, there are two types of Invalid Traffic: General Invalid Traffic, and Sophisticated Invalid Traffic. Spoofing the browser is definitely in the sophisticated camp. One can spoof the browser by changing the browser's user agent.

    So what is a user agent?

    HowToGeek does a fine job at doing this, so I won't try to do any better:

    We can identify a visitor's user agent through the navigator property of JavaScript. To demonstrate this, open up the developer console (CTRL + SHIFT + I again) and type in navigator.useragent. In my case, it's "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36".

    upload_2019-5-14_9-36-43.png

    So what does this tell us? Through the use of a user agent parsing library, the user agent says that I'm on a Windows 10 desktop computer connecting to the Internet through Chrome version 74 or(74.0.3729.131). All true! You can see this online by going to https://www.whatsmyua.info/.

    Now, I were using a mobile device, connecting through (let's say: Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3679.0 Mobile Safari/537.36), a website owner could also tell the device, up to the brand (Samsung) and the model (Galaxy S5). Test it out in whatsmyua.info and see.

    So what does this have to do with spoofing / changing the user agent?

    If you're using a browser automation framework such as Selenium, you can technically change the user agent of the browser, as I do in the code below.

    Code:
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    
    firefox_useragent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0' # new user agent
    
    options = Options()
    options.add_argument(f'--user-agent={firefox_useragent}') # changing the user agent
    browser = webdriver.Chrome(executable_path="chromedriver.exe", chrome_options=options) # adding it to the webdriver
    
    Here, I am using a Chrome WebDriver but changing the user agent to Firefox and opening up my automated browser. And my screenshot below shows that it is indeed a chrome browser and it's got a Firefox User Agent. It works!

    upload_2019-5-14_9-49-16.png

    That being said, if I were to use this spoofed webdriver, and assuming I changed the navigator.webdriver property to "" or false, I would still get flagged as having an incongrouous browser. And there are a couple methods how.

    Fonts
    Web browsers / operating systems only have certain fonts available. You ordinarily wouldn't see MacOS proprietary fonts on a Windows Chrome device. And you probably wouldn't see Linux-only fonts on a Mac OS device. FontFaceObserver library "recreates a span element with a known font family, measures its width, sets the target font family, and measures the width again. If there is a difference in the width, you’ll know that the font has rendered and is thus available." Pretty slick! You can read more about it here.

    Below is a list of fonts I have available on my computer:

    Code:
    ["Arial","Arial Black","Arial Narrow","Arial Rounded MT Bold","Book Antiqua","Bookman Old Style","Calibri","Cambria","Cambria Math","Century","Century Gothic","Century Schoolbook","Comic Sans MS","Consolas","Courier","Courier New","Georgia","Helvetica","Impact","Lucida Bright","Lucida Calligraphy","Lucida Console","Lucida Fax","Lucida Handwriting","Lucida Sans","Lucida Sans Typewriter","Lucida Sans Unicode","Microsoft Sans Serif","Monotype Corsiva","MS Gothic","MS PGothic","MS Reference Sans Serif","MS Sans Serif","MS Serif","Palatino","Palatino Linotype","Segoe Print","Segoe Script","Segoe UI","Segoe UI Light","Segoe UI Semibold","Segoe UI Symbol","Tahoma","Times","Times New Roman","Trebuchet MS","Verdana","Wingdings","Wingdings 2","Wingdings 3","Agency FB","Algerian","Baskerville Old Face","Bauhaus 93","Bell MT","Berlin Sans FB","Berlin Sans FB Demi","Bernard MT Condensed","Blackadder ITC","Bodoni MT","Bodoni MT Black","Bodoni MT Condensed","Bodoni MT Poster Compressed","Bookshelf Symbol 7","Bradley Hand ITC","Britannic Bold","Broadway","Brush Script MT","Californian FB","Calisto MT","Candara","Castellar","Centaur","Chiller","Colonna MT","Constantia","Cooper Black","Copperplate Gothic","Copperplate Gothic Bold","Copperplate Gothic Light","Corbel","Curlz MT","Ebrima","Edwardian Script ITC","Elephant","Engravers MT","Eras Bold ITC","Eras Demi ITC","Eras Light ITC","Eras Medium ITC","Felix Titling","Fixedsys","Footlight MT Light","Forte","Freestyle Script","French Script MT","Gabriola","Gigi","Gill Sans MT","Gill Sans MT Condensed","Gill Sans MT Ext Condensed Bold","Gill Sans Ultra Bold","Gill Sans Ultra Bold Condensed","Gloucester MT Extra Condensed","Goudy Old Style","Goudy Stout","Haettenschweiler","Harlow Solid Italic","Harrington","HELV","High Tower Text","Imprint MT Shadow","Informal Roman","Jokerman","Juice ITC","Kristen ITC","Kunstler Script","Magneto","Maiandra GD","Malgun Gothic","Marlett","Matura MT Script Capitals","Microsoft Himalaya","Microsoft JhengHei","Microsoft New Tai Lue","Microsoft PhagsPa","Microsoft Tai Le","Microsoft YaHei","Microsoft Yi Baiti","MingLiU_HKSCS-ExtB","MingLiU-ExtB","Mistral","Modern","Modern No. 20","Mongolian Baiti","MS Reference Specialty","MS UI Gothic","MT Extra","MV Boli","Niagara Engraved","Niagara Solid","NSimSun","OCR A Extended","Old English Text MT","Onyx","Palace Script MT","Papyrus","Parchment","Perpetua","Perpetua Titling MT","Playbill","PMingLiU-ExtB","Poor Richard","Pristina","Rage Italic","Ravie","Rockwell","Rockwell Condensed","Rockwell Extra Bold","Roman","Script","Script MT Bold","Showcard Gothic","SimSun","SimSun-ExtB","Small Fonts","Snap ITC","Stencil","Sylfaen","System","Tempus Sans ITC","Terminal","Tw Cen MT","Tw Cen MT Condensed","Tw Cen MT Condensed Extra Bold","Viner Hand ITC","Vivaldi","Vladimir Script","Wide Latin"]
    
    Now, that's a lot of fonts. But they're Windows fonts--you can tell with all of the MS fonts there, and the total lack of Mac-specific fonts, such as "AVENIR", "Apple Chancery","Apple Color Emoji","Apple SD Gothic Neo" and others. If you see a Windows Chrome browser with only the ["MONO"] font enabled, then that's a bot for sure, because real visitors have more than one basic Linux(!) font available. A Mac will have Mac fonts. Android phones only have a handful of fonts available.

    Most ad networks have pretty sophisticated algorithms to determine what browsers / operating systems should have which fonts enabled. And if a font is detected that is proprietary to another operating system or browser, that will get flagged 100% of the time. This is probably the easiest way to ID spoofed browsers.

    But let's say you're able somehow to change the font fingerprint. That doesn't get around advanced browser fingerprinting techniques to ID browsers.

    Advanced Fingerprinting

    I borrowed liberally from the adtechmadness blog, and he does a better job of identifying breaking it down than I could. I’m not going to quote all of it, but much of what I write below is lifted from there, with a few edits from me:

    In 2016, Google released “Picasso: Lightweight Device Class Fingerprinting for Web Clients“. Picasso is a system that allows a server to identify the device class of a web client. A device class is defined as the combination of browser, OS, graphics hardware. That is, Picasso is not intended to identify unique web visitors or specifically bots, but rather distinguish, with high certainty, between different devices classes. The basic principle behind it is to utilize the graphic rendering system of a device as a physically unclonable function. i.e., The output of a web browser graphics such as canvas, is depends on many different layers, from hardware (GPU), to low level software (GPU driver, OS rendering) to high level software (OS and library provided graphics API). This makes the output highly unique per device class, and allows accurate differentiation between them.

    This capability is important in the context of bot detection, as many bots lie about their underlying technology within the user agent string in order to appear legitimate and get targeted with high paying ads.

    The basic principle behind it is to utilize the graphic rendering system of a device as a physically unclonable function. i.e., The output of a web browser graphics such as canvas, is depends on many different layers, from hardware (GPU), to low level software (GPU driver, OS rendering) to high level software (OS and library provided graphics API). This makes the output highly unique per device class, and allows accurate differentiation between them.

    From the Picasso White Paper itself:

    And keep in mind, this is publicly available information released nearly three years ago. Who knows what kind of crazy shit ad networks (and Google in particular) know about but they haven’t even released. It’s probably even more sophisticated.

    So long story short / TD;DR:

    1. Google (and other companies) have an essentially unbeatable way to identify spoofed browsers / user agents.
    2. If it's a spoofed browser, then it's almost certainly a bot, because why else would anyone spoof their browser?
    3. Therefore, every visitor that has a spoofed browser will get flagged as a bot.
     

    Attached Files:

    • Thanks Thanks x 5
  6. pepefrog

    pepefrog Newbie

    Joined:
    Oct 5, 2018
    Messages:
    46
    Likes Received:
    5
    Hey so you with all the complexities you have described with using bots is this something that can work without bots? Or is bots the only way to get enough traffic for the CPM? Thanks for the thread.
     
  7. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    I assume you mean buying CPC traffic to send visitors to a site, and then monetize those users with CPM ads, right?

    There are hundreds of sites (likely thousands) that do CPC arbitrage and make a profit doing exactly this. But you can't buy bot traffic, because that will get your account banned and your money withheld. As I've illustrated in my posts, they have methods on how to ID bot traffic and you'll almost definitely get caught.
     
  8. pepefrog

    pepefrog Newbie

    Joined:
    Oct 5, 2018
    Messages:
    46
    Likes Received:
    5
    I was not aware this was a thing. Is there some sort of list of networks to figure out which has the lowest rates to buy traffic? I'm assuming the best way to do this would be to send a lot of low quality traffic but wouldn't you get more profit from sending people from places like the US for a better CPM? Or are there networks with some sort of fixed cpm?
     
  9. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    There are lots and lots of places to buy traffic from. At this time I can't discuss it too much, but broadly the more you spend, the more likely the traffic is to be vetted / human. Adwords is very expensive, whereas a lot of services that are sub 0.01 CPC are mostly bot. I'm writing this guide to help ID non-human traffic.

    Maybe I'll write a guide to CPC arbitrage as well.
     
    • Thanks Thanks x 1
  10. peakaboo

    peakaboo Newbie

    Joined:
    May 8, 2019
    Messages:
    22
    Likes Received:
    5
    Gender:
    Female
    Looking forward to your second [GUIDE] :)

    Those posts that you've written are very technical and hard to implement on our own. In the end - can you suggest an existing open-source filter that works?
     
  11. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    You mean an open-source bot filter? If so, those don't exist. At least I haven't seen one.

    You could train an open-source analytics platform to ID bots--does Matomo/Piwik offer something like this?

    I had a freelancer code up a javascript tracking pixel for a few hundred bucks. You could feasibly do the same thing.
     
    • Thanks Thanks x 1
  12. marcodada

    marcodada Newbie

    Joined:
    Feb 6, 2019
    Messages:
    46
    Likes Received:
    3
    Thanks for the info man, if you would just edit the post it would be really better I thought first you didnt update.
    That information gives me ideas about few things.
     
  13. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    I don't think I can edit the post. Am I wrong here?

    I'm happy to add links to the other bot flags to the first post, but I thought that Flag #1 implied that there would be additional flags after that.
     
  14. peakaboo

    peakaboo Newbie

    Joined:
    May 8, 2019
    Messages:
    22
    Likes Received:
    5
    Gender:
    Female
    You can ask Mods to edit the post. Perhaps adding links to your other flags into the first post would be helpful for people to follow up
     
    • Thanks Thanks x 1
  15. tether

    tether Newbie

    Joined:
    Apr 14, 2019
    Messages:
    41
    Likes Received:
    4
    thanks, great posts. Waiting for your cpc abitrage guide :)
     
  16. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    I have to write it first!
     
  17. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    BOT FLAG #6 -- BAD / NO / IMPROPER Referrer URLs

    Let's take a look at a website with data from two visitors: Visitor A, Visitor B.

    Everything about this visitor is the same--meaning they are both coming from real computers with correct user agents, valid gpus, residential IP addresses, basically everything in this guide I said won't trigger a flag for bots.

    Based on the two screenshots, which one is more likely to be a bot?

    [​IMG]

    If you guessed Visitor A, then you guessed right. The reason? Look at the referrers.

    First, let’s talk about the parameters. What are we looking at here?

    window.location.href / document.referrer

    Like most of the things on these threads, these parameters are collected using JavaScript. window.location.href will return the exact page that you’re visiting. document.referrer will return the page you originated from.

    To test, in your web browser (Chrome / Safari / Firefox / etc.), hit (ALT + SHIFT + I) on your keyboard. Go to the “console” tab and type in window.location.href. You should get a response like the below:

    [​IMG]

    Also, type in document.referrer.In my example below, I clicked on the link from my BHW member page, hence why I have that document.referrer. You’ll probably have something different (maybe from the main page or the making money forum?).

    [​IMG]

    But what if you just copy / paste the link to this thread instead of clicking on it? Try it, and type in document.referrer again.

    [​IMG]

    You’ll get a blank response! So why is this a bot indicator?

    On-Site Browsing Referrers:

    In JavaScript, a referrer URL only is collected when a visitor actually clicks on a link on a page. Think about it, when you're reading your favorite website, do you ever remember a time that instead of clicking on a link that you were interested in, you right-clicked, copied the link, pasted it into your browser, and hit enter? Hardly ever, right? Most of the time, you would click on the link. Ad networks and ad verification companies know this, and expect a long string of referrer URLs. Someone like Google, that has very sophisticated fingerprinting / cooke-ing abilities through their Google Analytics product, has even MORE data. Hence if your website has a ton of weird referrers, it doesn’t matter if the visits pass all other flags, this will get flagged too!

    To explain further, there is a natural progression of a visitor going to a site. Let's say you're visiting ESPN -- a popular sports site. You go to the home page, and then if you're a basketball fan, you would click on NBA, then on the menu drop-down you would select "Home." Bringing you to their NBA coverage. From there, you might click on the main story of the day: Magic: Lakers GM Pelinka was 'backstabbing'.

    Once you're done reading that story, because ESPN has an infinite scroll site, it will dynamically generate the next page based on your interests.

    So we have four URLs here. Three of them should have a valid referrer URL, except for the homepage, which was a direct visit.


    [​IMG]

    What would it look like if it’s a bot:

    [​IMG]

    Look ma, no referrers!

    The long and the short of it is, if you have visitors that aren't clicking or navigating through the page like a normal human being would, then that's likely to be a bot.

    Incoming Site Referrers:
    Perhaps the most important referrer to a page is the first one. How does that visitor get to the page? If you're a Tier-1 website, then maybe 25%-50% of the visitors to your site will be direct--meaning without any kind of referrer. For the rest of us mortals, your traffic will be through different acquisition channels--organic search (SERPs), paid search, social (Facebook, Twitter, Instagram, Pintrest), Email, Native / paid acquisition (Taboola, Outbrain, Flipboard, Pocket) or non-social site referrers. If you have a steady stream of traffic coming direct, or from the exact same referrer, then it's much more likely for the traffic to be bot. It should be diversified.

    Each of these sources has their own bot flags, truly. Imagine that you're getting a lot of email traffic--do the visitors actually log into their email pages, click on the links and get to your page? If not, it's suspicious and if a verification vendor such as Moat (or the Big G itself) IDs this, they are likely to flag it. After all, it's rare that someone just pastes in an email referrer url over and over again.

    Is it a twitter referrer from an account like 5 followers? Again, it’s weird. Same thing with Facebook. If you’re using Outbrain for instance, each click on an Outbrain link will have their own type of referrer url format. Not only does the format need to match, it also needs to actually go to the location that you click on.

    If you click on an Outbrain link, all of their outgoing links follow the same pattern:

    http://paid.outbrain.com/network/redir?p={}&c={}&v=3

    Where the first {} is a huge ID string which will re-direct the click to the proper page, and the second which is the campaign ID.

    Code:
    http://paid.outbrain.com/network/redir?p=b42NnxEikqzugMTOxGFzTDd6_hOnrfSdLs8yeRLF_OTqjvOkBeRK952ftG8hB-KA4jw3jA0ZNUi8meoWKH1YieIRUJTZszeG-uWGhKp0iYEdw8U97Hj_LIuMRPmohe7fKtWFDxiKdWisqQqvhD8IVqdELgVOozAkKX3lXsg6su62saSL8IuWvzE0daO1inPnbQmpH4WVTTg0MWH3g9DQS_IK6GfD5FsjATXL4ecs8ou2p1rd3nedyALEPDvYf6BvZwzByN3GOVyc9ueOpwpCuE_sUsZdyDwaD4XXMThsoYfC1WjMVwSinfAg4Ulcm4PR2PzfUtllBuAtQA8E0GWX8jn-bGMLZSWnEnfNQDKWdZKnvivaY1zs69KWrU6OuFwuRHP3YQ_pVYkFzCjWAYR_2ov2pv1hASE_f2T8LLRM-vMmian6Bm5_3eTF_cR7zJhZEGQNwFMPkxghOV3mekPzD7sOaUU4Kin52Wdu3QikPK2BH8vSdyHYS69aT9lYt-F_I4zEATKcLmJd-KH-iuRnlmehyHL802z5Fpy-FKh3R3YGow3dTA24KKSzJ7wmden8cFvhqkRCzxitkp51j7aI8ctbjCU5qQMmNLRUaptBVta5EyIIZmezkC0ztoXG7xO0EBQt6ka9CQm7h9Yv0xRXiJ2oWRVnZsswNxkO-d2pNefEXpuTRm_BUwZeD8pV972rRN_JyReC2bTRmPJHhkCvI__E-Ibq3-0xfMNlo5Q2-tulW0426lJlJGg8KEInO4y8vJrz6ISNlvmo1FahBooJkndIWSUmoshUDi2b0jwahZCczBISI_IPu6YFNM9yH1X3m3WJmWak4vVqSGE2Xl-mMhfrkINoBdZM7VYs54QpXJSxsuUm1UIQ449IHnm6qnTlf4pIjGHXxoxHvJGurnaFeCd5dowJRqssG3Z44uKUw3pgI2nltTh39CgLSO51NjQLXAz2CIJbDJFRZQxTQFFfu22gfQ1RX2gZxNBZK3diCEWWyCR55Kdty2Qrl4kNotJSw2OhF-RlwDVc8ER_3ouNJdJeYm5fznmttfI5ktlhvxmvlgH_467a7oTyXSluhOfCqerrFnJvanOFTaOyx27EeKSHwfsPS1-cW7yLdKwlzZVup7xJQXuZz7hbfnJwjpzUdAFiKwEDx7RmPPjAgS6vow&c=5e1aa7b6&v=3
    
    Code:
    http://paid.outbrain.com/network/redir?p=_BruuEWFSRgoaRjN_i3pRWU2WQ3OcONu2RrkS0CBuJtgEikDQrl-t2b-ISMciTRhD-gLRpFSZm_GOj3Oke1MQHqedFfkbzWC60JEWaAj-fsIu9k3L-MrolwJnMZ4DG4A1JsMqsfAxwVP7BmrQAvwS2iToBwYoFo1gmYYeZZ2Bk-FMiNB5D7cgs-lK9MMRu8Kvvj9aWdj5ZsV5OZ_ed_wBiVD4EwwVHCH2FLku-Wk2F7EXWmX3jYotL790CdEAGb5oTZpkI8n7rM-6XSHFmZH-QY6g8Fhv5YHj9qll6PaZGlV2IOpkAUstXO7t6WYg6yhSdzrFIm3LLZ8Dtu-lM5sP_ZFuLalBho878GRATvQlsQnvK7xEPG591M1XdWeDYv8nDV_moAIwhEoxBlbMzvZNxZF6duR_uZnV4eO2Uvancso7VPQQym1hwxBt6ONdBIy06KPOP7_CiSPTTgdabBFmLtCt2U64A6v0JIPC5zsmJyXEbdqkClp0QoLtD_dqVDBK75F0UNjtKgTPN45L9C8FnFEjpKIlLDfXOmh-wSzcUjaMliEFxMSsfgLO9-lvbhG0MHWn17tK9_WtD02Hyk2bHIoAC79jvLb8pm4ry5zMA9yfRxUOYc__BpAcv81mteoqmsiNBI-6uIKEqM5Alliok0QqQOvvqfT7UIQ1GlmoKid0FVihEruAE-qNhkzLpa4xdrM2qcvgOJmXMKiKtj-Yh2A-2eEwYF_t8dMuzRreMs_N-7b558eQmqNNbOa1jadGEh0B6ic6ttgkUoQyQhFkg9nd93mdlvhupuwqjH-KN1bCzAqFEkDJwDy_DJPGGP4Sz2FfrIc_B2kF1WB2W0cyyMJbY3FefopH-PO06ysKtv83DuzMf4fgRl--jvPqDfvZmQxlZNSQ8mgw3FXL4729LsxbrIeKKvUzGiHYR1rbH_R6Bl2nKwzwhnqsTVgexUcuDs0pDO44CjK2tGUc4EXAxtJO5AiRuff8ET8ZpVTLw9OxNen2hO8qRsvlzSbuHtcC02iO-_yUjhKW0tGQZWyXTG6jsSKU2z3FPQ1OFqF6bsSk1yaA9fpeZkxPAhyZh4nq0546HIsvABG9_O1us4vRnBLD3-sbOnFCE1l69iicvbI9T2oXiZX6R3EmHFv0MJDOnYVR0VjHT1IkQnXlIOJ3Q&c=c3ec8d4f&v=3
    
    If you change one letter in that initial query string, and then try to paste the link into your browser. It’ll break and won’t redirect properly. Now, maybe a clever bot traffic operator could spoof this above URL, but if someone started really looking at the referrers, it probably won’t work long-term. I’m sure most ad networks will check these referral URLs from time to time to make sure they’re active and working properly. And if they’re not? Then it’s probably bot and you're buying bot traffic!

    Impossible Referrers
    It's also possible via a honeypot technique to sent a bot to a page that's otherwise impossible to access unless you were a bot and finding links to the page. Meaning that this url has no actual referrer and is impossible to know about or get to unless you were a bot scraping links and visiting said links. This is a little hard to set up and not as effective as looking at every visit, but it is possible. Someone on this thread mentioned it, so I wanted to include it here as well.

    Anyway, make sure that the referrers of the visitors coming to your site are legit / normal. Else it will get flagged as a bot!
     
    • Thanks Thanks x 1
  18. naskootbg

    naskootbg Senior Member

    Joined:
    Nov 8, 2010
    Messages:
    824
    Likes Received:
    266
    Home Page:
    Selenium can run on your default chrome you are using together with its cookies, history and plugins. You can show real refferer just navigating it before visiting the main page instead setting it fake. There have a plugins that blocking web leaks. But all this making the software for automation heavy and I'm sure still can be detected on wrong step. Somehow hidden captcha still detecting me even much easier for solving.

    HTTP requests - well this is the best way, because sending requests direct to the server and not render javascript direct traps, but it is damn hard to manipulate the big sites that way. This kind of botting can easy be detected as well if not made really proffesional. Stealth http bot must reproduce each and every request like redirects, google analytics, other javascripts, to capture hidden tokens (some of them loaded in random time), to handle cookie changes and so on. It must reproduce each and every header exact same as the botted site do. So imagine add click with no result in analytics or wrong cookies/unix timestamp/token - fake ofcource.
     
  19. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    Thanks for the response. I am trying to highlight as many wrong steps as possible.

    RE: The referrer, you would have to navigate to that page AND click the link. It's enough enough to ID the link and then visit the URL. It would need to be clicked. And it would need to be clicked in a reasonable amount of time--it's not like you can visit the page and then two seconds later ID / click the link. Many links on the page aren't clickable and then Seleneium will throw an error.

    If you're not rendering the JavaScript on the page, you're most likely a bot. And you're almost certainly not loading ads--this guide is mostly aimed at ad networks / advertising. Again, some vendors don't care and they'll let you scrape to your heart's content (pro-football-reference.com). Others are much more strict when it comes to scraping.

    I appreciate the comment regardless.
     
  20. AdvertisingGuy

    AdvertisingGuy Junior Member

    Joined:
    May 8, 2019
    Messages:
    143
    Likes Received:
    145
    Just did it. Thanks to @WizGizmo for hooking it up, and to peakaboo for the clever idea. I've got a few more flags coming but these are the majority of them.
     
    • Thanks Thanks x 3