Bots - Behavioural and Feature Detection

noellarkin · Jul 15, 2022

Some observations on how anti-bot systems seem to be evolving.
I'm only going to talk about browser automation based bots on desktop computers, I have almost no experience with HTTP request bots or mobile automation.

- Behavioural detection
Bot actions are our main achilles heel, bots are often coded to post, comment etc very EFFICIENTLY, because they're automated, goddamnit and marketers want their ROI

But yeah, number of requests/time is a major indicator of bot activity, most real humans just aren't that efficient. Slowing down your bots will do wonders for ban rates and shadowbans. Us marketers are also often completely clueless how the typical normie uses a website - - we're extremely goal-oriented, so we assume everyone will be as well. For example, most real users never even type in a URL into the address bar, they google the site's name instead, and click the first search result. Sucessful bots have these inefficiencies built into their behaviour.

- Traffic monitoring and analytics
If someone creates an account on a website, did they directly go to "site.com/new-account/free" and make the account or did they click through a link on some other site, for example, a recent viral social media post mentioning the site. Traffic source is a very easy way to catch bots.

- Entropy
Some things ought to be randomized, others not so much. Screen resolution shouldn't be randomized to change after every HTTP request. Your graphics card shouldn't change every time your bot visits a site. If an IP is residential, it shouldn't change with every session, noone moves around that much. Switching between too many ISPs is an issue as well. Entropy is fine in some cases, for example, font fingerprint may change if you update system fonts or use a different browser setting.

- Clustering
Account metrics fall into statistical clusters. For example, A website may have historical data about its user demographic being 60% from country A, 20% from country B, 9% from country C and 1% from country D. If it suddenly gets an influx of users from country D without any plausible explanation (see traffic monitoring and analytics) that's a red flag. This holds true for existing accounts as well. Every account may have a threshold text to hyperlink ratio of the amount of textual content the account has posted vs the number of hyperlinks it has posted. Easy way to check for a bot.

- Inconsistency
Browser class, OS class, timezone inconsistencies, DNS location not matching IP location, probably a thousand other things.

- Reputation Scores and Account Warmup
Reddit's a great example. Although the system is flawed, it does act as a form of bot deterrance.

- IPs: VPNs, Residential, Mobile Proxies
For one thing VPNs aren't all that bad, especially since they're becoming more popular with the general public. It's more an aggregate confidence score thing than a "this is a deal breaker" thing. There's a human rights angle to this as well, many platforms would be hesitant to ban or shadowban VPN users because these platforms may be used by reporters, journalists and whistleblowers. Of course, some platforms don't give a damn either way. Depends on the niche, of course, if you're trying to crack betting sites, well that's way above my pay grade tbh.
Residential proxies...depends on use case and rotation time. If you're using a residential IP and it refreshes every 3 minutes, that's a major red flag for the anti-bot, because you can't possibly be a real user and connecting to multiple residential connections every 3 minutes. Might as well use a VPN in cases like that.
Mobile proxies...the NAT system rotates and assigns IPs to different people so these IPs can't be 'blacklisted' but they can certainly be 'watched', especially if you're using a mobile proxy on a desktop browser (how often do people do that? most people use residential connections when using a desktop browser).
I've never used datacentre so can't comment.

- Monitoring vs Banning
It used to be the case that sites would ban a suspected bot. Now, they just monitor accounts that are suspected, gathering more and more behavioural data. ML has changed the game -- if you're running bots and they haven't been banned, that's no guarantee that you're doing things right.

Moving Forward.
That's a lot to take in. Here are some recommendations to build better bots.

1. Observe and emulate human behaviour: most people aren't marketers, most people just kill time on social media sites, they aren't proactive. Make your bots lazier and a little more aimless. Make them search around on google before clicking through to a site, make them avoid using things like the URL bar and type the damn thing into google search instead. When they're waiting for a page to load, maybe have them open up youtube in another tab for a few seconds before shifting back. (That last one was a joke but you see how it would be useful from a cookies POV).
2. Don't create unlikely scenarios: rotating residential IPs with every page refresh? Highly unlikely. IP says USA and system timezone says Indonesia? Highly unlikely. Simple things, none of this is rocket science but it all adds up.
3. Do things that most bots wouldn't do: Most bots don't make stupid ordinary posts like "um kinda lazy today wow". Most bots don't comment on a users post in a normal way, bots are compulsively CTA driven.
4. Update your scripts from time to time: don't keep running the same script from a year ago, make some changes and updates, avoid things becoming too detectable as a pattern across multiple accounts.
5. Read More: have a healthy respect for antibot systems, and bookmark their sites. Read the new research papers on bot detection. Log all settings used to create your bots and run them so when a batch of accounts you made get banned, you'll know why. Run your own rudimentary statistics on corelating bans and account action parameters.
6. Give your bots personalities: if your bots are going to make generic non-CTA posts every now and then, have them post stuff that's consistent. This makes it simpler for you to come up with different ideas for what the bot should write about. This also means a fair amount of database building and content scraping/spinning.

This isn't comprehensive by any yardstick. Would love to hear more suggestions and ideas to expand this further.

0xxi3 · Jul 15, 2022

Don't behave like a bot, behave like a human - is a great advice but it's way too generic. Like, I don't exactly understand what the point of this thread is? Bot protection is often developed by an internal team and differs from site to site hence there is no one size fits all solution. You have listed good general advice but that is not going to be enough to tackle big websites that everybody wants to bot on so if you want to go into specific details then we should be talking about specific websites.

A thing that I would add is that it's always up to you to test a website and see how far you can get. No bot protection is perfect hence there may be ways to interact with certain websites through, let's call it, vulnerabilities, requests that are not covered by bot protection. That was the case for Instagram for a while when simply due to how many APIs they've had developed for different platforms, they forgot that a certain API was publicly accessible and that allowed you to scrape the website without being logged in. Or, another mistake on their part, you could copy over your mobile app cookies which (I'll explain it like this for simplicity) received a lower spam score by default but allowed you to make requests through the web API, in this way you could execute many more actions on an account before getting blocked. These issues have been addressed since but there are likely many more holes in their system and such things are true for all platforms simply because of their complexity.

mindeswx · Jul 15, 2022

That's some good basic guidelines, I would just like to add that it heavily depends on the website.
Of course securities are much tighter on the 'top' websites like google or facebook, but guess what- it doesn't really matter what you do - they still know that you are using automation 100% and only allowing you to do that to keep the content shitting machine turning.
The smaller websites usually don't have much securities, besides captchas and basic x actions/hour limit let alone ML algorithms to classify users.

My suggestion is - Start with the very least safety from your side (no delays between clicks/requests, no mouse movements, no typing speed randomization, disabled webgl, crappy proxies)
and ONLY if your accounts are not surviving - start slowing down, and adding the 'fancy' stuff.
Stay safe, stay automated.

noellarkin · Jul 15, 2022

0xxi3 said:
Like, I don't exactly understand what the point of this thread is? Bot protection is often developed by an internal team and differs from site to site hence there is no one size fits all solution. You have listed good general advice but that is not going to be enough to tackle big websites that everybody wants to bot on so if you want to go into specific details then we should be talking about specific websites.

As you said, it's general advice. I thought of all the things that I didn't know when I started botting and wanted to list out some general clarifications.
A lot of site-specific advice isn't going to be shared on public forums anyway, methods shared usually get patched soon after.

mindeswx said:
My suggestion is - Start with the very least safety from your side (no delays between clicks/requests, no mouse movements, no typing speed randomization, disabled webgl, crappy proxies)
and ONLY if your accounts are not surviving - start slowing down, and adding the 'fancy' stuff.

This is excellent - - IMO marketers tend to do this naturally (trial and error) when learning to code bots, but to make it a concious strategy is a great tip.

todordonev · Jul 16, 2022

Nice post OP.
I think, the stuff you posted is employed only by the major social media websites and there is a very good reason for it - it costs a lot of money to bake in anti-bot protection. It's not a "top layer" like Cloudflare which you can slap on your site for a few bucks a month.

duke1921 · Jul 17, 2022

Nice Analysis OP!

aurim · Jul 17, 2022

Nice input on the subject, especially the 'Behavioural detection' part, this is getting more and more a serious threat for bots.

Lukmat · Jul 17, 2022

You will never win with AI anti-bot systems, they are learning constantly and they will close to 100% detected bots.
Some positive, only big G, Facebook, maybe TikTok can afford to teach their AI
Reddit, Twitter or smaller social medias will never close to them.

Secondly, there is new way for botting - auto websites with AI. GPT-3 and newer models are better and better, such good that Google will have a huge, unbeatable problem

Bots - Behavioural and Feature Detection

noellarkin

Senior Member

0xxi3

Regular Member

mindeswx

Regular Member

noellarkin

Senior Member

todordonev

Power Member

duke1921

Senior Member

aurim

Senior Member

Lukmat

Elite Member

Main Menu

Marketplace

Making Money

BlackHat World