noellarkin
Senior Member
- Mar 14, 2021
- 989
- 1,469
Some observations on how anti-bot systems seem to be evolving.
I'm only going to talk about browser automation based bots on desktop computers, I have almost no experience with HTTP request bots or mobile automation.
- Behavioural detection
Bot actions are our main achilles heel, bots are often coded to post, comment etc very EFFICIENTLY, because they're automated, goddamnit and marketers want their ROI But yeah, number of requests/time is a major indicator of bot activity, most real humans just aren't that efficient. Slowing down your bots will do wonders for ban rates and shadowbans. Us marketers are also often completely clueless how the typical normie uses a website - - we're extremely goal-oriented, so we assume everyone will be as well. For example, most real users never even type in a URL into the address bar, they google the site's name instead, and click the first search result. Sucessful bots have these inefficiencies built into their behaviour.
- Traffic monitoring and analytics
If someone creates an account on a website, did they directly go to "site.com/new-account/free" and make the account or did they click through a link on some other site, for example, a recent viral social media post mentioning the site. Traffic source is a very easy way to catch bots.
- Entropy
Some things ought to be randomized, others not so much. Screen resolution shouldn't be randomized to change after every HTTP request. Your graphics card shouldn't change every time your bot visits a site. If an IP is residential, it shouldn't change with every session, noone moves around that much. Switching between too many ISPs is an issue as well. Entropy is fine in some cases, for example, font fingerprint may change if you update system fonts or use a different browser setting.
- Clustering
Account metrics fall into statistical clusters. For example, A website may have historical data about its user demographic being 60% from country A, 20% from country B, 9% from country C and 1% from country D. If it suddenly gets an influx of users from country D without any plausible explanation (see traffic monitoring and analytics) that's a red flag. This holds true for existing accounts as well. Every account may have a threshold text to hyperlink ratio of the amount of textual content the account has posted vs the number of hyperlinks it has posted. Easy way to check for a bot.
- Inconsistency
Browser class, OS class, timezone inconsistencies, DNS location not matching IP location, probably a thousand other things.
- Reputation Scores and Account Warmup
Reddit's a great example. Although the system is flawed, it does act as a form of bot deterrance.
- IPs: VPNs, Residential, Mobile Proxies
For one thing VPNs aren't all that bad, especially since they're becoming more popular with the general public. It's more an aggregate confidence score thing than a "this is a deal breaker" thing. There's a human rights angle to this as well, many platforms would be hesitant to ban or shadowban VPN users because these platforms may be used by reporters, journalists and whistleblowers. Of course, some platforms don't give a damn either way. Depends on the niche, of course, if you're trying to crack betting sites, well that's way above my pay grade tbh.
Residential proxies...depends on use case and rotation time. If you're using a residential IP and it refreshes every 3 minutes, that's a major red flag for the anti-bot, because you can't possibly be a real user and connecting to multiple residential connections every 3 minutes. Might as well use a VPN in cases like that.
Mobile proxies...the NAT system rotates and assigns IPs to different people so these IPs can't be 'blacklisted' but they can certainly be 'watched', especially if you're using a mobile proxy on a desktop browser (how often do people do that? most people use residential connections when using a desktop browser).
I've never used datacentre so can't comment.
- Monitoring vs Banning
It used to be the case that sites would ban a suspected bot. Now, they just monitor accounts that are suspected, gathering more and more behavioural data. ML has changed the game -- if you're running bots and they haven't been banned, that's no guarantee that you're doing things right.
Moving Forward.
That's a lot to take in. Here are some recommendations to build better bots.
1. Observe and emulate human behaviour: most people aren't marketers, most people just kill time on social media sites, they aren't proactive. Make your bots lazier and a little more aimless. Make them search around on google before clicking through to a site, make them avoid using things like the URL bar and type the damn thing into google search instead. When they're waiting for a page to load, maybe have them open up youtube in another tab for a few seconds before shifting back. (That last one was a joke but you see how it would be useful from a cookies POV).
2. Don't create unlikely scenarios: rotating residential IPs with every page refresh? Highly unlikely. IP says USA and system timezone says Indonesia? Highly unlikely. Simple things, none of this is rocket science but it all adds up.
3. Do things that most bots wouldn't do: Most bots don't make stupid ordinary posts like "um kinda lazy today wow". Most bots don't comment on a users post in a normal way, bots are compulsively CTA driven.
4. Update your scripts from time to time: don't keep running the same script from a year ago, make some changes and updates, avoid things becoming too detectable as a pattern across multiple accounts.
5. Read More: have a healthy respect for antibot systems, and bookmark their sites. Read the new research papers on bot detection. Log all settings used to create your bots and run them so when a batch of accounts you made get banned, you'll know why. Run your own rudimentary statistics on corelating bans and account action parameters.
6. Give your bots personalities: if your bots are going to make generic non-CTA posts every now and then, have them post stuff that's consistent. This makes it simpler for you to come up with different ideas for what the bot should write about. This also means a fair amount of database building and content scraping/spinning.
This isn't comprehensive by any yardstick. Would love to hear more suggestions and ideas to expand this further.
I'm only going to talk about browser automation based bots on desktop computers, I have almost no experience with HTTP request bots or mobile automation.
- Behavioural detection
Bot actions are our main achilles heel, bots are often coded to post, comment etc very EFFICIENTLY, because they're automated, goddamnit and marketers want their ROI But yeah, number of requests/time is a major indicator of bot activity, most real humans just aren't that efficient. Slowing down your bots will do wonders for ban rates and shadowbans. Us marketers are also often completely clueless how the typical normie uses a website - - we're extremely goal-oriented, so we assume everyone will be as well. For example, most real users never even type in a URL into the address bar, they google the site's name instead, and click the first search result. Sucessful bots have these inefficiencies built into their behaviour.
- Traffic monitoring and analytics
If someone creates an account on a website, did they directly go to "site.com/new-account/free" and make the account or did they click through a link on some other site, for example, a recent viral social media post mentioning the site. Traffic source is a very easy way to catch bots.
- Entropy
Some things ought to be randomized, others not so much. Screen resolution shouldn't be randomized to change after every HTTP request. Your graphics card shouldn't change every time your bot visits a site. If an IP is residential, it shouldn't change with every session, noone moves around that much. Switching between too many ISPs is an issue as well. Entropy is fine in some cases, for example, font fingerprint may change if you update system fonts or use a different browser setting.
- Clustering
Account metrics fall into statistical clusters. For example, A website may have historical data about its user demographic being 60% from country A, 20% from country B, 9% from country C and 1% from country D. If it suddenly gets an influx of users from country D without any plausible explanation (see traffic monitoring and analytics) that's a red flag. This holds true for existing accounts as well. Every account may have a threshold text to hyperlink ratio of the amount of textual content the account has posted vs the number of hyperlinks it has posted. Easy way to check for a bot.
- Inconsistency
Browser class, OS class, timezone inconsistencies, DNS location not matching IP location, probably a thousand other things.
- Reputation Scores and Account Warmup
Reddit's a great example. Although the system is flawed, it does act as a form of bot deterrance.
- IPs: VPNs, Residential, Mobile Proxies
For one thing VPNs aren't all that bad, especially since they're becoming more popular with the general public. It's more an aggregate confidence score thing than a "this is a deal breaker" thing. There's a human rights angle to this as well, many platforms would be hesitant to ban or shadowban VPN users because these platforms may be used by reporters, journalists and whistleblowers. Of course, some platforms don't give a damn either way. Depends on the niche, of course, if you're trying to crack betting sites, well that's way above my pay grade tbh.
Residential proxies...depends on use case and rotation time. If you're using a residential IP and it refreshes every 3 minutes, that's a major red flag for the anti-bot, because you can't possibly be a real user and connecting to multiple residential connections every 3 minutes. Might as well use a VPN in cases like that.
Mobile proxies...the NAT system rotates and assigns IPs to different people so these IPs can't be 'blacklisted' but they can certainly be 'watched', especially if you're using a mobile proxy on a desktop browser (how often do people do that? most people use residential connections when using a desktop browser).
I've never used datacentre so can't comment.
- Monitoring vs Banning
It used to be the case that sites would ban a suspected bot. Now, they just monitor accounts that are suspected, gathering more and more behavioural data. ML has changed the game -- if you're running bots and they haven't been banned, that's no guarantee that you're doing things right.
Moving Forward.
That's a lot to take in. Here are some recommendations to build better bots.
1. Observe and emulate human behaviour: most people aren't marketers, most people just kill time on social media sites, they aren't proactive. Make your bots lazier and a little more aimless. Make them search around on google before clicking through to a site, make them avoid using things like the URL bar and type the damn thing into google search instead. When they're waiting for a page to load, maybe have them open up youtube in another tab for a few seconds before shifting back. (That last one was a joke but you see how it would be useful from a cookies POV).
2. Don't create unlikely scenarios: rotating residential IPs with every page refresh? Highly unlikely. IP says USA and system timezone says Indonesia? Highly unlikely. Simple things, none of this is rocket science but it all adds up.
3. Do things that most bots wouldn't do: Most bots don't make stupid ordinary posts like "um kinda lazy today wow". Most bots don't comment on a users post in a normal way, bots are compulsively CTA driven.
4. Update your scripts from time to time: don't keep running the same script from a year ago, make some changes and updates, avoid things becoming too detectable as a pattern across multiple accounts.
5. Read More: have a healthy respect for antibot systems, and bookmark their sites. Read the new research papers on bot detection. Log all settings used to create your bots and run them so when a batch of accounts you made get banned, you'll know why. Run your own rudimentary statistics on corelating bans and account action parameters.
6. Give your bots personalities: if your bots are going to make generic non-CTA posts every now and then, have them post stuff that's consistent. This makes it simpler for you to come up with different ideas for what the bot should write about. This also means a fair amount of database building and content scraping/spinning.
This isn't comprehensive by any yardstick. Would love to hear more suggestions and ideas to expand this further.