Discussion in 'C, C++, C#' started by Taegn, Nov 25, 2016.
Selenium PhantomJS seems to be the best.
Everybody has its favourite but yeah selenium works best for me too
Selenium is amazing but the underlying FireFox Build is garbage. Everyone and their mom knows how to detect a browser that is running off a FireFox ESR build which is how selenium even works at all.
It works for 80% of sites but for the other 20% you'll need to use a Chrome based browser automation platform.
Really? Are you not exaggerating slightly? Have read that phantom is easy to detect without question (although can be countered via JS). But not that FF is more detectable than chrome. Studies into detecting selenium I have read have detected chrome and ff equally well.
Unless there is something new (haven't looked into it for a while) but googling found nothing and if "everyone and their mom" knows, then it would be easy to find.
Most automation can be detected now. Distil networks and many others detect selenium and phantom through js and header analysis. Having said that you'll probably find whatever you're planning on automating will ban you based on activity rather than browser/software.
You could try fiddler .Net also .......
Correct me if I'm wrong, but doesn't selenium use a real browser?
Also you can set all the headers on PhantomJS, if you do it carefully you can fully mimic Firefox or Chrome. Just capture some legit FF or Chrome traffic and apply the same headers.
The issue I see with PhantomJS is it crashes all the time.
Correct it does use a real browser but I believe there are various js hooks that can be enumerated
So client JS on web pages can check for selenium js hooks to see whether the browser is being run automated?
That's my understanding of it yes
You guys are close but not quite, Firefox has different versions one being ESR (Extended release) and release (mainstream firefox). They are practically the same but ESR is always one version behind and with each ESR version say the "ZennoPoster team" has to update their bot to manipulate the new Firefox ESR build.
As Google and other website are fixing up their spam filters they are detecting and actively blocking Firefox ESR builds/other hardcoded footprints by selenium/phantomJS/Zennoposter/Ubot/etc.
The good old fashioned cat/mouse game.
Maybe @nuaru can explain this in simpler terms?
If you have UBot Studio, there is always a group of folks active and talking on skype about using intelligent methods to avoid detection. Sometimes for UBot Studio, sometimes for something else. You will want to consider joining.
In general, I would worry less about what will be detectable and what you will actually be able to do. Selenium seems easiest for new users (besides UBot Studio obviously =D) if you're building an automation engine from scratch. Not sure why people would do that tho...
Phantom is easy to detect, there are a lot of details online of how to do it. It uses / or used a very old version of webkit and the headers were sent in a unique way.
Also you could crash the JS and get phantomjs strings, you could detect specific JS properties.
Firefox / chrome on the other hand use the INSTALLED browser on your machine. So headers are sent exactly as the browser. selenium firefox = normal firefox.
As for JS detection, I am not aware of any. There are no exposable selenium JS code to detect, so I don't think it is as easy as it is made out to be above. I would guess someone is passing on 2nd hand information as opposed to personal experience.
But as stated some companies are detecting selenium with a high probabilty. Distil being one of them. I read about this briefly and no one is really sure. They talk of taking a finger print of the browser and behaviour. Perhaps timings, perhaps JS events. When you call a mouse event in JS, did you trigger of other mouse events expected to occur before?
If your mouse clicks a button, there should be lots of DOM enter / exit / hover events. Could be a metric. Bypassable by automating a real mouse though
However, perhaps is possible but certainly is not simple that every one and their grandma knows. That is false - until I am shown evidence to the contary.
For automation of a browser your best bet is still selenium with firefox / chrome.
No you cannot set the header order. You can set the headers, but when they are sent, in what order you cannot.
examples (not real but the idea of the problem)
i.e. firefox would send:
whereas phantom sends
Easy to identify
The way around this is to proxy the calls, then modify the header order in the proxy. I Cannot remember the tools you could use (was a while ago I looked) but maybe was haproxy, proxifier?? might be wrong there tho
That may be how zenno works, but that isn't how selenium works. You do not have to use the ESR build. It will use the installed version of the machine.
install firefox v45 and selenium will use v45. Install v40 and selenium will use that,
Since v45 you would need to use marionette as FF broke the ffwebdriver. FF are pushing forward with teh wire protocol (how selenium works) and not waiting for selenium to keep up.
As most here I use PhantomJS when I need a headless browser. There is others (not headless) like Imacros .Net Component, CefSharp and Selenium.
PhantomJS could be easily detected, here is one test that I did using Fiddler to gather information. Only when using PhantomJS Tumblr sends me that function to verify my browser.
There really are no quality headless browsers if you want to automate websites, they all have their problems.
The best you are going to do is selenium running with XVFB which will appear headless. You can also give it it's own mouse to automate and containerise it to run multiple instances on a single machine
I see. But checking header order is a really fragile method, cuz you can install addons to Firefox that change the headers. If sites started filtering based on this, it would break a lot of browsers.
How many people run plugins to randomise the header order?
But as I said, the above was from memory of the "type" of things you can do to detect phantomJS. There are a few more and you can with 100% accuracy detect it (according to the reports). Some are protectable with JS coding to remove, but things like header order would be recompiles of phantomjs or webkit and I think some old QT they are / were using.
The point being, if you want to be undetectable in your automations, phantom is not really the way to do. If you are running tests on your website it might make sense, but what with phantom bugs you would be better with selenium
Separate names with a comma.