Selenium and Playwright both detected by Tripadvisor's bot detection tools

picsou_k6

Newbie
Joined
Apr 8, 2024
Messages
3
Reaction score
1
Hello there,

i have been working for couple of months on a project to get data from TripAdvisor.

I am able to easily get comments from any facilities using a location ID, but I still get a lot of trouble to login automatically using either Selenium or Playwright with python.
When I try to access the website, I get this:

2024-04-08_23-56.png

I am using IP rotation from a lot of different website (Iproyal, Brightdata, Smartproxy). I guess the problem comes from my browser fingerprint or/and IPs I am using and I haven't found a workaround..

My main objective is to get the cookies so I can make API calls.

Amy idea ? :D
 
Playwright and selenium dont do a lot for your fingerprint, as it for the most part remains the same throughout all of the sessions. I mostly use plugins like playwright-extra and their stealth addon and playwright-afp(I am the plugin author) that was created just for the purpose of changing the browser fingerprint. Another way is using gologin or some similar service, that would give you the option of creating browsers with unique fingerprints
 
Did you tried Undetected chrome driver/ no driver from ultrafunkamsterdam?
Nop I didn't I will try this right now :)
Playwright and selenium dont do a lot for your fingerprint, as it for the most part remains the same throughout all of the sessions. I mostly use plugins like playwright-extra and their stealth addon and playwright-afp(I am the plugin author) that was created just for the purpose of changing the browser fingerprint. Another way is using gologin or some similar service, that would give you the option of creating browsers with unique fingerprints
Yes I was currently looking at gologin solution, but I'm still trying to figure out how to make the API works. I will also check plugins you recommend, thanks!
In the meantime this code is producing this error:

Python:
import time
from sys import platform
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from gologin import GoLogin
from gologin import getRandomPort

# random_port = get_random_port() # uncomment to use random port

gl = GoLogin(
    {
        "profile_id": "jolly-darkness",
        "token": "my.very.very.secret.jwt.token",
        # "port": random_port
    }
)

if platform == "linux" or platform == "linux2":
    chrome_driver_path = "./chromedriver"
elif platform == "darwin":
    chrome_driver_path = "./mac/chromedriver"
elif platform == "win32":
    chrome_driver_path = "chromedriver.exe"

debugger_address = gl.start()
chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", debugger_address)
driver = webdriver.Chrome(executable_path=chrome_driver_path, options=chrome_options)
driver.get("http://www.python.org")
assert "Python" in driver.title
driver.close()
time.sleep(3)
gl.stop()

But I get this error:

Bash:
downloadProfileZip
b'<html>\n<head><title>307 Temporary Redirect</title></head>\n<body>\n<center><h1>307 Temporary Redirect</h1></center>\n<hr><center>nginx/1.25.4</center>\n</body>\n</html>\n'
extracting profile
exception File is not a zip file
uploadEmptyProfile
createEmptyProfile
no proxy
empty profile name
profile= {'statusCode': 400, 'message': [{'target': {'id': 'jolly-darkness'}, 'value': 'jolly-darkness', 'property': 'id', 'children': [], 'constraints': {'isLength': 'id must be longer than or equal to 24 characters'}}], 'error': 'Bad Request', 'profile_id': 'jolly-darkness'}
 
You are using the wrong profile id. The id is not the name, click on 3 dots in the profile you want to start and then go to copy profile id
 
Playwright and selenium dont do a lot for your fingerprint, as it for the most part remains the same throughout all of the sessions. I mostly use plugins like playwright-extra and their stealth addon and playwright-afp(I am the plugin author) that was created just for the purpose of changing the browser fingerprint. Another way is using gologin or some similar service, that would give you the option of creating browsers with unique fingerprints
any version for c#?
 
am using IP rotation from a lot of different website (Iproyal, Brightdata, Smartproxy). I guess the problem comes from my browser fingerprint or/and IPs I am using and I haven't found a workaround..
Your IPs aren't the reason. If you are using default playwright and selenium configurations on a standard chromedriver install,it will expose a cdp_ref_id key and some other devtools protocol artifacts.

These artifacts can be tested by the site with javascript to check if CDP is enabled to access the site. That's most likely why you are getting detected.

Use an modded chromedriver like https://github.com/ultrafunkamsterdam/undetected-chromedriver with the above stealth plugins mentioned by @paleksic

Accessing from an undetected chromedriver will most likely change your fingerprints but run it from a different IP and on a different machine if you can just to be safe.
 
Your IPs aren't the reason. If you are using default playwright and selenium configurations on a standard chromedriver install,it will expose a cdp_ref_id key and some other devtools protocol artifacts.

These artifacts can be tested by the site with javascript to check if CDP is enabled to access the site. That's most likely why you are getting detected.

Use an modded chromedriver like https://github.com/ultrafunkamsterdam/undetected-chromedriver with the above stealth plugins mentioned by @paleksic

Accessing from an undetected chromedriver will most likely change your fingerprints but run it from a different IP and on a different machine if you can just to be safe.
But Playwright specifically doesn't use chromedriver only websockets,

I had something similar where when I ran a raw chromium browser to access a site everything was okay, but the minute I initialized the same browser using Playwright especially and tried to access the same site, I got CAPTCHA'd immediately

There is something obvious within the Playwright initialization process which makes it clear that the browser is a bot, I doubt any enterprise bot detector relies on navigator.webdriver as it's a very rudimentary check that can easily be bypassed

But I don't know, maybe through some command-line argument that gets passed, or through the way playwright sends messages to the browser, they are able to pick up on something odd and deny access based on that

Some poor guy had a similar problem to me and to OP and got shot down by the repo owners when he tried to ask them about it https://github.com/microsoft/playwright/issues/12538

Anyways the above had me thinking and now I'm curious, how could we check via JS or DOM whether the browser is running in debug-mode or whether CDP is enabled? As from the little I know I thought it wasn't possible to deduce the flags that a browser was initialized with or its debug status as JS wont have scope access
 
If anyone offers tripadvisor scraping as a service that works in August 2024 please ping me a DM
 
Back
Top