Python + Selenium Leaving Footprints?

Discussion in 'Programming' started by TiagoS, Jun 10, 2017.

  1. TiagoS

    TiagoS Regular Member

    Joined:
    Jul 5, 2014
    Messages:
    404
    Likes Received:
    223
    Hey guys! I'm developing a bot that scrapes prices from a specific website and i'm running into a problem. Somehow, the website is able to tell that I'm using selenium/chrome webdriver. It displays higher prices when I scrape trough my bot and when I search manually, it shows fake/higher prices. (I've done a search and it is confirmed that the website does that, other people mentioned it). Here what I have tried so far:

    - Clearing cookies/cache
    - Random delays
    -Changing User agent
    -Using proxies

    I'm looking for ideas of what I might be doing wrong, what kind of footprints am I leaving? What's even worse is that sometimes it works and sometimes is does not (it works 4 out of 10 times).

    I am even willing to pay if someone manages to come up with a solution, if that's the case (Pm me if you can).
     
  2. GoDesain

    GoDesain Regular Member

    Joined:
    Feb 26, 2011
    Messages:
    481
    Likes Received:
    208
    That new for me.. can you share the url with me ?
    will try to check..
    about clear cookies and chache (start private mode):
    Code:
    profile = webdriver.FirefoxProfile()
    profile.set_preference("browser.cache.disk.enable", False)
    profile.set_preference("browser.cache.memory.enable", False)
    profile.set_preference("browser.cache.offline.enable", False)
    profile.set_preference("network.http.use-cache", False)
    Random delay:
    Code:
    from random import randint
    from time import sleep
    sleep(randint(10,100))
    Random useragent:
    Code:
    from fake_useragent import UserAgent
    from selenium import webdriver
    profile = webdriver.FirefoxProfile()
    profile.set_preference("general.useragent.override", UserAgent().random)
    profile.update_preferences()
    Proxy :
    Code:
    profile = webdrive.FirefoxProfile()
    profile.set_preference('network.proxy_type',1)
    profile.set_preference('network.proxy.http',"xxx.xxx.xxx.xxx")
    profile.set_preference('network.proxy.http_port',3128)
    profile.update_preference()
     
    • Thanks Thanks x 3
  3. LostLife

    LostLife Regular Member

    Joined:
    May 12, 2017
    Messages:
    265
    Likes Received:
    292
    Gender:
    Male
    Occupation:
    Software Engineer
    Are you sure website is not changing prices randomly? Please check manually first for 20-30 times. Please make sure deleting cookies are successful. Can u share the url please?
     
  4. living2xl

    living2xl Elite Member

    Joined:
    Dec 9, 2011
    Messages:
    1,810
    Likes Received:
    438
    Occupation:
    Sippin dat juice - Shout it louder!
    Location:
    Not sleeping!
    Home Page:
    why not put a filter that scrapes the same url x times and discards those scrapes with prices higher than x times the median
     
  5. Google Prince

    Google Prince Jr. VIP Jr. VIP

    Joined:
    Dec 24, 2015
    Messages:
    184
    Likes Received:
    138
    Location:
    Google's Search Engine
    It's most likely Chromedriver, which is known to leave footprints.

    Change the webdriver to firefox, that might help.
     
  6. jamie3000

    jamie3000 Elite Member Premium Member

    Joined:
    Jun 30, 2014
    Messages:
    2,018
    Likes Received:
    931
    Occupation:
    Owner of BigGuestPosting.com
    Location:
    uk
    Home Page:
    https://www.distilnetworks.com are able to detect selenium, not sure if they ever elaborated on how. Think it was through some sort of js probing.
     
  7. jamie3000

    jamie3000 Elite Member Premium Member

    Joined:
    Jun 30, 2014
    Messages:
    2,018
    Likes Received:
    931
    Occupation:
    Owner of BigGuestPosting.com
    Location:
    uk
    Home Page:
  8. akio_daichi

    akio_daichi Newbie

    Joined:
    Jul 4, 2017
    Messages:
    2
    Likes Received:
    0
    Gender:
    Male
    Can you share the site's url? I think this site is not using distill-networks services, because they block you out immediately.
     
  9. pasdoy

    pasdoy Senior Member

    Joined:
    Jul 17, 2008
    Messages:
    863
    Likes Received:
    274
    • Thanks Thanks x 1
  10. pasdoy

    pasdoy Senior Member

    Joined:
    Jul 17, 2008
    Messages:
    863
    Likes Received:
    274
    • Thanks Thanks x 1
  11. Tozzy

    Tozzy Regular Member

    Joined:
    Nov 26, 2015
    Messages:
    428
    Likes Received:
    125
    Gender:
    Male
    Location:
    World
    It is not so hard to detect auto browser even with plain JS and no rocket science.
    Who knows how this or that website does it but one can instantly think of 10 or 20 ways to maintain this.
     
  12. B.Shahin

    B.Shahin Junior Member

    Joined:
    Dec 10, 2016
    Messages:
    108
    Likes Received:
    17
    Gender:
    Male
    It is the 1st time I see some thing like this
    I am using selenium for fb, and it is working for months with no error
    I am also using chromedriver!
    So just following here to see if there is some thing required to be added!
     
  13. jamie3000

    jamie3000 Elite Member Premium Member

    Joined:
    Jun 30, 2014
    Messages:
    2,018
    Likes Received:
    931
    Occupation:
    Owner of BigGuestPosting.com
    Location:
    uk
    Home Page:
    You can always recompile phantom and other headerless browsers from source if you can find what js enumeration is used for detecting
     
    • Thanks Thanks x 1
  14. MrPenguin

    MrPenguin Junior Member

    Joined:
    Mar 10, 2010
    Messages:
    114
    Likes Received:
    18
    Use Firefox. Install the extension random user agent. This will randomize alot of different settings. Should work.
     
  15. rootjazz

    rootjazz Jr. VIP Jr. VIP

    Joined:
    Dec 21, 2012
    Messages:
    922
    Likes Received:
    395
    Occupation:
    Developer
    Location:
    UK
    Home Page:
    Lots of things could be at play.

    When you use the browser normally (and the website will know (if they chose to know) what is normal behaviour within some standard deviations). Time reading / clicking etc.
    Also, JS events will not be triggered, no mouse action could be a flag against you, just programmtically clicking on things would not give a realistic even pattern. Should be able to check this in a browser debug app.

    I've had to automate some sites where had to control the actual mouse in order to get around the detections - and then move the mouse in a realistic way (you can look at curve generation / bezier curves
     
    • Thanks Thanks x 2
  16. rolax

    rolax Registered Member

    Joined:
    Dec 26, 2016
    Messages:
    62
    Likes Received:
    9
    did you just have some movement of the mouse.. or did the mouse actually navigate to the pixel location and hover over the buttons you wanted to click?
     
  17. itz_styx

    itz_styx Power Member

    Joined:
    May 8, 2012
    Messages:
    686
    Likes Received:
    359
    Occupation:
    CEO / Admin / Developer
    Location:
    /dev/mem
    Home Page:
    there is a good fingerprinting method that i wont name here (no need for whitehats to get ideas). that same method also allows for detecting what type of browser you are using i.e. selenium.
     
  18. BloodyNinja

    BloodyNinja Power Member

    Joined:
    Oct 28, 2013
    Messages:
    624
    Likes Received:
    617
    Location:
    Deeptown
    I will only say here that no one has a good solution to this yet.
    There are some known or less known half-measures but a robust solution doesn't exist.
    Most people in this topic have no idea about the problem.
     
  19. Lothric

    Lothric Regular Member

    Joined:
    Apr 25, 2017
    Messages:
    204
    Likes Received:
    50
    This
    UserAgent().random looks not good enough. I just tested them and they spits out garbage like chrome 29, 41.. it doesn't make sense to me. If they use real world stats, they should return version like 59, at least 57, am i missing something here?