1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Python + Selenium + PhantomJS -- Or is there a better way for Python Browser Automation?

Discussion in 'General Programming Chat' started by eveneven, Jun 23, 2015.

  1. eveneven

    eveneven Regular Member

    Joined:
    Oct 6, 2013
    Messages:
    264
    Likes Received:
    109
    Hi,

    I'm currently learning Python, and I've been looking into browser automation with Selenium (+scraping with Scrapy and BS). I would like at some point to build some bots, i.e. simple Twitter, Pinterest, scraping, tumblr etc. bots to automate some mundane tasks (vs. using VAs).

    Is this the best way? I don't want to start off in the wrong direction.


    Thanks!
     
  2. Pohmx

    Pohmx Newbie

    Joined:
    Feb 10, 2015
    Messages:
    22
    Likes Received:
    6
    I've worked with it. It's probably the best and fastest way to build bots.
    You should look into JS too. there are more automation frameworks you can find [​IMG]
     
  3. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    974
    Likes Received:
    680
    Occupation:
    Web/Bot Developer
    Python with Selenium or PhantomJS/CasperJS are great for browser automation. You can even run a headless version of Selenium. However, in a lot of cases I find that NodeJS with CasperJS runs faster and is easier to scale up.
     
  4. dadiaar

    dadiaar Newbie

    Joined:
    Mar 7, 2013
    Messages:
    31
    Likes Received:
    9
    Occupation:
    Ecommerce
    Location:
    China
    Those tools are good, but... you will need more than that, just know it if you keep forward.

    The first thing is that selenium has the habit to wait until all the content from a website has loaded, and this is a waste of RAM and time.

    The second point, is that really necessary for what you want? Maybe the only thing you need is the cookie, and later manage everything with urllib and ajax calls. From Chrome, once pinterest has loaded, press Ctrl+Shft+i and scroll down, you will see in the Network tab the Ajax call (they add content without reloading the page).

    Last push in the right direction. If you want to run it on a Linux headless server, forget PhantomJS as many sites can detect it. Use genuine Firefox binary files instead. This is the correct way to start from:

    Code:
    from pyvirtualdisplay import Display
    from selenium import webdriver
    from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
    
    display = Display(visible=0, size=(1366, 768))
    
    # Define profile_path and binary_path
    profile = webdriver.FirefoxProfile(profile_path)
    firefox_binary = FirefoxBinary(binary_path)
    browser = webdriver.Firefox(firefox_profile=self.profile, firefox_binary=firefox_binary)
    
    browser.get(url)
    #...
    Don't expect good results the first week, good luck ;)

    P.D. I have just tried Pinterest, and it seems they load the info with several chain-ajax requests, so just use selenium better. Here you have the screenshot
     

    Attached Files:

    • Thanks Thanks x 3
  5. weedsmoker

    weedsmoker Junior Member

    Joined:
    May 2, 2011
    Messages:
    190
    Likes Received:
    79
    I didn't had much luck with selenium, also it's pretty slow. Something like Firefox + Mozrepl suited me better (or Marionette). Easy to customize, communicates through sockets, you write client in any language you want, and customize Mozrepl by your needs. Selenium was also bitch to use with vpn (HMA), just claiming ports right and left for self while wrecking havoc, throwing exceptions for every single shit, my code was endless try/catch blocks, fuck that.
    Also phantomjs is nice, but has issues with evaluating some JS code (for example click events), you can test code in FF, Chrome console but it won't work in phantomjs. It's Qt related problem because I had similar situation with Qt webkit bindings.
     
    • Thanks Thanks x 2
  6. duriangray

    duriangray Newbie

    Joined:
    Jul 22, 2014
    Messages:
    5
    Likes Received:
    0
    Location:
    California
    I've also been playing around with the Selenium/phantom/casper. Had to look a lot at the documentation. PhantomJS pissed me off cuz it can get slightly buggy when spoofing headers.

    Selenium i had trouble getting to work with some proxies that required authentication.
    I only started about a week ago. At this point I see the best way to be injecting JS and firing all the events myself - mouse movements, click events, hover, focus, scroll, key presses etc.
     
  7. rajanant

    rajanant Registered Member

    Joined:
    Sep 16, 2015
    Messages:
    62
    Likes Received:
    10
    This set of tools is quite good, but when you need switching between lots of proxies (especially with authentication) - there is now easy solution. For the task I described Qt based browser is much useful.
     
  8. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    974
    Likes Received:
    680
    Occupation:
    Web/Bot Developer
    What language/framework are you using with Qt?
     
  9. rajanant

    rajanant Registered Member

    Joined:
    Sep 16, 2015
    Messages:
    62
    Likes Received:
    10
    I prefer using python. For example spynner is good enough (although it has no fresh updates).
    Also you can try c++ but I have no experience with it.
     
  10. kahuna74

    kahuna74 Regular Member

    Joined:
    Aug 19, 2014
    Messages:
    270
    Likes Received:
    102
    Gender:
    Male
    Occupation:
    Software Developer
    Location:
    Grand Rapids, MI
    I've used casperjs and it was pretty nice, especially if you're comfortable with Javascript.
     
  11. revproxy

    revproxy BANNED BANNED Jr. VIP Premium Member

    Joined:
    Nov 20, 2015
    Messages:
    396
    Likes Received:
    100
    Gender:
    Male
    I wrote scraper at work.
    i check all the options from phantomjs, javafx, jcef...
    i chose Qt / QtWebKit... best performance and less limits
     
  12. NullReferenceX

    NullReferenceX Newbie

    Joined:
    Dec 1, 2015
    Messages:
    41
    Likes Received:
    83
    Occupation:
    Programmer
    Location:
    Germany
    I've written a plugin for Ubot using selenium, it does work but i wouldn't recommend it to anyone. It works but you will have allot small issues you would need to solve along the way. Just learn the HTTP protocol and you'll be better of, although it might take you longer to build a bot it will be much faster and uses less system resources.