1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

The new age of bot writing: Headless Web Browsers

Discussion in 'Black Hat SEO' started by bartosimpsonio, Jun 14, 2015.

  1. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    12,463
    Likes Received:
    11,167
    Occupation:
    CHEAP
    Location:
    DATASETS
    Home Page:
    Are you still running bots written in Python, C#, C++ or Perl? Worse yet, are you still running bots written in.....(gasp) VB??

    Time has come to update your arsenal!

    This short piece is just a few tips I'd like to share here to get you started in headless browser programming. But first things first.

    What the heck is a headless browser?

    Well it's just what the name implies. It's a fully featured web browser with no graphical interface. It's the guts of a web browser but without all the fluff and bloating that's required for end users. You're black hats after all, you don't need no pretty interface!

    How do I use a headless browser?

    You tell it to do stuff and it does stuff pretending to be a real human. Right now how do bots work? They are independent HTTP stacks built by bot writers so the bot will pretend to be a human. But this is really lacking these days. Google itself has a graphical bot crawling the web, and they can "see" your website. They even rank your stuff based on where you put the content, ads and so on(which IMO has created great distortions, but anyway).

    Real black hats automate everything! So if you find yourself doing stuff by hand, you program a browser to do it for you. Of course you could use some macro or scripting package for Firefox, but that's a big waste because the browser takes over 1 GB of memory just to show up on your screen. If you're scripting 1000's of bots, you'll need one of NASA's computers to do it. With headless browsers you have tiny bots that act just like a real browser.

    OK, so how do I get started?

    (Before you do anything, I'll say : if you're not using Linux for your Black Hat work yet, then you are entirely out of the loop. I do a lot of browsing and run some softwares on Windows, but all my programming and most black hat botting is done on Linux.)

    You need a headless browser you can program. These are some popular ones:



    After you're familiar with the browsers themselves, you can try a higher level library such as :




    PhantomJS
    Right now PhantomJS is the hottest one, but it's quickly getting "too popular" and smart website owners have started to detect and fight it. So you may want to try out SlimerJS. One of the really cool features of SlimerJS is it's capable of taking snapshots. So if you wondered how folks make those thumbnail images of sites, or full page capture of sites, in bulk then SlimerJS is your answer.

    Here is PhantomJS quick start guide. With 2 lines of Javascript you have a running web browser on your command line!

    Create the script and run it with
    Code:
    phantomjs
    command after you install the package. For example if you save your script to
    Code:
    testing.js
    you run it with
    Code:
    phantomjs testing.js
    With PhantomJS you can load jQuery and do everything to a loaded page you'd do IF YOU WERE IN the page. You can load w w w .yo urco mpetitorhere .c om and then run YOUR JQUERY scripts in their page!

    Here are examples of automating interaction with loaded pages with PhantomJS.


    SlimerJS

    Getting started with SlimerJS is just as simple. Here is the quick start guide to get you rolling.

    Again with SlimerJS the loaded web page is now 100% under your control for botting and scripting. From the documentation : "Once a web page is opened, you may need to execute a javascript function into the context of the web page, in order to retrieve data or to manipulate the page content."


    What do I do with this?

    If you haven't realized the potential here yet, then you're probably in the wrong business!!

    First of all Javascript is becoming the de-facto language for the WWW. Now all your bots can be written using Javascript and using REAL browsers. With these headless browsers you can make link harvesters, automated thumbnail generators, automatic posting and publishing on social media, everything a human can do on a web browser can be done using a headless browser! It is unlimited WWW scripting and automation potential - anything a human can do, your script and do too.

    Well, this has been barto's sunday morning Black Coffee Black Hat momento. I hope you find this useful! Cheers!
     
    • Thanks Thanks x 23
  2. Asif WILSON Khan

    Asif WILSON Khan Executive VIP Jr. VIP

    Joined:
    Nov 10, 2012
    Messages:
    12,124
    Likes Received:
    33,652
    Gender:
    Male
    Occupation:
    Fun Lovin' Criminal
    Location:
    London
    Home Page:
    Dude, you must have read my mind, this is a subject that I have just been doing some research on. Thanks for the thread.
     
    • Thanks Thanks x 2
  3. Hawkster

    Hawkster Jr. VIP Jr. VIP

    Joined:
    Jun 22, 2013
    Messages:
    3,504
    Likes Received:
    3,719
    Gender:
    Male
    Occupation:
    Listen to everyone - Follow no-one
    Location:
    UK
    Home Page:
    Excellent read thanks buddy
     
    • Thanks Thanks x 2
  4. CrackFantasy

    CrackFantasy Junior Member

    Joined:
    Jun 22, 2014
    Messages:
    166
    Likes Received:
    77
    so this cant be implemented on windows?
     
  5. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    7,282
    Likes Received:
    8,251
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
  6. 9to5destroyer

    9to5destroyer Jr. VIP Jr. VIP

    Joined:
    Nov 14, 2011
    Messages:
    359
    Likes Received:
    206
    These guides are pretty good for getting started with selenium. It includes guides for selenium ide for firefox which is a good start for people who haven't done much automation and web driver guide which you can use ghostdriver
    (PhantomJS)
    http://www.guru99.com/selenium-tutorial.html
     
    • Thanks Thanks x 2
  7. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    12,463
    Likes Received:
    11,167
    Occupation:
    CHEAP
    Location:
    DATASETS
    Home Page:
    Yup! Can be done on any platform. I use Linux, because all these tools are created on Linux and it just feels "natural" to work with them under a UNIX.
     
  8. RoseWinters

    RoseWinters Newbie

    Joined:
    Jun 12, 2015
    Messages:
    34
    Likes Received:
    3
    wow wow wow thanks for this dude!!
     
    • Thanks Thanks x 1
  9. DarkPixel

    DarkPixel Jr. VIP Jr. VIP

    Joined:
    Oct 4, 2011
    Messages:
    1,348
    Likes Received:
    1,252
    Location:
    ↓↓↓↓
    Home Page:
    Excellent read. The only thing I dislike about headless browsers is that they cannot be embedded to custom applications (to add GUI for example). Yeah of course you could use a simple REPL to interact with it, but that's experimental.
     
    • Thanks Thanks x 1
  10. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    12,463
    Likes Received:
    11,167
    Occupation:
    CHEAP
    Location:
    DATASETS
    Home Page:

    I think you can embed them, with some work. I'm just talking in theory here, I've never done this, but for example PhantomJS is the WebKit engine itself adapted to run your Javascript's from the command line and not show a GUI. But, it's all in C++ and if you've got the skills to write GUI's in C++, you probably can wire them up together and compile a C++ app. Which means you can probably write commercial bots using this. Again I haven't tried, but I think it's doable with some skills and some investment in time.
     
  11. fatboy

    fatboy Elite Member

    Joined:
    Aug 13, 2008
    Messages:
    1,618
    Likes Received:
    3,229
    Occupation:
    Retired
    Location:
    Old Peoples Home
    Been playing with SlimerJS and CasperJS for about a month now. Bit of a learning curve for an old git like me, but have to say I am impressed! Trying to convert some of my resource hogging Ubot bots to it, getting there (very) slowly :D
     
    • Thanks Thanks x 4
  12. tb303

    tb303 Power Member

    Joined:
    Dec 18, 2011
    Messages:
    764
    Likes Received:
    436
    very useful. thanks and bookmarked.
     
    • Thanks Thanks x 1
  13. Y M C M B

    Y M C M B Power Member

    Joined:
    Sep 19, 2012
    Messages:
    622
    Likes Received:
    117
    bookmarked, will use it for windows though
     
    • Thanks Thanks x 1
  14. riktubrs

    riktubrs Regular Member

    Joined:
    Dec 8, 2010
    Messages:
    263
    Likes Received:
    68
    Occupation:
    Software Developer
    Interesting.
    All bots I created for self use are with nodejs.
    Where do you position nodejs compared to casperjs? what do you think are the pros of casper over node?
     
    Last edited: Jun 15, 2015
  15. bluehatface

    bluehatface Regular Member

    Joined:
    Oct 19, 2013
    Messages:
    259
    Likes Received:
    105
    Location:
    Here
    Good shout El Barto...

    For those who don't code in JS, try Selenium - PHP, Java & C
     
    • Thanks Thanks x 1
  16. jamie3000

    jamie3000 Supreme Member

    Joined:
    Jun 30, 2014
    Messages:
    1,371
    Likes Received:
    624
    Occupation:
    Finance coder looking for semi-retirement
    Location:
    uk
    I've used Facebook's php web driver with phantomjs loads and it is by far the easiest way to get started with full render botting IMHO. I did find however that when you come to scale your operations up even the low resource demanding phantomjs will take up 10x the resources of a simple HTML parsing bot. Great post though man.
     
  17. Repulsor

    Repulsor Power Member

    Joined:
    Jun 11, 2013
    Messages:
    770
    Likes Received:
    278
    Location:
    PHP Scripting ;)
    Headless Web browsers really come handy in some cases of scrapping where there are lot of javascript contents, which usually the normal scrapping cant get in PHP.

    Selenium is pretty handy, and facebooks git for the webdriver is a good start in leveraging web drivers into your projects.

    https://github.com/facebook/php-webdriver
     
    • Thanks Thanks x 1
  18. accelerator_dd

    accelerator_dd Jr. VIP Jr. VIP

    Joined:
    May 14, 2010
    Messages:
    2,448
    Likes Received:
    1,010
    Occupation:
    SEO
    Location:
    IM Wonderland
    Selenium is too loud in terms of resources. If you are doing continuous stuff, the browsers take up too much RAM from their countless memory leaks.

    PhantomJS is great where you have tons of JS stuff going on and you just want the end result, but that comes at a performance price for sure. Headless browsers are becoming more and more popular, and they will keep doing so for sure.

    However, I wouldn't use one unless I absolutely have to. The good old TcpClient will always be the fastest (although somewhat robus) option for a long term solution. Any other case - PhantomJS or Selenium is your tool!
     
  19. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    12,463
    Likes Received:
    11,167
    Occupation:
    CHEAP
    Location:
    DATASETS
    Home Page:
    They're different beasts.

    Somewhere out there there's a question for one of the PhantomJS developers, something like "why don't you make PhantomJS a module for Node.js". His answer is that PhantomJS is built upon a large footprint. Namely WebKit(Safari and Chrome use it). So you have this elephant. Embedding this large code base into a smaller one isn't easy.

    Also, you usually embed the smaller program into the larger program. It'd make more sense to embed Node.js into WebKit. Dunno if that answers your question, but that's 2 extra cents about that.
     
  20. ItsBlinkHere

    ItsBlinkHere Regular Member

    Joined:
    Apr 27, 2014
    Messages:
    409
    Likes Received:
    150
    Location:
    At Large
    I have used PhantomJS before. Mainly for a twitter bot. You can turn off images to make it faster yet. Just a tip, if you are going to be using it for botting social networks then you might actually want to add headers. The cool thing about headerless browsers is that by default they don't need headers but you can add any browser headers you like. I randomize the ones I use. Sometimes it uses Firefox headers, sometimes Chrome headers and sometimes IE headers. I usually automate it with selenium and python. It's the fastest for me.