1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

I'm starting a Perl Bot/Scraper programming tutorial.

Discussion in 'BlackHat Lounge' started by VoidITSolutions, Nov 7, 2013.

  1. VoidITSolutions

    VoidITSolutions BANNED BANNED

    Joined:
    Apr 5, 2013
    Messages:
    164
    Likes Received:
    44
    Like the title says, I'm creating a tutorial that'll teach you how to scrape websites and create bots.
    Part 1 is done which covers just the basics.
    Part 2 is also finished which covers retrieving web pages and submitting forms.

    If I see enough traffic and social shares from the content, I'll create a full-blown series and it'll be out there for all of you BlackHatters for free.

    Being able to create your own bots is a powerful ability.
    I wanted to share it with the community that's taught me so much.

    I'll add Part 3 tomorrow.

    I've also linked to my Perl for Beginners series in the post if you're not familiar with Perl programming.

    Please, if you comment SITE suggestions do so in here.
    If you have direct CODE suggestions or questions, comment on the site.

    Let me know what you think!

    http://www.fryitservice.com/p/creating-web-bots-with-perl-part-1-perl-basics/
     
    • Thanks Thanks x 3
  2. seeplusplus

    seeplusplus Power Member

    Joined:
    Aug 18, 2008
    Messages:
    511
    Likes Received:
    163
    Good stuff!

    How do you rate Perl as a language for this type of thing?
     
  3. VoidITSolutions

    VoidITSolutions BANNED BANNED

    Joined:
    Apr 5, 2013
    Messages:
    164
    Likes Received:
    44
    I love using perl. I use it to automate everything.
    Perl was made for manipulating data. I find i very easy to code pattern matching algorithms with which I think makes it perfect for bot/scraper coding.
     
    • Thanks Thanks x 1
  4. Bostoncab

    Bostoncab Elite Member

    Joined:
    Dec 31, 2009
    Messages:
    2,255
    Likes Received:
    514
    Occupation:
    pain in the ass cabbie
    Location:
    Boston,Ma.
    Home Page:
    Nice tutorial.. Can you make be a bot that will scrape all of yelp and put it into a csv so i can upload it to a new site i am working on?
     
  5. VoidITSolutions

    VoidITSolutions BANNED BANNED

    Joined:
    Apr 5, 2013
    Messages:
    164
    Likes Received:
    44
    Ha. Not for free :D. I'd say PM me if you wanna chat about it.

    My goal here though is to teach others how to build their own bots.
    I don't usually take up freelance gigs.
    Maybe I'll get JrVip and set up a BST later this month, we'll see.

    Either way, I'll be continuing the tutorial tomorrow.
    Hope it helps someone!
     
  6. Kries

    Kries Junior Member

    Joined:
    Aug 13, 2013
    Messages:
    180
    Likes Received:
    27
    Hi, does this work for twitter-relevant projects? Like scraping tweets etc.
     
  7. VoidITSolutions

    VoidITSolutions BANNED BANNED

    Joined:
    Apr 5, 2013
    Messages:
    164
    Likes Received:
    44
    It should for work for almost all sites.

    You may need to fiddle with the UserAgent string and/or use a proxy however.
    I'll be including proxy management in my 3rd or 4th entry. I haven't decided which yet.

    The more IPs you are using, the higher your bot/scraper success rate..

    It is important to have IP addresses from the same geographical location for the highest bot success rate.
    One big red-flag that causes captchas and/or banned accounts is creating an account in one country and using it in another.
     
    Last edited: Nov 8, 2013
  8. CodeNinja

    CodeNinja Newbie

    Joined:
    Sep 25, 2013
    Messages:
    47
    Likes Received:
    26
    As a Rubyist I am going to watch some nice Perl. Just make sure you add syntax highlighting to your code examples.
     
  9. CodeNinja

    CodeNinja Newbie

    Joined:
    Sep 25, 2013
    Messages:
    47
    Likes Received:
    26
    As a Rubyist I am going to watch some nice Perl. Just make sure you add syntax highlighting to your code examples.
     
  10. mast3rmind

    mast3rmind Newbie

    Joined:
    Sep 25, 2013
    Messages:
    3
    Likes Received:
    0
    Thanks, very helpful example using LWP::UserAgent. I agree with CodeNinja on the syntax highlighting. Do you have any experience with Mojolicious? It's a power web framework that can be used for some powerful crawler/bots.
     
  11. VoidITSolutions

    VoidITSolutions BANNED BANNED

    Joined:
    Apr 5, 2013
    Messages:
    164
    Likes Received:
    44
    I agree with both of you about the syntax highlighting. I overlooked it but I just added it to all of my posts.
    It makes the posts much more festive :D

    As far as Mojolicious mast3rmind, no I haven't.
    I'm very stubborn when comes to third party tools.
    The only one I'll touch is ScrapeBox.
    Everything else I build from scratch. It takes longer initially but when a site/program changes it's code, it makes it easy for me to update mine instead of waiting on the developer. I'm also a big fan of the command line and scripting things. Most 3rd party tools are GUI. I suppose I can't speak for Mojo though.
     
  12. VoidITSolutions

    VoidITSolutions BANNED BANNED

    Joined:
    Apr 5, 2013
    Messages:
    164
    Likes Received:
    44
    Part 3 is done - http://www.fryitservice.com/p/creating-web-bots-with-perl-part-3-basic-processing/

    Now ya'll can code your own google scrapers!
     
  13. CodeNinja

    CodeNinja Newbie

    Joined:
    Sep 25, 2013
    Messages:
    47
    Likes Received:
    26
    Have you considered expaned your tutorial into teaching how to web scrape using WWW::Mechanize?
    Mechanize gives a more browserlike experience, without the need to define your own cookie jars.
     
  14. VoidITSolutions

    VoidITSolutions BANNED BANNED

    Joined:
    Apr 5, 2013
    Messages:
    164
    Likes Received:
    44
    Someone brought it to my attention on Facebook.
    WWW::Mechanize is not my style and it has no real benefits.

    It may be easier for some to use but I like to program on a lower-level (I started programming first with Intel Assembler if that tells you anything).

    Maybe in the future I'll make another set of tutorials geared around it.