1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Recommend a URL Scraper/Harvester...?

Discussion in 'Black Hat SEO Tools' started by Drover, Apr 16, 2010.

  1. Drover

    Drover Regular Member

    Joined:
    Jan 23, 2008
    Messages:
    288
    Likes Received:
    42
    Location:
    Banking Craigslist Again!
    Home Page:
    I have several URL scrapers but they all seem to be made to scrape from google (or other SEs). I'm looking for a URL scraper/harvester/extractor that will extract from a page that I specify. I've tried Web Data Extractor but it only grabs 10k URLs. I'm looking at like 1.7 million.

    Any recommendations/suggestions?

    Thanks in advance.
     
  2. drgraden

    drgraden Newbie

    Joined:
    Mar 9, 2010
    Messages:
    10
    Likes Received:
    0
    Try Jonathan Leger's Web Data Parser (http://webdataparser.com/).

    I don't own it, but own some of his other products, such as The Best Spinner, which are excellent. If Web Data Parser is as good as TBS, it should handle your needs well.

    It's only $67.00!

    Let us know if it works for you.
     
  3. Drover

    Drover Regular Member

    Joined:
    Jan 23, 2008
    Messages:
    288
    Likes Received:
    42
    Location:
    Banking Craigslist Again!
    Home Page:
    Thanks. Giving WDP a try now. Seems to crap out around 20K links collected for some reason. It just stops responding. :(

    Any other suggestions?
     
  4. Vinnie03

    Vinnie03 Newbie

    Joined:
    Apr 22, 2009
    Messages:
    14
    Likes Received:
    1
    Drover,

    Do you have an ebay scraper? Been trying to find one with no luck. The person who was selling one in this forum hasn't replied to my PM. If you could let me know either way I'd appreciate it.
     
  5. boffmaster

    boffmaster Junior Member

    Joined:
    Jun 1, 2009
    Messages:
    144
    Likes Received:
    32
    A lot of programs will fail after a certain amount because the site / ISP you are scraping detects the way the are hammering the site and stops them. You may want to look for something that includes timers and scheduling so it can gently scrape the site.
     
  6. Derek Foreal

    Derek Foreal Junior Member

    Joined:
    Apr 10, 2010
    Messages:
    190
    Likes Received:
    767
    Gender:
    Male
    Location:
    East Coast USA
    My advice to you would be to learn a little Python (my experience learning Python has been surprisingly easy)...and you really only need to learn enough to accomplish your needs. There are so many good resources available freely to learn and "Diving InTo Python" is one of them. It's free to read online or a down-loadable pdf, there's tons of example code that's also down-loadable to follow along with the exercises in the book.
    In fact here's a pc. of example "Dive InTo Python" code to grab every link on a given page, it runs at warp speed and in literally a second or two it'll return several hundred links or however many links are on that page:

    import urllib, urllister
    usock = urllib.urlopen("http://your-URL-to-scrape-goes-here.com")
    parser = urllister.URLLister()
    parser.feed(usock.read())
    usock.close()
    parser.close()
    for url in parser.urls: print url

    Or 4 lines with the lxml module:

    from lxml.html import parse
    doc = parse('http://whatever-the-URL-is.com').getroot()
    for link in doc.cssselect('div.pad a'):
    print '%s: %s' % (link.text_content(), link.get('href'))

    However these examples don't follow the links, the point is just to show you how easy it is, and only takes a few lines of code to get all the links on a given page. The possibilities are really endless with Python and it's saving me so much time and automating alot of my tasks.

    There are Modules that will follow all the links or just ones that you specify and can go unlimited levels deep. That's what it sounds like your looking for, and the Scrapy Module should do that nicely for you. And all on auto pilot, schedule it to run whenever you want, have it write the data into spread sheets or a mysql data base, whatever you want. They have a very active community as well in google groups where everyone helps each other out and you can get support.

    Check it out...It's not hard at all to learn a little bit of Python and it's always nice to be able to customize your own tools especially the way things change so fast all the time anymore.
     
    • Thanks Thanks x 2
    Last edited: Apr 23, 2010
  7. ch8878

    ch8878 Elite Member

    Joined:
    Mar 21, 2009
    Messages:
    2,242
    Likes Received:
    428
    Gender:
    Male
    Occupation:
    Gamer
    Location:
    Youtube
    Home Page:
    Theres a great free one on here some where that gets sent by email cant remember where !