1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Best URL scraper?

Discussion in 'BlackHat Lounge' started by airwolf137, Jun 28, 2017.

  1. airwolf137

    airwolf137 Junior Member

    Joined:
    Sep 15, 2016
    Messages:
    177
    Likes Received:
    11
    Gender:
    Female
    Occupation:
    Student
    Location:
    San Jose
    hi
    I want to know best url scraper available in market.
    I already have the Scrapebox but its scraping speed is way too low even with private proxies. ( posting speed is okay as i m getting 40% success rate )
    I want to scrape 80k-90k URLs and then work on it.
    Please suggest.
     
  2. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,726
    Likes Received:
    1,993
    Gender:
    Male
    Home Page:
    google blocks every single scrape the exact same speed. That said Scrapebox is, to my knowledge, the fastest on the market. I have a video, Ill put it below, showing Scrapebox scraping over 1 million urls per Minute from google.

    Its NOT about the scraper (at least when it comes to scrapebox) its about the proxies. So if you want to go faster you can either find your own proxy sources and then scrape them

    OR

    You can buy prefiltered lists of public proxies

    OR

    you can just let the private/shared proxies go on the detailed harvester with a delay and let it run endlessly - also note you can run an unlimited number of instances of Scrapebox, so you can fire up other instances to do other things. Also video below.

    OR

    You can use back connect proxies/rotating /reverse proxies - These are a big pool of proxies you can use. Most people find these work well, but its up to you and what you call "fast"


    so it all boils down to the proxies. Now with other scrapers, due to their architecture, they may be slower then Scrapebox in general, but scrapebox V2 was built from the ground up with scraping in mind and they spent weeks and weeks and weeks optimizing it for scraping.

    Videos:








     
    • Thanks Thanks x 2
  3. Bozard

    Bozard Newbie

    Joined:
    Jun 11, 2017
    Messages:
    42
    Likes Received:
    17
    Gender:
    Male
    The best url scraper is the one you make your self. Python is a simple language and it has a million libraries. The beautiful soup lib. Allows you to return every url on a page with one line of code.......
     
  4. bendutchman

    bendutchman Regular Member

    Joined:
    Jun 1, 2012
    Messages:
    237
    Likes Received:
    74
    Occupation:
    genetic engineer
    Location:
    House, Road House
    Scrapebox on vps with 1gigabit connection, and as many private proxies as you can get your hands on.

    I guarantee that you will be able to scrape millions of url's in not time.

    PS Your focus shouldn't be so much as to scrape millions of url's, but to come up with some well thought out footprints to create a higher quality of sesarch, otherwise you will literally spend hours sorting through your harvested url's to get somthing of value.