1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

I want to scrape url or website with 10,000 pages

Discussion in 'Black Hat SEO' started by atongalee, Nov 12, 2010.

  1. atongalee

    atongalee Regular Member

    Joined:
    Aug 14, 2010
    Messages:
    314
    Likes Received:
    55
    Occupation:
    internet engineer
    Location:
    China, UK & Newyork
    Hi,

    Pls do you know any tolls that can do this ? what tool . Pls help
     
  2. onlinewealth

    onlinewealth Junior Member

    Joined:
    Mar 13, 2007
    Messages:
    157
    Likes Received:
    101
    Occupation:
    Direct marketing working at home.
    Location:
    "In a State of Corruption"
    Try httrack. Google it.
     
    • Thanks Thanks x 1
  3. ┼blackrat┼

    ┼blackrat┼ Senior Member

    Joined:
    Jul 31, 2010
    Messages:
    899
    Likes Received:
    729
    Location:
    Sewer
    scrapebox. Does WONDERS.
     
    • Thanks Thanks x 1
  4. zappys

    zappys Junior Member

    Joined:
    Apr 25, 2010
    Messages:
    139
    Likes Received:
    62
    kind of hard to find all the 10 000 pages , however it's possible.

    1. use scrapebox to find all the indexed pages of the website using site: operator (however you will get just 1000 urls in this way)
    2. use scrapebox internal link adon and check every internal link on the urls you already have scraped on the first step
    3. put the lists together and then remove duplicate urls

    Hope this helped
     
    • Thanks Thanks x 2
  5. zimsabre

    zimsabre Regular Member

    Joined:
    Nov 11, 2010
    Messages:
    255
    Likes Received:
    174
    SB can only handle 1m urls
     
  6. ┼blackrat┼

    ┼blackrat┼ Senior Member

    Joined:
    Jul 31, 2010
    Messages:
    899
    Likes Received:
    729
    Location:
    Sewer
    you can run almost infinite instances of SB at the same time.
     
  7. bryansbiz

    bryansbiz Newbie

    Joined:
    May 28, 2009
    Messages:
    15
    Likes Received:
    1
    are you scraping for emails? Other Phone, etc. Try GSA spider or scrapebox.
    with 10,000 pages they probably have a database call for pages find the number ranges of the pages and have gsa crawl through them
     
  8. marketingman

    marketingman Newbie

    Joined:
    Jul 26, 2010
    Messages:
    18
    Likes Received:
    5
    Try using Visual Web Spider. I have been using it for years to create URL lists for my clients. Google it for the direct site!
     
  9. SNB321

    SNB321 Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 29, 2012
    Messages:
    240
    Likes Received:
    13
    Occupation:
    Seo Consultant
    Location:
    Black Hat World
    Home Page:
    Scrapebox is the most useful tool.