1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrape all pages of website >> Scrapebox

Discussion in 'Black Hat SEO Tools' started by pokerjk, Nov 11, 2011.

  1. pokerjk

    pokerjk Senior Member

    Joined:
    Dec 26, 2010
    Messages:
    1,167
    Likes Received:
    384
    Occupation:
    Online Marketer
    Location:
    England
    I'm trying to scrape all the pages of a website with Scrapebox but its not scraping all of the pages.

    Google says the site has 770,000 pages.

    I set the footprint to site:domain.com select all search engines and results to max (1000).

    It only pulls just over 1500 results. How do I get Scrapebox to scrape ALL the 770,000 pages?

    Cheers :rolleyes:
     
  2. pokerjk

    pokerjk Senior Member

    Joined:
    Dec 26, 2010
    Messages:
    1,167
    Likes Received:
    384
    Occupation:
    Online Marketer
    Location:
    England
    Oops. Yes big typo in title... tired... :(
     
  3. Kickflip

    Kickflip BANNED BANNED

    Joined:
    Jan 29, 2010
    Messages:
    2,038
    Likes Received:
    2,465
    Could try this to see if it pull more pages

    Code:
    1 site:
    2 site:
    3 site:
    4 site:
    5 site:
    6 site:
    7 site:
    8 site:
    9 site:
    0 site:
    q site:
    w site:
    e site:
    r site:
    t site:
    y site:
    u site:
    i site:
    o site:
    p site:
    a site:
    s site:
    d site:
    f site:
    g site:
    h site:
    j site:
    k site:
    l site:
    z site:
    x site:
    c site:
    v site:
    b site:
    m site:
    
     
  4. pokerjk

    pokerjk Senior Member

    Joined:
    Dec 26, 2010
    Messages:
    1,167
    Likes Received:
    384
    Occupation:
    Online Marketer
    Location:
    England
    Thanks I got more 5600, but no where near the 770,000... :/
     
  5. VaLeRyA

    VaLeRyA Regular Member

    Joined:
    Sep 23, 2008
    Messages:
    379
    Likes Received:
    81
    Location:
    Argentina
    same problem
    there must be some tool to extract all urls from google
     
  6. HelloInsomnia

    HelloInsomnia Jr. Executive VIP Jr. VIP Premium Member

    Joined:
    Mar 1, 2009
    Messages:
    1,817
    Likes Received:
    2,913
    Your going to have to keep adding keywords to get more results...

    I would look for a sitemap, you may be able to get all the pages from there.

    Edit: another idea is to load up all the urls you have already and scrape for internal links in the link extractor - you may be able to do that over and over again to multiply the results.
     
  7. kokoloko75

    kokoloko75 Elite Member

    Joined:
    Jan 1, 2011
    Messages:
    1,628
    Likes Received:
    1,936
    Occupation:
    Design director
    Location:
    Paris (France)
    Wow, 770.000 is huge !
    You need to have a tons a good proxies to scrape all this from Google...

    The website doesn't have a Sitemap ? (...)
    URLs can not be generated manually ? Like :
    Code:
    http://wwwwebsite.com/page/1
    http://wwwwebsite.com/page/2
    http://wwwwebsite.com/page/3
    ...
    Beny
     
  8. pokerjk

    pokerjk Senior Member

    Joined:
    Dec 26, 2010
    Messages:
    1,167
    Likes Received:
    384
    Occupation:
    Online Marketer
    Location:
    England
    No site map :/

    That's what I have been doing is extracting the internal links over and over but its a little long hauled that way.
     
  9. IamTURK

    IamTURK Regular Member

    Joined:
    Jun 5, 2009
    Messages:
    225
    Likes Received:
    17
    Location:
    Turkey
    site:domain.com

    or u should scrape sitemaps.

    no other way.
     
  10. ppcmaster

    ppcmaster Supreme Member

    Joined:
    Dec 26, 2008
    Messages:
    1,492
    Likes Received:
    161
    Occupation:
    Article writing, link wheel, link building, etc
    Location:
    Lowell, MA
    Home Page:
    How about Xenu? Have you used it before? You will want to turn off scraping external links though. It works like a charm for me :)
     
  11. d3x73r

    d3x73r Newbie

    Joined:
    Mar 12, 2011
    Messages:
    13
    Likes Received:
    7
    At first thanx for the good tip ppcmaster but...

    How do you take the links to scrapebox from xenu ... it try create a sitemap but its over 10 Mb so scrapebox didnt can read them.. then i try to split the sitemap but also no luck scrapebox dint load any url. is there a method with xenux to filter out files from the sitemal like jpg ? or is there a method to export the urls to a txt from xenus ??

    all the best
     
  12. innocent_kid

    innocent_kid Power Member

    Joined:
    Feb 9, 2010
    Messages:
    503
    Likes Received:
    123
    if u have so many google pass private proxies than only you can do it