1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to find all pages on one site with scrapebox?

Discussion in 'Black Hat SEO Tools' started by Chicilikit, Apr 2, 2012.

  1. Chicilikit

    Chicilikit Senior Member

    Joined:
    Dec 21, 2010
    Messages:
    878
    Likes Received:
    153
    Hello, I need to find all pages on site that has tens thousands of pages, but when I try to harvest them with srcapebox, it only finds one or two hundreds. Where is the problem? Or is there better way than scrapebox how to do this?
     
  2. ugjunk

    ugjunk Jr. VIP Jr. VIP Premium Member

    Joined:
    Jan 1, 2011
    Messages:
    2,345
    Likes Received:
    721
    Location:
    Los Angeles
    Home Page:
    I am not 100% sure but I think the footprint was this :

    site:domain.com
     
  3. kkvsam

    kkvsam Senior Member

    Joined:
    Oct 11, 2009
    Messages:
    936
    Likes Received:
    569
    Occupation:
    SYS ADMIN
    Home Page:
    there are 3 options for this.
    1. SB Sitemap scraper add-on
    2. site:domain.com "keywords"
    for 2nd option you have to add as many as keywords.
    3. Generate sitemap with below link and extract all links with the sitemap.
    Code:
    http://www.check-domains.com/sitemap/index.php
    
    
    Done...:D
     
    • Thanks Thanks x 4
  4. kokoloko75

    kokoloko75 Elite Member

    Joined:
    Jan 1, 2011
    Messages:
    1,628
    Likes Received:
    1,936
    Occupation:
    Design director
    Location:
    Paris (France)
    As said by Ugjunk, use the operator "SITE:".
    Or if it's a WordPress blog (...) you can use the sitemap.

    EDIT : Kkvsam been faster than me ;)

    Beny
     
    • Thanks Thanks x 1
  5. Chicilikit

    Chicilikit Senior Member

    Joined:
    Dec 21, 2010
    Messages:
    878
    Likes Received:
    153
    Thank you for advices, I used that site:, but it doesnt work. Looks like creating sitemap is better option.
     
  6. Chicilikit

    Chicilikit Senior Member

    Joined:
    Dec 21, 2010
    Messages:
    878
    Likes Received:
    153
    Still did not solve this. I tried to create sitemap, but the site is so big that it runs for many hours and always freezes before end. Dont know what to do now, since the first method does not work. I also think that they already have sitemap on their server as traffic travis shows there should be, but I dont know how to download it.
     
    Last edited: Apr 6, 2012
  7. kingtana

    kingtana Regular Member

    Joined:
    Nov 1, 2008
    Messages:
    277
    Likes Received:
    73
    Location:
    WWW
    Home Page:
    Hey there, this can be done like so...

    site:domain.com

    You place site:domain.com in the keywords field

    Make sure you have your proxies set

    5 connections highly recommended for any search engine

    Harvester time out settings 30 seconds works well

    I wouldn't scrape AOL or Bing, Yahoo and Google would work fine, for this i would probably just select Google.

    Have Results set to 1000

    Click Harvest

    Now the big question, what exactly are you doing for proxies?

    That will be the limiting or un limiting factor in the above equation.
     
  8. shin610

    shin610 Regular Member

    Joined:
    Jun 23, 2010
    Messages:
    224
    Likes Received:
    89
    you have to combine the site: operator to a huge list of keywords
     
    • Thanks Thanks x 1
  9. Chicilikit

    Chicilikit Senior Member

    Joined:
    Dec 21, 2010
    Messages:
    878
    Likes Received:
    153
    The site: with lot of keywords worked, even it still did not find all the pages, it found at least half I think. Problem was proxies, I tested huge list and used really few dozens of the best and fastest and then it worked. thanks a lot.
     
  10. raghav

    raghav Power Member

    Joined:
    Jan 4, 2011
    Messages:
    534
    Likes Received:
    302
    If you are still looking for this or anybody who comes to this thread, you can try link extractor addon with sb and choose internal links only.
     
    • Thanks Thanks x 1