1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to scrape all URLs of a website? Scrapebox?

Discussion in 'Black Hat SEO Tools' started by gamingneeds, Oct 12, 2014.

  1. gamingneeds

    gamingneeds Regular Member

    Joined:
    Jul 18, 2008
    Messages:
    401
    Likes Received:
    83
    Hey guys,

    I want to be able to quickly scrape all the urls of a website.

    For example, if I l have somenicheblog.com, how can I make a URL list of all the pages on somenicheblog.com?
    I'm sure it's a general feature in Scrapebox but I can use something else if needed.

    Thanks!
     
  2. mindmaster

    mindmaster Jr. VIP Jr. VIP Premium Member

    Joined:
    Sep 16, 2010
    Messages:
    2,501
    Likes Received:
    1,136
    Location:
    at my new office
    1. If the website has a sitemap you can use the sitemap scrapebox addon for that.
    or
    2. If the site dose not have a sitemap, you can scrape google for indexed pages with the site: operator (site:example.com).
    After you scrape from google those pages you can further use the Link Extractor (internal links) addon. After this remove duplicate urls from all (scraped + extracted) links.
     
    • Thanks Thanks x 3
  3. Absan

    Absan Jr. VIP Jr. VIP

    Joined:
    Dec 28, 2013
    Messages:
    257
    Likes Received:
    82
    Occupation:
    SEO and PPC Consultant
    Location:
    Spain
    You also can use ScreamingFrog for this task :)
     
    • Thanks Thanks x 4
  4. nexoneo

    nexoneo Newbie

    Joined:
    Apr 12, 2011
    Messages:
    16
    Likes Received:
    2
    i used sitesucker before and it seems to have worked for the fappenin
     
  5. lord1027

    lord1027 Elite Member

    Joined:
    Sep 20, 2013
    Messages:
    3,174
    Likes Received:
    2,222
    +1 for ScreamingFrog SEO Spider, I'm using it almost daily for this kind of task, works great!
     
  6. mikemiller1

    mikemiller1 Newbie

    Joined:
    Jan 5, 2014
    Messages:
    21
    Likes Received:
    4
    thanks man saved me time...
     
  7. Aty

    Aty Jr. VIP Jr. VIP

    Joined:
    Jan 27, 2011
    Messages:
    5,410
    Likes Received:
    3,698
    Home Page:
    Screaming Frog is a great option too, however I prefer Scrapebox's Link Extractor. It's faster than SF, plus you will need Scrapebox anyway to remove external url's, split lists maybe, and so on.
     
  8. lilmasta

    lilmasta Jr. VIP Jr. VIP Premium Member

    Joined:
    May 21, 2009
    Messages:
    2,164
    Likes Received:
    958
    Occupation:
    IM
    Location:
    sydney
    click custom foot print , type site: for keywords just write the website root name that you wanna scrape for example "domain.com" only select Google for scraping and click harvest now
     
  9. SEO_Alchemy

    SEO_Alchemy Senior Member

    Joined:
    Sep 8, 2012
    Messages:
    1,134
    Likes Received:
    1,213
    Location:
    USA
    Exactly. Screaming Frog is your #1 choice for this type of task. It functions just like an SE Spider to crawl whatever site you give it to crawl with all kinds of customizations for filtering: only html, only images, leave out noFollow, etc, etc. etc. and will output all urls. For this type of functionality it beats the pants off of Scrapebox (and I love Scrapebox). Only issue you'll find is that for huge sites, you're going to need huge amounts of RAM to be able to crawl/scrape the whole thing.
     
  10. hpasha

    hpasha Jr. VIP Jr. VIP

    Joined:
    May 15, 2011
    Messages:
    1,187
    Likes Received:
    179
    Location:
    Kepler 186F
    i do it using scrapebox. Use (site:example.com) and harvest all the indexed URL. It's that easy :)