1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to scrape all URLs of a website? Scrapebox?

Discussion in 'Black Hat SEO Tools' started by gamingneeds, Oct 12, 2014.

  1. gamingneeds

    gamingneeds Regular Member

    Joined:
    Jul 18, 2008
    Messages:
    405
    Likes Received:
    83
    Hey guys,

    I want to be able to quickly scrape all the urls of a website.

    For example, if I l have somenicheblog.com, how can I make a URL list of all the pages on somenicheblog.com?
    I'm sure it's a general feature in Scrapebox but I can use something else if needed.

    Thanks!
     
  2. mindmaster

    mindmaster Jr. VIP Jr. VIP

    Joined:
    Sep 16, 2010
    Messages:
    2,818
    Likes Received:
    1,250
    Home Page:
    1. If the website has a sitemap you can use the sitemap scrapebox addon for that.
    or
    2. If the site dose not have a sitemap, you can scrape google for indexed pages with the site: operator (site:example.com).
    After you scrape from google those pages you can further use the Link Extractor (internal links) addon. After this remove duplicate urls from all (scraped + extracted) links.
     
    • Thanks Thanks x 3
  3. Absan

    Absan Jr. VIP Jr. VIP

    Joined:
    Dec 28, 2013
    Messages:
    399
    Likes Received:
    119
    Occupation:
    SEO and PPC Consultant
    Location:
    Spain
    You also can use ScreamingFrog for this task :)
     
    • Thanks Thanks x 4
  4. nexoneo

    nexoneo Newbie

    Joined:
    Apr 12, 2011
    Messages:
    16
    Likes Received:
    2
    i used sitesucker before and it seems to have worked for the fappenin
     
  5. lord1027

    lord1027 Elite Member

    Joined:
    Sep 20, 2013
    Messages:
    3,177
    Likes Received:
    2,239
    +1 for ScreamingFrog SEO Spider, I'm using it almost daily for this kind of task, works great!
     
  6. mikemiller1

    mikemiller1 Newbie

    Joined:
    Jan 5, 2014
    Messages:
    22
    Likes Received:
    4
    thanks man saved me time...
     
  7. Aty

    Aty Jr. VIP Jr. VIP

    Joined:
    Jan 27, 2011
    Messages:
    5,994
    Likes Received:
    4,087
    Occupation:
    SEO (Senior Erection Officer)
    Location:
    your 6 o'clock
    Home Page:
    Screaming Frog is a great option too, however I prefer Scrapebox's Link Extractor. It's faster than SF, plus you will need Scrapebox anyway to remove external url's, split lists maybe, and so on.
     
  8. lilmasta

    lilmasta Jr. VIP Jr. VIP

    Joined:
    May 21, 2009
    Messages:
    2,525
    Likes Received:
    1,176
    Gender:
    Male
    Location:
    192.168.0.1
    click custom foot print , type site: for keywords just write the website root name that you wanna scrape for example "domain.com" only select Google for scraping and click harvest now
     
  9. SEO_Alchemy

    SEO_Alchemy Senior Member

    Joined:
    Sep 8, 2012
    Messages:
    1,133
    Likes Received:
    1,215
    Location:
    USA
    Exactly. Screaming Frog is your #1 choice for this type of task. It functions just like an SE Spider to crawl whatever site you give it to crawl with all kinds of customizations for filtering: only html, only images, leave out noFollow, etc, etc. etc. and will output all urls. For this type of functionality it beats the pants off of Scrapebox (and I love Scrapebox). Only issue you'll find is that for huge sites, you're going to need huge amounts of RAM to be able to crawl/scrape the whole thing.
     
  10. hpasha

    hpasha Jr. VIP Jr. VIP

    Joined:
    May 15, 2011
    Messages:
    1,372
    Likes Received:
    185
    Location:
    Kepler 186F
    i do it using scrapebox. Use (site:example.com) and harvest all the indexed URL. It's that easy :)