1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How do I extract ALL urls in an entire website?

Discussion in 'Black Hat SEO Tools' started by ChrisChris, Dec 1, 2014.

  1. ChrisChris

    ChrisChris Junior Member

    Joined:
    Feb 26, 2013
    Messages:
    148
    Likes Received:
    6
    Occupation:
    Marketing, Networking, Malaga
    Location:
    Oakland, CA
    I'd like to extract all the urls in a website to find expired domains. Id like to just enter funnyDOTcom and let it extract all urls from ALL the pages on the site. Simple but sweet.

    Any good programs for this?

    Xenu dies on me after 2mil.
    Scrapebox ink extractor is retarded when it comes to this.
    Other options?
     
    Last edited: Dec 1, 2014
  2. zagard

    zagard Jr. VIP Jr. VIP

    Joined:
    Oct 26, 2014
    Messages:
    192
    Likes Received:
    168
    Occupation:
    popoom pompomer
    Location:
    poompoom land
    :1: screaming frog?
     
  3. ChrisChris

    ChrisChris Junior Member

    Joined:
    Feb 26, 2013
    Messages:
    148
    Likes Received:
    6
    Occupation:
    Marketing, Networking, Malaga
    Location:
    Oakland, CA


    ooo, I had forgotten about that.

    Is it capable of scanning even 10m pages on 1 site, then scanning them all for broken links without freezing, or dying. Im just very sick of scanning for broken links and ending up with like 3% scanned then "run out of memory".
     
  4. maddawgmackem

    maddawgmackem Junior Member

    Joined:
    Feb 7, 2014
    Messages:
    166
    Likes Received:
    28
    haven't used screaming frog but won't scrape box and the broken link checker work ?
     
  5. ziplack

    ziplack Senior Member

    Joined:
    Feb 18, 2010
    Messages:
    1,193
    Likes Received:
    603
    Location:
    BHW
    i guest scrapebox can do the job
     
  6. zagard

    zagard Jr. VIP Jr. VIP

    Joined:
    Oct 26, 2014
    Messages:
    192
    Likes Received:
    168
    Occupation:
    popoom pompomer
    Location:
    poompoom land
    Tested Screaming Frog with dmoz, i gt 25 url/s at 20 threads, at that speed 10m is like 5 days :( and when i run it on max threads (200) the url/s is at 50-ish but the program turn irresponsive after 10 mins bcs proc usage was like 94%. :p

    too slow, myb shld stick to scrapebox as the two ppl above said
     
    Last edited: Dec 2, 2014
  7. misteryou.

    misteryou. Power Member

    Joined:
    Feb 1, 2012
    Messages:
    573
    Likes Received:
    109
  8. supacatt

    supacatt Junior Member

    Joined:
    Mar 21, 2009
    Messages:
    111
    Likes Received:
    4
    Occupation:
    Internet Marketer
    Location:
    Hollywood, Florida
    Never heard of this site before. Thanks for the link.
     
  9. ChrisChris

    ChrisChris Junior Member

    Joined:
    Feb 26, 2013
    Messages:
    148
    Likes Received:
    6
    Occupation:
    Marketing, Networking, Malaga
    Location:
    Oakland, CA
    Ya, Im still using xenu for the time being, but will grab that scraming frog bulk checker license to see if its at least a little more consistent.

    Wish I knew of other alternatives tho, without building one myself.

    Any other bulk broken link checkers out there? To find expired domains for pbn?
     
  10. ChrisChris

    ChrisChris Junior Member

    Joined:
    Feb 26, 2013
    Messages:
    148
    Likes Received:
    6
    Occupation:
    Marketing, Networking, Malaga
    Location:
    Oakland, CA
    lol, screaming frog unresponsive, lovely.

    Ya, I just want some expired domains with links to high authority places like wikipedia, so I thought scrapebox could handle it, but it freezes after a very short while, much worse than xenu. Screaming frogs free version sure doesnt do much, 500 link limit is good for a test, but no more.

    How do you like screaming frogs unlimited $99 version? 5 days is pretty bad, but better than nothing, which is what Ive got now lol.
     
  11. Repulsor

    Repulsor Power Member

    Joined:
    Jun 11, 2013
    Messages:
    710
    Likes Received:
    267
    Location:
    PHP Scripting ;)
    I can make a scrapper which would go to a link and fetch all links for further fetching, and scrapes domains from each and every page and saves into a database. You can then export the domains as a file of whatever size you need, say 10k domains and you can run it through scrapebox or any other tools for broken links.

    Having it coded well, it would never really load the server at all. A shared hosting would be more than sufficient.
     
  12. jo199

    jo199 Newbie

    Joined:
    Dec 6, 2014
    Messages:
    4
    Likes Received:
    0
    Scrapebox is good dude.