1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to get all urls from a site?

Discussion in 'Cloaking and Content Generators' started by pamps, Aug 11, 2012.

  1. pamps

    pamps Newbie

    Joined:
    Aug 11, 2012
    Messages:
    2
    Likes Received:
    0
    hi,

    i need to get all the urls from a site ( just the urls list ) that are indexed in google or eventually by "scaning" that site all following internal links but i am new to terms like grabing, harvesting or fetching.

    the site uses what i thing is called pretty urls so no .php?... is shown.

    my difficulty is that i even do not know what it is used to call to this kind of technical issue.

    is there any tool or some help you could recommend ?

    many thanks all !
     
  2. Autumn

    Autumn Elite Member

    Joined:
    Nov 18, 2010
    Messages:
    2,197
    Likes Received:
    3,041
    Occupation:
    I figure out ways to make money online and then au
    Location:
    Spamville
    To get the urls that are indexed by Google then you would use the site:domain.com query in Google. Make sure that you use &filter=0 so that you get every result, including the near-duplicate pages. You could do this by hand or by using a tool like Scrapebox.

    If you want to spider an entire site then the tool I would use would be wget, which is simple and reliable. If you are using Windows then there are stand-alone versions for Windows available. Of course, if the site publishes a sitemap then you could just download the sitemap and extract the urls from that.
     
    • Thanks Thanks x 1
  3. datatyper

    datatyper Newbie

    Joined:
    Jan 8, 2011
    Messages:
    23
    Likes Received:
    6
    Scrapebox is the best option for link checking otherwise you can use SEOquake toolbar.
     
  4. pamps

    pamps Newbie

    Joined:
    Aug 11, 2012
    Messages:
    2
    Likes Received:
    0
    thanks for the suggestions.

    i will try wget as it is free and also like the tip of the sitemap ( indeed the sitemap, if the site has, should contain all the urls ).

    will also try SEOquake toolbar to see how it works. for the Scrapebox ( which i understand is a complete solution pack - for what i read on the forum ) maybe later as it is paid and my knowledge is very little.

    Regards.
     
  5. KevinA

    KevinA Regular Member

    Joined:
    Feb 27, 2012
    Messages:
    328
    Likes Received:
    124
    Im pretty sure scrapebox does this!
     
  6. curphey

    curphey BANNED BANNED

    Joined:
    May 24, 2012
    Messages:
    117
    Likes Received:
    41
    Pm me the link ill do it in sb for you.
     
  7. Bostoncab

    Bostoncab Elite Member

    Joined:
    Dec 31, 2009
    Messages:
    2,255
    Likes Received:
    514
    Occupation:
    pain in the ass cabbie
    Location:
    Boston,Ma.
    Home Page:
    you could use httrack to copy the entire site to your hd or use scrapebox sitemap scraper
     
  8. partymarty4870

    partymarty4870 Elite Member

    Joined:
    Jul 7, 2010
    Messages:
    2,034
    Likes Received:
    1,690
    Location:
    I come from a land downunder
    scrapebox does do this but you'll burn down your proxies really fast scraping google.

    I'm doing this with free online sitemap generators to avoid using my proxies at the moment.

    To find one just google "free online sitemap generator" - most limit to 500 pages but there are some that will let you do more.
     
    • Thanks Thanks x 1
  9. hatelovemisery

    hatelovemisery Junior Member

    Joined:
    Apr 17, 2011
    Messages:
    118
    Likes Received:
    22
    Location:
    Singapore
    Seo quake sorta sucks, SB all the way. But seo quake is probably the only free alternative.
     
  10. Bostoncab

    Bostoncab Elite Member

    Joined:
    Dec 31, 2009
    Messages:
    2,255
    Likes Received:
    514
    Occupation:
    pain in the ass cabbie
    Location:
    Boston,Ma.
    Home Page:
    httrack