Aug 11, 2012
i need to get all the urls from a site ( just the urls list ) that are indexed in google or eventually by "scaning" that site all following internal links but i am new to terms like grabing, harvesting or fetching.

the site uses what i thing is called pretty urls so no .php?... is shown.

my difficulty is that i even do not know what it is used to call to this kind of technical issue.

is there any tool or some help you could recommend ?

many thanks all !
To get the urls that are indexed by Google then you would use the query in Google. Make sure that you use &filter=0 so that you get every result, including the near-duplicate pages. You could do this by hand or by using a tool like Scrapebox.

If you want to spider an entire site then the tool I would use would be wget, which is simple and reliable. If you are using Windows then there are stand-alone versions for Windows available. Of course, if the site publishes a sitemap then you could just download the sitemap and extract the urls from that.
Scrapebox is the best option for link checking otherwise you can use SEOquake toolbar.
thanks for the suggestions.

i will try wget as it is free and also like the tip of the sitemap ( indeed the sitemap, if the site has, should contain all the urls ).

will also try SEOquake toolbar to see how it works. for the Scrapebox ( which i understand is a complete solution pack - for what i read on the forum ) maybe later as it is paid and my knowledge is very little.

you could use httrack to copy the entire site to your hd or use scrapebox sitemap scraper
scrapebox does do this but you'll burn down your proxies really fast scraping google.

I'm doing this with free online sitemap generators to avoid using my proxies at the moment.

To find one just google "free online sitemap generator" - most limit to 500 pages but there are some that will let you do more.
Seo quake sorta sucks, SB all the way. But seo quake is probably the only free alternative.
