1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Entire Site Scrapping

Discussion in 'Black Hat SEO' started by anuj291, Aug 7, 2014.

  1. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,533
    Likes Received:
    313
    Hi Guys,

    Need your help in scrapping an entire site ...

    For example i want to scrape all the urls of Cnn.com

    Can you tell me the footprint for the same ??

    if i just put site:cnn.com with no keywords i will get only 1000 odd results.... is it necessary to use keywords ??? is there no way to just scrape everything off the site ?
     
  2. Aty

    Aty Jr. VIP Jr. VIP

    Joined:
    Jan 27, 2011
    Messages:
    5,407
    Likes Received:
    3,698
    Home Page:
  3. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,533
    Likes Received:
    313
    I am using gscraper...
     
  4. evansharb

    evansharb BANNED BANNED

    Joined:
    Jul 10, 2014
    Messages:
    65
    Likes Received:
    16
    have you looked into its sitemap?
     
    Last edited: Aug 7, 2014
  5. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,533
    Likes Received:
    313
    I was thinking the same... But for example like a site has pages and blog... Sitemap will show domain /blog.. But it won't have all those blog urls..... So I don't think sitemap will capture all
     
  6. satyawrat

    satyawrat Jr. VIP Jr. VIP

    Joined:
    Jul 8, 2009
    Messages:
    924
    Likes Received:
    1,182
    Occupation:
    Hustler
    Location:
    Gurgaon
    Home Page:
    You could use "Screaming Frog URL Spider", but the free version is limited to only 500 urls.
    Scrapebox would work using the "site:domain.com" footprint. But you would need proxies if you wish to scrape large number of urls.
    Gscraper could also utilize the same footprint I think. I have never personally used Gscraper as Scrapebox has been good enough solution for me.
     
  7. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,533
    Likes Received:
    313
    Else to scrape through the keyword... I will need all keywords possible in the world...!!!

    Can someone give me the holy grail!
     
  8. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,533
    Likes Received:
    313
    Screaming or xenu could be used...
    You are saying to extract the sitemap?
    If I put it in for just one site... It will take all the urls to find broken links... Hmm... Makes sense.. I think that will work....

    So scraping is useless then...
     
  9. gary2

    gary2 Jr. VIP Jr. VIP Premium Member

    Joined:
    Jan 20, 2013
    Messages:
    970
    Likes Received:
    82
    Occupation:
    Inbound Marketer. Blogger. Author.
    Location:
    Near River
    try the scraper extension for chrome... it worked for a friend of mine... may be it would do the same for you....
     
  10. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,533
    Likes Received:
    313
    that wont work.. I am looking at huge scrapes...

    i need a proper footprint that can get all the urls -- for Gscraper ...