Entire Site Scrapping

Discussion in 'Black Hat SEO' started by anuj291, Aug 7, 2014.

  1. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,556
    Likes Received:
    316
    Hi Guys,

    Need your help in scrapping an entire site ...

    For example i want to scrape all the urls of Cnn.com

    Can you tell me the footprint for the same ??

    if i just put site:cnn.com with no keywords i will get only 1000 odd results.... is it necessary to use keywords ??? is there no way to just scrape everything off the site ?
     
  2. Aty

    Aty Jr. VIP Jr. VIP

    Joined:
    Jan 27, 2011
    Messages:
    6,140
    Likes Received:
    4,182
    Home Page:
    Scrapebox Link Extractor will do the work.
     
  3. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,556
    Likes Received:
    316
    I am using gscraper...
     
  4. evansharb

    evansharb BANNED BANNED

    Joined:
    Jul 10, 2014
    Messages:
    65
    Likes Received:
    16
    have you looked into its sitemap?
     
    Last edited: Aug 7, 2014
  5. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,556
    Likes Received:
    316
    I was thinking the same... But for example like a site has pages and blog... Sitemap will show domain /blog.. But it won't have all those blog urls..... So I don't think sitemap will capture all
     
  6. Buzzika

    Buzzika Supreme Member

    Joined:
    Jul 8, 2009
    Messages:
    1,307
    Likes Received:
    1,598
    Gender:
    Male
    Occupation:
    Hustler
    Location:
    Gurgaon
    You could use "Screaming Frog URL Spider", but the free version is limited to only 500 urls.
    Scrapebox would work using the "site:domain.com" footprint. But you would need proxies if you wish to scrape large number of urls.
    Gscraper could also utilize the same footprint I think. I have never personally used Gscraper as Scrapebox has been good enough solution for me.
     
  7. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,556
    Likes Received:
    316
    Else to scrape through the keyword... I will need all keywords possible in the world...!!!

    Can someone give me the holy grail!
     
  8. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,556
    Likes Received:
    316
    Screaming or xenu could be used...
    You are saying to extract the sitemap?
    If I put it in for just one site... It will take all the urls to find broken links... Hmm... Makes sense.. I think that will work....

    So scraping is useless then...
     
  9. gary2

    gary2 Jr. VIP Jr. VIP

    Joined:
    Jan 20, 2013
    Messages:
    1,732
    Likes Received:
    237
    Occupation:
    Inbound Marketer. Blogger. Author.
    Location:
    Near River
    try the scraper extension for chrome... it worked for a friend of mine... may be it would do the same for you....
     
  10. anuj291

    anuj291 Elite Member

    Joined:
    Feb 1, 2009
    Messages:
    1,556
    Likes Received:
    316
    that wont work.. I am looking at huge scrapes...

    i need a proper footprint that can get all the urls -- for Gscraper ...