Scraping Websites

Discussion in 'Black Hat SEO' started by agag2, Aug 8, 2013.

  1. agag2

    agag2 Supreme Member

    Joined:
    Feb 17, 2009
    Messages:
    1,309
    Likes Received:
    254
    Hi

    If i wanted to scrape news websites and extract ALL content from the site - how would I go about retrieving the urls for all content on site in absence of an xml file or sitemap?

    Thanks
     
  2. Emp1!

    Emp1! Junior Member

    Joined:
    Dec 10, 2012
    Messages:
    147
    Likes Received:
    167
    Like all scrapers do: by following links...
    You can use: httrack. 1 minute install, 1 minute to understand how it works (at least for the basics), and then you can download the website :)

    Have Fun.
     
  3. bubbaranks

    bubbaranks Junior Member

    Joined:
    Jan 29, 2013
    Messages:
    187
    Likes Received:
    23
    Occupation:
    Living off big G
    Location:
    UK
    Scrapebox internal link scraper or dork site:targetsite.com
     
  4. CanCan87

    CanCan87 Newbie

    Joined:
    Jun 8, 2013
    Messages:
    40
    Likes Received:
    5
    The one and only beast SB(scrapebox) will help you to find the desired sites.
     
  5. starstrafe

    starstrafe Registered Member

    Joined:
    Jan 27, 2013
    Messages:
    85
    Likes Received:
    54
    Location:
    spamfolder
    GSA SER has an option "crawl online" which crawls as many levels as you wish.

    Regards