1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scraping Websites

Discussion in 'Black Hat SEO' started by agag2, Aug 8, 2013.

  1. agag2

    agag2 Supreme Member

    Joined:
    Feb 17, 2009
    Messages:
    1,308
    Likes Received:
    254
    Hi

    If i wanted to scrape news websites and extract ALL content from the site - how would I go about retrieving the urls for all content on site in absence of an xml file or sitemap?

    Thanks
     
  2. Emp1!

    Emp1! Junior Member

    Joined:
    Dec 10, 2012
    Messages:
    147
    Likes Received:
    167
    Like all scrapers do: by following links...
    You can use: httrack. 1 minute install, 1 minute to understand how it works (at least for the basics), and then you can download the website :)

    Have Fun.
     
  3. bubbaranks

    bubbaranks Junior Member

    Joined:
    Jan 29, 2013
    Messages:
    187
    Likes Received:
    23
    Occupation:
    Living off big G
    Location:
    UK
    Scrapebox internal link scraper or dork site:targetsite.com
     
  4. CanCan87

    CanCan87 Newbie

    Joined:
    Jun 8, 2013
    Messages:
    40
    Likes Received:
    5
    The one and only beast SB(scrapebox) will help you to find the desired sites.
     
  5. starstrafe

    starstrafe Registered Member

    Joined:
    Jan 27, 2013
    Messages:
    70
    Likes Received:
    52
    Location:
    spamfolder
    GSA SER has an option "crawl online" which crawls as many levels as you wish.

    Regards