1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Is there a content scraper that does this?

Discussion in 'Black Hat SEO' started by Bostoncab, Jun 18, 2012.

  1. Bostoncab

    Bostoncab Elite Member

    Joined:
    Dec 31, 2009
    Messages:
    2,255
    Likes Received:
    514
    Occupation:
    pain in the ass cabbie
    Location:
    Boston,Ma.
    Home Page:
    I want a content scraper that I can feed a list of urls scraped from Scrapebox. I want the content scraper to go to each url and extract the words images metatags and other page elements and spit it out in one .html file

    Any such thing?
     
  2. CloneX

    CloneX Power Member

    Joined:
    Mar 31, 2012
    Messages:
    597
    Likes Received:
    228
    Not possible, since HTML tags, and content placement of each site is different and varies with platforms as well. At best, a bot can be made that is platform specific, say WP, but it will still have very low accuracy due to varied content placement.
     
  3. Bostoncab

    Bostoncab Elite Member

    Joined:
    Dec 31, 2009
    Messages:
    2,255
    Likes Received:
    514
    Occupation:
    pain in the ass cabbie
    Location:
    Boston,Ma.
    Home Page:
    My idea was to take the top 1,000 results for a keyword from scrapebox. Eliminate the duplicate urls. Have the "bot" or scraper whatever you want to call it go there harvest all the content from each result and stuff it all together on one page. Now I take the html page copy its content into a wp post and upload all the files (images etc.) that the bot harvested to the WP site and presto I have one perfect keword targeted page of unique non spun content.

    No good?
     
  4. CloneX

    CloneX Power Member

    Joined:
    Mar 31, 2012
    Messages:
    597
    Likes Received:
    228
    Not to create any hype, but something similar is one of the modules of Licorne AIO. It doesn't allow custom URLs, but scrapes content based on KW from sources (from Google) and saves Images, videos, articles etc in html file, thus creating a complete site.

    However, what you say is not possible, or not efficient, if you will due to reasons mentioned in previous post.

    It would be best if the harvested URLs are on a single platform/single site. Then it can be done using even ubot.