scraping pages from an autoapprove domain

Discussion in 'Black Hat SEO' started by cloakndagger, Feb 23, 2011.

  1. cloakndagger

    cloakndagger Power Member

    Joined:
    Oct 31, 2010
    Messages:
    613
    Likes Received:
    173
    Ok I've been spamming,er sorry,posting alot of constructive comments and have just started checking links straight away to get auto approve links.
    Anyway I have a few and we're talking a 100 or so but I know other pages,posts are auto approved by the 1500 comments lol.
    How do I get all the pages from the url using scrapebox?
    cheers
    CnC
     
  2. movieman32

    movieman32 Regular Member

    Joined:
    Aug 6, 2008
    Messages:
    371
    Likes Received:
    346
    There are 2 easy ways. Your choice.

    1. load all your autoapprove urls into the harvester and trim to root. Remove all duplicate URLs.
    2. Save the list as a text file.

    Now comes your choice.
    1. in the footprint search box, type in site:
    2. import your list of sites (trimmed to root) into the keyword box

    Or
    1. Open your trimmed to root list and do a find and replace all.
    find http:// replace site:http://
    save the list
    2. Import the list into the keyword search box.

    Start harvesting.

    You are gonging to wind up with loads of tag pages, pdfs, and archive pages.

    What I do is open the harvested list in Excel or Open Office and do another find and replace

    find *tag* leave the replace field blank - be sure to type in the asterisks or Excel will only replace the word "tag" with a blank space (you want to delete the whole url)
    replace all - this will eliminate tons of pages you can't use for commenting

    Here are a few more common deletions I use
    *rss*
    *feed*
    *archive*
    *xml*
    *pdf*
     
    • Thanks Thanks x 1
    Last edited: Feb 23, 2011
  3. cloakndagger

    cloakndagger Power Member

    Joined:
    Oct 31, 2010
    Messages:
    613
    Likes Received:
    173
    I know I've thanked you but I'll give you a written thanks as well for such a clear method(s) for doing it.
    Once I've got around 1k of domains I'll do the above and fire the list up here.
    Thanks again movieman32 you're a star :)