1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox: How to scrape internal blog posts?

Discussion in 'Black Hat SEO' started by SalieriJazz, Mar 13, 2014.

  1. SalieriJazz

    SalieriJazz Regular Member

    Joined:
    Jun 29, 2010
    Messages:
    285
    Likes Received:
    44
    Location:
    Oups, I forgot
    Well, I did a harvest, made a list with wordpress blogs with DA 20+, and now I want to make use of the Manual Poster. The thing is that I have the root domains only, not blog posts links. How can I scrape the links so I'll extract each blog post from each link from my list?

    Tried it with the addon called Link Extractor, but from my 1,500 URL list, it made out 58,000 links, lol ... it extracted everything, categories, even image links, every kind of internal link. I'm just looking for the blog posts links.

    Thanks a lot guys, cheers!
     
  2. Ventio

    Ventio Regular Member

    Joined:
    Nov 22, 2013
    Messages:
    418
    Likes Received:
    136
    Occupation:
    Content Writer
    Location:
    Ya aunties crib
    im not too sure but i think you can load up the sites in the keyword and use the "site:" function, then use the blog analyzer to see which ones you can comment on
     
  3. divok

    divok Senior Member

    Joined:
    Jul 21, 2010
    Messages:
    1,015
    Likes Received:
    634
    Location:
    http://twitter.com/divok
    If those blogs follow a certain pattern , you could use regular expression to filter you posts .
    Have you removed duplicates ?
     
  4. SalieriJazz

    SalieriJazz Regular Member

    Joined:
    Jun 29, 2010
    Messages:
    285
    Likes Received:
    44
    Location:
    Oups, I forgot
    Yes, I have.

    "you could use regular expression to filter you posts . "

    How can I do this?
     
  5. Sweetfunny

    Sweetfunny Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 13, 2008
    Messages:
    1,747
    Likes Received:
    5,039
    Location:
    ScrapeBox v2.0
    Home Page:
    In the Link Extractor addon, click the "Settings" button. There's filters for "Remove urls containing" and "Remove urls not containing" so if you don't want urls from tag pages etc simply filter them by adding something like this to the remove urls containing:

    /tag/
    /category/
    .jpg
    .png
    .bmp


    Now you won't have that issue, also you could use the "Remove urls not containing" filter and add something like /2014/ so it will only keep posts published this year provided this blog uses the year in their blog post url format.

    link-extractor.png