1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[TUT] How to easily build HUGE sites lists for GSA SER

Discussion in 'Black Hat SEO Tools' started by Hinkys, Sep 13, 2013.

  1. Hinkys

    Hinkys Jr. VIP Jr. VIP

    Joined:
    Mar 3, 2012
    Messages:
    673
    Likes Received:
    547
    Location:
    Croatia
    Hey guys, this is something I started doing a while back and wanted to share it with you guys cause it's THAT good.

    Basically, with this trick you will increase your site lists EXPONENTIALLY.

    I'll show you to use a seed list of as little as 100 AA blogs to create a list of 100,000 sites (after deduping both URLs and domains) that you can import to GSA SER and get a generous percentage of them on successful and verified. This process shouldn't take more than a few hours.

    This list should then be imported in to a dedicated project in GSA (which I will explain how to setup later in the tutorial).

    Note that you will need Scrapebox for this.

    Last time I did this (which was a few days ago) I started with 2000k AA blogs taken from one of my projects and ended up with 2mil+ DIFFERENT DOMAINS to import in to SER. This resulted in 30k verified links. And that's with using out-of-the-box Captcha Breaker (See why I prefer CB to SC) & highly spun content. If I was to use more readable content and a more optimized version of CB, the submitted / verified numbers would be MUCH higher.

    Idea behind this is the following:
    If you managed to post a comment to a particular blog using GSA SER, there's a good chance that a lot of other GSA SER users have managed to post there as well. This means that there are a lot of other people building their Tier 2 / 3 in those same blogs.

    By scraping all internal links on those blogs, we get a huge list of blog posts that have been violated by SER

    Then by extracting all external links from the
    huge list of blog posts, we get a massive list of potential targets for SER


    Detailed Version

    Step 1 - Getting a seed list

    First you need a list to start with. When you're doing this for the first time, you can go to your verified folder and copy all URLs from blog comments. Don't worry if you get very low number of URLs this way (I got around 500), you will see it's still as effective.

    For every other time you're doing this, you can just use a list of verified blog comments you created from the last time you did this whole process. Other than that, you can simply harvest a list of blogs and check which ones are auto-approve by posting there with SB / SER.

    When you found a list that you can use, import it to SB, trim to root, dedup URLs, save the list as "Step 1 - seed list" and proceed to step 2.

    Step 2 - Extracting internal links

    Fire up the "Link Extractor" plugin in Scrapebox and load the "Step 1 - seed list" file. Set the mode to "internal", use as much connections as your box can handle and start.

    After it's done, import that list to scrapebox and close the link extractor. Scroll through the list and see if there's many comment (usually ending with /#comment), categories, tag links, etc. Use the "Remove URLs containing" to try and get rid of as many of those as possible. Ideally you'd want a list consisting of nothing but blog posts.

    This filtering isn't necessary but depending on the size of your initial list, the next step could take considerably longer if you skip this.

    After you're finished, save the list and name it "Step 2 - all internal links".

    OPTIONALLY
    Setup a dedicated project in SER and feed it those lists to filter out the junk from the actual blog posts
    As a bonus, this is also a good way to build your AA list.

    Step 3 - Extracting external links

    Before starting, you should split "Step 2 - all internal links" file in to smaller ones, no more than 10k per file and then process them one at a time. The reason behind this is that Step 3 usually produces up to 200x more URLs than the number of links you start this step with. (For example, I usually use batches of 5k links which result in list of 400-900k de-duped URLs)

    If you use large lists in this step, you will end up with couple of million of URLs and as far as my experience goes, Scrapebox doesn't handle more than 1mil URLs all that well.

    You can use the "Dup Remover" Scrapebox plugin for splitting the files.

    So open the link extractor again and load the first batch of the "Step 2 - all internal links" file. Set the mode to "external" and hit start. Go make yourself a coffee and once the link extract has finished, transfer the list to Scrapebox, dedup if needed and save the file as "Step 3 - Needs sorting".

    Repeat the process until you have gone through all the smaller batches of the "Step 2 - all internal links" file.

    You can run multiple instances of the link extractor simultaneously as long as your box can handle that if you want to speed this up (tho I wouldn't recommend more than 2-3 per SB instance since it will most likely crash).

    Step 4 - Sorting the list

    Now you should have quite large lists of sites that need to be sorted somehow. Luckily, GSA SER will do this automatically without too much hassle.

    Setup a new project with the following settings:
    [​IMG]
    [​IMG]

    Note that this setting will filter out all unindexed sites. If you don't care about PR and just want as many links as possible, just untick all PR filters, it should skip PR checking for sites, probably making your project faster.

    Also make sure to untick all search engines & site lists, you want this project to ONLY post to target URL lists you import.

    Double check that you're only using Captcha Sniper / Captcha Breaker for this project as anything else will very quickly deplete your balance.

    Now just generate some relevant spun content (the type you would be using in all your other projects) and your new "Sitecheck" project is good to go.


    After you got it setup, import the "Step 3 - Needs sorting" as target URLs for that project. (I don't know the limitations of GSA SER but I split everything above 1mil URLs in to smaller files and then let GSA go through the files one at a time.)

    Now this project will now go through the list and fill up your identified / successful / verified lists!

    Double check that you're saving Identified, Submitted & Verified sites in SER.

    Now wait till it's finished, grab a list of verified blog comments / image comments you just created and repeat the process!

    Obviously GSA SER won't be able to post to all of these sites but you will be surprised at how much it will!

    Short Version

    1. Get a list of verified blog comments from GSA
    2. Extract all internal links
    3. Extract all external links (from all internal)
    4. Import the list to a project in GSA and let it sort it for you
    5. Repeat from 1. with the verified links you just created

    Example

    If you want to see how good these lists are, I just shared a part of the recent run:
    http://www.blackhatworld.com/blackh...ch-engine-optimization-tools.html#post6041304

    It's 4k PR?-7 AA verified links sorted by type and engine. Also, if you just want to throw dem links right in to SER, there's also a raw URL .txt export in there.

    And this is barely 20% of the whole list!

    Problems:

    Ideally you would keep a database of all sites you already imported to GSA SER (master list) and everytime you complete Step 3, you would filter your new list vs. the master list of all sites already found.
    While GSA SER will handle this when sorting and display "already parsed" for all sites you imported before (assuming you're using just 1 sitecheck project), it still takes MUCH LONGER for GSA SER to process the imported lists then it takes you to compile them with this process. And that's with 200-300 LPM. :eek:

    Right now it takes me a couple of hours to do this process and then around 5 days for GSA to process all found sites.

    I tried keeping a deduped list of ALL sites found this way and then using the "Remove URLs containing entries from..." option in Scrapebox everytime I get a new list but with little luck. Even after a week of doing this, the master list got so large that SB crashes every time I try to do it.

    :(

    Any tips on how to handle this problem would be quite welcome. :D
     

    Attached Files:

    • 1o5t.jpg
      1o5t.jpg
      File size:
      76.9 KB
      Views:
      652
    • Thanks Thanks x 84
  2. hadoken

    hadoken Regular Member

    Joined:
    Dec 4, 2012
    Messages:
    300
    Likes Received:
    529
    Location:
    Toronto
  3. royserpa

    royserpa Jr. VIP Jr. VIP Premium Member

    Joined:
    Sep 28, 2011
    Messages:
    4,641
    Likes Received:
    3,491
    Gender:
    Male
    Occupation:
    Negative Options aka Rebills!
    Location:
    Royserpa
    Home Page:
    This is what I've been searching OP.
    Problem is I dont wanna buy SB, do you think I could get away with only using GSA SER man?
     
  4. hadoken

    hadoken Regular Member

    Joined:
    Dec 4, 2012
    Messages:
    300
    Likes Received:
    529
    Location:
    Toronto
    You would not be able to do
    Which is what makes the list even larger.
     
  5. Winternacht

    Winternacht Junior Member

    Joined:
    Jan 7, 2011
    Messages:
    113
    Likes Received:
    46
    you could try the option Import URL List --> select url list to compare, but on lists over 1 million urls scrapebox tends to not perform well.
     
  6. Riders On The Storm

    Riders On The Storm Senior Member

    Joined:
    Feb 27, 2012
    Messages:
    1,149
    Likes Received:
    489
    cool twisty share
     
  7. Bokva

    Bokva Regular Member

    Joined:
    Jan 26, 2010
    Messages:
    201
    Likes Received:
    110
    Occupation:
    Graphic and Web design
    Location:
    Underground
    Thanks, Hinkys.
    Very nice and detailed tut.
     
  8. capricious

    capricious Junior Member

    Joined:
    Feb 23, 2013
    Messages:
    160
    Likes Received:
    42
    I really appreciate your effort. I have both- SB and GSA SER. This tutorial is ideal for me. Thanks a lot! :)
     
  9. nanavlad

    nanavlad Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 2, 2009
    Messages:
    2,419
    Likes Received:
    892
    Gender:
    Male
    Occupation:
    SEO Consultant
    Location:
    Proxy Central
  10. azura

    azura Newbie

    Joined:
    Mar 20, 2010
    Messages:
    46
    Likes Received:
    7
    nice and detailed tutorial, the possibility are endless.

    but, I want to ask, how effective is the spamming method now?
     
  11. anteros

    anteros Newbie

    Joined:
    Jun 28, 2009
    Messages:
    29
    Likes Received:
    5
    Well, Sven (from GSA) already implemented this in SER in the latest update, but still scraping from scrapebox is faster.
     
  12. 7thAmigo

    7thAmigo Jr. VIP Jr. VIP

    Joined:
    Dec 4, 2011
    Messages:
    959
    Likes Received:
    63
    Location:
    Area 51
    You can also buy a list of thousands of verified GSA links from Fiverr.
     
  13. MMOTrading

    MMOTrading Registered Member

    Joined:
    Sep 12, 2013
    Messages:
    56
    Likes Received:
    3
    Occupation:
    http://mmo-trading.com
    Home Page:
    Thanks for sharing :)
     
  14. Hinkys

    Hinkys Jr. VIP Jr. VIP

    Joined:
    Mar 3, 2012
    Messages:
    673
    Likes Received:
    547
    Location:
    Croatia
    Well as anteros said it, GSA SER now has a similar feature only you do it directly from you verified links list. You can do it like that without a problem but I myself prefer to keep all scraping only on Scrapebox and use SER for posting and nothing else.

    Yeah, indeed he did. Even so, when you use SER for posting alone and do all your scraping with SB, SER instantly gets 10x faster. If you want a pimped-out optimized setup, you have to make sure that SER is using all of it's threads for posting and not anything else. :)

    For all else, I hope this really helps.
    If you have any ideas to expand / improve upon this, please share with us here!
     
  15. EnzBots

    EnzBots Junior Member

    Joined:
    Aug 25, 2013
    Messages:
    145
    Likes Received:
    67
    thanks for the share! I hope I can use this to make huge money!
     
  16. eagle-flux

    eagle-flux Regular Member

    Joined:
    Dec 27, 2012
    Messages:
    220
    Likes Received:
    58
    Location:
    Silicon Valley
    can you make the tutorial if using GSA SER... i dont afford to buy scrapebox right now..
     
  17. Hinkys

    Hinkys Jr. VIP Jr. VIP

    Joined:
    Mar 3, 2012
    Messages:
    673
    Likes Received:
    547
    Location:
    Croatia
    This exact can't be done with GSA SER, you need scrapebox mainly for extracting internal and external links which is something that SER can't do. There's a new feature that lets you post to external links from the sites in your verified list but it's not nearly as effective as this method.
     
  18. nanavlad

    nanavlad Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 2, 2009
    Messages:
    2,419
    Likes Received:
    892
    Gender:
    Male
    Occupation:
    SEO Consultant
    Location:
    Proxy Central
  19. davbel

    davbel Newbie

    Joined:
    Jun 22, 2012
    Messages:
    14
    Likes Received:
    3
    Great tut :D
     
  20. DBMEDIALLC

    DBMEDIALLC Junior Member

    Joined:
    Feb 24, 2013
    Messages:
    174
    Likes Received:
    36
    Location:
    Seattle, WA
    My problem with scraping is just having the public proxies to keep the damn thing going. I use Proxygo's service which is real reliable, but you only get about 2 hours of solid scraping time with each list. I might just load it up with private ones and run the connections on 10% or something.