1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to keep GSA SER Lists Clean And Increase Your LPM

Discussion in 'Black Hat SEO' started by TrevorB, Oct 6, 2013.

  1. TrevorB

    TrevorB Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 21, 2011
    Messages:
    1,185
    Likes Received:
    361
    Location:
    Canada
    I'm an avid user of GSA SER and totally love the software that Sven
    has made. As you know, as your lists build up, so does all the garbage
    in them. With this post, I hope to tell you how to keep your lists clean
    as a whistle.


    If you?re having low LPM problems, this may fix everything for you.


    First off, I have to stress this; this takes lots of work and time to complete,
    so if you are one of those users who don't like lot's of work, or don't mind
    getting all those download failed messages in SER, you can stop reading
    now. Thank you.


    1) First thing you want to do is create a project for each engine type that
    SER has. (I told you this is going to be lot's of work).


    The reason for this, SER sometimes mis-identifies an engine if you have
    multiple engines selected that have the same footprints on the page. A
    great example of this is the General Blogs engine, and a few Article engines.
    I?ve had many get identified as General Blogs, but they should have been
    identified as Zendesk for example.


    Make sure that you do not put any limits on these projects such as pausing
    after X amount of verified?s or something.


    2) Once you have all the projects created, select them all then Right Click >
    Modify Project > Move To Group > New. Name this group (Sorting Project).
    Now all those projects will be in a nice group for you.


    3) Once that's completed, open up Scrapebox and open the Dup Remove
    add on. What you want to do now, is merge all the site lists together. Merge
    all your Identified and save that as GSA Identified Site Lists. Do this for all
    your Site lists folders. With that same tool, remove all the duplicate URL?s also.


    If you have huge site lists, over 1 million URL?s in each of the files that you
    just created, simple split them up in to 600,000k chunks.


    4) Once you have your files created, import each one in to Scrapebox and
    remove all unwanted extentions. EG: .gif, .pdf, .jpg, ETC. Then Export that
    cleaned URL list as the same file name you just cleaned, and Scrapebox will
    over write it with the cleaned list.


    Taking this one step further. Trim each site to root domain, remove duplicate
    domains, and save that as the same file name + Domains Only. Doing this
    helps if the URL to an article as an example does not exist, this will give you
    a 404 error message, having the domain only, SER should identify the site still.


    5) Once you have all your new clean lists created, DELETE all of GSA SER site
    lists files. This is very important to do, if you don?t, it just defeats the purpose
    of doing all this.


    6) Now import each of the files in to each of the projects you created and let
    SER go to work.


    I do this every couple of months when I do my full clean up of SER and this
    has helped immensely.


    I hope this is easy to understand and helps you out.


    If you have any questions, simply post them, and I will try my best to help.
     
    • Thanks Thanks x 9
    Last edited: Oct 6, 2013
  2. TrevorB

    TrevorB Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 21, 2011
    Messages:
    1,185
    Likes Received:
    361
    Location:
    Canada
    221 views and no comments. Guess not to many people using SER here.

    Well I tried. lol
     
  3. nanavlad

    nanavlad Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 2, 2009
    Messages:
    2,420
    Likes Received:
    892
    Gender:
    Male
    Occupation:
    SEO Consultant
    Location:
    Proxy Central
  4. TrevorB

    TrevorB Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 21, 2011
    Messages:
    1,185
    Likes Received:
    361
    Location:
    Canada
    I think there already is a BST for this, is there not?

    Maybe I should write an e-book on how to get the most out of GSA SER.
     
  5. hadoken

    hadoken Regular Member

    Joined:
    Dec 4, 2012
    Messages:
    300
    Likes Received:
    529
    Location:
    Toronto
    Nice method. A clean list is so much more effective. Half the time GSA or other software is just wasting its time on bad urls in list
    +rep
     
  6. HelloInsomnia

    HelloInsomnia Jr. Executive VIP Jr. VIP Premium Member

    Joined:
    Mar 1, 2009
    Messages:
    1,817
    Likes Received:
    2,913
    Maybe I'm missing something here but you can delete duplicate domains and urls from within GSA using options -> advanced -> tools.

    Then you can open up the verified folder and choose each individual file and open it in Scrpaebox if you wish to remove further things or a much quicker way would be to find and replace in Notepad++ (find in files and replace using a regular expression to do all files at once).
     
  7. TrevorB

    TrevorB Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 21, 2011
    Messages:
    1,185
    Likes Received:
    361
    Location:
    Canada
    Yes you can do all that is SER but, that will not get rid of all the dead links. Take your article links for instance, when SER verifies them, or even first finds them, the full URL to that article is save. Now if and when that article gets deleted, that site in most cases with give a 404 error message, when SER gets these, it can match that site no more. Now your left with a dead link in your sites lists.

    My process get all the garbage out, and only keeps working URL's. Having a site list with thousands of 100% working URL's speeds SER up 10 fold.
     
    • Thanks Thanks x 1
  8. davids355

    davids355 Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 25, 2011
    Messages:
    8,790
    Likes Received:
    6,328
    Home Page:
    Very nice guide, it's just what I was looking for.

    When you do this, how does it affect existing projects - ie if you have projects that have been running for some time using your lists, and then you remove and re import all your lists with the above method, does it not slow down current projects as they would effectively be going through the new lists from scratch and there would be tons of "already submitted" links that they would come across??
     
  9. shuttershades

    shuttershades Registered Member

    Joined:
    Oct 26, 2013
    Messages:
    92
    Likes Received:
    15
    Home Page:
    I have:
    GSA SER
    Gscraper
    Gsa Captcha B
    Gsa Indexer
    W.Article Creator
    30 semi-deicated proxy
    100 gmails accounts.
    VPS (Processor: Intel(R) Xeon(R) CPU L5530 2.66 GHz...RAM: 6 GB...System type 64-bit)
    95.000 verified backlinks (23.07.2014)
    On GSA SER my Lpm is 50-60...but i saw same guys with 300-500 Lpm.
    When i try to use Gscraper my Lpm starts from 30.000 and after 2 min, decrease to 5000 Lpm.
    So. My VPS is not so good or my settings or what?
    I need same help with my GSA SER setings and Gscraper...maybe same footprints sugestion or keyword.
    Thx.
     
    Last edited: Jul 27, 2014
  10. DannyZhang

    DannyZhang Regular Member

    Joined:
    Apr 2, 2014
    Messages:
    233
    Likes Received:
    69
    This does sound useful indeed. How does it take to repost everything to repopulate your site lists?
     
  11. shuttershades

    shuttershades Registered Member

    Joined:
    Oct 26, 2013
    Messages:
    92
    Likes Received:
    15
    Home Page:
    It's take a while, because I lack some information.:(