1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How Do You Organise Your ScrapeBox URL Lists?

Discussion in 'Black Hat SEO' started by tbrad, Oct 5, 2010.

  1. tbrad

    tbrad Registered Member

    Joined:
    Aug 11, 2010
    Messages:
    99
    Likes Received:
    11
    I'm still trying to figure out the best way of organizing my URL Lists.

    So far its pretty basic.

    I have the following files...

    HARVESTED_WordPress.txt
    HARVESTED_BlogEngine.txt
    HARVESTED_Moveabletype.txt
    HARVESTED_Captcha.txt

    With the above files, I will save the raw harvested URLs, unsorted, unpageranked, unchecked.

    And then after ive page ranked them, analyzed them to make sure they are open, I throw out the useless ones and save the good ones to master lists like this...

    MASTER_Wordpress.txt
    MASTER_BlogEngine.txt
    MASTER_MoveableType.txt
    MASTER_Captcha.txt

    Once i build up enough, id like to start splitting lists by Page Rank.

    How do you guys order yours?
    Any ideas or ways i could improve would be welcome.

    Thanks guys.
     
    Last edited: Oct 5, 2010
  2. jellyfish

    jellyfish Junior Member

    Joined:
    Sep 16, 2008
    Messages:
    184
    Likes Received:
    36
    I like working with spreadsheet(excel or whatever you like)
    also after each blast i'll check which one were auto-approve and separate it to an additional file.
    whats taking most of my time is joining all the batchs(1mil result on each batch file)and removing duplicates domains/urls.wish there was a faster way of doing this:)
     
  3. jascoken

    jascoken Senior Member

    Joined:
    Nov 1, 2010
    Messages:
    1,135
    Likes Received:
    751
    Gender:
    Male
    Occupation:
    IT/Web Systems & Development...
    Location:
    Sussex:UK
    Crazyflx has a great post on removing dupes and working with HUGE files:

    crazyflx.com/scrapebox-tips/remove-duplicate-domains-urls-from-huge-txt-files-with-ease