1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Quick duplicate url remover command for ScrapeBox users using linux

Discussion in 'Black Hat SEO Tools' started by mataff, Sep 12, 2010.

  1. mataff

    mataff Junior Member

    Joined:
    Sep 21, 2008
    Messages:
    139
    Likes Received:
    54
    Nothing ground breaking, just a useful command that quickly combines all of the harvested files from scrapebox and uniquely sorts them all into one file.

    Collect all of the harvested files over to linux and run:

    - Add however many batch files, no limit)
    - The time command at the beginning is optional and only shows you how long the process took after the process is complete.

    Using the command above (3million links in total on an old 2.2ghz dual core):

    real 1m34.996s
    user 1m12.981s
    sys 0m2.784s

    Then run:
    to get a total count of urls within the new unique file.

    Much quicker then using emacs, gvim, or any other windows based text editor.

    Hopefully this helps.
     
    Last edited: Sep 12, 2010
  2. boo blizzi

    boo blizzi Regular Member

    Joined:
    May 28, 2009
    Messages:
    361
    Likes Received:
    267
    or u can just export all the files to the same list...then reload them and sb will tell u how many urls...lol ;)
     
  3. mataff

    mataff Junior Member

    Joined:
    Sep 21, 2008
    Messages:
    139
    Likes Received:
    54
    How do you load 15-20 files harvested files all into scrapebox, and sort out the uniques?

    I've been able to add each file at a time then sort out duplicates but the loading time on each file takes a while.

    EDIT: Just tried importing the first 1million links from batch0001.txt, then unique sorting, then importing the next million from batch0002.txt and you get the following message:

    How are you getting around that?
     
    Last edited: Sep 12, 2010