1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[GET] Perl script for duplicated domains removal from text file (get unique domains)

Discussion in 'Black Hat SEO Tools' started by xrfanatic, Jun 1, 2015.

  1. xrfanatic

    xrfanatic Jr. VIP Jr. VIP

    Joined:
    Aug 28, 2010
    Messages:
    419
    Likes Received:
    176
    Location:
    http://bit.ly/slb64
    Home Page:
    Hi BHW,

    since a vast majority of us work with loads of urls daily I'm sharing this little but very handy script which allows to remove duplicated domains from text file and get the only unique ones under Windows command line.

    Getting unique domains from 100 mb file takes about 60 seconds, there are probably faster solutions, but this one is handy because it's standalone and you do not have to interfere with your seo software to take care about that while it's busy.

    Usage in Windows cmd:

    perl remduplicates.** DuplicatedDomains.txt UniqueDomains.txt

    Where:

    DuplicatedDomains.txt - Your file with duplicated domains
    UniqueDomains.txt - New file which will contain only unique domains after script's work is finished.

    If you run this script on Windows, you need to install Strawberry Perl.

    Code:
    http://www.speedyshare.com/FARp2/remduplicates.**
    Hope it was useful :)

    Cheers !
     
    • Thanks Thanks x 2
  2. Repulsor

    Repulsor Power Member

    Joined:
    Jun 11, 2013
    Messages:
    775
    Likes Received:
    280
    Location:
    PHP Scripting ;)
    Free shares like this gets little to no attention here in BHW for some reason. Even I too have shared some nice little hacks before, non cares, but are ready to buy the same functionality for dollars. Its good though. :D

    Good to find a perl guy! :D
     
    • Thanks Thanks x 1
  3. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    974
    Likes Received:
    680
    Occupation:
    Web/Bot Developer
    In Linux, scrubbing a file of duplicate entries is as simple as this one line:
    Code:
    awk '!x[$0]++' list.txt > cleaned.txt 
     
    • Thanks Thanks x 1