1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How do you compare txt files to get Unique list?

Discussion in 'White Hat SEO' started by blackhatdavid, Dec 16, 2010.

  1. blackhatdavid

    blackhatdavid Regular Member

    Joined:
    Nov 5, 2009
    Messages:
    296
    Likes Received:
    106
    For a while now, I have been trying to find a way to do this - surely someone here can provide an answer...

    I have two text files with two lists of URLs. I want to compare the two and produce a third list that contains ONLY the URLs from list2 that ARE NOT in list1.

    With all the scrapebox users around, I would think this is a common problem. You have one list of sites that you have scraped or purchased and you post to it. Then you get a second list. You don't want to post duplicates, so you only want the new sites that are in the second list.

    HOW ARE YOU GUYS DOING THIS?

    I have tried a programs like WinDiff & WinMerge, but I can't see how to use them to do this type of comparison (especially to automatically produce a third file that is just unique new results).
     
  2. nouseforaname

    nouseforaname Junior Member

    Joined:
    Apr 7, 2009
    Messages:
    133
    Likes Received:
    28
    Home Page:
    Would like to know the same too..
     
  3. Jesperj

    Jesperj Power Member

    Joined:
    Sep 10, 2010
    Messages:
    502
    Likes Received:
    347
    Occupation:
    Web Designer
    Location:
    Far, Far away
    Home Page:
    Just load the new urls into scrapebox, then select Import and compare urls, and if you want unique domains, then Import and compare Domains.

    It will not import the lists, just compare and remove dublicates.
     
    • Thanks Thanks x 1
  4. allfreetodo

    allfreetodo Registered Member

    Joined:
    Jan 9, 2008
    Messages:
    66
    Likes Received:
    16
    here you go

    Code:
    http://www.easymarketingclassifieds.com/Sorter8.zip
    
    this is a pretty powerful sorter plus more features for cleaning many different types of list
    
    http://www.virustotal.com/file-scan/report.html?id=9eefb6019464983fd44d1fbfc896db7081c6e17408beae753c3cca390dbd79b0-1292511970
    
    
    
     
  5. blackhatdavid

    blackhatdavid Regular Member

    Joined:
    Nov 5, 2009
    Messages:
    296
    Likes Received:
    106
    Thanks for this info. (thanks given)
     
  6. blackhatdavid

    blackhatdavid Regular Member

    Joined:
    Nov 5, 2009
    Messages:
    296
    Likes Received:
    106
    OK guys...for anyone else wanting to do this type of thing...

    I just found the PERFECT tool...and it's FREE!

    What makes this useful (as opposed to Jesperj's suggestion above) is that it can be used on ANY TEXT files, not just URLS.

    You can input one file, and the program will remove all duplicates from that one file. You can input two files, and it gives you several output options. You can get the items that are unique to each file, or items that are in both, etc. Check it out!

    I think I am going to use the first option (input one file) to remove duplicate proxies instead of using Scrapebox. Have you noticed in SB, if you harvest several thousand proxies and then remove duplicates, it takes FOREVER! This program is FAST!

    Anyway, here's the link...

    Code:
    http://wonderwebware.com/duplicatefinder/download.html
     
    • Thanks Thanks x 5