1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Anyone know a free text manipulation toole that could do this?

Discussion in 'Black Hat SEO Tools' started by Kitsune, Aug 2, 2012.

  1. Kitsune

    Kitsune Junior Member

    Joined:
    Mar 2, 2008
    Messages:
    136
    Likes Received:
    30
    For example i have 2 txt files:
    List1.txt contains
    www.someurl.com/1
    www.someurl.com/2
    www.someurl.com/3
    www.someurl.com/4
    www.someurl.com/5

    List2.txt contains
    www.someurl.com/1
    www.someurl.com/3
    www.someurl.com/15
    www.someurl.com/10
    www.someurl.com/5

    The tool would have to output all the lines from List1.txt which doesn't have duplicate lines from List2.txt

    As a result the output would look like this:
    www.someurl.com/2
    www.someurl.com/4

    I know ultimate demon has this option however it crashes when working with large lists.
    Any other tool to do this?

    Thanks
     
  2. dgofat

    dgofat Junior Member

    Joined:
    Dec 23, 2008
    Messages:
    114
    Likes Received:
    11
    Location:
    127.0.0.1
    try this: textmechanic.com
    this web tool can do practically anything with a text
     
  3. Kitsune

    Kitsune Junior Member

    Joined:
    Mar 2, 2008
    Messages:
    136
    Likes Received:
    30
    I know a way of using the duplicate line tool from text mechanic to do this however it crashes when working with larger lists :(
     
  4. Scritty

    Scritty Elite Member Premium Member

    Joined:
    May 1, 2010
    Messages:
    2,807
    Likes Received:
    4,496
    Occupation:
    Affiliate Marketer
    Location:
    UK
    Home Page:
    Could do this in Excel quite easily.
    Combination of "Find" and "Lookup" commands
    Even easier with a quick macro

    Check this out;
    http://www.mrexcel.com/archive/Formulas/565.html

    Pseudocode

    Couple of loops and comparisons - about 15 lines in Excel. Useful tool by the looks of it. Though I'm sure there are sites on the net that do the same in your browser.

    Scritty
     
  5. tacopalypse

    tacopalypse Executive VIP Jr. VIP Premium Member

    Joined:
    Nov 30, 2009
    Messages:
    980
    Likes Received:
    2,485
    Home Page:
    put both lists in the same column in excel, with list 2 above list 1, and separate them with a dotted line entry "-------------------"

    do remove duplicates on the entire column, then delete everything above the dotted line.

    what remains below the line is the output you want.
     
  6. kvmcable

    kvmcable Supreme Member

    Joined:
    Dec 28, 2010
    Messages:
    1,355
    Likes Received:
    2,815
    Occupation:
    24 year business owner - old school dude
    Location:
    KFC - BW3
    Textpipe Pro - torrents. Little talked about program but essential if you work with text files. Master it (learning curve) and you can do almost anything with text files.
     
  7. termseo

    termseo Junior Member

    Joined:
    Nov 4, 2010
    Messages:
    103
    Likes Received:
    160
    Occupation:
    Software ingineer
    i've just create simple c# application to resolve your prob
    [​IMG]

    Download it here

    good luck.
     
    • Thanks Thanks x 1
  8. sockpuppet

    sockpuppet Junior Member

    Joined:
    Nov 7, 2011
    Messages:
    155
    Likes Received:
    145
    quite easy with grep:
    grep -v -f list2.txt list1.txt
     
  9. Rua999

    Rua999 Power Member

    Joined:
    Jun 25, 2011
    Messages:
    630
    Likes Received:
    407
    Nice tool, thanks for making, but no virus scan results and what's the deal with using a french site to upload it to?!

    Just scanned it with AVG there and no sign f malware but to save anyone thinking of downloading it the download link is "Téléchargement du fichier"
     
  10. Kitsune

    Kitsune Junior Member

    Joined:
    Mar 2, 2008
    Messages:
    136
    Likes Received:
    30
    @termseo
    I created my own tool in java however and while it was working great with small lists after using 100k + lists it was loading just the text file for ages. Had the same problem with your tool :(
    Also i don't have excel - i use openoffice and i found a way to remove duplicate rows however that messes everything up
    As for grep i don't really understand how to use it.
    Well i guess i'll just have to stick with ultimate demon for doing these tasks and learn how to use one of those more comlicated text manipulation tools

    Thanks for answers everyone. I'm sure it will be useful for other BHW members :)
     
  11. jkwilson78

    jkwilson78 Regular Member Premium Member

    Joined:
    Jun 24, 2010
    Messages:
    224
    Likes Received:
    311
    Try this one called Duplicates Finder:
    http://wonderwebware.com/duplicatefinder/

    Its freeware, and can do what you want. It can handle pretty large lists but if your looking at million+ lines it will take quite a while to run. (more than a day but it will still work and not crash)
     
  12. HostStage

    HostStage Jr. VIP Jr. VIP Premium Member UnGagged Attendee

    Joined:
    May 20, 2010
    Messages:
    1,774
    Likes Received:
    1,731
    Occupation:
    BHW - CEO of Webhosting Company
    Location:
    BWH from France
    Home Page:
    I haven't done this before, but Text Pipe Pro remains THE tool for text handlement. It is quite complicated to manipulate at first but definitly worth to give a go to it.
     
  13. csguy

    csguy BANNED BANNED

    Joined:
    Jul 13, 2012
    Messages:
    396
    Likes Received:
    42
    Goto hxxp://texthat.cXm (remove X) and paste all your urls into the url box and click "duplicates" -- it'll remove all duplicate urls.
     
    • Thanks Thanks x 1