If I have a .txt file that has multiple urls from the same domain and I want to remove duplicate domains, how can I do that in python? So if I have 1.txt and it contains http://www.domain1.com/page1 http://www.domain1.com/page2 http://www.domain2.com/page1 http://www.domain3.com/page1 http://www.domain3.com/page2 and I want to only be left with http://www.domain1.com/page2 http://www.domain2.com/page1 http://www.domain3.com/page2 I don't care which url is kept from a given domain, so long as there is only 1 url from that domain. I was thinking I might be able to do this with regex, but I just have never really used regex much. Perhaps using a dictionary. Or perhaps if there is some module in python that can be imported that already has something that will recognize urls. I can remove duplicate urls just fine, but Im not the worlds foremost python expert, so Im a little stumped on this one. Any help is appreciated.