Filtering URL lists

Discussion in 'Black Hat SEO' started by sbw27, Sep 3, 2009.

Tags:
  1. sbw27

    sbw27 Regular Member

    Joined:
    Jan 6, 2008
    Messages:
    390
    Likes Received:
    442
    I have scraped a list of URL's to use with a blog commenting application, but I want to exclude all duplicate URLS, without having to trim the URL down to just the root...i.e.

    something.com/cool_page.html
    something.com/cool_page56.html
    another.com/anotherpage.html
    another.com/anotherpage56.html

    I would want to ditch the 2nd and 4th URL so I am only left with 1 result from each domain. I have a URL washer which i found on this forum that will give me

    something.com
    another.com

    But not the whole URL for the 2 domains....does anyone have any ideas on how to clean large url lists of duplicate root domains?

    Cheers