1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Filtering URL lists

Discussion in 'Black Hat SEO' started by sbw27, Sep 3, 2009.

Tags:
  1. sbw27

    sbw27 Regular Member

    Joined:
    Jan 6, 2008
    Messages:
    390
    Likes Received:
    441
    I have scraped a list of URL's to use with a blog commenting application, but I want to exclude all duplicate URLS, without having to trim the URL down to just the root...i.e.

    something.com/cool_page.html
    something.com/cool_page56.html
    another.com/anotherpage.html
    another.com/anotherpage56.html

    I would want to ditch the 2nd and 4th URL so I am only left with 1 result from each domain. I have a URL washer which i found on this forum that will give me

    something.com
    another.com

    But not the whole URL for the 2 domains....does anyone have any ideas on how to clean large url lists of duplicate root domains?

    Cheers