After harvesting a huge list of blog, there are some 'obvious' URL that I know it wont work - especially the root domains such as http://www.abc.com
How can I filter those out?
Let say I have a list of 5 urls
http://www.abc.com
http://www.abc.com/p=321312
http://www.abc.com/news/basfsaf
http://www.abc.com/blog/124124222
http://www.22241412.com
Knowing that abc.com and 22241412.com probably arent commentable,
what can I do to filter them out?
Thanks in advance.
How can I filter those out?
Let say I have a list of 5 urls
http://www.abc.com
http://www.abc.com/p=321312
http://www.abc.com/news/basfsaf
http://www.abc.com/blog/124124222
http://www.22241412.com
Knowing that abc.com and 22241412.com probably arent commentable,
what can I do to filter them out?
Thanks in advance.