One of the big problems with seo tools and huge lists of sites to post to is that the lists will contain tons of duplicates that aren't easily removed. Before I show you the code I will tell you the results from running this just now. My original list had 497718 lines (urls) and after running this it had 433981. Over 60,000 urls that my seo software doesn't have to deal with now. Examples: Code: http://some.com/read.php?tid=10053 http://some.com/read.php?tid=10053&ordertype=desc http://some.com/read.php?tid=10053&page=e& All three of those urls are the same for our purposes. We dont need all the extra junk on the end to feed it into an seo tool. Most de-duplication tools will leave all three of those in your list. More examples: Code: http://example.com/forum.php?mod=viewthread&tid=974 http://example.com/forum.php?mod=viewthread&tid=974&extra=page%3D1 http://another.com/index.php?title=User:Sdgsdgsdg http://another.com/index.php?title=User:Sdgsdgsdg&oldid=31992 http://test.com/phpbb3/viewtopic.php?f=2&t=30301 http://test.com/phpbb3/viewtopic.php?f=2&t=30301&view=print Thats three more examples of the same problems. I've spent the last few hours putting together a sed script that cleans up 60 versions of these. File "replacements.sed": Code: s/\/read\.php\?.*/\//g s/\/viewthread\.php.*/\//g s/\/forum\.php.*/\//g s/\/index\.php?title=User:.*/\//g s/\/index\.php?title=User%3A.*/\//g s/\/viewtopic.php?f=.*/\//g s/\/showthread\.php?.*/\//g s/\/review\.asp?.*/\//g s/\/index\.php\/User:.*/\//g s/\/index\.php\/User%3A.*/\//g s/\/viewtopic\.php?p=.*/\//g s/\/boke.asp?.*/\//g s/\/forum\/index.php?topic=.*/\/forum\//g s/\/index\.php?action=profile.*/\//g s/\/index\.php?topic=.*\.new/\//g s/\/memberlist\.php?.*/\//g s/\/home\.php?mod=.*/\//g s/\/?feed=rss.*/\//g s/\/index\.php?option=com_akobook.*/\/index\.php?option=com_akobook/g s/\/search.php.*/\//g s/\/forum\/topic\.php?.*/\/forum\//g s/\/asae-comments\.cgi.*/\//g s/\/contact\.php?.*/\//g s/\/member\.php?action=profile.*/\//g s/\/YaBB\.**?.*/\//g s/\/viewtopic\.php?pid=.*/\//g s/\/forum?func=view.*/\//g s/\/viewlinks\.php?.*/\/viewlinks\.php/g s/\/posting\.php?.*/\//g s/\/space\.php?uid=.*/\//g s/\/showtopic\.aspx?.*/\//g s/?bfm_index=.*//g s/\/so\.php?id=.*/\//g s/\/modules\.php?name=Forums.*/\/modules\.php?name=Forums/g s/\/printthread\.php?.*/\//g s/\/index\.php?option=com_ckforms.*/\//g s/\/?contact_form=.*/\//g s/\/?widgetType=.*/\//g s/\/index\.php?showuser=.*/\//g s/\/forum\/member\.php?.*/\/forum\//g s/\/index\.php?do=forum.*/\/index\.php?do=forum/g s/\/guest-book\/?cpage=.*/\/guest-book\//g s/\/rss\.php?.*/\//g s/\/index\.php?site=guestbook.*/\/index\.php?site=guestbook/g s/\/profile\.php?mode=viewprofile.*/\//g s/\/search?updated-.*/\//g s/\/index\.php?topic=.*\.0$/\//g s/\/index\.php?topic=.*\.msg.*/\//g s/\/forum_viewtopic\.php?.*/\//g s/\/index\.php?site=forum_topic.*/\/index\.php?site=forum/g s/\/submit_article\.php?id=.*/\//g s/\/submit\.php?id=.*/\//g s/\/index\.php?action=post;.*/\//g s/\/Archiver\.asp?ThreadID=.*/\//g s/\/member\.php?u=.*/\//g s/\/guest_book\.php?.*/\/guest_book\.php/g s/\/index\.php?topic=.*\.0;prev_.*/\//g s/\/upcoming\.php?page=.*/\//g s/\/redirect\.php?.*/\//g s/\/guestbook\.php?page=.*/\/guestbook\.php/g s/?replytocom=.\+//g If your a linux/unix user you will know what to do with this file. Save that text as replacements.txt then do this: Code: sed -f replacements.sed < urls.txt | sort -u >fixed-urls.txt So lets say you have your list of urls in urls.txt they would now be in fixed-urls.txt and the file will have far less duplicates. If anyone has any others that are a real problem and that I didnt cover in this list please tell me, with examples, and I will get them added. ps: I wont be using this thread to teach anyone linux or the commandline. If you dont understand this post maybe someone else has the time to help.