Scrapebox: Trimming down the list

SEOfarmer

BANNED
Joined
Jun 6, 2011
Messages
342
Reaction score
49
After harvesting a huge list of blog, there are some 'obvious' URL that I know it wont work - especially the root domains such as http://www.abc.com

How can I filter those out?
Let say I have a list of 5 urls

http://www.abc.com
http://www.abc.com/p=321312
http://www.abc.com/news/basfsaf
http://www.abc.com/blog/124124222
http://www.22241412.com

Knowing that abc.com and 22241412.com probably arent commentable,
what can I do to filter them out?

Thanks in advance.
 
You can use Notepad++ and use Regex replace there :)
 
Notepad++ , CTRL+H, select regex mode, in top field enter ^.*.com$
in bottom field leave it blank, press replace all, done
 
Notepad++ , CTRL+H, select regex mode, in top field enter ^.*.com$
in bottom field leave it blank, press replace all, done

Thanks, but this doesnt seem to work.
It says it 'cant find any' but clearly there are some in that document.
Am i doing anything wrong? I attached a screenshot here:

6fmtkg.jpg
 
Just make a new txt file, delete the .txt and give it .bat as ending.

Put in this:

type input.txt | findstr /v /e ".com" > output.txt

Save it.

When doubleclicking this .bat file, it is going to search through input.txt and will type everything with NOT .com at the end into output.txt :)

So if you try with asd.com and asd.com/kasjdbne only asd.com/kasjdbne will be in the output.txt ;)
 
You can also use loopline free tools. Duplicate removal tool also has a option to remove root domains.

Code:
http://scrapeboxmarketplace.com/scrapebox-helper-tools

Cheers
 
Back
Top