WebHarvy : trips

fmartin · Jan 31, 2014

WebHarvy is a easy way to scrap website, but have several limitation.
-You can not provide it a list of url to parse ( or i do not found out how)
-You can not set proxy to be used randomly ( it cycle throu it)
-You can not be multithread.

So, here the tips:
-> Clone 10 time your WebHarvy bin folder.
-> start each clone, and configure it to use 10% of your proxy.
-> Create 10 .cmd file each starting 10% of url to parse, using comment line:

Ex cmd2:

IF EXIST A:\app\scrapebox\projet\WebHarvy\out\out2.csv ( echo "Already exist A:\app\scrapebox\projet\WebHarvy\out\out2.csv" ) else ( A:\app\webharvy\WebHarvy.exe A:\app\scrapebox\projet\WebHarvy\scrap_pj2.xml -1 A:\app\scrapebox\projet\WebHarvy\out\out2.csv )
IF EXIST A:\app\scrapebox\projet\WebHarvy\out\out3.csv ( echo "Already exist A:\app\scrapebox\projet\WebHarvy\out\out2.csv" ) else ( A:\app\webharvy\WebHarvy.exe A:\app\scrapebox\projet\WebHarvy\scrap_pj3.xml -1 A:\app\scrapebox\projet\WebHarvy\out\out2.csv )

...

So, you can start the 10 CMD file now. it will resume if stopped.
All will work in // and rotate on their proxy list. So 2 thread wont use same proxy.

harubel · Sep 14, 2016

I have webhary software and like to exchange with someone interested

WebHarvy : trips

fmartin

Junior Member

harubel

Newbie

Main Menu

Marketplace

Making Money

BlackHat World