looking for a good scraping tool any recommendations?

iTheExpert

BANNED
Oct 25, 2013
116
38
I use gscraper but what I do not like about it is I am not getting platform identification. It does not tell me that particular url http://www.abcd.com is for example: articledirectory-articlebeach or PHPLD or it's bookmarking - pligg platform or Oxwall or forum profile - xoops. I am looking for a way to identify all the URLs or another scraping tool where I can get a list with platform information. Any suggestions? Thank you!
 
I use gscraper but what I do not like about it is I am not getting platform identification. It does not tell me that particular url http://www.abcd.com is for example: articledirectory-articlebeach or PHPLD or it's bookmarking - pligg platform or Oxwall or forum profile - xoops. I am looking for a way to identify all the URLs or another scraping tool where I can get a list with platform information. Any suggestions? Thank you!


The real choices are Gscraper, Scrapebox and Hrefer.

However, mis-identified platforms goes with the territory. It's a function of the search term and NOT a function of the tool which transmits the search term.
In other words - the footprint you use to scrape with.

95% none conformity is pretty normal.
I scrape on several severs 24/7 - about 2.2 million URLS a day the vast majority of which are not the platforms they were supposed to be, and I use some really good footprints that I keep up to date all the time (and sorry - no - I don't give them away - to anyone - ever - they are FAR too valuable to me)

But I suspect your answer is better footprints and a more thorough post scrape testing regime rather than a change of tool.

Good luck. Once you develop a system then it gets easier.
Hint A spreadsheet program like Excel or Openoffice Calc (free) is VERY useful.

Scritty
 
The real choices are Gscraper, Scrapebox and Hrefer.

However, mis-identified platforms goes with the territory. It's a function of the search term and NOT a function of the tool which transmits the search term.
In other words - the footprint you use to scrape with.

95% none conformity is pretty normal.
I scrape on several severs 24/7 - about 2.2 million URLS a day the vast majority of which are not the platforms they were supposed to be, and I use some really good footprints that I keep up to date all the time (and sorry - no - I don't give them away - to anyone - ever - they are FAR too valuable to me)

But I suspect your answer is better footprints and a more thorough post scrape testing regime rather than a change of tool.

Good luck. Once you develop a system then it gets easier.
Hint A spreadsheet program like Excel or Openoffice Calc (free) is VERY useful.

Scritty

Have you written your own scraper scripts or do you use scrapebox?
 
Back
Top
AdBlock Detected

We get it, advertisements are annoying!

Sure, ad-blocking software does a great job at blocking ads, but it also blocks useful features and essential functions on BlackHatWorld and other forums. These functions are unrelated to ads, such as internal links and images. For the best site experience please disable your AdBlocker.

I've Disabled AdBlock