Scrapping and Crawling

ugorrogu

Registered Member
Joined
May 27, 2008
Messages
66
Reaction score
4
Does anybody have any good resources on Scrapping and Crawling websites. I am thinking of writing my own scrapper/crawler in C# as a fun little project but at the same time I want to create it to be really resource efficient.

Thanks,
UgorrogU
 
Google for "screen scraping c#" and "multithreading c#" - should at least get you some ideas how to start.
IMO there are probably other languages more suited to this - ruby, php and python have some very nice & easy to use libraries for scraping purposes available
 
Google for "screen scraping c#" and "multithreading c#" - should at least get you some ideas how to start.
IMO there are probably other languages more suited to this - ruby, php and python have some very nice & easy to use libraries for scraping purposes available

Shodan,

Thanks for reply. The only reason why I want to do it in C# is because I want to have a solution for this that does not depend on a server. Also I havent done to many projects in C# and I think that this is a good excuse to do it.

ugorrogu
 
The goal of scraping is to isolate wanted text. Find something common within all the items you wish to scrape, like the tags. Then read the source line by line and isolate the interesting line first. After you have that, it's just a matter of splitting. That's what I do in all my software.
 
Shodan,

Thanks for reply. The only reason why I want to do it in C# is because I want to have a solution for this that does not depend on a server. Also I havent done to many projects in C# and I think that this is a good excuse to do it.

ugorrogu
Run Perl in ActivePerl or a PC Perl IDE.
 
Back
Top