Suppose if i am harvesting elgg, drupal footprints but with scrapebox also harvest those links which is not related to any of these sites, how can i tell scrapebox not to harvest those links, is there any method of use ?
Such links are only a few if you use a correct footprint. Try changing your footprint if the ratio of unwanted sites is too big.
You just scrape everything you can using the footprints, then load up a platform checking program. I use Ultimate Demon or Licorne for this.... Sick marketing has Sick Platform Reader which is free, and very good. You feed all the harvested urls into platform checker, and tha't how you get your lists. The trick is to filter the scraped list so that you have a more manageable set to work with. So obviously dupe urls, and dupe domains, but then also if you're not harvesting blogs, get rid of all urls that contain "blog" and "image" and "wiki" etc etc etc. The filtering process will really help you out, as scraping is like throwing a big fishnet, and just grabbing all you can (based on footprints of course). Hope that helps
obviously the keywords you are targeting try using the keyword scraper type in a few keywords and sb will fetch some more related keywords