1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox Multiple Footprints a la Hrefer

Discussion in 'Black Hat SEO Tools' started by jb2008, Jan 26, 2011.

  1. jb2008

    jb2008 Senior Member

    Joined:
    Jul 15, 2010
    Messages:
    1,158
    Likes Received:
    972
    Occupation:
    Scraping, Harvesting in the Corn Fields
    Location:
    On my VPS servers
    When I first saw the Hrefer interface, as someone who was accustomed to Scrapebox I was at first puzzled. You could have a list of "additive words" (=footprints), not just one? What was this madness?

    Now, I see that to truly take it to the next level with scraping, Scrapebox should at least have the option of multiple footprints in one harvest and of course to go with that mid-harvest proxy update. I understand that SF is probably very busy at the moment with Scrapeboard and everything but it's just an idea to make SB into an even bigger monster.

    Although Scrapebox can without doubt harvest massive amounts of URLs (if you've got large keyword lists) there is something 'industrial' about hrefer. SB has the power, is actually faster (in my opinion) but these 'industrial' features are strangely lacking. Is there any reason for it? Right now I see Scrapebox as more of a 'sniper' scraper while hrefer is more like an Uzi. There's no reason (or is there?) why it can't be both.
     
  2. HealeyV3

    HealeyV3 Power Member

    Joined:
    Mar 4, 2009
    Messages:
    521
    Likes Received:
    344
    I agree. I'd LOVE to load up a list of footprints, and have SB automatically cycle through every one.

    Another missing feature in my mind is the 1 million line cap. Why not program it so that if it hits a million lines when scraping, it automatically saves that "dump" file, and moves on to the next 1 million? Doesn't really make sense to me.
     
  3. jb2008

    jb2008 Senior Member

    Joined:
    Jul 15, 2010
    Messages:
    1,158
    Likes Received:
    972
    Occupation:
    Scraping, Harvesting in the Corn Fields
    Location:
    On my VPS servers
    Yes, the duplicates are an issue but for whatever reason I think there may be a legitimate limit of around 1 million for performance issues somewhere. A solution to these mega massive lists is to filter by domain on the fly, as hrefer does, but then you still lose a vast chunk of URLs which you could get only if you extracted the sitemap of each domain or ran the site: operator on each one, then you've got the task of filtering out the irrelevant pages using notepad++ to come back to the ones which have your footprint.

    An amazing thing about hrefer is that I can close it, and it resumes SEAMLESSLY at the point at which it left. I was really quite shocked when I first observed this.

    Another nifty thing is of course the keyword progress bar, and the precise keyword you are on - with SB you are kind of left in the dark.

    Now I'm just nitpicking. But the three things that stand out for me are:

    1. Multiple footprints instead of just one <--- priority number one
    2. Keyword number (progress) display
    3. Proxy refresh (it could even be as crude as testing another list in another instance of SB and importing it into proxies.txt/creating the new proxies.txt - just the *ability* to refresh proxies mid harvest as hrefer does)
     
  4. ndev2k

    ndev2k Junior Member

    Joined:
    Nov 4, 2009
    Messages:
    190
    Likes Received:
    41
    Home Page:
    If you have multiple footprints in sb you can actually attach these to all your keywords by clicking the 'M' next to the footprint textbox.

    Simply load your keywords into the keyword section. Save your footprints in a text file and then click the 'M' and load the footprint list. SB puts the footprint with the keywords automatically and you can scrape multiple footprints at once with multiple keywords.
     
    • Thanks Thanks x 3
  5. xhpdx

    xhpdx Regular Member

    Joined:
    Sep 21, 2008
    Messages:
    331
    Likes Received:
    2,160
    Occupation:
    Coder
    Location:
    EU
    I've never even noticed the existence of the M button :eek:
     
  6. Scripteen

    Scripteen Elite Member

    Joined:
    Sep 19, 2009
    Messages:
    1,811
    Likes Received:
    1,918
    Home Page:
    It has been there for ages :rolleyes: . Use the button "m" in the harvester to load custom foortprints.
     
  7. jb2008

    jb2008 Senior Member

    Joined:
    Jul 15, 2010
    Messages:
    1,158
    Likes Received:
    972
    Occupation:
    Scraping, Harvesting in the Corn Fields
    Location:
    On my VPS servers
    No, the Merge function is not = multiple additives function because, say, if you had a keyword list of 1 million keywords merged with 10 footprints it would be a keyword list of 10 million. If I go over 1 million keywords, SB freezes at the beginning , middle , or most often the end of harvesting.

    With hrefer on the other hand, I can throw in a 5 million keyword list, load it up with 10 additives, set threads to 500 and let her rip. The only thing I do is change proxies every day or two.
     
  8. keinehabe

    keinehabe Supreme Member

    Joined:
    Nov 4, 2008
    Messages:
    1,207
    Likes Received:
    472
    Gender:
    Male
    Occupation:
    -= CEO =-
    Location:
    Heaven
    Home Page:
    I think I saw on the forums few days back a post made by someone who writes a hand crafted footprint to use multiple blog platforms for scraping ... unfortunately I think I forget to bookmark this page if someone find it there on bhw maybe is so kind to bump the thread ?

    I have scrapebox only from few days , I`m still playing with it a little ... the 1 million cap value is everywhere it's seems to be fair somehow but , for link indexer addon for example is useless ... found on the forums links of 80-100K urls :) this can bring only few urls into the domains list to be used for this ...

    btw a short question if anyone can answer , scrapebox can use socks4/5 ? I don't wanna open a new thread lol about this maybe someone tested and figured how socks can be used for this :)
     
    Last edited: Jan 28, 2011