1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox: is there a way to exclude blogs you already harvested from fresh harvests?

Discussion in 'Black Hat SEO Tools' started by Ranko Jones, Jun 10, 2011.

  1. Ranko Jones

    Ranko Jones BANNED BANNED

    Joined:
    Mar 3, 2011
    Messages:
    1,677
    Likes Received:
    146
    Is there some way to set it up so that SB doesn't harvest blogs you've already done?

    Doing compares after is fine but it would save tons of harvesting time if it could just exclude ones you've already harvested...like a blacklist but for harvested urls...

    I am currently using to time function to scrape fresh blogs but since I am harvesting all the time if I stop it to get them for posting won't it just start back at the beginning and reharvest most of the same ones thus wasting time?

    So is there a way around this?
     
  2. blackhatdavid

    blackhatdavid Regular Member

    Joined:
    Nov 5, 2009
    Messages:
    296
    Likes Received:
    106
    There is no way to do this. It would eventually be too time consuming. Each URL you were scraping would need to be compared against (possibly) millions that you have scraped before. It would take forever!
     
  3. BigLarry

    BigLarry Regular Member

    Joined:
    Mar 2, 2011
    Messages:
    218
    Likes Received:
    144
    I'm guessing you've already answered your own question sort of.

    You can't stop it harvesting as such but I think you can stop it reposting by trimming to root and the adding them to your local blacklist.
     
  4. kez1000

    kez1000 Supreme Member

    Joined:
    Jul 24, 2009
    Messages:
    1,402
    Likes Received:
    1,340
    Location:
    UK
    1. Goto where you have installed SB on your pc i.e C:programs/scrapebox/blacklist
    2. then paste all of the urls that you do not want to spam again in to the black list.txt.
    3. now the harvister will delete those url if they are harvested again and also SB will not
    post on those URLs.

    When I scrape seo WP blogs SB auto deletes around 7,000 blogs that I have already spammed :)

    Hope this answers your question
     
    • Thanks Thanks x 1
  5. Ranko Jones

    Ranko Jones BANNED BANNED

    Joined:
    Mar 3, 2011
    Messages:
    1,677
    Likes Received:
    146
    ye i was wondering is blacklist would do this.
     
  6. kez1000

    kez1000 Supreme Member

    Joined:
    Jul 24, 2009
    Messages:
    1,402
    Likes Received:
    1,340
    Location:
    UK
    the funny thing is that I have had SB for two years and I am only using the
    black list now LOL:D
     
  7. Ranko Jones

    Ranko Jones BANNED BANNED

    Joined:
    Mar 3, 2011
    Messages:
    1,677
    Likes Received:
    146
    So will it actually prevent it from harvesting them or just delete them before you use them?

    Just wondering cos it would make a bit of a difference to deciding when the tell it to stop harvesting depending on the answer.
     
  8. jb2008

    jb2008 Senior Member

    Joined:
    Jul 15, 2010
    Messages:
    1,158
    Likes Received:
    972
    Occupation:
    Scraping, Harvesting in the Corn Fields
    Location:
    On my VPS servers
    Don't use the time function. That will restrict your harvests too much. You need different keyword lists, concentrate on diversifying your keyword lists rather than any fancy shit in SB. Get used to the compare function: whether you like it or not it is 100% necessary to do most stuff these days.
     
  9. Ranko Jones

    Ranko Jones BANNED BANNED

    Joined:
    Mar 3, 2011
    Messages:
    1,677
    Likes Received:
    146
    Well I downloaded maruk's motherload of keywords from here and he told me I wouldn't have to harvest any more keywords as that list is pretty exhaustive.

    Thing is as well I'm thinking is your gonna spend most time harvesting the same shit- which is why I wanted something to exclude them in the 1st place but does blacklist actually exclude without harvesting or do it after?.

    If I harvest 100k of blogs one day and I harvest 150k blogs the next day then I will spend most of the 150k harvest reharvesting the 1st 100k.
     
  10. Monrox

    Monrox Power Member

    Joined:
    Apr 9, 2010
    Messages:
    615
    Likes Received:
    579
    You can't prevent them from being harvested. SB downloads the whole results page, then extracts all the links, then puts them in the url list. G and others can be asked for one match at a time but it is for using their services officially and won't work for scraping on a large scale (= you'd get banned).

    Also as someone already mentioned it is way more computationally efficient to clean up everything in one go afterwards than to exlcude single urls as they become available.
     
    • Thanks Thanks x 1
  11. muchacho

    muchacho Supreme Member

    Joined:
    May 14, 2009
    Messages:
    1,293
    Likes Received:
    187
    Location:
    Lancashire, England.

    I was thinking this, but after a few emails between myself and the owner of Scrapebox, I agree it's bad advice. It would severely slow the process down as the list gets bigger, to such a degree you may as well do the import & compare after it's completed.

    A good way to split the harvesting sessions up is via keywords. I have something like 400k keywords which I harvest 5-10,000 at a time. When you find all the URLs on the domain of an existing AA URL + find spammer's backlinks, you can find millions upon millions of AA without the need to harvest.

    If you're harvesting for other reasons then just have a 'history' folder, each split into sub folders depending on what you wish to remove and stick with the import and compare function.
     
    • Thanks Thanks x 1
    Last edited: Jun 10, 2011
  12. Ranko Jones

    Ranko Jones BANNED BANNED

    Joined:
    Mar 3, 2011
    Messages:
    1,677
    Likes Received:
    146
    Monrox you look evil, is this intentional? Evil genius look :D?

    I guess it won't be too hard if I make one huge list or a few rather than tons of tiny lists. Can you get it to compare multiple files at once maybe with the ctrl or shift functions on open? that would make things quicker...

    Awesome I just tried and it seems I can do that; comparing just got a whole lot easier :).
     
    Last edited: Jun 10, 2011