1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

trouble harvesting G with scrapebox

Discussion in 'Black Hat SEO' started by flc735, Jun 3, 2012.

  1. flc735

    flc735 Regular Member

    Joined:
    Apr 30, 2011
    Messages:
    284
    Likes Received:
    82
    Occupation:
    Writer
    Location:
    Los Angeles, CA
    i know most of the ins and outs of scrapebox. im trying to harvest pliggs from a few thousand footprints i have. i just bought 20 sb proxies from newipnow. anytime i run it, it stops after a few minutes. i tried playing with the settings, low max connections, tried it with the multi thread harvester off, set a delay with the single thread all with no luck. it scrapes well but after a few minutes, they all become blocked by google. but when i recheck them with the harvester, it says they are fine for google.
    so what am i doing wrong? i just want to go to bed and let it run through all the kw's. doesn't matter if its 50 connections or 1 at a time with a 10 second delay.
     
  2. virtualc08

    virtualc08 Supreme Member

    Joined:
    Mar 23, 2010
    Messages:
    1,379
    Likes Received:
    951
    Is your SB updated? How many keywords are you using?
     
  3. g33k0f9x

    g33k0f9x Regular Member

    Joined:
    Mar 16, 2012
    Messages:
    262
    Likes Received:
    137
    Occupation:
    IM
    Location:
    Orgrimmar
    what's your home internet situation like?
     
  4. flc735

    flc735 Regular Member

    Joined:
    Apr 30, 2011
    Messages:
    284
    Likes Received:
    82
    Occupation:
    Writer
    Location:
    Los Angeles, CA
    yes, its up to date.
    5,000 pligg footprints (actually 70,000 but at this point, ill be happy with 5k)
    but thats not the issue, i cant get it to scrape more than 50 or so before they are blocked each time over the past few days that i have been trying.

    from everything else ive read, people put in 10 private proxies and scrape tens of thousands over night. i cant even get 1,000 with my 20 private proxies. also, i have tried other proxies with no luck and asked for sb proxies from newipnow. they worked great for the first 5 minutes, now they are all blocked after my 5 minute, single thread, 1 second delay run.
     
  5. flc735

    flc735 Regular Member

    Joined:
    Apr 30, 2011
    Messages:
    284
    Likes Received:
    82
    Occupation:
    Writer
    Location:
    Los Angeles, CA
    my internet is very fast. no problems there.
    when i test the proxies with the harvester, all 20 pass the G check and all are med or high speed
     
  6. flc735

    flc735 Regular Member

    Joined:
    Apr 30, 2011
    Messages:
    284
    Likes Received:
    82
    Occupation:
    Writer
    Location:
    Los Angeles, CA
    ok, wtf!
    so far, when i have less than 1000 footprints in the keyword box, it works (at least for a seconds)
    then when i try it again with my 70,000 footprints, they are all blocked from the start
    tried it with 8,000, then 2,000, both blocked
    then i tried it again with 900, it works...turned it off after a few seconds
    then i tried it with 2,000 again and it didnt work

    this makes no sense to me. can someone explain? why does the amount of kw's effect its ability to scrape in the very beginning?
    ive been using single thread harvester with delay with 100 results on google only with ever try.
     
  7. scrapefox

    scrapefox Power Member

    Joined:
    Dec 3, 2011
    Messages:
    692
    Likes Received:
    277
    How many connections are you using? If you're just using your 20 private proxies, they will get banned fast for scraping. Have you tried with a few hundred public instead?

    edit - One of my servers is idle at the moment for an hour or so. If you'd like to send me the footprints I'd happily try a quick scrape for you.
     
    Last edited: Jun 3, 2012
  8. flc735

    flc735 Regular Member

    Joined:
    Apr 30, 2011
    Messages:
    284
    Likes Received:
    82
    Occupation:
    Writer
    Location:
    Los Angeles, CA
    i tried everything. i am using the single thread harvester with a delay while i try to figure this out. im having a little bit more luck now but its still very bad. and frustrating! arrg!
    scrapefox, thanks for the offer but im trying to add a bunch of bookmarks and a few others to my magic submitter so i would prefer to figure it out the hard way now. if you have a few minutes, i would like to ask you questions to help me along on skype or something. ill pm you. feel free to ignore it if you dont have the time. no worries!
     
  9. scrapefox

    scrapefox Power Member

    Joined:
    Dec 3, 2011
    Messages:
    692
    Likes Received:
    277
    Feel free to PM. I'll help if I can, but you might get a better response in the thread. Many heads are better than one :)
     
    • Thanks Thanks x 1
  10. BacklinkBasket

    BacklinkBasket Registered Member

    Joined:
    May 12, 2012
    Messages:
    56
    Likes Received:
    4
    I'm having the exact same problems and it's driving me mad
     
  11. Typlo

    Typlo Jr. VIP Jr. VIP Premium Member

    Joined:
    Sep 22, 2010
    Messages:
    1,285
    Likes Received:
    499
    Location:
    New York
    Home Page:
    Scrape Google + Bing and use a crap-ton of proxies. You'll need to find a decent source for a large amount of public proxies which is updated at least a few times a day. And generally from URLs CAN'T be plugged into the Scrapebox proxy harvester. Since this means most other automated tools can't pick them up as well. Preferably you'd need to copy / paste the proxy list manually. (And then check it if you'd like.)

    I'll throw a few thousand proxies into a text file in the morning and evening. Twice a week I'll manually scan them to get rid of the bad ones. Since there's usually 10,000+ proxies in the file it doesn't really matter if half are bad. There's still plenty that will pass the Google test and successfully harvest. So I don't worry about needing to constantly check them.
     
  12. lostgringos

    lostgringos Senior Member

    Joined:
    Dec 5, 2008
    Messages:
    837
    Likes Received:
    266
    Occupation:
    Online Reputation Manager
    Location:
    Dumaguete City, Philippines
    I am having the same problem. My SB is hanging and starts and stops in dribbles. I use proxy gobbler to scrape my proxies. It is so slow filtering them. I really need something that will filter out the bad proxies fast! My VP proxies go dead fast. Yahoo spits out an 999 error or something... I am running this on a fast VPS and it still hangs up. Maybe I put in too many keywords? Around 2,000.
     
  13. flc735

    flc735 Regular Member

    Joined:
    Apr 30, 2011
    Messages:
    284
    Likes Received:
    82
    Occupation:
    Writer
    Location:
    Los Angeles, CA
    based on my sb experiences, i couldnt fathom doing this. normally, ill paste in a few thousand proxies, filter them and repeat every hour or so. would it be better to not test them and just let the harvester pull from the larger untested list?

    im getting better results with a smaller keyword list but its still far from how fast other people seem to use it.
    right now, ive split my 72k list into 72 1k lists and am getting better results, but still not great. it only gets through a few hundred at most. at this rate, i will have to run 140-210 harvest sessions. ugh

    bing gives me very poor results with the footprint i am using. do they not have a lot of pliggs in thier index or do i have to alter the footprint to accommodation bing?
    here are a few examples
    inurl:"/pligg" inurl:/register.php nurse
    inurl:"faq-en.php" intext:"pligg" name
    inurl:"pligg/upcoming.php" hat
     
  14. flc735

    flc735 Regular Member

    Joined:
    Apr 30, 2011
    Messages:
    284
    Likes Received:
    82
    Occupation:
    Writer
    Location:
    Los Angeles, CA
    have you always had trouble with this or is this something recent?
     
  15. StressKills

    StressKills Registered Member

    Joined:
    Aug 24, 2011
    Messages:
    77
    Likes Received:
    2
    Occupation:
    Web Designer, Martial Art Instructor
    Location:
    Huntington Beach, CA
    Home Page:
    I'm having the same problem "Powered by Elgg" with 20 private proxies and stops after 3 seconds... WTF
     
  16. blackieman

    blackieman Power Member

    Joined:
    Jan 28, 2008
    Messages:
    762
    Likes Received:
    79
    Advanced operators, e.g. inurl: trip up google pretty fast. You may consider not using them, and then get a much larger list, and write a program to go through them afterwards and find the ones that really match the footprint. It is more work initially to write the prog, but g* is really smart now about scraping. I think in the long run this is less work than every hour scraping for new unburnt proxies.