1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

ScrapeJet - IP gets blocked while scraping?

Discussion in 'Black Hat SEO Tools' started by omida86, Jan 24, 2012.

  1. omida86

    omida86 Power Member

    Joined:
    Feb 15, 2011
    Messages:
    791
    Likes Received:
    181
    Occupation:
    SEO Consultant, Business Web Developer
    Location:
    Earth
    Is it just me, but the IP get's often blocked by Google when scraping? Especially with several profiles...

    NHSEO doesn't use Google but bing so IP never gets blocked since it uses BING API...but of course Google is better so I get better results with ScrapeJet...still would be good if they could fix the blocked ip issue...

    Anybody else have the same issue?
     
  2. cloakndagger

    cloakndagger Power Member

    Joined:
    Oct 31, 2010
    Messages:
    613
    Likes Received:
    173
    Use a few proxies,if you're harvesting it is normal for your proxies to go dead after so long with google and yahoo.I hope you are using proxies.
     
  3. omida86

    omida86 Power Member

    Joined:
    Feb 15, 2011
    Messages:
    791
    Likes Received:
    181
    Occupation:
    SEO Consultant, Business Web Developer
    Location:
    Earth
    well with NHSEO it doesn't get blocked since it uses BING with API...

    But guess you are right, some private proxies would do wonders..
     
  4. Bryan

    Bryan Power Member

    Joined:
    Aug 25, 2009
    Messages:
    565
    Likes Received:
    292
    id use public proxies to scrape (that way you can get 500 public proxies and scrape hard), private proxies to post

    nothing scrapejet can do really lol, it's the google
     
  5. omida86

    omida86 Power Member

    Joined:
    Feb 15, 2011
    Messages:
    791
    Likes Received:
    181
    Occupation:
    SEO Consultant, Business Web Developer
    Location:
    Earth
    Well ScrapeJet doesn't have the option atm to only proxies proxies while scraping, so it uses same proxies while posting which decreases success-rate.

    What they should do is to use Google until it's blocked, after it's blocked use Bing API like NHSEO does....
     
  6. softtouch2009

    softtouch2009 Senior Member

    Joined:
    Dec 2, 2009
    Messages:
    1,001
    Likes Received:
    225
    Occupation:
    Programming
    Location:
    ssdnet.biz
    Home Page:
    No, bing api is just crap. Bing was implemented in the first couple of versions, and the result was not satisfying.
    Google give way better results, and is faster.

    I use private proxies, and was never blocked until now.

    Note that Scrapejet does access google only once, for 1 page, then start posting, and resume harvesting after posting, for another page, so I don't think google blocked you because of harvesting. Scrapejet does the harvesting very well balanced, and even when not using any proxy, I do not get blocked.
    Is it possible that your proxies are just bad / public and dead? Or, do you run multi0ple instances of ScrapeJet, all using the same proxies, and all accessing google at the same time?
     
  7. omida86

    omida86 Power Member

    Joined:
    Feb 15, 2011
    Messages:
    791
    Likes Received:
    181
    Occupation:
    SEO Consultant, Business Web Developer
    Location:
    Earth
    Hmm no I'm not running any proxies....

    softtouch, the issue I have is that when I load a lot of keywords (1000's) from a textfile, scrapejet behaves differently then when I enter 3 keywords in the field. Instead of scraping 100 and post, it goes through ALL the keywords and scrapes before it starts posting, resulting in my IP being banned. So right now, I either have to load a very small text list or only enter 3 keywords to avoid being banned..It should behave better when loading huge keywords from textfile...

    Can you guys add the option of choosing to use proxies only for scraping but not for posting? I wish to use public proxies for scraping but not for posting...

    I think you are right regarding Bing being crap...tried some searching and Google returns far more results when scraping for wordpress blogs then Bing. Maybe that's why I get so many more backlinks with ScrapeJet then NHSEO... Thanks for clarifying that.

    Right now ScrapeJet does a pretty good job in posting to AA blogs actually :) Keep up the good work and hope captcha sniper is integrated soon :)
     
    Last edited: Jan 26, 2012
  8. omida86

    omida86 Power Member

    Joined:
    Feb 15, 2011
    Messages:
    791
    Likes Received:
    181
    Occupation:
    SEO Consultant, Business Web Developer
    Location:
    Earth
    Actually, I realized the linklock I did was wrong, resulting in a lot of issue.

    It should be www.url.com {anchor1|anchor2} but I had www.url.com{anchor1|anchor2}

    Because of that I had a lot of issues. Now that I corrected it everything works fine, including loading keywords from a huge textfile. My bad.

    Will do more testing and return back.
     
  9. softtouch2009

    softtouch2009 Senior Member

    Joined:
    Dec 2, 2009
    Messages:
    1,001
    Likes Received:
    225
    Occupation:
    Programming
    Location:
    ssdnet.biz
    Home Page:
    It does not do it this way. It does not matter from where the keywords are coming, harvested or loaded, its the same for the poster.
    What you describe looks for me (and thats the only way it could happen) that harvester cannot find any results for a keyword, then it wont post, but resume harvesting for the next keyword. It would look like it would harvest urls for all keywords, when it fact it did not get results and just continued with the next keyword. I will take a look at it and will come up with a solution for this rare case.

    Noted and added to the todo list, thanks.
     
  10. omida86

    omida86 Power Member

    Joined:
    Feb 15, 2011
    Messages:
    791
    Likes Received:
    181
    Occupation:
    SEO Consultant, Business Web Developer
    Location:
    Earth
    Thanks! To use public proxies only for scraping and PR checking would be awesome.
     
  11. omida86

    omida86 Power Member

    Joined:
    Feb 15, 2011
    Messages:
    791
    Likes Received:
    181
    Occupation:
    SEO Consultant, Business Web Developer
    Location:
    Earth
    Screen Shot 2012-01-25 at 9.10.17 PM.png

    Softtouch, when running multiple profiles, ScrapeJet gets stuck on the second profile like in the screenshot above. It just says "Harvesting Url for Keyword Done". Am I doing something wrong?

    Thx!
     
  12. softtouch2009

    softtouch2009 Senior Member

    Joined:
    Dec 2, 2009
    Messages:
    1,001
    Likes Received:
    225
    Occupation:
    Programming
    Location:
    ssdnet.biz
    Home Page:
    Hm as far as I know, thats the only report related to this. Possible theres a little bug, possible something else caused it, possible 50 threads are too much for your connection (I run 25 threads, at 11mbps). Threads are not just "threads" like in scrapebox, each of the connections is doing the hell of a job, and 50 could just bee too much. This could also cause the 15% CPU I see in your screenshot (my cpu is at 0%-1% only).
    Please forward the scrapejet.sil file you can find in the configuration folder to support, it will give us some clues. But in order to get a valid .sil file, you need to exit/terminate scrapejet.
     
  13. omida86

    omida86 Power Member

    Joined:
    Feb 15, 2011
    Messages:
    791
    Likes Received:
    181
    Occupation:
    SEO Consultant, Business Web Developer
    Location:
    Earth
    Alright. I will try running 25 threads on a more powerful computer.(I have 10 mbit).

    If the issue persists, I will forward the .sil file.
     
  14. omida86

    omida86 Power Member

    Joined:
    Feb 15, 2011
    Messages:
    791
    Likes Received:
    181
    Occupation:
    SEO Consultant, Business Web Developer
    Location:
    Earth
    Capture.PNG

    So I ran SJ on a powerful I5 quad-core with 6 gig of ram and good internet connection for 10 hours.

    The second profile works, but it can't keep up with the first profile. The first profile has 4 times more submitted links.

    Both profiles have identical settings and keyword list.

    Thanks!
     
    Last edited: Jan 26, 2012