1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

scraping with scrapebox

Discussion in 'Black Hat SEO' started by kurkapan, Jul 16, 2010.

  1. kurkapan

    kurkapan Regular Member

    Joined:
    May 9, 2009
    Messages:
    228
    Likes Received:
    32
    i have the following problem - i put a huge keyword list in scrapebox ( around 30000 keywords) and put to harvest from all of the engines. i put every engine to be on 150 threads (600 threads total) it starts harvesting, but in one moment the different search engines cut me out with error 403 or error 503 . why is that and how can i prevent it? am i pushing the threads too much?
     
  2. Kickflip

    Kickflip BANNED BANNED

    Joined:
    Jan 29, 2010
    Messages:
    2,038
    Likes Received:
    2,465
    Why are you using such big keyword list in one go?
     
  3. kurkapan

    kurkapan Regular Member

    Joined:
    May 9, 2009
    Messages:
    228
    Likes Received:
    32
    because i want to leave it to harvest, go to sleep and have couple of million blogs in the morning...so i can start blasting
     
  4. symss

    symss Regular Member

    Joined:
    Feb 14, 2009
    Messages:
    216
    Likes Received:
    206
    That is certainly a big ammunt of keywords are you using any private proxies I suppose...?
    It may be a problem with your proxies
     
  5. proxygo

    proxygo Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 2, 2008
    Messages:
    10,203
    Likes Received:
    8,689
    to many searches
    30k key words way to much
    i bet the proxies are timing out

    The 403 Forbidden error is an HTTP status code that means that accessing the page or resource you were trying to reach is absolutely forbidden for some reason.

    The 503 Service Unavailable error is an HTTP status code that means the web site's server is simply not available at the moment. This is usually due to a temporary overloading or maintenance of the server.
     
  6. sunshyne

    sunshyne Regular Member

    Joined:
    Jun 17, 2010
    Messages:
    359
    Likes Received:
    54
    Location:
    localhost
    I agree, might be a proxy issue. I just spent the money to get 10 private ones a month. So far it has been working out good
     
  7. proxygo

    proxygo Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 2, 2008
    Messages:
    10,203
    Likes Received:
    8,689
    10 private proxies wont scrape
    across 30k keywords its to much
     
  8. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,371
    Likes Received:
    1,799
    Gender:
    Male
    Home Page:
    Mate its probably your proxies. A couple of notes.

    1. Harvesting from all 4 engines will give you a Massive ammount of duplicates. I have found so much so that I only harvest from one engine for any given keyword. The first couple of pages can vary greatly on results, but if your pulling the top 1000 results you are going to be getting probably 95%+ of the same results duplicated.

    2. You can't hammer the engines, as I am sure you know. That said if you are running 150 connections on say Google. You would need at least 300 proxies to use, so it isn't hitting non stop from the same IP. However that will get you 403d real quick still. I would shoot for 600 - 1000 proxies if you want to harvest that much and have it run till the list is done.

    I just did a list of 64K keywords, harveting from Google only, 85 connections, with 400 proxies. Ran it all night last night. It failed out on 12K keywords (as proxies died), the other 52K keywords returned 5.5 million results.
     
  9. johnrichardjack

    johnrichardjack Registered Member

    Joined:
    Jul 28, 2010
    Messages:
    52
    Likes Received:
    1
    maybe you should use private proxies for better results.