1. This website uses cookies to improve service and provide a tailored user experience. By using this site, you agree to this use. See our Cookie Policy.
    Dismiss Notice

Scrapebox url scrap error 429

Discussion in 'Black Hat SEO Tools' started by xfreedom, Jul 9, 2019.

  1. xfreedom

    xfreedom Newbie

    Joined:
    Oct 25, 2018
    Messages:
    8
    Likes Received:
    0
    Hello all,

    I wonder if someone can help me with this. Whenever I try to scrape url of google, I only get 100 results before it stops. The logs show me an error 429. Then I have to wait about 8~12 hours before I can try again.
    About my configuration, I use 10 private proxies with one thread and max timeout. I try scraping of Google FR and I updated my engine list. Also my keyword contain a "site:" and an "intitle:"
    I'm totally clueless on how to fix this so if anyone got an idea, I'm all ears. Thanks !
     
  2. Gogol

    Gogol Jr. VIP Jr. VIP

    Joined:
    Sep 10, 2010
    Messages:
    5,293
    Likes Received:
    4,991
    Gender:
    Male
    Occupation:
    Programmer
    Location:
    Pale Blue Dot
    Home Page:
    It probably has to do with the proxies (server response 429 = too many requests). Try using other premium proxy providers/use the proxy harvester and generate Google passed proxies (or whatever it was called.. forgot lol..).
    Tagging @loopline in case he has some better solution for you.
     
    Last edited: Jul 9, 2019
  3. xfreedom

    xfreedom Newbie

    Joined:
    Oct 25, 2018
    Messages:
    8
    Likes Received:
    0
    I use private proxies and they pass the google test but I'm still unable to scrape url. For simplest keyword, I also had the issue but a greater number of results like 300, which isn't much. Also for only one keyword.

    Anyway, thanks for the answer and tagging @loopline
     
  4. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    4,944
    Likes Received:
    2,638
    Gender:
    Male
    Home Page:
    429 is a error Ive started to see popup from google in the past couple months. Its basically an ip block error for all intensive purposes. Google blocks based on all kinds of things and not all blocks are created equal. Even the history of the ips with google comes into play. So what someone did with the ips before you got them affects how you can use them with google.

    Basically here is more info




    and then also some people just prefer to go this method (which is also what I do)

     
    • Thanks Thanks x 1
  5. proxygo

    proxygo Jr. VIP Jr. VIP

    Joined:
    Nov 2, 2008
    Messages:
    31,745
    Likes Received:
    12,617
    Gender:
    Male
    Occupation:
    Proxies Back Connect
    Location:
    UK - ALWAYS ON BHW
    Home Page:
  6. xfreedom

    xfreedom Newbie

    Joined:
    Oct 25, 2018
    Messages:
    8
    Likes Received:
    0
    Ok because i've seen lots of your videos, they're of great help but you didn't mention that particular error and you generally speak of scraping different keywords but since I'm scraping only one, I wanted to know if I could try something else before trying rotating proxies like maybe changing the user-agent or things like that.
    Anyway, thank you, I'll look into other proxies services to use.
     
  7. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    4,944
    Likes Received:
    2,638
    Gender:
    Male
    Home Page:
    429 is only something Ive started seeing the past couple of months really, so its fairly new. Im not saying you have to have back connect proxies, but just the exact proxies you have may have bad history with google and so may be being banned really quick is all.
     
  8. xfreedom

    xfreedom Newbie

    Joined:
    Oct 25, 2018
    Messages:
    8
    Likes Received:
    0
    Ok so for the record or if anyone else got the same problem, I tried some rotating proxies and I still had the same error. But I found the problem, since I Was searching through google fr, I changed the fr in the search query but didn't change the marker for the next page...

    I planned to try the rotating proxies anyway, and I can see it's better than just dedicated proxies for google scrap after a few tests.


    Got another question which is related to the number of results. The difference between the estimation and the actual results is huge, the estimation is around 1k resultats when what I get is 300 results. I tried with the rc=1 parameter (maybe in the wrong way) and estimation is still around 1000 results when actual results are still around 300.
    Anyway to improve that ?
     
    Last edited: Jul 10, 2019
  9. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    4,944
    Likes Received:
    2,638
    Gender:
    Male
    Home Page:
    Google often soft caps now between 300 to 600 results. They put so much effort into page 1 that they know that .0001% of people will ever make it past result 300 so they figure if you haven't found it by then your not going to. This is especially true the more advanced of a search query that you use.

    Further to be honest googles relevancy can be terrible at result 900 even, like if you are searching for a ski resort in colorado you might get results for lawn care in London UK or ghost hunting equipment (literally). So Id guess they may not even want to show results past 300 anyway, hehe
     
  10. xfreedom

    xfreedom Newbie

    Joined:
    Oct 25, 2018
    Messages:
    8
    Likes Received:
    0
    Yeah I saw the video where you were explaining about the revelance of results after 300 hundred results. Still it's a shame since for this particular case, I needed all url from a subsection of a forum with a specific keyword.
    I guess 300 will have to do !

    Thanks again for the answer.
     
  11. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    4,944
    Likes Received:
    2,638
    Gender:
    Male
    Home Page:
    Just tack on keywords. So lets say your query is

    site:domain.com inurl:subsection

    just do like

    site:domain.com inurl:subsection a
    site:domain.com inurl:subsection b
    site:domain.com inurl:subsection c
    site:domain.com inurl:subsection 1
    site:domain.com inurl:subsection 2
    site:domain.com inurl:subsection car
    site:domain.com inurl:subsection purple
    etc..

    That forces google to return different sets of results from their database and then just remove duplicates when you are done.

    That or if you are working in only 1 domain, just use the grab urls by crawling a site function and let scrapebox crawl the site, just keep connections low like 2 or 3 so you dont' get blocked by the site.



    You can then filter out any unwanted urls when its done quick and easy.

    Cheers!
     
  12. xfreedom

    xfreedom Newbie

    Joined:
    Oct 25, 2018
    Messages:
    8
    Likes Received:
    0
    Actually my query is site:domain.com intitle:keyword but from what you said, I can just add the inurl:subsection which will indeed provide more results.
    Thanks loopline !
     
  13. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    4,944
    Likes Received:
    2,638
    Gender:
    Male
    Home Page:
    Your welcome, have a great day!