1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox won't scrape from Google.

Discussion in 'Black Hat SEO' started by Xlr8, Jul 27, 2016.

  1. Xlr8

    Xlr8 BANNED BANNED Jr. VIP

    Joined:
    Oct 17, 2014
    Messages:
    430
    Likes Received:
    174
    Hey,
    Took out my copy of Scrapebox which I bought almost a year ago yesterday.Never really got to use it that much apart from some Web2.0 scraping on rare occasions.

    So I bought 10 Dedicated proxies and fired up Scrapbox.

    The 1st scrape went well and I harvested the URLs I needed.

    However,when I went for the 2nd harvesting session I am facing some problems as scrapebox is not scraping links from Google

    I did not change the settings at all.

    Its not the proxies as they are working fine and I am able to harvest URLs from Yahoo and Bing.

    When I start harvesting, scrapebox just completed the scrape in 2-3 seconds returning 0 results even though I have thousands of keywords/footprints put in.

    I'd appreciate if anyone could help me out to figure what the issue is.

    PS - I don't know if this has something to do with the problem I am facing but when I test the google engine in Harvester Engine configuration it shows NEXT PAGE MARKER:NOT FOUND. Bing and Yahoo give return FOUND.

    EDIT
    Tried using the detailed harvester. But its just stuck.Keeps rotating the IPs without any results.
    Screenshot_1.png
     
  2. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    12,792
    Likes Received:
    11,442
    Occupation:
    COINZ
    Location:
    BUYAH
    Home Page:
    Makes sense if it found the 1st page, crawled it like you said, but couldn't go to the next page.
     
  3. Xlr8

    Xlr8 BANNED BANNED Jr. VIP

    Joined:
    Oct 17, 2014
    Messages:
    430
    Likes Received:
    174
    Im sorry. I did not quite understand what you said there.
     
  4. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    12,792
    Likes Received:
    11,442
    Occupation:
    COINZ
    Location:
    BUYAH
    Home Page:
    You mentioned it crawled the first time and then failed. It might not be finding the next page link in order to follow through.
     
  5. Xlr8

    Xlr8 BANNED BANNED Jr. VIP

    Joined:
    Oct 17, 2014
    Messages:
    430
    Likes Received:
    174
    Pardon me if I sound dumb. Im not good with technical things.

    I manually put in the query which Scrapebox is trying to scrape for and it returned 10 pages of results.

    I don't understand why Scrapebox wont be able to find the next page link? lol

    Any idea how to solve this issue?
     
  6. Disloyal

    Disloyal Jr. VIP Jr. VIP

    Joined:
    May 3, 2012
    Messages:
    361
    Likes Received:
    223
    Occupation:
    Geek
    Location:
    Ethernet Cable
    I see the delay you set is at 0.. when you are scraping Google you have to set a delay of at least 30-60 seconds depending how advanced the footprints are.

    Your proxies most likely got banned from Google. Tell your provider to give you a new set and this time use the detailed harvester and play around the delay. You might only need 30 second delay or maybe even 300 second delay.
     
    • Thanks Thanks x 1
  7. Pakal

    Pakal Junior Member

    Joined:
    Dec 6, 2015
    Messages:
    120
    Likes Received:
    57
    Gender:
    Male
    Location:
    http://bit.cards
    google is getting harder and harder to harvest, what i noticed is that webcrawler and bing harvest almost identical data with google but with less restrictions. I stopped harvesting google a long time ago as they ban proxies very fast. The way i do it when i need to harvest google is without proxies, I am using my own ipbecause it is dynamic. So once my ip gets banned, i am stopping the harvest and reset my connection, this way i get a new ip and harvesting goes much faster. of course the downside is that you have to watch it as it gets burned really fast
     
  8. mnkassier

    mnkassier Newbie

    Joined:
    May 15, 2015
    Messages:
    47
    Likes Received:
    6
    How do we know that our proxies are banned by Google? The reason I am asking you is because I can manually search with my proxy but I can not scrape with SB. But the proxies pass the Google search test on proxy manager.
     
  9. Xlr8

    Xlr8 BANNED BANNED Jr. VIP

    Joined:
    Oct 17, 2014
    Messages:
    430
    Likes Received:
    174
    Proxies work fine. I checked them and they passed the Google test in Scrapebox.

    Tried changing the delay too.

    However, I noticed that SB is able to scrape from Google when I do not put in any footprints. Just the keywords.

    When I put in the footprints thats when It starts giving me the issue.

    Any idea why this may be happening?
     
    Last edited: Jul 27, 2016
  10. Xlr8

    Xlr8 BANNED BANNED Jr. VIP

    Joined:
    Oct 17, 2014
    Messages:
    430
    Likes Received:
    174
    And now its back to not harvesting URLs even without footprints.
     
    Last edited: Jul 27, 2016
  11. Disloyal

    Disloyal Jr. VIP Jr. VIP

    Joined:
    May 3, 2012
    Messages:
    361
    Likes Received:
    223
    Occupation:
    Geek
    Location:
    Ethernet Cable
    See the thing is, when you test them they say they are good. But when you're scraping and it returns with 0 results or error it means that Google blocked your proxies.
    We know the proxies work for Bing and Yahoo but not for Google. Which means is a proxy issue (banned). Google is a pain in the ass to scrape from compared to the other SEs.

    It takes like 24 hours for proxies to be unbanned by Google.

    Also, after the proxies are banned the delay won't work, since Google is already blocking them.

    Give this a shot, just put one of the footprints and use the detailed harvester without any proxies and just using your ip. If that works, it means your proxies are banned.
    Make sure to set a delay before running it, so you don't get your IP banned.

    I've gotten my proxies banned by Google before... so I know the feeling lol.

    Watch this video:


    It will help @Calmly69 and @mnkassier
     
    • Thanks Thanks x 4
  12. Donald Trump

    Donald Trump Registered Member

    Joined:
    May 15, 2016
    Messages:
    98
    Likes Received:
    13
    This happened to me before too but I dont use proxies, just one search query at a time from my own ip. Google banned mine before and it worked again the next day. What I do now is set the delay to 60 seconds, that way if you scrape 300 results, the first 100 come in quick, and then another 100 1 minute later, and then the other 100 another minute later.

    I also just use "search term" "plus another term" in the quotes so I can use yahoo and bing.
     
  13. Xlr8

    Xlr8 BANNED BANNED Jr. VIP

    Joined:
    Oct 17, 2014
    Messages:
    430
    Likes Received:
    174
    Damn.
    I did not know Google was this strict about the proxy issue.
    I guess I know what the problem is now. Maybe my proxies have been temporarily banned.
    I'll wait a day or two and then start again with the detailed harvester and a delay.
    Thanks for pointing it out buddy. Appreciate it :)
     
  14. mnkassier

    mnkassier Newbie

    Joined:
    May 15, 2015
    Messages:
    47
    Likes Received:
    6
    Thanks @Disloyal for the explanation. And how should we change connection settings per 10 private proxies?
     
  15. Skyebug77

    Skyebug77 Jr. VIP Jr. VIP

    Joined:
    Mar 22, 2012
    Messages:
    2,208
    Likes Received:
    1,618
    Occupation:
    Marketing
    Location:
    Portland,Or
    • Thanks Thanks x 2
  16. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    588
    Google bans proxies very fast when you use advance operators. Just doing a manual search using operators like "Site," "Inurl," "intext" and the like and scrolling down the page while not finding what i am looking for and moving to the next page, Google will throw a captcha by the third or fourth page.

    Unfortunately, Scrapebox does not have the functionality of solving a search engines (Google) captcha's. It would be nice. With this in mind, either use a lot of public proxies (I would recommend GSA Proxy Scraper), or set a delay of 1 to three minutes to scrape Google.
     
  17. BellasDad

    BellasDad Registered Member

    Joined:
    Dec 12, 2016
    Messages:
    55
    Likes Received:
    12
    Gender:
    Male
    Had the same issue today, was way frustrated!!!
    Thanks @Disloyal for the vid, turns out I had SB set on "use custom harvester" rather then "use detailed harvester" I made the change added a 10sec delay and WHAM... Aces now
     
  18. Trololololo

    Trololololo Newbie

    Joined:
    Oct 14, 2015
    Messages:
    12
    Likes Received:
    0
    Does ScrapeBox still not support solving the google search engine captcha? If not, maybe there is some way to write a custom plugin to do that. Does anybody know if ScrapeBox has a programmers API ?