1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox Proxies Don't pass Google test?

Discussion in 'Black Hat SEO Tools' started by raiju, Mar 19, 2016.

  1. raiju

    raiju Junior Member

    Joined:
    Oct 27, 2015
    Messages:
    134
    Likes Received:
    27
    Hello just a question. Everytime i run scrapebox and use the harvest proxies none of them pass the google test? i've done this so many times and tried it with 8k proxies and none of them passed?

    Since i just want to scrape domains and web 2.0's should i use private proxies? People told me that i can use public proxies but since I can't get them to work for google then I need private?
     
  2. redarrow

    redarrow Elite Member

    Joined:
    Apr 1, 2013
    Messages:
    5,152
    Likes Received:
    1,167
    Even some private proxies dont pass google.

    you beed to asn cron your supplier...
     
  3. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    585
    See that orange bar on the right? Those are Google passed proxies. See that white bar on the left? Those are total scraped proxies.

    182/10715 = 0.0169 = 1.7 percent of all proxies scraped are Google passed.

    The failure to pass a Google check is higher with the Scrapebox proxy scraper because of overuse.

    [​IMG]
     
    • Thanks Thanks x 1
  4. raiju

    raiju Junior Member

    Joined:
    Oct 27, 2015
    Messages:
    134
    Likes Received:
    27
    Damn so i guess private proxies to scrape then?
     
  5. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    585
    Save the private proxies to post with. Scrape with public proxies, but get a good scraper like GSA, or subscribe to a proxy mailing list.
     
  6. bigbrothers

    bigbrothers Regular Member

    Joined:
    Jul 15, 2014
    Messages:
    391
    Likes Received:
    80
    Gender:
    Male
    Occupation:
    Seo Company
    Well, if you are using proxies with URL Profiler, there are a number of things that could be causing them to fail.

    First of all, it is important to recognise when proxies are actually required. In most cases, you only need proxies to perform tasks such as Google Index checks, Duplicate Content checks and Drop Domain checks.

    You can use proxies for URL Scraping (readability) and HTTP status, but in most cases this is simply not necessary. It becomes necessary if you are profiling a lot of URLs from the same domain and the site in question has security protocols that block repeated requests. You will know if this has happened because you will get 403 Forbidden errors.

    I recommend private, anonymous proxies. Scrapebox have a handy proxy checker which allows you to check if your proxies are working properly. This is what good proxies look like,

    proxy pass.png

    - BB
     
  7. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    585
    I can pretty much guarantee you that scraping like this will make private proxies fail. This image will also show why I recommend using public proxies to scrape with; the errors are from proxies that are failing.

    [​IMG]

    But do as you will.
     
  8. loverdo

    loverdo Newbie

    Joined:
    Aug 12, 2014
    Messages:
    29
    Likes Received:
    14
    I can't find any public google passed proxy's too, I quiet google and jumped into bing.
     
  9. raiju

    raiju Junior Member

    Joined:
    Oct 27, 2015
    Messages:
    134
    Likes Received:
    27
    Do you have any recommendations on how to get public proxies to work for google? Should i just be scraping a lot more proxies?
     
  10. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,799
    Likes Received:
    2,026
    Gender:
    Male
    Home Page:
    There isn't a wrong answer. IF you can find public proxies by using your own proxy sources, which you can build into scrapebox, then you can do it.

    Also google isn't the only engine and many public proxies will work for other engines.

    Further I use private proxies for scraping all the time. I have hundreds of them for other things, why not use them for scraping? Just use the detailed harvester and if need be use a delay and as long as you go slow enough to keep the time between each ip being used low enough that you don't get banned, your set. I love private proxies because they are set it and forget it, no testing, no upkeep, they just work.

    But don't get caught up in what is "right" or "wrong" or what you "should" do, you can find "proof" to support anything.

    Just do what works. Public proxies can be an option if you find good sources or just buy prefiltered lists.

    Private proxies can be an option if you do it correctly as well.
     
    • Thanks Thanks x 2
  11. immaletyoufinish

    immaletyoufinish Regular Member

    Joined:
    Mar 3, 2016
    Messages:
    219
    Likes Received:
    113
    I got some private proxies and even tho they pass google check I still can't scrape google with them.
     
  12. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,799
    Likes Received:
    2,026
    Gender:
    Male
    Home Page:
    build a custom test in the proxy manager for the type of query you are using. There are a lot of different kinds/levels of blocks and the proxy tester only checks for basic keyword search.

    https://www.youtube.com/watch?v=P9CbGhfc1aY
     
  13. Bahmer

    Bahmer Regular Member

    Joined:
    Jul 8, 2015
    Messages:
    261
    Likes Received:
    60
  14. Akandor

    Akandor Newbie

    Joined:
    Jan 9, 2015
    Messages:
    35
    Likes Received:
    4
    there are also reverse proxies, working good
     
  15. proxygo

    proxygo Jr. VIP Jr. VIP

    Joined:
    Nov 2, 2008
    Messages:
    18,470
    Likes Received:
    10,067
    Occupation:
    PROVIDING PROXIES FOR GSA SCRAPING.
    Location:
    BHW
    Home Page:
    there are still plenty G passed public proxies - just gotta no where to find them
    and u wont get hi amounts via public scraping proxies, but only from either private proxies
    or port scanned public proxies which can not be scraped.

    port scanned public proxys today
    [​IMG]
     
  16. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    585

    OMG, the withdrawls; two days in the cardiac care unit with a computer that does not have the site passwords.

    A plug for GSA is that in has a port scanner. But you would not necessarily use a program geared to SEO; for port scanning you could use NMap and a few other programs if you chose. Port scanning is a time intensive undertaking.

    As far as proxy lists, if Tom Pots had an opening in his sales list, you could not go wrong with his proxy lists.
     
  17. proxygo

    proxygo Jr. VIP Jr. VIP

    Joined:
    Nov 2, 2008
    Messages:
    18,470
    Likes Received:
    10,067
    Occupation:
    PROVIDING PROXIES FOR GSA SCRAPING.
    Location:
    BHW
    Home Page:
    nmap and gsa port scanner as far as i no isnt a tcp port scanner but a syn scanner
    you want a tcp port scanner for best results. trust me i been doin this since 2003
     
  18. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    585
    Here is the GSA proxy port scanner in operation. I started it yesterday and am scanning ports in IP ranges that have several Google passed proxys per GSA Proxy Scanner:

    [​IMG]

    As far as Nmap, you are misunderstanding what it is. As a tool for Pen testing, it must have the ability to connect through a proxy to an IP. If it can connect to an IP through a proxy, it can scan for proxys.

    ## Scan TCP port 80
    nmap -p T:80 192.168.1.1

    The script attempts to connect to www. google. com through the proxy and checks for a valid HTTP response code. Valid HTTP response codes are 200, 301, and 302. If the target is an open proxy, this script causes the target to retrieve a web page from www. google. com.

    nmap --script http-open-proxy.nse \
    --script-args proxy.url=<url>,proxy.pattern=<pattern>

    Script Output

    Interesting ports on scanme.nmap.org (64.13.134.52):
    PORT STATE SERVICE
    8080/tcp open http-proxy
    | proxy-open-http: Potentially OPEN proxy.
    |_ Methods successfully tested: GET HEAD CONNECT
    https://nmap.org/nsedoc/scripts/http-open-proxy.html
    https://nmap.org/book/man-port-scanning-techniques.html

    The default scan is a syn scan, but that does not mean that other scans are not possible.
    Code:
    local proxy = require "proxy"local shortport = require "shortport"
    local stdnse = require "stdnse"
    local string = require "string"
    local table = require "table"
    local url = require "url"
    
    description=[[
    Checks if an HTTP proxy is open.
    
    The script attempts to connect to www.google.com through the proxy and
    checks for a valid HTTP response code. Valid HTTP response codes are
    200, 301, and 302. If the target is an open proxy, this script causes
    the target to retrieve a web page from www.google.com.
    ]]
    
    ---
    -- @args proxy.url Url that will be requested to the proxy
    -- @args proxy.pattern Pattern that will be searched inside the request results
    --
    -- @usage
    -- nmap --script http-open-proxy.nse \
    --      --script-args proxy.url=<url>,proxy.pattern=<pattern>
    -- @output
    -- Interesting ports on scanme.nmap.org (64.13.134.52):
    -- PORT     STATE SERVICE
    -- 8080/tcp open  http-proxy
    -- |  proxy-open-http: Potentially OPEN proxy.
    -- |_ Methods successfully tested: GET HEAD CONNECT
    
    -- Arturo 'Buanzo' Busleiman <[email protected]> / www.buanzo.com.ar / linux-consulting.buanzo.com.ar
    -- Changelog: Added explode() function. Header-only matching now works.
    --   * Fixed set_timeout
    --   * Fixed some \r\n's
    -- 2008-10-02 Vlatko Kosturjak <[email protected]>
    --   * Match case-insensitively against "^Server: gws" rather than
    --     case-sensitively against "^Server: GWS/".
    -- 2009-05-14 Joao Correa <[email protected]>
    --   * Included tests for HEAD and CONNECT methods
    --   * Included url and pattern arguments
    --   * Script now checks for http response status code, when url is used
    --   * If google is used, script checks for Server: gws
    
    author = "Arturo 'Buanzo' Busleiman"
    license = "Same as Nmap--See https://nmap.org/book/man-legal.html"
    categories = {"default", "discovery", "external", "safe"}
    
    --- Performs the custom test, with user's arguments
    -- @param host The host table
    -- @param port The port table
    -- @param test_url The url te send the request
    -- @param pattern The pattern to check for valid result
    -- @return status if any request succeeded
    -- @return response String with supported methods
    function custom_test(host, port, test_url, pattern)
      local lstatus = false
      local response = {}
      -- if pattern is not used, result for test is code check result.
      -- otherwise it is pattern check result.
    
      -- strip hostname
      if not string.match(test_url, "^http://.*") then
        test_url = "http://" .. test_url
        stdnse.debug1("URL missing scheme. URL concatenated to http://")
      end
      local url_table = url.parse(test_url)
      local hostname = url_table.host
    
      local get_status = proxy.test_get(host, port, "http", test_url, hostname, pattern)
      local head_status = proxy.test_head(host, port, "http", test_url, hostname, pattern)
      local conn_status = proxy.test_connect(host, port, "http", hostname)
      if get_status then
        lstatus = true
        response[#response+1] = "GET"
      end
      if head_status then
        lstatus = true
        response[#response+1] = "HEAD"
      end
      if conn_status then
        lstatus = true
        response[#response+1] = "CONNECTION"
      end
      if lstatus then response = "Methods supported: " .. table.concat(response, " ") end
      return lstatus, response
    end
    
    --- Performs the default test
    -- First: Default google request and checks for Server: gws
    -- Seconde: Request to wikipedia.org and checks for wikimedia pattern
    -- Third: Request to computerhistory.org and checks for museum pattern
    --
    -- If any of the requests is successful, the proxy is considered open
    -- If all get requests return the same result, the user is alerted that
    -- the proxy might be redirecting his requests (very common on wi-fi
    -- connections at airports, cafes, etc.)
    --
    -- @param host The host table
    -- @param port The port table
    -- @return status if any request succeeded
    -- @return response String with supported methods
    function default_test(host, port)
      local fstatus = false
      local cstatus = false
      local get_status, head_status, conn_status
      local get_r1, get_r2, get_r3
      local get_cstatus, head_cstatus
    
      -- Start test n1 -> google.com
      -- making requests
      local test_url = "http://www.google.com"
      local hostname = "www.google.com"
      local pattern  = "^server: gws"
      get_status, get_r1, get_cstatus = proxy.test_get(host, port, "http", test_url, hostname, pattern)
      local _
      head_status, _, head_cstatus = proxy.test_head(host, port, "http", test_url, hostname, pattern)
      conn_status = proxy.test_connect(host, port, "http", hostname)
    
      -- checking results
      -- conn_status use a different flag (cstatus)
      -- because test_connection does not use patterns, so it is unable to detect
      -- cases where you receive a valid code, but the response does not match the
      -- pattern.
      -- if it was using the same flag, program could return without testing GET/HEAD
      -- once more before returning
      local response = {}
      if get_status then fstatus = true; response[#response+1] = "GET" end
      if head_status then fstatus = true; response[#response+1] = "HEAD" end
      if conn_status then cstatus = true; response[#response+1] = "CONNECTION" end
    
      -- if proxy is open, return it!
      if fstatus then return fstatus, "Methods supported: " .. table.concat(response, " ") end
    
      -- if we receive a invalid response, but with a valid
      -- response code, we should make a next attempt.
      -- if we do not receive any valid status code,
      -- there is no reason to keep testing... the proxy is probably not open
      if not (get_cstatus or head_cstatus or conn_status) then return false, nil end
      stdnse.debug1("Test 1 - Google Web Server\nReceived valid status codes, but pattern does not match")
    
      test_url = "http://www.wikipedia.org"
      hostname = "www.wikipedia.org"
      pattern  = "wikimedia"
      get_status, get_r2, get_cstatus = proxy.test_get(host, port, "http", test_url, hostname, pattern)
      head_status, _, head_cstatus = proxy.test_head(host, port, "http", test_url, hostname, pattern)
      conn_status = proxy.test_connect(host, port, "http", hostname)
    
      if get_status then fstatus = true; response[#response+1] = "GET" end
      if head_status then fstatus = true; response[#response+1] = "HEAD" end
      if conn_status then
        if not cstatus then response[#response+1] = "CONNECTION" end
        cstatus = true
      end
    
      if fstatus then return fstatus, "Methods supported: "  .. table.concat(response, " ") end
    
      -- same valid code checking as above
      if not (get_cstatus or head_cstatus or conn_status) then return false, nil end
      stdnse.debug1("Test 2 - Wikipedia.org\nReceived valid status codes, but pattern does not match")
    
      test_url = "http://www.computerhistory.org"
      hostname = "www.computerhistory.org"
      pattern  = "museum"
      get_status, get_r3, get_cstatus = proxy.test_get(host, port, "http", test_url, hostname, pattern)
      conn_status = proxy.test_connect(host, port, "http", hostname)
    
      if get_status then fstatus = true; response[#response+1] = "GET" end
      if conn_status then
        if not cstatus then response[#response+1] = "CONNECTION" end
        cstatus = true
      end
    
      if fstatus then return fstatus, "Methods supported:" .. table.concat(response, " ") end
      if not get_cstatus then
        stdnse.debug1("Test 3 - Computer History\nReceived valid status codes, but pattern does not match")
      end
    
      -- Check if GET is being redirected
      if proxy.redirectCheck(get_r1, get_r2) and proxy.redirectCheck(get_r2, get_r3) then
        return false, "Proxy might be redirecting requests"
      end
    
      -- Check if at least CONNECTION worked
      if cstatus then return true, "Methods supported:" .. table.concat(response, " ") end
    
      -- Nothing works...
      return false, nil
    end
    
    portrule = shortport.port_or_service({8123,3128,8000,8080},{'polipo','squid-http','http-proxy'})
    
    action = function(host, port)
      local supported_methods = "\nMethods successfully tested: "
      local fstatus = false
      local def_test = true
      local test_url, pattern
    
      test_url, pattern = proxy.return_args()
    
      if(test_url) then def_test = false end
      if(pattern) then pattern = ".*" .. pattern .. ".*" end
    
      if def_test
        then fstatus, supported_methods = default_test(host, port)
        else fstatus, supported_methods = custom_test(host, port, test_url, pattern);
      end
    
      -- If any of the tests were OK, then the proxy is potentially open
      if fstatus then
        return "Potentially OPEN proxy.\n" .. supported_methods
      elseif not fstatus and supported_methods then
        return supported_methods
      end
      return
    
    end
    However, Proxygo is another decent proxy proxy list seller. Many find it more convenient to subscribe to proxy lists rather than to scan for proxies because proxy scanning is a time and resource intensive process.

    Any proxy list seller such as Tom Pots or Proxygo are well worth the money they ask due to the time intensiveness of proxy scanning!
     
    Last edited: Mar 24, 2016
  19. Devil Rider

    Devil Rider BANNED BANNED

    Joined:
    Jul 24, 2015
    Messages:
    554
    Likes Received:
    59
    Buddy, they are also reverse proxies, working well...
     
  20. Iqbal khan

    Iqbal khan Newbie

    Joined:
    Feb 27, 2017
    Messages:
    1
    Likes Received:
    0
    Gender:
    Male

    i am new to scrap box i bought scrap box and private proxies from instant proxies my problem is those proxies keep failing test on scrap box saying 403 forbidden and when i try to harvest from scrap box public proxies notihng come up as passed on google can some one please able to help me regards
     

    Attached Files:

    Last edited: Mar 2, 2017