1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

python crawler

Discussion in 'Black Hat SEO' started by 777lady, Jan 13, 2015.

  1. 777lady

    777lady Newbie

    Joined:
    Sep 16, 2014
    Messages:
    4
    Likes Received:
    0
    Hello,

    i try to write a python serp crawler with proxy.
    I have there now a problem that proxy work in browser but not with the python script, google block the request.

    This is the passage in my script

    response = request.get(url, proxies=proxies, headers=headers, verify=True)

    Can anyone helpme?

    Thanks

    Lady
     
  2. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    12,767
    Likes Received:
    11,424
    Occupation:
    COINZ
    Location:
    BUYAH
    Home Page:
    Are you changing the user agent string? Google will block user agent strings from default python libraries.
     
  3. 777lady

    777lady Newbie

    Joined:
    Sep 16, 2014
    Messages:
    4
    Likes Received:
    0
    yes i do i have about 20 user agents which i everytime also change.
    Is there mabye some user agents which google do not like?
     
  4. myopic1

    myopic1 Regular Member

    Joined:
    Mar 24, 2014
    Messages:
    408
    Likes Received:
    404
    You are correct, proxies do not work from terminal unless you specifically incorporate it in your code to do so.

    Simply do the following:

    arrayOfProxies = ["xxx.xxx.xxx.xxx", "xxx.xxx.xxx.xxx"]
    for x in arrayOfProxies:

    r = requests.get(url, headers=headers, proxies=x)

     
    Last edited: Jan 13, 2015
  5. 777lady

    777lady Newbie

    Joined:
    Sep 16, 2014
    Messages:
    4
    Likes Received:
    0
    I try this also, but same effect. :(
     
  6. TheVegan

    TheVegan Junior Member

    Joined:
    Mar 6, 2013
    Messages:
    179
    Likes Received:
    33
    Occupation:
    blackhat
    Location:
    Prague
    First I would say just don't use google, they're pretty good at blocking.. I would use yahoo or some other random search engine... Buuuut, what you can also use is Tor browser, if your using python it's pretty easy to connect via tor.
     
  7. tony_d

    tony_d Elite Member

    Joined:
    Jun 22, 2013
    Messages:
    2,583
    Likes Received:
    3,179
    Location:
    1600 Amphitheatre Parkway, Mountain View CA
    That would be about as useful as a chocolate teapot... who wants a serp checker that just checks yahoo? Nobody.
     
    • Thanks Thanks x 1
  8. seoproranker

    seoproranker Newbie

    Joined:
    Nov 1, 2012
    Messages:
    33
    Likes Received:
    4
    Few public proxies will work against G

    you need to use dedicated private proxies

    and you are not in a hurry Tor ant so bad... really I think....never tried Tor hehe
     
  9. myopic1

    myopic1 Regular Member

    Joined:
    Mar 24, 2014
    Messages:
    408
    Likes Received:
    404
    I really don't know why that wouldn't work for you but I would put good money on it not being the fault of the requests library...which is a pretty awesome library. My advice would be to A) Check your proxies actually work B) Check what sort of authentication your proxies use (if any) and make sure you're providing it C) Use a different library such as urllib2 or something similar.
     
  10. 777lady

    777lady Newbie

    Joined:
    Sep 16, 2014
    Messages:
    4
    Likes Received:
    0
    A and B I checked, also I use dedicated private proxies. I think I will try C now maybe this can solve the problem. Will tell you if it than works.