1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Developing in Python

Discussion in 'Other Languages' started by blasphemous, Jan 28, 2012.

  1. blasphemous

    blasphemous Newbie

    Joined:
    Jan 28, 2012
    Messages:
    2
    Likes Received:
    0
    Just thought I'd share what I'm working on with y'all. I would greatly appreciate constructive criticism and/or advice on making things work better.

    Disclaimer: I just started learning python 2 days ago [25JAN2012 at the time of writing] (and my computer science education ended at the high school level several years ago). As such, please excuse any grievous transgressions I make in speech, implementation, or action.

    The following is my attempt at a proxy-checker. I realize that I may just be reinventing the wheel (props to BHW's captchaman for his beautiful Proxy Madness 4), but this particular script is going to be a component in a much larger (and hopefully useful) suite.

    More on that later. Sorry about the code not being 100% out-of-the-post functional; I apparently need to wait longer before I can have links in my posts.

    Code:
    # takes untested.txt full of proxies and tries to connect to yahoo.
    # if successful, adds it to confirmed.txt and then tries the next one in the list.
    # untested.txt's format is one proxy entry per line - ip.ip.ip.ip:port
    import socks
    
    class proxyChecker():
    	def test(self,proxtype):
    		goodList = open('confirmed.txt', 'w')
    		for line in open('untested.txt'):
    			print 'starting new check'
    			proxy = line.split(':')
    			pIP = proxy[0]
    			pPort = int(proxy[1])
    			if not self.check(proxtype,pIP,pPort):
    				goodList.write(line)
    				print 'verified: ', line
    		goodList.close()
    
    	def check(self,proxtype,addr,port):
    		s = socks.socksocket()
    		s.setblocking(True)
    		s.settimeout(10)
    		if proxtype is 'http':
    			s.setproxy(socks.PROXY_TYPE_HTTP,addr,port)
    		elif proxtype is 'socks':
    			s.setproxy(socks.PROXY_TYPE_SOCKS5,addr,port)
    		try:
    			s.connect(("[B]INSERT LINK HERE[/B]",80))
    		except Exception, detail:
    			print 'Exception: ', detail
    			return True
    		return False
    
    Drone = proxyChecker()
    Drone.test('http')
     
    Last edited: Jan 28, 2012
  2. Deusdies

    Deusdies Regular Member

    Joined:
    May 22, 2009
    Messages:
    261
    Likes Received:
    190
    I'm the creator of the tool in my sig. uberBlogCreator was written in Python. Since uBC incorporates a proxy checker, I'll share a portion of the code here :)

    Code:
    goodProxies = []
    br = mechanize.Browser()
    proxies = open(proxyfile, mode="r") #load the proxy file
    parsed_proxies = [p.strip() for p in proxies.readlines()]
    proxies.close()
    for i in xrange(len(parsed_proxies)):
      proxy = parsed_proxies[i]
    
      try:
            br.set_proxies({"http" : proxy})
            br.open("http://www.google.com", timeout=3)
            print "The proxy in testproxy works!"
            goodProxies.append(proxy)
      except: #mechanize raises an exception if it cannot reach the host
            print "The proxy in testproxy does not work"
            return
    
    if goodProxies != []:
      goodProxyFile = open("goodproxyfile.txt", "w")
      goodProxyFile.write(goodProxies)
      goodProxyFile.close()
    
    This doesn't check for SOCKS proxies though, only http (and https). However, because of the mechanize library, it also supports private proxies (those requiring authentication).
     
    • Thanks Thanks x 2
    Last edited: Jan 28, 2012
  3. blasphemous

    blasphemous Newbie

    Joined:
    Jan 28, 2012
    Messages:
    2
    Likes Received:
    0
    Something about the script I just finished (dealing with internet relay chat interfaces) triggers the moderation filter - I can't post it somewhere and link to it, and I'm not sure I should be trying to circumvent a filter anyway.

    So here's the concept:

    A class inherits the SingleServer___Bot, and uses
    Code:
    lines = (file.read()).splitlines()
    to populate a list with the configuration details (stored in SW.CONF for ease of modification).

    The list is then used to start the SS_B derivative (each line is one of the arguments, in order).

    I've tested it, and it connects to a server and acknowledges most commands - the only one I'm having trouble with is the command that should make it change its own nickname - akin to a /nick somenick command in a common client.

    If/when the restrictions on me are lifted, I'll edit this (or add another post) to include the actual code.

    After the inital post, this one feels a bit bare.

    Next up on my to-do list:
    -Cleaner, faster Proxy checking - possibly uBC's code (or something like it). So clean!

    -Proxy scraping/leeching - need to find reliable sources and then figure out how to scrape proxies without extracting extraneous information. Gathering by hand (or manually clicking "gather proxies" and then "save" is time-consuming).

    -Some implementation of Selenium. Parse pages, look for links, map a suspected route to a desired page and then behave like a human who would follow that path.

    -Some sort of mastermind that controls the above components in an intelligent manner.

    I plan on distributing this later, too - across many machines, if possible, all of whom need a unique nickname when they connect to the server (uuid library?). This means I should also have a script that can delegate tasks and rapidly process input/output between humans (me) and the distributed taskforce (scripts).

    Does this sound feasible? What do y'all think?

    This is a really strict moderation filter.
     
  4. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,148
    Of course it 's feasible, it 's also a lot of work :D
     
  5. Divroc

    Divroc Newbie

    Joined:
    Feb 8, 2012
    Messages:
    4
    Likes Received:
    0
    If you want to upload code, just throw it on pastebin or a similar paste site, and link it here.

    Regarding the idea:

    You can just use file.readlines(), saves space and time. Also, "for var in file" automatically iterates over each line, which is even more efficient than doing a .readlines() call beforehand.

    Also, I'm not sure that idea is feasible but you'd have to test it. You generally can't control HTTP or SOCKS proxies in the same way one would control drones in a botnet. They're pretty limited in what they can do. It's better to do something like send a simple HTTP request out to thousands of probably-working proxies, if you're trying to spam in some way. Your command and control can all be done from within your program, with no IRC required in my opinion. I'm not sure what kind of commands you had in mind though.
     
  6. weedsmoker

    weedsmoker Junior Member

    Joined:
    May 2, 2011
    Messages:
    190
    Likes Received:
    79
    Stay away from Selenium unless pages you scrape heavily rely on javascript, use mechanize library it's way faster (and in my opinion easier to maintain).
     
  7. Deusdies

    Deusdies Regular Member

    Joined:
    May 22, 2009
    Messages:
    261
    Likes Received:
    190
    Agreed, doesn't Selenium require you to run a JAVA server or something? Mechanize is much better, unless like weedsmoker said, you need JS parser. But if you do, I suggest you get PyQt or PySide and lear QtWebKit :)
     
    • Thanks Thanks x 1
  8. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,148
    @Deusdies

    Selenium 2 comes in various flavors, not only Java. Selenium 1 server was indeed only Java based.
     
  9. jamb0ss

    jamb0ss Junior Member

    Joined:
    Feb 9, 2012
    Messages:
    125
    Likes Received:
    45
    Occupation:
    Bots programming
    Check Scrapy framework
     
  10. LongBanana

    LongBanana Regular Member

    Joined:
    Oct 23, 2009
    Messages:
    411
    Likes Received:
    247
    Location:
    Chicago, IL
    You should look into Scrapy for scraping proxies
    I would say bash scripts & cron jobs for controlling them or even a simple website using PHP
     
  11. terabai

    terabai Newbie

    Joined:
    Aug 22, 2012
    Messages:
    3
    Likes Received:
    0
    The word is scapy and not scrapy i think
     
  12. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,148
    • Thanks Thanks x 1
  13. Hermetik

    Hermetik Newbie

    Joined:
    Apr 6, 2011
    Messages:
    46
    Likes Received:
    8
    Anyone awake? Is this the Python megathread?

    Scrapy is wonderful. I have jumped on the nginx+gunicorn+django bandwagon and scrapy has great integration with that stack via django-dynamic-scraper.
     
    • Thanks Thanks x 1
  14. Question

    Question Registered Member

    Joined:
    Aug 14, 2011
    Messages:
    51
    Likes Received:
    32
    Scrapy is awesome, however there is a strong alternative named Grab. I used it once and for some requirements it might suit better.

    Unfortunately the docs are in russian...
     
    • Thanks Thanks x 1