Programming Bots

Kontroller · Jul 7, 2011

I'm interested in programming bots to do similiar jobs to Tweet Attacks and Scrapebox (obviously on a much smaller scale).

I have a decent amount of experience in Python and Javascript, and was wondering what programming language you guys would recommend learning in order to achieve my goal. Also if you know any resources from which I could start learning, even if it is just a source code archive.

Thanks

Chris22 · Jul 7, 2011

If you have a good deal of experience with python then to be honest that is all you need. I find python is GREAT for building to script to automate a task fast. Otherwise I'd definitely check out any of the .NET languages.

hotleatherdreams · Jul 7, 2011

im pretty decent at bots using VB6 and windows api. idk jack about python so i cant comment how it compares.

other_henry · Jul 7, 2011

Python is a good language for scraping.

Check out mechanize and beautiful soup which can do most of the work.

wowhaxor · Jul 7, 2011

I think most programs like this are .NET based you can start with something easy like VB and work your way from there - will shorten your learning curve quite a bit.

Weaves87 · Jul 8, 2011

If you know regular expressions, you can get a very nice link scrape type program up and running in python in a very fast amount of time. It didn't take me very long to create a recursive link scrape script, and it wouldn't take much to get it to be able to post to websites either.

The following modules would be of interest if you're trying to do this kind of stuff in python: re, urllib, urllib2.

The .NET languages can do this kind of stuff too, but I generally recommend python because it's dynamic and extremely easy to get something up and running quickly. Really good for testing ideas out.

Hope this helps

edit: let me check and see if I have my old link scraping script laying around somewhere... I had made it about 3-4 years ago and I'm not sure if it's still around somewhere. I'll PM you if I find it, you might find it useful.

edit2: So I found the python code, but I can't PM because I'm less than 15 posts. Hah. And it's not letting me post the original python code here because some of my code is triggering the spam filter (says "You are NOT allowed to post URLs, email addresses...etc")

Hehe.

Sooo I modified the code a bit to please the spam filter here on the forum. It should hopefully still work the same. I make no guarantees though

Code:

import re
import sys
import string
from urllib import *
import urlparse

if len(sys.argv) < 2:
    print "usage: <python> ", sys.argv[0], " <url>"
    sys.exit()

# define some constants
LINKDEPTH   = 2        # don't go depth first into links more than 5 times or else it locks up
QSIGNAL     = 0        # if QSIGNAL == 1 that means emergency stop

linkre = r"a[^h]+href[ \t]*=[ \t]*[\'\"]?([^\'\"> ]*)[\'\"]?[ \t]*"

imgre  = r"img[^s]+src[ \t]*=[ \t]*[\'\"]?([^\'\"> ]*)[\'\"]?[ \t]*"

# compile the regular expressions
locatel = re.compile(linkre)
locatei = re.compile(imgre)

def init():
    return file("output.txt", "w")

def cleanup(f):
    f.write(".\n")
    f.close()

def locate(fn, d, f):
    
    global LINKDEPTH
    
    # don't go over the recursion number
    if d == LINKDEPTH:
        return
         
    try:
        # opening some random URL has a good probability of failing, so..
        fi = urlopen(fn)
        tomatch = string.lower(fi.read())
        fi.close()
    except:
        # if it fails with an exception, just get out of there
        return
        
    # find all the links on the webpage
    ml = locatel.findall(tomatch)
    
    # find all the images on the webpage
    mi = locatei.findall(tomatch)
    
    f.write("\nlinks found at " + fn + " --------------+\n")        
    for x in ml:
        um = urlparse.urlparse(x)[2]
        if um[len(um)-4:] == '.jpg':
            f.write("!   " + urlparse.urljoin(fn, x) + "\n")
        print "going to " + x
        locate(x, d+1, f)
        print "returning from " + x

    f.write("\nimages found at " + fn + " --------------+\n")
    for x in mi:
        f.write("!   " + urlparse.urljoin(fn, x) + "\n")

fl = init()
print "going to " + sys.argv[1]
locate(sys.argv[1], 0, fl)
cleanup(fl)

The snippet above essentially crawls through a site, going 2 levels deep (it'll go inside a link inside a link) and prints out all the links that point to a JPG. I think I used it to scrape some thumbnails and their blown up pictures off a site a few times. Obviously you'd want to adapt the code to do what you want but it kind of shows you the foundation of how you make something like this.

Programming Bots

Kontroller

Newbie

Chris22

Regular Member

hotleatherdreams

Registered Member

other_henry

Junior Member

wowhaxor

Elite Member

Weaves87

Newbie

Main Menu

Marketplace

Making Money

BlackHat World