1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Programming Bots

Discussion in 'Black Hat SEO' started by Kontroller, Jul 7, 2011.

  1. Kontroller

    Kontroller Newbie

    Joined:
    Jul 5, 2011
    Messages:
    29
    Likes Received:
    0
    I'm interested in programming bots to do similiar jobs to Tweet Attacks and Scrapebox (obviously on a much smaller scale).

    I have a decent amount of experience in Python and Javascript, and was wondering what programming language you guys would recommend learning in order to achieve my goal. Also if you know any resources from which I could start learning, even if it is just a source code archive.

    Thanks
     
  2. Chris22

    Chris22 Regular Member

    Joined:
    Sep 29, 2010
    Messages:
    400
    Likes Received:
    1,061
    If you have a good deal of experience with python then to be honest that is all you need. I find python is GREAT for building to script to automate a task fast. Otherwise I'd definitely check out any of the .NET languages.
     
  3. hotleatherdreams

    hotleatherdreams Registered Member

    Joined:
    Mar 29, 2010
    Messages:
    78
    Likes Received:
    18
    Occupation:
    ecommerce website CODER - SUCK at design and SEO
    Location:
    third rock from hell
    im pretty decent at bots using VB6 and windows api. idk jack about python so i cant comment how it compares.
     
  4. other_henry

    other_henry Junior Member

    Joined:
    Jun 1, 2011
    Messages:
    107
    Likes Received:
    19
    Occupation:
    Freelance coder, server guy
    Location:
    US
    Python is a good language for scraping.

    Check out mechanize and beautiful soup which can do most of the work.
     
  5. wowhaxor

    wowhaxor Elite Member

    Joined:
    Apr 28, 2007
    Messages:
    2,020
    Likes Received:
    3,361
    Location:
    ?¿?
    Home Page:
    I think most programs like this are .NET based you can start with something easy like VB and work your way from there - will shorten your learning curve quite a bit.
     
  6. Weaves87

    Weaves87 Newbie

    Joined:
    Jul 6, 2011
    Messages:
    1
    Likes Received:
    1
    If you know regular expressions, you can get a very nice link scrape type program up and running in python in a very fast amount of time. It didn't take me very long to create a recursive link scrape script, and it wouldn't take much to get it to be able to post to websites either.

    The following modules would be of interest if you're trying to do this kind of stuff in python: re, urllib, urllib2.

    The .NET languages can do this kind of stuff too, but I generally recommend python because it's dynamic and extremely easy to get something up and running quickly. Really good for testing ideas out.

    Hope this helps :)

    edit: let me check and see if I have my old link scraping script laying around somewhere... I had made it about 3-4 years ago and I'm not sure if it's still around somewhere. I'll PM you if I find it, you might find it useful.

    edit2: So I found the python code, but I can't PM because I'm less than 15 posts. Hah. And it's not letting me post the original python code here because some of my code is triggering the spam filter (says "You are NOT allowed to post URLs, email addresses...etc") :rolleyes: Hehe.

    Sooo I modified the code a bit to please the spam filter here on the forum. It should hopefully still work the same. I make no guarantees though :p

    Code:
    import re
    import sys
    import string
    from urllib import *
    import urlparse
    
    if len(sys.argv) < 2:
        print "usage: <python> ", sys.argv[0], " <url>"
        sys.exit()
    
    # define some constants
    LINKDEPTH   = 2        # don't go depth first into links more than 5 times or else it locks up
    QSIGNAL     = 0        # if QSIGNAL == 1 that means emergency stop
    
    linkre = r"a[^h]+href[ \t]*=[ \t]*[\'\"]?([^\'\"> ]*)[\'\"]?[ \t]*"
    
    imgre  = r"img[^s]+src[ \t]*=[ \t]*[\'\"]?([^\'\"> ]*)[\'\"]?[ \t]*"
    
    # compile the regular expressions
    locatel = re.compile(linkre)
    locatei = re.compile(imgre)
    
    def init():
        return file("output.txt", "w")
    
    def cleanup(f):
        f.write(".\n")
        f.close()
    
    def locate(fn, d, f):
        
        global LINKDEPTH
        
        # don't go over the recursion number
        if d == LINKDEPTH:
            return
             
        try:
            # opening some random URL has a good probability of failing, so..
            fi = urlopen(fn)
            tomatch = string.lower(fi.read())
            fi.close()
        except:
            # if it fails with an exception, just get out of there
            return
            
        # find all the links on the webpage
        ml = locatel.findall(tomatch)
        
        # find all the images on the webpage
        mi = locatei.findall(tomatch)
        
        f.write("\nlinks found at " + fn + " --------------+\n")        
        for x in ml:
            um = urlparse.urlparse(x)[2]
            if um[len(um)-4:] == '.jpg':
                f.write("!   " + urlparse.urljoin(fn, x) + "\n")
            print "going to " + x
            locate(x, d+1, f)
            print "returning from " + x
    
        f.write("\nimages found at " + fn + " --------------+\n")
        for x in mi:
            f.write("!   " + urlparse.urljoin(fn, x) + "\n")
    
    fl = init()
    print "going to " + sys.argv[1]
    locate(sys.argv[1], 0, fl)
    cleanup(fl)
    
    The snippet above essentially crawls through a site, going 2 levels deep (it'll go inside a link inside a link) and prints out all the links that point to a JPG. I think I used it to scrape some thumbnails and their blown up pictures off a site a few times. Obviously you'd want to adapt the code to do what you want but it kind of shows you the foundation of how you make something like this.
     
    • Thanks Thanks x 1
    Last edited: Jul 8, 2011