1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Facebook crawler in python

Discussion in 'Other Scripting Languages' started by killingdude, Dec 29, 2010.

  1. killingdude

    killingdude Newbie

    Joined:
    Dec 29, 2010
    Messages:
    3
    Likes Received:
    0
    Hey,

    iam building a FB-crawler in Python2.7.

    Iam looking for someone to split the work.

    PN me if you have have basic knowledge in Python, Javascript and Captcha breaking with open source librarys.

    killingdude
     
  2. kkvsam

    kkvsam Senior Member

    Joined:
    Oct 11, 2009
    Messages:
    936
    Likes Received:
    569
    Occupation:
    SYS ADMIN
    Home Page:
    What do you mean by facebook crawler?
    Email extracting or anyother ?
     
  3. killingdude

    killingdude Newbie

    Joined:
    Dec 29, 2010
    Messages:
    3
    Likes Received:
    0
    the main features will be :
    -process User:pass file
    -login user to fb00k
    -proxysupport
    -start "crawling" from selected profil
    - save basic information about this profile in DB
    -> automaticly add friends of this profile
    - visit next profile from the information of the crawled one
    - ######And so on .. (loop)
     
    Last edited: Dec 29, 2010
  4. houston27

    houston27 Registered Member

    Joined:
    Nov 26, 2010
    Messages:
    93
    Likes Received:
    73
    Location:
    Miami, FL, USA
    Home Page:
    It helps to write modules in C or C++ to extend the Python interpreter with new modules. Those modules can define new functions but also new object types and their methods. The document also describes how to embed the Python interpreter in another application, for use as an extension language.

    Finally, it shows how to compile and link extension modules so that they can be loaded dynamically (at run time) into the interpreter, if the underlying operating system supports this feature.
     
  5. lolzor60

    lolzor60 Newbie

    Joined:
    Dec 28, 2010
    Messages:
    6
    Likes Received:
    7
    I can't really help you here, but good luck with it!
    Would be happy to test it out when it's done!

    Have a good day!
    Regards, from lolzor60.
     
  6. nmxxx

    nmxxx Newbie

    Joined:
    Aug 16, 2010
    Messages:
    17
    Likes Received:
    0
    :edit..
     
  7. wu1239

    wu1239 Newbie

    Joined:
    Jun 4, 2011
    Messages:
    16
    Likes Received:
    0
    done?
     
  8. Baybo.it

    Baybo.it Registered Member

    Joined:
    Aug 9, 2011
    Messages:
    72
    Likes Received:
    39
    Occupation:
    Founder of Baybo.it
    Location:
    San Francisco
    Home Page:
    Why not just use the facebook API using the open graph protocol? You can emulate login using OAuth2 (and have the script login) and then access the user's info directly through the API.
     
  9. wu1239

    wu1239 Newbie

    Joined:
    Jun 4, 2011
    Messages:
    16
    Likes Received:
    0
    I'm interested in this, could you PM me your mail?
    My post is less than 15 so I could not PM you.
     
  10. allysona

    allysona Newbie

    Joined:
    Aug 17, 2011
    Messages:
    30
    Likes Received:
    5
    as a word of advice, test your script manually before writing it

    will save yourself significant time if you verify that your theory works in practice before automating it to scale

    also I've taken up the practice of storing all my account details with a proxy server, so that it will always try to use the same proxy first...

    let me know if you run into any blocks getting this built, I've been writing automation tools with Python for a while, though I don't do to much with Facebook

    PS. pycurl is your friend if you want to make it run multi-threaded, the urllib library stores proxy information as a global var so unless you want to do some serious monkey patching your going to run into a wall for number of threads you can run at once without getting insta-banned