1. This website uses cookies to improve service and provide a tailored user experience. By using this site, you agree to this use. See our Cookie Policy.
    Dismiss Notice

Facebook data scraping with python

Discussion in 'Programming' started by Nighcore2, Aug 13, 2019 at 8:36 PM.

  1. Nighcore2

    Nighcore2 Registered Member

    Joined:
    Apr 30, 2019
    Messages:
    63
    Likes Received:
    3
    Gender:
    Female
    I have been scraping for 1 month, 15 private proxys from blaizing seo 1 random in each request, one request every 4 seconds

    Code:
            def Facebook():
                codigo = None
                while codigo is None:
                    agent = Headers(
                        headers=True)
                    random_headers = agent.generate()
                    user_agent = random_headers['User-Agent']
                    d2 = collections.OrderedDict()
                    # d2.update({'Content-Length': length})
                    d2.update({'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'})
                    d2.update({'Accept-Encoding': 'gzip, deflate'})
                    d2.update({'Accept-Language': 'es-MX,es;q=0.8,en-US;q=0.5,en;q=0.3'})
                    d2.update({'Cache-Control': 'max-age=0'})
                    d2.update({'Connection': 'keep-alive'})
                    d2.update({'Host': 'www.facebook.com'})
                    d2.update({'TE': 'Trailers'})
                    d2.update({'Upgrade-Insecure-Requests': '1'})
                    d2.update({'User-Agent': user_agent})
                    try:
                        # proxies = {
                        #     'https': ez()
                        # }
                        # p = requests.get('https://www.facebook.com/pg/forocoches/posts/?ref=page_internal',headers=d2)
                        p = requests.get('https://www.facebook.com/forocoches/', headers=d2)
                        # print(p.text)
                        print(p.text)
    

    but p.text dosnt contain what i need thath is publications text from https://www.facebook.com/forocoches/, now would be
    "Código 2wrhW6jEsQ3 sin el primer dígito en forocoches.com/codigo"

    but these isnt in the html code so..
    here is the output of html


    ps: proxy lines are commented cause i wanted to test if was IP ban of all my proxies, but isnt, with my own server clean IP i dont get publications text
    http://batman.gyptis.org/zerobin/?3e05b0c45e71c8d9#4mE1KDZxF5al0qPyFSWWDfWOJM6e0Ck3J3f7Ajqyfr0=
     
    • Thanks Thanks x 1