1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[Python] - How would you parse sponsored post from Facebook Newsfeed

Discussion in 'Other Languages' started by CPAHaus, Jan 3, 2017.

  1. CPAHaus

    CPAHaus Junior Member

    Joined:
    Oct 26, 2016
    Messages:
    115
    Likes Received:
    19
    Gender:
    Male
    Occupation:
    Affiliate / Media buyer / Pussychaser
    Location:
    Behind a Cloaker .php
    Home Page:
    Hey all o/

    So I am playing around with python + selenium and for now I am trying to pull out sponsored post.

    BUT... I can't figure out how to parse that out.
    I was thinking about searching for the "Sponsored" string in the html source but it seems thats just not best way to do it.

    Anyone has a better (working) method?

    Happy New 2017 :)
     
  2. pasdoy

    pasdoy Power Member

    Joined:
    Jul 17, 2008
    Messages:
    792
    Likes Received:
    245
    Selenium has selectors.
    • find_element_by_id
    • find_element_by_name
    • find_element_by_xpath
    • find_element_by_link_text
    • find_element_by_partial_link_text
    • find_element_by_tag_name
    • find_element_by_class_name
    • find_element_by_css_selector
    To find multiple elements (these methods will return a list):

    • find_elements_by_name
    • find_elements_by_xpath
    • find_elements_by_link_text
    • find_elements_by_partial_link_text
    • find_elements_by_tag_name
    • find_elements_by_class_name
    • find_elements_by_css_selector
    http://selenium-python.readthedocs.io/locating-elements.html

    Try with Xpath. http://www.w3schools.com/xml/xpath_syntax.asp. Don't forget your element might not be available instantly after page load. Check this for Wait For operations http://selenium-python.readthedocs.io/waits.html. This coupled with a selector should work for you.
     
  3. CPAHaus

    CPAHaus Junior Member

    Joined:
    Oct 26, 2016
    Messages:
    115
    Likes Received:
    19
    Gender:
    Male
    Occupation:
    Affiliate / Media buyer / Pussychaser
    Location:
    Behind a Cloaker .php
    Home Page:
    Thanks for the reply @pasdoy

    I've already written some small script with selenium where I can scroll down few times so the FB newsfeed can load more posts (and sponsored posts), and I've managed to pull out ahref links with BeautifulSoup but currently It's beyond me how to detect if there is some "sponsored post" or not :)

    I guess I will keep trying :D
     
  4. pasdoy

    pasdoy Power Member

    Joined:
    Jul 17, 2008
    Messages:
    792
    Likes Received:
    245
    Nice that you found your way around it. With xpath you can get a list of posts, and filter each post if the xpath to SPONSORED is not there. I wonder if that would work. To work within an element object you can do: obj.xpath('./div') where ./ says to search from this element in the tree.
     
  5. CPAHaus

    CPAHaus Junior Member

    Joined:
    Oct 26, 2016
    Messages:
    115
    Likes Received:
    19
    Gender:
    Male
    Occupation:
    Affiliate / Media buyer / Pussychaser
    Location:
    Behind a Cloaker .php
    Home Page:
    I will try it out today and let you know how it went :)
    If I manage to get out what I wanted, I will light a money candle in your name bro ;)
     
  6. Blacklistede

    Blacklistede Newbie

    Joined:
    Oct 19, 2016
    Messages:
    39
    Likes Received:
    1
    You just need to know how to identify these sponsored posts. Look up what they have in common.
    If you can't fetch them with BeautifulSoup methods, you could try using regular expressions. They will definetly be able to fetch the post, if you have the right expression of course.
     
  7. pasdoy

    pasdoy Power Member

    Joined:
    Jul 17, 2008
    Messages:
    792
    Likes Received:
    245
    try not to parse html with regex. maybe javascript if you really have to.
     
  8. Blacklistede

    Blacklistede Newbie

    Joined:
    Oct 19, 2016
    Messages:
    39
    Likes Received:
    1
    Why wouldn't you parse html with regex?
     
  9. pasdoy

    pasdoy Power Member

    Joined:
    Jul 17, 2008
    Messages:
    792
    Likes Received:
    245
  10. pasdoy

    pasdoy Power Member

    Joined:
    Jul 17, 2008
    Messages:
    792
    Likes Received:
    245
    It's not that it can't be done or must not, it's just if it can be avoided why not.
     
  11. CPAHaus

    CPAHaus Junior Member

    Joined:
    Oct 26, 2016
    Messages:
    115
    Likes Received:
    19
    Gender:
    Male
    Occupation:
    Affiliate / Media buyer / Pussychaser
    Location:
    Behind a Cloaker .php
    Home Page:
    Anyone interested into 5minutes of picking his brain about this python parsing for FB Sponsored ads?
    Thanks :)
     
  12. rolax

    rolax Registered Member

    Joined:
    Dec 26, 2016
    Messages:
    62
    Likes Received:
    9
    Any updates? Were you successfully able to identify the meaningful tags for the sponsored posts?
     
  13. prot0

    prot0 Registered Member

    Joined:
    Jul 11, 2017
    Messages:
    84
    Likes Received:
    26
    Location:
    localhost
    See if some texts inside it has some particular id and go for xpath to get the whole box, if you find some text inside it that is only in "Sponsored Post" box, you can get it with xpath by using "acenstor" keyword