1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

XPath: Absolute vs Relative - Reliablity? Preference?

Discussion in 'General Scripting Chat' started by macdonjo3, Feb 21, 2014.

  1. macdonjo3

    macdonjo3 Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 8, 2009
    Messages:
    5,562
    Likes Received:
    4,317
    Location:
    Toronto
    Home Page:
    When you guys use XPath in your bots, do you use the absolute or relative xpath? Why?

    I use FirePath to grab the path, and absolute xpath seems to use just HTML tags, while relative uses HTML tags along with IDs.

    Any thoughts or preferences? Personally, I've noticed relative paths to be less reliable than absolute as the IDs can change.
     
  2. YouFeelMeDawg?

    YouFeelMeDawg? BANNED BANNED

    Joined:
    Aug 10, 2011
    Messages:
    266
    Likes Received:
    371
    Its actually the other way around, absolute is way less reliable than relative. Your statement is incorrect.

    If the id's change , or attributes w/e your searching by, in an absolute xpath then your xpath will be broken.On top of that, relative xpath's can be so freaaaaaaaaaaking long!

    With relative xpath, I just need to think in terms of groups where it starts and where it ends.

    So say if I was parsing a table, I would look for the //table[@id='something']
    Then take the result from that, turn it into html, and look for the rows //tr
    then look for the colums //td .

    I don't need to know the whole long ass xpath for absolute just the relative.

    This also works great when looking for specific values on for my bots like:
    Divs, and spans that are nested with each other. li tags,ul tags, and a whole other array of tags. It makes things so much easier when I just have to think in terms of where the pattern starts and ends, that is why I go for relative xpath all the time and I skip the absolute xpath.


    Plus,its so much easier using xpath + regular expressions .However, I understand that as a newbie the first thing you are most likely to do is to look for values by using StringSplit, but eventually that becomes buggy so you move to bigger ,better, and more efficient things.
     
  3. macdonjo3

    macdonjo3 Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 8, 2009
    Messages:
    5,562
    Likes Received:
    4,317
    Location:
    Toronto
    Home Page:
    Really? My absolute XPaths don't have IDs or attributes... And my absolutely paths are the long ones.

    My relative paths are short with IDs like: .//*[@id='splash-panel']

    My absolute path equivalent is long like: html/body/div[3]/div[2]/div/div/div[6]/div[3]/div[2]/div[1]/div[3]/div[2]/div[1]/div[3]/div[2]/div[1]
     
  4. YouFeelMeDawg?

    YouFeelMeDawg? BANNED BANNED

    Joined:
    Aug 10, 2011
    Messages:
    266
    Likes Received:
    371
    OOOPs, I meant to say "On top of that, absolute xpath's can be so long".

    Sorry about that is a typo. But look at it this way, once you discover xpath and you get better at regular expressions, your skills at parsing html increases so much I mean a lot. You are able to make parsing much easier to read and generally can code it much faster.

    I code in python lxml , so am always using xpath.
     
  5. macdonjo3

    macdonjo3 Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 8, 2009
    Messages:
    5,562
    Likes Received:
    4,317
    Location:
    Toronto
    Home Page:
    I'm using Selenium's GhostDriver in Python right now for the headerless browser.

    Well, I think sometimes FirePath gives me relative XPaths that end up being a duplicate (having multiple results in the HTML). Is it supposed to? That is one of the reasons why I thought absolute XPath could be better.
     
  6. jamb0ss

    jamb0ss Junior Member

    Joined:
    Feb 9, 2012
    Messages:
    125
    Likes Received:
    45
    Occupation:
    Bots programming
    do not ever use an absolute xpath. never.
    any element can be found with a correct relative xpath.

    for example, we have HTML code like that:
    Code:
    <div>
    
    <div id="donut">
    yummy!
    </div>
    
    <div class="trash">
    my government 
    </div>
    
    </div>
    
    and we use an absolute xpath "/div/div[1]/text()" to get "yummy!"

    but one day invisible force decides to add one more <div> section:
    Code:
    <div>
    
    <div id="lol">
    o rly?
    </div>
    
    <div id="donut">
    yummy!
    </div>
    
    <div class="trash">
    my government 
    </div>
    
    </div>
    
    in this case our old xpath returns "o rly?"
    this is not what we expect to get, am I right?
    be smart, use a relative xpath "//div[@id='donut']/text()"

    I have five years of experience in web-scraping, so I know what I'm talking about.
     
  7. macdonjo3

    macdonjo3 Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 8, 2009
    Messages:
    5,562
    Likes Received:
    4,317
    Location:
    Toronto
    Home Page:
    Right but what if the website changes the ID often: "//div[@id='donut']/text()"
     
  8. jamb0ss

    jamb0ss Junior Member

    Joined:
    Feb 9, 2012
    Messages:
    125
    Likes Received:
    45
    Occupation:
    Bots programming
    the probability of this event is much less
     
  9. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    950
    Likes Received:
    662
    Occupation:
    Web/Bot Developer
    I prefer CSS selectors over xPath. So much simpler.
     
  10. vuarnet

    vuarnet Newbie

    Joined:
    May 29, 2013
    Messages:
    14
    Likes Received:
    0
    when a website changes you must change... but it won't *always* break your xpath expression... just sometimes. and i must echo what others have said -- relative is much more reliable for me than absolute. relative paths are much more change tolerant.
     
  11. lorien

    lorien Newbie

    Joined:
    Aug 20, 2010
    Messages:
    19
    Likes Received:
    5
    Occupation:
    data mining, SEO
    Location:
    Russia
    Home Page:
    I never use absolute XPATH. It is quite unreadable and, I think, it is more vulnerable to changes in the document than relative XPATH.

    > I prefer CSS selectors over xPath. So much simpler.

    CSS selectors are not so powerful as XPATH selectors. Sometimes only XPATH power allows you to write complex selector. When I faced with need to mix CSS and XPATH I decided to do not use CSS at all. I think mixing CSS and XPATH in one script is ugly and I use only XPATH selectors.