XPath: Absolute vs Relative - Reliablity? Preference?

macdonjo3 · Feb 21, 2014

When you guys use XPath in your bots, do you use the absolute or relative xpath? Why?

I use FirePath to grab the path, and absolute xpath seems to use just HTML tags, while relative uses HTML tags along with IDs.

Any thoughts or preferences? Personally, I've noticed relative paths to be less reliable than absolute as the IDs can change.

YouFeelMeDawg? · Feb 21, 2014

macdonjo3 said:
Any thoughts or preferences? Personally, I've noticed relative paths to be less reliable than absolute as the IDs can change.

Its actually the other way around, absolute is way less reliable than relative. Your statement is incorrect.

If the id's change , or attributes w/e your searching by, in an absolute xpath then your xpath will be broken.On top of that, relative xpath's can be so freaaaaaaaaaaking long!

With relative xpath, I just need to think in terms of groups where it starts and where it ends.

So say if I was parsing a table, I would look for the //table[@id='something']
Then take the result from that, turn it into html, and look for the rows //tr
then look for the colums //td .

I don't need to know the whole long ass xpath for absolute just the relative.

This also works great when looking for specific values on for my bots like:
Divs, and spans that are nested with each other. li tags,ul tags, and a whole other array of tags. It makes things so much easier when I just have to think in terms of where the pattern starts and ends, that is why I go for relative xpath all the time and I skip the absolute xpath.

Plus,its so much easier using xpath + regular expressions .However, I understand that as a newbie the first thing you are most likely to do is to look for values by using StringSplit, but eventually that becomes buggy so you move to bigger ,better, and more efficient things.

macdonjo3 · Feb 21, 2014

YouFeelMeDawg? said:
If the id's change , or attributes w/e your searching by, in an absolute xpath then your xpath will be broken.On top of that, relative xpath's can be so freaaaaaaaaaaking long!

Really? My absolute XPaths don't have IDs or attributes... And my absolutely paths are the long ones.

My relative paths are short with IDs like: .//*[@id='splash-panel']

My absolute path equivalent is long like: html/body/div[3]/div[2]/div/div/div[6]/div[3]/div[2]/div[1]/div[3]/div[2]/div[1]/div[3]/div[2]/div[1]

YouFeelMeDawg? · Feb 21, 2014

macdonjo3 said:
Really? My absolute XPaths don't have IDs or attributes... And my absolutely paths are the long ones.

My relative paths are short with IDs like: .//*[@id='splash-panel']

My absolute path equivalent is long like: html/body/div[3]/div[2]/div/div/div[6]/div[3]/div[2]/div[1]/div[3]/div[2]/div[1]/div[3]/div[2]/div[1]

OOOPs, I meant to say "On top of that, absolute xpath's can be so long".

Sorry about that is a typo. But look at it this way, once you discover xpath and you get better at regular expressions, your skills at parsing html increases so much I mean a lot. You are able to make parsing much easier to read and generally can code it much faster.

I code in python lxml , so am always using xpath.

macdonjo3 · Feb 21, 2014

I'm using Selenium's GhostDriver in Python right now for the headerless browser.

Well, I think sometimes FirePath gives me relative XPaths that end up being a duplicate (having multiple results in the HTML). Is it supposed to? That is one of the reasons why I thought absolute XPath could be better.

jamb0ss · Feb 27, 2014

do not ever use an absolute xpath. never.
any element can be found with a correct relative xpath.

for example, we have HTML code like that:

Code:

<div>

<div id="donut">
yummy!
</div>

<div class="trash">
my government 
</div>

</div>

and we use an absolute xpath "/div/div[1]/text()" to get "yummy!"

but one day invisible force decides to add one more <div> section:

Code:

<div>

<div id="lol">
o rly?
</div>

<div id="donut">
yummy!
</div>

<div class="trash">
my government 
</div>

</div>

in this case our old xpath returns "o rly?"
this is not what we expect to get, am I right?
be smart, use a relative xpath "//div[@id='donut']/text()"

I have five years of experience in web-scraping, so I know what I'm talking about.

macdonjo3 · Feb 27, 2014

Right but what if the website changes the ID often: "//div[@id='donut']/text()"

jamb0ss · Feb 27, 2014

the probability of this event is much less

MrBlue · Mar 28, 2014

I prefer CSS selectors over xPath. So much simpler.

vuarnet · Apr 15, 2014

when a website changes you must change... but it won't *always* break your xpath expression... just sometimes. and i must echo what others have said -- relative is much more reliable for me than absolute. relative paths are much more change tolerant.

lorien · May 9, 2014

I never use absolute XPATH. It is quite unreadable and, I think, it is more vulnerable to changes in the document than relative XPATH.

> I prefer CSS selectors over xPath. So much simpler.

CSS selectors are not so powerful as XPATH selectors. Sometimes only XPATH power allows you to write complex selector. When I faced with need to mix CSS and XPATH I decided to do not use CSS at all. I think mixing CSS and XPATH in one script is ugly and I use only XPATH selectors.

XPath: Absolute vs Relative - Reliablity? Preference?

macdonjo3

Elite Member

YouFeelMeDawg?

BANNED

macdonjo3

Elite Member

YouFeelMeDawg?

BANNED

macdonjo3

Elite Member

jamb0ss

Junior Member

macdonjo3

Elite Member

jamb0ss

Junior Member

MrBlue

Senior Member

vuarnet

Newbie

lorien

Newbie

Main Menu

Marketplace

Making Money

BlackHat World