Google+

329. Parse the HTML Source using its hierarchy and find XPath path







What is Absolute path ?

Absolute means full path. It starts from the root i.e. '/' till the desired html element is reached.

Example:  /html/body/p[2]

What is Relative path ?

Relative path means shortcut path. It starts with '//'. It goes directly to the element instead of parsing or searching from the root.

Example:  //p[@id='para2']

Lets Implement the Absolute and Relative  XPath:

1. Suppose if we want to locate the 'Another paragraph of text'  text on the http://compendiumdev.co.uk/selenium/basic_web_page.html page using Xpath path, first we've to find the Xpath for the text:

2. View the page source of this page as shown below:


3. By looking at the hierarchy of the above HTML Source, lets create absolute XPath path for locating the 'Another paragraph on the page' text.
4. Since we are creating an absolute path, the path should start with '/' (i.e. Root)
5. As per hierarchy <html> tag comes after the root, hence /html
6. There are two child tags for <html> tag, i.e. <head> and <body>. But we need to locate 'Another paragraph on the page' text which is inside the <p> tag. There are two <p> tags and both of them are under the <body> tag. So /html/body
7. As <p> tags are children to the <body> tag, we can write the absolute XPath path for identifying both the paragraphs as /html/head/p

So from this Process we've found that /html/head/p is the absolute XPath path for locating the two paragraph text elements on the page. But lets verify whether we can locate them using this Absolute path by following the below steps:

1. Launch Firefox Browser
2. Open http://compendiumdev.co.uk/selenium/basic_web_page.html in the Firefox Browser
3. Click on 'FireBug' option and ensure that the 'FireBug' options got displayed as shown below:


4. Click on the 'Firepath' tab and paste the created absolute XPath path i.e. /html/head/p into the 'XPath' text box as shown below:

5. Now click on the 'Eval' option and ensure whether both the paragraph texts on the page are highlighted as shown below:



But we've started with an intention to identify only one element (i.e. 'Another paragraph of text' text)  instead of identifying two (i.e. 'A paragraph of text' and 'Another paragraph of text' texts).

How to resolve this problem?

One method of solving this problem is to add indexes to the created absolute XPath path /html/body/p.

6. Add the index value to the created absolute XPath path /html/body/p as shown  below:

/html/body/p[1]


7. Click on the 'Firepath' tab and paste the created absolute XPath path i.e. /html/head/p[1] into the 'XPath' text box as shown below:



8. Now click on the 'Eval' option and ensure whether both the paragraph text 'A paragraph of text' is high lighted as shown below:



9. Now change the index value in the XPath path from 1 to 2 as shown below:

/html/body/p[2]


10. Click on the 'Firepath' tab and paste the created absolute XPath path i.e. /html/head/p[2] into the 'XPath' text box as shown below:



11. Now click on the 'Eval' option and ensure whether the paragraph text 'Another paragraph of text' is high lighted as shown below:



At this point we have to understand that the absolute path /html/body/p[2] will search for <p> html tag inside the <body> html tag only . This XPath path wont search for <p> in the other parts of  HTML Source code like <head> html tag.

But with Relative path, we are going to search for <p> tag with complete HTML Source code of the page and find out which <p> html tag matches our search. Lets implement the Relative XPath path in the next steps.

12. Before creating the path first lets type //p into the 'XPath' field and find out how many tags matched our search <p> html tag by clicking 'Eval' button as shown below:


13. View the HTML Page Source of the high lighted Paragraph text elements below:


Observe that both the <p> html tag elements of the high lighted elements have unique properties like id attributes with unique values as shown below:



14. Suppose if we have to identify only the 'A paragraph of text' paragraph text element, we've to add the following attribute to the Relative XPath path as shown below:

//p[@id='para1']

15. Click on the 'Firepath' tab and paste the created Relative XPath path i.e. //p[@id='para1'] into the 'XPath' text box as shown below:



16. Now click on the 'Eval' option and ensure whether the paragraph text 'A paragraph of text' is high lighted as shown below:



17. In the similar manner 'Another paragraph of text' paragraph text is high lighted on changing the unique id value from 'para1' to 'para2' and clicking on 'Eval' button as shown below:



As Relative XPath path with unique attributes specified is generally preferred, lets use Relative XPath path only from the next post on wards.



Please comment below to feedback or ask questions.

Different Types of Nodes in HTML will be explained in the next post.





No comments:

Followers :)

Blog Index