I have a following HTML:
<data-my-tag>
<data-another-tag>
<p>...</p>
<data-my-tag>
<span>...</span>
</data-my-tag>
</data-another-tag>
</data-my-tag>
I use JSOUP to parse it and I would like to match all elements starting with <data-.
I only found methods to match getElementsByTag which matches by entire tag name. Also select method performs only css selector, but there seems to be no way to match data-* in JSOUP way (e.g. use XPath). Is there any way to match these tags via JSOUP.
Unfortunately, it is not possible to use XPath queries in JSOUP. The only way I figured out is following:
Document doc = Jsoup.parse(content);
Elements elements = doc.select("*");
elements.stream().filter(e -> e.nodeName().startsWith("data-")).forEach(e -> {
// do what you need with the node
});
Related
I am trying to click on News link on google search page the HTML structure looks like this
I tried following xpaths but none worked
//a/child::span[1][contains(.,'News')]
The following xpath resulted in invalid selector: The result of the xpath expression "//a/child::span/following-sibling::text()[contains(.,'News')]" is: [object Text]. It should be an element.
//a/child::span/following-sibling::text()[contains(.,'News')]
Thanks
//a[contains(.,'News')] might return this link, but may result in a list of more than one element that you'd need to handle and select the right element from.
You can use Selenium's SearchContext to specify a container element, or solve it using an xpath one-liner like: //div[#role='navigation']//a[contains(.,'News')] (Effectively searching for a link that contains 'News' somewhere in it's html-tree, somewhere inside a div that has a role attribute with value 'navigation').
You simply need
//a[contains(., "News")]
Note that "News" is not a part of span, but a, so your 1st XPath won't work
My code:
WebDriver driver = new SafariDriver();
driver.get("http://bet.hkjc.com/football/default.aspx");
WebElement matchs = driver.findElement(By.cssSelector("span.Head to Head"));
System.out.println(matchs);
driver.quit();
How can I crawl Manchester Utd and Celta Vigo?
WebElement matchs = driver.findElement(By.xpath("//a[#title='Head to Head']"));
System.out.println(matchs.getText());
Use firebug and firepath addons in firefox and inspect that element and get the xpath and put it here inside double quotes in this code :
System.out.println(driver.findElement(By.xpath("")).getText());
If you don't know how to use firebug and firepath refer this link
You can locate the element either by css selector or xpath selector
By using xpath
driver.findElement(By.xpath("//a[#title='Head to Head']"));
By using css Selector
driver.findElement(By.cssSelector("span > a[title='Head to Head']"));
OR Try somethings like this if not getting the match
driver.findElement(By.cssSelector("td.cteams.ttgR2>span>a[title='Head to Head']"));
Note : in your code you are trying like span.Head to Head in CSS selector . dot represents the class and according to your path you are locating span tag which have class name "Head to Head" which doesn't exist in your dom as this is the title of anchor tag.
Went through the Firebug and Firepath plugins of Firefox initially to get the Xpath or css path
Explore some blogs to get clear understanding, you will be able to create by yourself
Refer This link for the same
I assume all the above answers doesn't work for you and am providing another answer.
I can see both the texts are under "a" tag. So the idea is to navigate to the element and use getText() - which returns the visible text.
String word = driver.findElement(By.xpath("//span/a")).getText();
System.out.println(word);
Hope this works for you.
In all of my tests I'm using the getAttribute like this to get text and it is working fine for me on all drivers :
assertEquals(strCity, txtCity.getAttribute("value"));
I am using Jsoup and was wondering how do you get embedded tags? I can get the section tag but I am not sure how to get the div tag inside as I have a list of elements. My question is how do I fetch a div tag inside a section tag?
this will work surely
Elements elements = doc.select("section.page-content-full div.content");
Just use the query selector syntax :
Elements elems = doc.select("section.main-page-content-full>div.content");
If you want just the first element use the following :
Elements elems = doc.select("section.main-page-content-full>div.content").first();
I would like to select an element that matches a specific String
<img src='http://iblink.ch/resized/sjg63ngi3h3g4a.jpg' alt='tree'>
since I don't have a specific class or div to trigger I try to use getElementsContainingOwnText("resized")
method to get this element.
But it does not find it?
I also try: getElementsContainingText
Same output :(
Anyone have any idea?
The text is the part outside the tags: <tag attribute="value">Text</tag>
So you want to select Elements with a certain attribute value like this:
Elements els = doc.select("img[src*=resized]");
Have a look into CSS selectors as they are implemented in Jsoup.
If I have HTML that looks like:
<td class="blah">&nbs;???? </td>
Could I get the ???? value using xpath?
What would it look like?
To use XPath you usually need XML not HTML, but some parsers (e.g. the one built into PHP) have a relaxed Mode which will parse most HTML, too.
If you want to find all <a> that are direct children of <td class="blah"> the XPath you need is
//td[#class = 'blah']/a
or
//td[#class = 'blah']/a[#href = 'http://...']
(depending on whether you only want the one url or all urls)
This will give you a Set of Nodes. You'll need to iterate through it and then check for the nodeType of the firstChild (supposed to be a text node) and the number of child nodes (supposed to be 1). Then the firstChild will contain the ????
Why would you use an XML parser to parse HTML?
I would suggest using a dedicated Java HTML parser, there are many, but I haven't tried any myself.
As for your question, would it work, I suspect it will not work, you will get an error when trying to parse it as HTML right at &nbs; if not earlier.