Jsoup Grab embedded tags - java

I am using Jsoup and was wondering how do you get embedded tags? I can get the section tag but I am not sure how to get the div tag inside as I have a list of elements. My question is how do I fetch a div tag inside a section tag?

this will work surely
Elements elements = doc.select("section.page-content-full div.content");

Just use the query selector syntax :
Elements elems = doc.select("section.main-page-content-full>div.content");
If you want just the first element use the following :
Elements elems = doc.select("section.main-page-content-full>div.content").first();

Related

I would like to work on a efficient way to find a list of elements in a page having id attribute ( can have any value ) using selenium

I would like to work on a efficient way to find a list of elements in a page having id attribute ( can have any value ) using selenium.
I can do that with for loop using and getAttribute("id") != null on each element of a list of webElements find using driver.findElements(By.xpath("//body//*"); But this not time efficient.
If any one can suggest an efficient way, it would be helpful. I am using Java-Selenium.
To get a list of all the elements on the web page having an "id" attribute you can do the following:
List<WebElement> list = driver.findElements(By.xpath("//*[#id]");
Since XPath to locate element having "id" attribute is simply //*[#id]
for all the elements with id attributes we can determine them like below :
xpath :
//*[#id]
and store them in a list like this :
List<WebElement> allIDsAttribute = driver.findElements(By.xpath("//*[#id]");
and then iterate this list to get individual web elements like below :
for(WebElement ids : allIDsAttribute){
System.out.println(ids.getAttribute("id"));
}

JSOUP - find elements starting with

I have a following HTML:
<data-my-tag>
<data-another-tag>
<p>...</p>
<data-my-tag>
<span>...</span>
</data-my-tag>
</data-another-tag>
</data-my-tag>
I use JSOUP to parse it and I would like to match all elements starting with <data-.
I only found methods to match getElementsByTag which matches by entire tag name. Also select method performs only css selector, but there seems to be no way to match data-* in JSOUP way (e.g. use XPath). Is there any way to match these tags via JSOUP.
Unfortunately, it is not possible to use XPath queries in JSOUP. The only way I figured out is following:
Document doc = Jsoup.parse(content);
Elements elements = doc.select("*");
elements.stream().filter(e -> e.nodeName().startsWith("data-")).forEach(e -> {
// do what you need with the node
});

How to compare children of an element in a DOM with jsoup

I am working on a project where I have to be able to know that an element have repeated children .For example in that DOM, I want to know that the element tbody has similar children
My goal is to extract data- and store it in a database -from pages that I ignore their structure.
Use Jquery to get your td elements and iterate with each over them.
you can use JSOUP for this. its very easy to use as well
for example you want to get all td tag in within your document:
String html=... //your html string
Document doc = JSoup.parse(html);
Elements elements = doc.select("tbody").select("td");
System.out.println(elements.size()); //prints number of td within tbody REGARDLESS of where in the DOM tree they live.
Edit1:
to get all elements you can do:
for(Element e : doc.getAllElements){
System.out.println(e.getTagName());//prints the tag name
}

Jsoup select and iterate all elements

I will connect to a url through jsoup and get all the contents of it but the thing is if I select like,
doc.select("body")
its returning a single element but I want to get all the elements in the page and iterate them one by one for example,
<html>
<head><title>Test</title></head>
<body>
<p>Hello All</p>
Second Page
<div>Test</div>
</body>
</html>
If I select using body I am getting the result in a single line like,
Test Hello All Second Page Test
Instead I want to select all elements and iterate one by one and produce the results like,
Test
Hello All
Second Page
Test
Will that be possible using jsoup?
Thanks,
Karthik
You can select all elements of the document using * selector and then get text of each individually using Element#ownText().
Elements elements = document.body().select("*");
for (Element element : elements) {
System.out.println(element.ownText());
}
To get all of the elements within the body of the document using jsoup library.
doc.body().children().select("*");
To get just the first level of elements in the documents body elements.
doc.body().children();
You can use XPath or any library which contain XPath
the expression is //text()
Test the expression with your xml here

get elements from html parser

I'm using JSOUP, and trying to get the elements which start with a particular div tag id. For example:
<div id="test123">.
I need to check if the elements starts with the string "test" and get all the elements.
I looked at http://jsoup.org/cookbook/extracting-data/selector-syntax and I tried a multiple variations using:
doc.select("div:matches(test(*))");
But it still didn't work. Any help would be much appreciated.
Use the attribute-starts-with selector [attr^=value].
Elements elements = doc.select("div[id^=test]");
// ...
This will return all <div> elements with an id attribute starting with test.

Categories

Resources