get elements from html parser - java

I'm using JSOUP, and trying to get the elements which start with a particular div tag id. For example:
<div id="test123">.
I need to check if the elements starts with the string "test" and get all the elements.
I looked at http://jsoup.org/cookbook/extracting-data/selector-syntax and I tried a multiple variations using:
doc.select("div:matches(test(*))");
But it still didn't work. Any help would be much appreciated.

Use the attribute-starts-with selector [attr^=value].
Elements elements = doc.select("div[id^=test]");
// ...
This will return all <div> elements with an id attribute starting with test.

Related

Selenium - WebDriver - Java: Why am I getting empty strings from my elements?

I have a webpage where some elements get set on page load. I then need to read some of these elements using Selenium. The problem is that when I read them, all I get is an empty string.
This is a partial image of my website:
I'm trying to obtain the Number field.
Here is what the DOM looks like for this element:
<input style="; ; " class="form-control disabled " id="sys_readonly.u_po_coordination.number" value="POI1356285" ng-non-bindable="" readonly="readonly" aria-label="Number">
Here is my JAVA code to get that value:
String num = driver.findElement(By.id("sys_readonly.u_po_coordination.number")).getText();
I've tried using XPath as well. I've also tried getting other elements as well.
The result is ALWAYS the same: a blank string...
What am I missing? The elements exist because I can set various elements, but reading ones that should be set are always blank.
Any help would be greatly appreciated.
Thanks!
driver.findElement(By.id("sys_readonly.u_po_coordination.number")).getAttribute("value");
Found this website which was pretty useful:
https://www.lambdatest.com/blog/how-to-get-text-of-an-element-in-selenium/
Instead of getText() can you try getValue()?

Java - jsoup get element with specific string

I would like to select an element that matches a specific String
<img src='http://iblink.ch/resized/sjg63ngi3h3g4a.jpg' alt='tree'>
since I don't have a specific class or div to trigger I try to use getElementsContainingOwnText("resized")
method to get this element.
But it does not find it?
I also try: getElementsContainingText
Same output :(
Anyone have any idea?
The text is the part outside the tags: <tag attribute="value">Text</tag>
So you want to select Elements with a certain attribute value like this:
Elements els = doc.select("img[src*=resized]");
Have a look into CSS selectors as they are implemented in Jsoup.

How to get the specific information from an element in JSOUP?

I am trying to parse an html file using jsoup. Here is my code:
Document doc;
doc = Jsoup.connect("http://www.marketimyilmazlar.com/index.php?route=product/product&path=64_80&product_id=14102").get();
Elements elements = doc.getElementsByClass("price");
Then, when i look at the elements variable, its content is like the following:
<div class="price">
2.75 TL
<span class="kdv">KDV Dahil</span>
<br />
</div>7
Here, what i want to do is that, I want to get the value "2.75TL". I thought of using elements.get(int index) method, but do not know how to use index variable. Can anyone help me with this?
Thanks
You can use ownText method, e.g.
Elements elements = doc.getElementsByClass("price");
System.out.println(elements.get(0).ownText()); // 2.75 TL
Quite simple, you need to get the text nodes out of the element, and then take the first of it, so the solution is something like:
element.textNodes().get(0);

Jsoup Select method returns null

I am trying to get the rating of each movie but I cant seem to use the select method in the right way. I am trying to get the 7.0 part from the webpage:
http://www.imdb.com/title/tt0800369/
<div class="star-box giga-star">
<div class="titlePageSprite star-box-giga-star"> 7.0 </div>
I am using this line in java:
Element rating = doc.select("star-box giga-star").first();
System.out.println(rating);
Thanks in advance!
You can select an element by its class using .star-box-giga-star, and use text() to get the textual content of the element.
doc.select(".star-box-giga-star").text();
Problem with your selector is that you are using ancestor child selector instead of .class or element.class like div.star-box. Notice that to use multiple class you need to use element.class1.class2 or just .class1.class2 if you don't want to specify element.
Also if you want to specify parent child relationship you will have to use > so try maybe something like
Document doc = Jsoup.connect("http://www.imdb.com/title/tt0800369/").get();
Element rating = doc
.select("div.star-box.giga-star > div.titlePageSprite.star-box-giga-star")
.first();
System.out.println(rating);
Unfortunately this will print
<div class="titlePageSprite star-box-giga-star">
7.0
</div>
so if you want to get only text contend from that element use System.out.println(rating.text());
BTW since there is only one element with class star-box-giga-star you can just use
String rating = doc.select(".star-box-giga-star").text();
as shown in Alex answer

Jsoup select and iterate all elements

I will connect to a url through jsoup and get all the contents of it but the thing is if I select like,
doc.select("body")
its returning a single element but I want to get all the elements in the page and iterate them one by one for example,
<html>
<head><title>Test</title></head>
<body>
<p>Hello All</p>
Second Page
<div>Test</div>
</body>
</html>
If I select using body I am getting the result in a single line like,
Test Hello All Second Page Test
Instead I want to select all elements and iterate one by one and produce the results like,
Test
Hello All
Second Page
Test
Will that be possible using jsoup?
Thanks,
Karthik
You can select all elements of the document using * selector and then get text of each individually using Element#ownText().
Elements elements = document.body().select("*");
for (Element element : elements) {
System.out.println(element.ownText());
}
To get all of the elements within the body of the document using jsoup library.
doc.body().children().select("*");
To get just the first level of elements in the documents body elements.
doc.body().children();
You can use XPath or any library which contain XPath
the expression is //text()
Test the expression with your xml here

Categories

Resources