I Have a html file like below
<div id ="test"> <u>s</u> </div>
I want to modify like this using java
<div id ="test"> <b>Test<b> </div>
is it possible in jsoup ?
it is posible:
Element el = doc.select("div#test").first();
for (Element elC : el.children()) {
elC.remove();
}
Element nel = el.appendElement("b");
nel.text("Test");
Related
I'm trying to ignore an item and not parse it on Jsoup
But css selector "not", not working !!
I don't understand what is wrong ??
my code:
MangaList list = new MangaList();
Document document = getPage("https://3asq.org/");
MangaInfo manga;
for (Element o : document.select("div.page-item-detail:not(.item-thumb#manga-item-5520)")) {
manga = new MangaInfo();
manga.name = o.select("h3").first().select("a").last().text();
manga.path = o.select("a").first().attr("href");
try {
manga.preview = o.select("img").first().attr("src");
} catch (Exception e) {
manga.preview = "";
}
list.add(manga);
}
return list;
html code:
<div class="col-12 col-md-6 badge-pos-1">
<div class="page-item-detail manga">
<div id="manga-item-5520" class="item-thumb hover-details c-image-hover" data-post-id="5520">
<a href="https://3asq.org/manga/gosu/" title="Gosu">
<img width="110" height="150" src="https://3asq.org/wp-content/uploads/2020/03/IMG_4497-110x150.jpg" srcset="https://3asq.org/wp-content/uploads/2020/03/IMG_4497-110x150.jpg 110w, https://3asq.org/wp-content/uploads/2020/03/IMG_4497-175x238.jpg 175w" sizes="(max-width: 110px) 100vw, 110px" class="img-responsive" style="" alt="IMG_4497"/> </a>
</div>
<div class="item-summary">
<div class="post-title font-title">
<h3 class="h5">
<span class="manga-title-badges custom noal-manga">Noal-Manga</span> Gosu
</h3>
If I debug your code and extract the HTML for:
System.out.println(document.select("div.page-item-detail").get(0)) (hint use the expression evaluator in IntelliJ IDEA (Alt+F8 - for in-session, real-time debugging)
I get:
<div class="page-item-detail manga">
<div id="manga-item-2003" class="item-thumb hover-details c-image-hover" data-post-id="2003">
<a href="http...
...
</div>
</div>
</div>
It looks like you want to extract the next div tag down with class containing item-thumb ... but only if the id isn't manga-item-5520.
So here's what I did to remove that one item
document.select("div.page-item-detail div[class*=item-thumb][id!=manga-item-5520]")
Result size: 19
With the element included:
document.select("div.page-item-detail div[class*=item-thumb]")
Result size: 20
You can also try the following if you want to remain based at the outer div tag rather than the inner div tag.
document.select("div.page-item-detail:has(div[class*=item-thumb][id!=manga-item-5520])")
I'm trying to parse data from HTML.I need to get the all names from inner div class=vacancy-item which has different idnames.
Below please See the HTML code
<section class="home-vacancies" id="vacancy_wrapper">
<div class="home-block-title">job openings</div>
<div class="vacancy-filter">
...................
</div>
<div class="vacancy-wrapper">
<div class="vacancy-item" data-id="9120">
..............
</div>
<div class="vacancy-item" data-id="9119">
..................
</div>
<div class="vacancy-item" data-id="9118">
................................
</div>
<div class="vacancy-item" data-id="9117">
.............................
</div>
Here is my code:
Please help.
doc = Jsoup.connect("URL").get();
//title = doc.select(".page-content div:eq(3)");
title = doc.getElementsByClass("div[class=vacancy-wrapper]");
titleList.clear();
for (Element titles : title) {
String text = titles.getElementsB("vacancy-item").text();
titleList.add(text);
}
Thanks!
You can only query for a class attribute with getElementByClass, e.g. getElementByClass("vacancy-wrapper") would work.
You will also need a second loop to get each vacancy-items text as a separate element:
Elements title = doc.getElementsByClass("vacancy-wrapper");
for (Element titles : title) {
Elements items = titles.getElementsByClass("vacancy-item");
for (Element item : items) {
String text = item.text();
// process text
}
}
An other option would be to use Jsoup's select method:
Elements es = doc.select("div.vacancy-wrapper div.vacancy-item");
for (Element vi : es) {
String text = vi.text());
// process text
}
This would select all div elements with a class attribute vacancy-item that are under a div with a class attribute vacancy-wrapper.
Using Jsoup:
Element movie_div = doc.select("div.movie").first();
I got a such HTML-code:
<div class="movie">
<div>
<div>
<strong>Year:</strong> 2014
</div>
<div>
<strong>Country:</strong> USA
</div>
</div>
</div>
How can I use jsoup to extract the country and the year?
For the example html I want the extracted values to be "2014" and "USA".
Thanks.
Use
Element e = doc.select("div.movie").first().child(0);
List<TextNode> textNodes = e.child(0).textNodes();
String year = textNodes.get(textNodes.size()-1).text().trim();
textNodes = e.child(1).textNodes();
String country = textNodes.get(textNodes.size()-1).text().trim();
Did you try something like:
Element movie_div = doc.select("div.movie strong").first();
And to get the text value you should try;
movie_div.text();
If i have code like this
Elements e = d.select("div[id=result_52]");
System.out.println("elemeeeeeee" + e);
the output of e is as below
elemeeeeeee<imagebox id="result_52" class="rsltGrid prod celwidget" name="B00BF9MZ44">
<div class="linePlaceholder"></div>
<div class="image imageContainer">
<a href="http://www.abcdefg.com/VIZIO-E241i-A1-24-Inch-1080p-tilted/dp/B00BF9MZ44/ref=lp_6459736011_1_53/190-4904523-2326018?s=tv&ie=UTF8&qid=1405599829&sr=1-53">
<div class="imageBox">
<img src="http://ecx.images-abcdefg.com/images/I/51PhLnnk7NL._AA160_.jpg" class="productImage cfMarker" alt="Product Details" />
</div>
I want both URL which is coming inside
You can use # to get the specific id.
Element e = d.select("div#result_52").get(0);
String firstURL = e.select("a").attr("href"); //select the `a` tag and the `href` attribute.
String secondURL = e.select("img").attr("src");
I'm trying to parse HTML string using htmlparser library.
The html is like this:
<body>
<div class="Level1">
<div class="row">
<div class="txt">
Date of analysis:
</div><div class="content">
02/03/11
</div>
</div>
</div><div class="Level1">
<div class="row">
<div class="txt">
Site:
</div><div class="content">
13.0E
</div>
</div>
</div><div class="Level1">
<div class="row">
<div class="txt">
Network type:
</div><div class="content">
DVB-S
</div>
</div>
</div>
</body>
I need to extract "content" information for a given "txt". I have made a filter that returns the divs with class= "level1", but I don't know how to make a filter with the content of the div, I mean in case the value of txt is Site: then read content like 13.0E.
NodeList nl = parser.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("div"), new HasAttributeFilter("class", "Level1")));
Can someone help me with this issue?? how to read a div inside a div?
Thanks!!
NodeList nl = parser.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("div"), new HasAttributeFilter("class", "Level1")));
better to do it like this:
NodeList nl = parser.parse(null); // you can also filter here
NodeList divs = nl.extractAllNodesThatMatch(
new AndFilter(new TagNameFilter("DIV"),
new HasAttributeFilter("class", "txt")));
if( divs.size() > 0 ) {
Tag div = divs.elementAt(0);
String text = div.getText(); // this is the text of the div
}