I want to add all attributes of div in html page.
<div data-brackets-id="6217" class="layout-internal col-12 js-autosuggest__search-list-container">
Document doc = Jsoup.connect(url + text).get();
Elements info = doc.select("div[data-brackets-id]");
System.out.println(info);
but it does not works...
You can get the attribute like this:
Elements info = doc.select("div[data-brackets-id]");
// iterate the Elements
for(Element elm: info){
System.out.println(elm.attr("data-brackets-id"));
}
Related
Existing HTMl Document
Link
Like to convert it as:
Link
Using JSoup Java library many fancy parsing can be done. But not able to find clue to add attribute like above requirement. Please help.
To set an attribute have a look at the doc
String html = "<html><head><title>First parse</title></head>"
+ "<body><p>Parsed HTML into a doc.</p>Link</body></html>";
Document doc = Jsoup.parse(html);
Elements links = doc.getElementsByTag("a");
for (Element element : links) {
element.attr("onclick", "openFunction('"+element.attr("href")+"')");
element.attr("href", "#");
}
System.out.println(doc.html());
Will change :
<a href="http://google.com">
into
Link
Use Element#attr. I just used a loop but you can do it however you want.
Document doc = Jsoup.parse("Link");
for (Element e : doc.getElementsByTag("a")){
if (e.text().equals("Link")){
e.attr("onclick", "openFunction('http://google.com')");
System.out.println(e);
}
}
Output
Link
Consider an html document like this one
<div>
<p>...</p>
<p>...</p>
...
<p class="random_class_name">...</p>
...
</div>
How could we select all of the p elements, but excluding the p element with random_class_name class?
Elements ps = body.select("p:not(.random_class_name)");
You can use the pseudo selector :not
If the class name is not known, you still can use a similar expression:
Elements ps = body.select("p:not([class])");
In the second example I use the attribute selector [], in the first the normal syntax for classes.
See the Jsoup docu about css selectors
Document doc = Jsoup.parse(htmlValue);
Elements pElements = doc.select("p");
for (Element element : pElements) {
String class = element.attr("class");
if(class == null){
//.....
}else{
//.....
}
}
I am attempting to use JSoup to scrape some information off a page, which can be identified by a group of tags in a particular order. The order of them is as follows:
<span class="sold" >Sold</span></td>
<td class='prc'>
<div class="g-b bidsold" itemprop="price">
AU $1.00</div>
I am looking to grab each value that is in place of the AU $1.00 field on the page, but they can only be identified by the span class="sold" selector that occurs a few tags beforehand.
I have tried something like select("span.sold:lt(4) + [itemprop=price]") but feel like I'm flailing around in the dark!
The code below should do the trick!!!
Document doc = Jsoup.connect(/*URL of your HTML document*/").get();
Element part = doc.body();
Elements parts = part.getElementsByTag("div");
String attValue;
String requiredContent;
for(Element ent : parts)
{
if(ent.hasAttr("class"))
{
attValue = ent.attr("class");
if(attValue.equals("g-b bidsold"))
{
System.out.println("\n");
requiredContent=ent.text();
System.out.println(requiredContent);
}
}
}
Just make sure to iterate and get the output in an array.
You could also do this:
Elements soldPrices = doc.select("td:has(.sold) + td [itemprop=price]");
That will return elements (the DIVs) that have price itemprops, which have immediately preceeding TDs with elements (the SPANs) with class=sold.
See the Selector syntax for more details.
I've just stared learning Jsoup and the cookbook on their website but I'm just a bit stuck with addling text to an element I've parsed.
try{
Document doc = Jsoup.connect(url).get();
Element add = doc.prependText("a href") ;
Elements links = add.select("a[href]");
for (Element link : links) {
PrintStream sb = System.out.format("%n %s",link.attr("abs:href"));
System.out.print("<br>");
}
}
catch(Exception e){
System.out.print("error --> " + e);
}
Example run with google.com I get
http://www.google.ie/imghp?hl=en&tab=wi<br>
http://maps.google.ie/maps?hl=en&tab=wl<br>
https://play.google.com/?hl=en&tab=w8<br>
But I really want
<a href> http://www.google.ie/imghp?hl=en&tab=wi<br></a>
<a href> http://maps.google.ie/maps?hl=en&tab=wl<br></a>
<a href> https://play.google.com/?hl=en&tab=w8<br></a>
With this code I've gotten all the links off the page but I want to also get the and tags so I can them create my on webpage. I've tried adding a string and prepend text but just can't seem to get it right.
Thanks
with link.attr(...) you get the attribute value.
But you need the whole tag:
Document doc = Jsoup.connect(...).get();
for( Element e : doc.select("a[href]") ) // Select all 'a'-Tags with 'href' attribute
{
String wholeTag = e.toString(); // Get a string as the element is
/* No you you can use the html - in this example for a simple output */
System.out.println(wholeTag);
}
I am trying to extract "Know your tractor" and "Shell Petroleum Company.1955"? Bear in mind that that is just a snippet of the whole code and there are more then one H2/H3 tag. And I would like to get the data from all the H2 and H3 tags.
Heres the HTML: http://i.stack.imgur.com/Pif3B.png
The Code I have just now is:
ArrayList<String> arrayList = new ArrayList<String>();
Document doc = null;
try{
doc = Jsoup.connect("http://primo.abdn.ac.uk:1701/primo_library/libweb/action/search.do?dscnt=0&scp.scps=scope%3A%28ALL%29&frbg=&tab=default_tab&dstmp=1332103973502&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=t&vl(freeText0)=tractor&fn=search&vid=ABN_VU1").get();
Elements heading = doc.select("h2.EXLResultTitle span");
for (Element src : heading) {
String j = src.text();
System.out.println(j); //check whats going into the array
arrayList.add(j);
}
How would I extract "Know your tractor" and "Shell Petroleum Company.1955"? Thanks for your help!
Your selector only selects <span> elements which are inside <h2 class="EXLResultTitle">, while you actually need those <h2> elements themself. So, just remove span from the selector:
Elements headings = doc.select("h2.EXLResultTitle");
for (Element heading : headings) {
System.out.println(heading.text());
}
You should be able to figure the selector for <h3 class="EXLResultAuthor"> yourself based on the lesson learnt.
See also:
Jsoup cookbook - CSS selectors
Jsoup Selector API documentation