JSoup - Add onclick function to the anchor href - java

Existing HTMl Document
Link
Like to convert it as:
Link
Using JSoup Java library many fancy parsing can be done. But not able to find clue to add attribute like above requirement. Please help.

To set an attribute have a look at the doc
String html = "<html><head><title>First parse</title></head>"
+ "<body><p>Parsed HTML into a doc.</p>Link</body></html>";
Document doc = Jsoup.parse(html);
Elements links = doc.getElementsByTag("a");
for (Element element : links) {
element.attr("onclick", "openFunction('"+element.attr("href")+"')");
element.attr("href", "#");
}
System.out.println(doc.html());
Will change :
<a href="http://google.com">
into
Link

Use Element#attr. I just used a loop but you can do it however you want.
Document doc = Jsoup.parse("Link");
for (Element e : doc.getElementsByTag("a")){
if (e.text().equals("Link")){
e.attr("onclick", "openFunction('http://google.com')");
System.out.println(e);
}
}
Output
Link

Related

Use JSoup to get all textual links

I'm using JSoup to grab content from web pages.
I want to get all the links on a page that have some contained text (it doesn't matter what the text is) just needs to be non-empty/image etc.
Example of links I want:
Link to Some Page
Since it contains the text "Link to Some Page"
Links I don't want:
<img src="someimage.jpg"/>
My code looks like this. How can I modify it to only get the first type of link?
Document document = // I get my document object
Elements linksOnPage = document.select("a[href]")
for (Element page : linksOnPage) {
String link = page.attr("abs:href");
// I do stuff with the link
}
You could do something like this.
It does it's job though it's probably not the fanciest solution out there.
Note: the function text() gets you a clean text so if there are any HTML code fragements inside it, it won't return them.
Document doc = // get the doc
Elements linksOnPage = document.select("a");
for (Element pageElem : linksOnPage){
String link = "";
if(pageElem.text().trim().equals(""))
continue;
// do smth with it
}
I am using this and it's working fine:
Document document = // I get my document object
Elements linksOnPage = document.select("a:matches(([^\\s]+))");
for (Element page : linksOnPage) {
String link = page.attr("abs:href");
// I do stuff with the link
}

Android - how to parse html by jsoup and fill into the arraylist?

I want to read the date from this HTML link:
http://jadvalbaz.blog.ir/post/%D8%B1%D8%A7%D9%87%D9%86%D9%85%D8%A7%DB%8C-%D8%AD%D9%84-%D8%AC%D8%AF%D9%88%D9%84-%D8%AD%D8%B1%D9%81-%D8%B0
if you look at the view-source
ذات اریه (پنومونی- سینه پهلو)ذر (مورچه ریز)ذرع (مقیاس طول)ذره ای بنیادی از رده هیبرونها که بار الکتریکی ندارد (لاندا)ذره منفی اتم (الکترون)ذریه (نسل)ذل (خواری)ذم (نکوهش)ذهاب (رفتن)ذی (صاحب)
my words are separated by <.br>, I want to read each word to ArrayList, I means how to omit the <.br> and read the words.
here is my code:
Document document = Jsoup.connect(url).get();
for (Element span : document.select("?").select("?")) {
title = span.toString();
name.add(title);
}
How to read them, what to put instead of question mark.
any suggestion?
edit the css of your template and define a class for your words then Use the Element.select(String selector) and Elements.select(String selector) method.
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Element masthead = doc.select("p.words").first(); // p with class=words
follow below link for more information about extracting data with this methods:
Use selector-syntax to find elements

Grabbing information from an html file

OK, I am trying to grab the data-title and href and assigning them to variables in java.
<tr class="pl-video yt-uix-tile " data-video-id="MBBWVgE0ewk" data-set-video-id="" data-title="Windows Command Line Tutorial - 1 - Introduction to the Command Prompt"><td class="pl-video-handle "></td><td class="pl-video-index"></td><td class="pl-video-thumbnail"><span class="pl-video-thumb ux-thumb-wrap contains-addto"><a href="/watch?v=MBBWVgE0ewk&index=1&list=PL6gx4Cwl9DGDV6SnbINlVUd0o2xT4JbMu"
If you don't mind including a dependency, there is a good library for this kind of things called jsoup.
String html = ...
Document doc = Jsoup.parse(html);
Element tr = doc.select("tr").first();
Element link = tr.select("a").first();
String dataTitle = tr.attr("data-title");
String href = link.attr("href");

adding text before and after a link jSoup

I've just stared learning Jsoup and the cookbook on their website but I'm just a bit stuck with addling text to an element I've parsed.
try{
Document doc = Jsoup.connect(url).get();
Element add = doc.prependText("a href") ;
Elements links = add.select("a[href]");
for (Element link : links) {
PrintStream sb = System.out.format("%n %s",link.attr("abs:href"));
System.out.print("<br>");
}
}
catch(Exception e){
System.out.print("error --> " + e);
}
Example run with google.com I get
http://www.google.ie/imghp?hl=en&tab=wi<br>
http://maps.google.ie/maps?hl=en&tab=wl<br>
https://play.google.com/?hl=en&tab=w8<br>
But I really want
<a href> http://www.google.ie/imghp?hl=en&tab=wi<br></a>
<a href> http://maps.google.ie/maps?hl=en&tab=wl<br></a>
<a href> https://play.google.com/?hl=en&tab=w8<br></a>
With this code I've gotten all the links off the page but I want to also get the and tags so I can them create my on webpage. I've tried adding a string and prepend text but just can't seem to get it right.
Thanks
with link.attr(...) you get the attribute value.
But you need the whole tag:
Document doc = Jsoup.connect(...).get();
for( Element e : doc.select("a[href]") ) // Select all 'a'-Tags with 'href' attribute
{
String wholeTag = e.toString(); // Get a string as the element is
/* No you you can use the html - in this example for a simple output */
System.out.println(wholeTag);
}

How do I parse this HTML with Jsoup

I am trying to extract "Know your tractor" and "Shell Petroleum Company.1955"? Bear in mind that that is just a snippet of the whole code and there are more then one H2/H3 tag. And I would like to get the data from all the H2 and H3 tags.
Heres the HTML: http://i.stack.imgur.com/Pif3B.png
The Code I have just now is:
ArrayList<String> arrayList = new ArrayList<String>();
Document doc = null;
try{
doc = Jsoup.connect("http://primo.abdn.ac.uk:1701/primo_library/libweb/action/search.do?dscnt=0&scp.scps=scope%3A%28ALL%29&frbg=&tab=default_tab&dstmp=1332103973502&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=t&vl(freeText0)=tractor&fn=search&vid=ABN_VU1").get();
Elements heading = doc.select("h2.EXLResultTitle span");
for (Element src : heading) {
String j = src.text();
System.out.println(j); //check whats going into the array
arrayList.add(j);
}
How would I extract "Know your tractor" and "Shell Petroleum Company.1955"? Thanks for your help!
Your selector only selects <span> elements which are inside <h2 class="EXLResultTitle">, while you actually need those <h2> elements themself. So, just remove span from the selector:
Elements headings = doc.select("h2.EXLResultTitle");
for (Element heading : headings) {
System.out.println(heading.text());
}
You should be able to figure the selector for <h3 class="EXLResultAuthor"> yourself based on the lesson learnt.
See also:
Jsoup cookbook - CSS selectors
Jsoup Selector API documentation

Categories

Resources