I am a beginner at jsoup, and I would like to get the src of the image in this code:
<div class="detail-info-cover">
<img class="detail-info-cover-img" src="http://fmcdn.mfcdn.net/store/manga/33647/cover.jpg? token=eab4a510fcd567ead4d0d902a967be55576be642&ttl=1592125200&v=1591085412" alt="Ghost Writer (MIKAGE Natsu)"> </div>
If you run it you will see the image I want to get.
Do it as follows:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class Main {
public static void main(String[] args){
String html = "<div class=\"detail-info-cover\"> \n"
+ "<img class=\"detail-info-cover-img\" src=\"http://fmcdn.mfcdn.net/store/manga/33647/cover.jpg? token=eab4a510fcd567ead4d0d902a967be55576be642&ttl=1592125200&v=1591085412\" alt=\"Ghost Writer (MIKAGE Natsu)\"> </div>";
Document doc = Jsoup.parse(html);
Element image = doc.select("img").first();
String imageUrl = image.absUrl("src");
System.out.println(imageUrl);
}
}
Output:
http://fmcdn.mfcdn.net/store/manga/33647/cover.jpg? token=eab4a510fcd567ead4d0d902a967be55576be642&ttl=1592125200&v=1591085412
Related
My code returns all the links on a webpage, but I would like to get the first link when I google search something for example "android". How do I do that?
Document doc = Jsoup.connect(sharedURL).get();
String title = doc.title();
Elements links = doc.select("a[href]");
stringBuilder.append(title).append("\n");
for (Element link : links) {
stringBuilder.append("\n").append(" ").append(link.text()).append(" ").append(link.attr("href")).append("\n");
}
Here ids my code
Elements#first and Node#absUrl
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Node;
import org.jsoup.select.Elements;
public class Main {
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect("https://en.wikipedia.org/wiki/Wikipedia").get();
Elements links = doc.select("a[href]");
Node node = links.first();
System.out.println(node.absUrl("href"));
}
}
Output:
https://en.wikipedia.org/wiki/Wikipedia:Protection_policy#semi
I'm converting the xpath to Jsoup
below is my xpath (which is used in my selenium webdriver)
String number = driver.findElement(By.xpath("//span[#data-dojo-attach-point='subNumber']")).getText();
equivalent jsoup
String number = doc.select(" >span >data-dojo-attach-point=subNumber").text();
System.out.println(number);
While executing getting below error
Could not parse query 'data-dojo-attach-point=subNumber': unexpected token at '=subNumber'
HTML:
<div class="subHeaders">
<div class="subHeaderItem">
<h5 class="smallGray">Number</h5>
<span data-dojo-attach-point="subNumber">94607506</span>
</div>
</div>
can anyone help this.
This is the way you could retrieve that data with selectFirst​(String cssQuery) and then html():
TestClass:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
public class TestA {
public static void main(String[] args) throws IOException {
//this is where chromedriver.exe should be
String driverPath = "yourDriverPath";
System.setProperty("webdriver.chrome.driver", driverPath);
WebDriver driver = new ChromeDriver(); ;
driver.get("YourURL");
WebDriverWait wait = new WebDriverWait(driver, 15);
String cssSelector = "span[data-dojo-attach-point=subNumber]";
wait.until(ExpectedConditions.presenceOfElementLocated(By.cssSelector(cssSelector)));
Document doc = Jsoup.connect("YourURL").get();
Element subNumber = doc.selectFirst(cssSelector);
System.out.println(subNumber.html());
}
}
Output:
94607506
Note: I've tried the above in my laptop and it's working.
Use this CSS Selector.
div.subHeaders > div.subHeaderItem > span
String number = doc.select("div.subHeaders > div.subHeaderItem > span").text();
If the page has loaded then you will get the text. Use "Try Jsoup" to verify if you are able to get the text.
Click on this link. Click "Fetch URL" and input the URL of the page you are trying to parse and click "Fetch". Let me know if you are able to get the value.
If you don't mind posting the URL here, post the URL here. We will help you.
I want to show parsed Elements in my JSP page.
I already have Jsoup in my Maven dependencies
I have a class for parsing with jsoup which returns a string.
package com.user.jsoup;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class JsoupClass {
public String testMethod() throws IOException {
Document doc = Jsoup.connect("https://www.google.de").get();
String test = doc.title();
return test;
}
}
My JSP is:
<%#page import="com.user.jsoup.JsoupClass"%>
<%
JsoupClass jsclass = new JsoupClass();
out.print(jsclass.testMethod());
%>
Unfortunately it won't display anything.
What am I doing wrong?
I could solve my problem my adding
System.setProperty("https.proxyHost", "host");
System.setProperty("https.proxyPort", "port");
to my JsoupClass
I'm using JSOUP in Java to parse HTMLs like these two:
This and this.
In the first case, I get the output.
And I have a problem with the connection:
doc = Jsoup.connect(url).get();
There are some URLs which can easily be parsed, and I've got the output, but there are URLs too which produces empty output like this:
Title: [].
I can't understand what the problem is if both URLs are the same.
This is my code:
Document doc;
try {
doc = Jsoup.connect("http://ekonomika.sme.sk/c/8047766/s-velkymi-chybami-stavali-aj-budovu-centralnej-banky.html").get();
String title = doc.title();
System.out.println("title : " + title);
}
catch (IOException e) {
e.printStackTrace();
}
Take a look at what's in the head of the second url
Element h = doc.head();
System.out.println("head : " + h);
You'll see there are some meta refresh tags and an empty title:
<head>
<noscript>
<meta http-equiv="refresh" content="1;URL='/c/8047766/s-velkymi-chybami-stavali-aj-budovu-centralnej-banky.html?piano_d=1'">
</noscript>
<meta http-equiv="refresh" content="10;URL='/c/8047766/s-velkymi-chybami-stavali-aj-budovu-centralnej-banky.html?piano_t=1'">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title></title>
</head>
Which explains the empty title. You have to follow the redirect.
Here is my code for parsing, with this URL I have no output.
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package commentparser;
import java.io.IOException;
import static java.lang.Boolean.FALSE;
import static java.lang.Boolean.TRUE;
import java.net.URL;
import static java.sql.JDBCType.NULL;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
import static javafx.beans.binding.Bindings.length;
import static jdk.nashorn.internal.objects.ArrayBufferView.length;
import static oracle.jrockit.jfr.events.Bits.length;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class CommentParser {
public static void main(String[] args) {
Document doc;
try {
doc = Jsoup.connect("http://ekonomika.sme.sk/c/8047766/s-velkymi-chybami-stavali-aj-budovu-centralnej-banky.html").followRedirects(true).get();
String title = doc.title();
System.out.println("title : " + title);
//Link for discussions
if(doc.select("a[href^=/diskusie/reaction_show]").isEmpty() == FALSE){
Elements description = doc.select("a[href^=/diskusie/reaction_show]");
for (Element link : description) {
// get the value from href attribute
System.out.println("Diskusie: " + link.attr("href"));
}
}
//Author of article
if(doc.select("span[class^=autor]").isEmpty() == FALSE){
Elements description = doc.select("span[class^=autor]");
for (Element link : description) {
// get the value from href attribute
//System.out.println("\nlink : " + link.attr("b"));
System.out.println(link.text());
}
}
// get all links
Elements links = doc.select("a[href]");
for (Element link : links) {
// get the value from href attribute
System.out.println("\nlink : " + link.attr("href"));
System.out.println("text : " + link.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
How to get element by tags using JSoup (http://jsoup.org/)?
I have the following input and require the following output but i am not getting the text inside the <source>...<\source> tags:
[in:]
<html>
<something>
<source>foo bar bar</source>
<something>
<source>foo foo bar</source>
</html>
[desired out:]
foo bar bar
foo foo bar
I have tried this:
import java.io.*;
import java.util.List;
import org.apache.commons.io.IOUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class HelloJsoup {
public static void main(String[] args) throws IOException {
String br = "<html><source>foo bar bar</source></html>";
Document doc = Jsoup.parse(br);
//System.out.println(doc);
for (Element sentence : doc.getElementsByTag("source"))
System.out.print(sentence);
}
}
but it outputs:
<source></source>
You need to use the xmlParser(), which you can pass in to the parse() method:
String br = "<html><source>foo bar bar</source></html>";
Document doc = Jsoup.parse(br, "", Parser.xmlParser());
for (Element sentence : doc.getElementsByTag("source"))
System.out.println(sentence.text());
}
More on this in the docs: http://jsoup.org/apidocs/org/jsoup/parser/Parser.html#xmlParser()