jsoup reading data that is not present - java

I have a html page that I am reading.
If the format I am reading in that page is not present I want to exit and continue with the next page but that is not working.
can you please let me know what I am missing
try
{
Document doc = Jsoup.connect(urlget).get();
Element tables = doc.select("div.itembody");
websiteaddress= tables.text();
}
catch (IOException ee)
{
}
If the get is not having itembody I am seeing a exception:
Exception in thread "main" java.lang.NullPointerException
I want this loop to be continued not the program exsit when there is a exception

doc.select returns an object of type Elements (a list of Elements) not Element. If no element in your html matches the query you get an empty list of elements. Change your code to:
try
{
Document doc = Jsoup.connect(urlget).get();
Elements tables = doc.select("div.itembody");
if(tables.isEmpty())
noDivItembodyInHTML();
else
websiteaddress = tables.first().text();
}
catch (IOException ee)
{
}

Related

No output for JSoup file

When running the following code:
try {
Document doc = Jsoup.connect("https://pomofocus.io/").get();
Elements text = doc.select("div.sc-kEYyzF");
System.out.println(text.text());
}
catch (IOException e) {
e.printStackTrace();
}
No output occurs. When changing the println to:
System.out.println(text.first().text());
I get a NullPointerException but nothing else.
jsoup doesn't execute javascript - it parses the HTML that the server returns. You can check View Source (vs Inspect) to see the response from the server, and what is selectable.

Jsoup how to get text from a specific column of an html table

I'm trying to print out all the locations in the table on this wikipedia page: https://en.wikipedia.org/wiki/COVID-19_pandemic, but it always shows up blank. Is this a problem with my code or am I searching for the wrong html classes?
try {
Document doc = Jsoup.connect("https://en.wikipedia.org/wiki/COVID-19_pandemic").get();
for (Element row : doc.select("table.wikitable.plainrowheaders.sortable.jquery-tablesorter tr")){
if (row.select("th:nth-of-type(2)").text().equals("")){
continue;
}else {
final String location = row.select("th:nth-of-type(2)").text();
System.out.println(location);
}
}
} catch (IOException e) {
e.printStackTrace();
}
When I changed
doc.select("table.wikitable.plainrowheaders.sortable.jquery-tablesorter tr")
to
doc.select("table.wikitable tr")
I was able to get the countries names.
Please try.

Java stax: The reference to entity "R" must end with the ';' delimiter

I am trying to parse a xml using stax but the error I get is:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[414,47]
Message: The reference to entity "R" must end with the ';' delimiter.
Which get stuck on the line 414 which has P&Rinside the xml file. The code I have to parse it is:
public List<Vild> getVildData(File file){
XMLInputFactory factory = XMLInputFactory.newFactory();
try {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(Files.readAllBytes(file.toPath()));
XMLStreamReader reader = factory.createXMLStreamReader(byteArrayInputStream, "iso8859-1");
List<Vild> vild = saveVild(reader);
reader.close();
return vild;
} catch (IOException e) {
e.printStackTrace();
} catch (XMLStreamException e) {
e.printStackTrace();
}
return Collections.emptyList();
}
private List<Vild> saveVild(XMLStreamReader streamReader) {
List<Vild> vildList = new ArrayList<>();
try{
Vild vild = new Vild();
while (streamReader.hasNext()) {
streamReader.next();
//Creating list with data
}
}catch(XMLStreamException | IllegalStateException ex) {
ex.printStackTrace();
}
return Collections.emptyList();
}
I read online that the & is invalid xml code but I don't know how to change it before it throws this error inside the saveVild method. Does someone know how to do this efficiently?
Change the question: you're not trying to parse an XML file, you're trying to parse a non-XML file. For that, you need a non-XML parser, and to write such a parser you need to start with a specification of the language you are trying to parse, and you'll need to agree the specification of this language with the other partners to the data interchange.
How much work you could all save by conforming to standards!
Treat broken XML arriving in your shop the way you would treat any other broken goods coming from a supplier: return it to sender marked "unfit for purpose".
The problem here, as you mention is that the parser finds the & and it expects also the ;
This gets fixed escaping the character, so that the parser finds & instead.
Take a look here for further reference

Java parse data from html table with jsoup

I want to get the data from the table from the link.
link:
https://www.nasdaq.com/symbol/aapl/financials?query=balance-sheet
I´ve tried my code but it doens´t work
public static void main(String[] args) {
try {
Document doc = Jsoup.connect("https://www.nasdaq.com/symbol/aapl/financials?query=balance-sheet").get();
Elements trs = doc.select("td_genTable");
for (Element tr : trs) {
Elements tds = tr.getElementsByTag("td");
Element td = tds.first();
System.out.println(td.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
Can anybody help me? To get it to work
I´m not getting an output of the table. Nothing happens.
After test your code I've got and Read time out problem. Looking on Google I found this post where suggest to add an user agent to fix it and it worked for me. So, you can try this
public static void main(String[] args) {
try {
// add user agent
Document doc = Jsoup.connect("https://www.nasdaq.com/symbol/aapl/financials?query=balance-sheet")
.userAgent("Mozilla/5.0").get();
Elements trs = doc.select("tr");
for (Element tr : trs) {
Elements tds = tr.select(".td_genTable");
// avoid tr headers that produces NullPointerException
if(tds.size() == 0) continue;
// look for siblings (see the html structure of the web)
Element td = tds.first().siblingElements().first();
System.out.println(td.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
I have added User agent option and fix some query errors. This will be useful to start your work ;)

SocketTimeoutException: Read timed out, how to fix it?

I have a swing application that read HTML pages using the following command
String urlzip = null;
try {
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
for (Element link : links) {
if (link.attr("abs:href").contains("BcfiHtm.zip")) {
urlzip = link.attr("abs:href").toString();
}
}
} catch (IOException e) {
textAreaStatus.append("Failed to get new file from internet:"+e.getMessage()+"\n");
e.printStackTrace();
}
return urlzip;
then my swing application will return a string, It works fine and it reads any HTML page that I give to it. However, some times the application gave me the following error type Exception report. How can i increase timeOut?
There's an example on this page.
Jsoup.connect("http://example.com").timeout(3000)
This error occurs while you are trying to read data and because of large data or connection problem it can not complete the task. I would suggest you to increase your Timeout using above code atleast for 1 minute. so it will be like below code,
Jsoup.connect("http://example.com").timeout(60000);

Categories

Resources