Java parse data from html table with jsoup

Java parse data from html table with jsoup - java

I want to get the data from the table from the link.
link:
https://www.nasdaq.com/symbol/aapl/financials?query=balance-sheet
I´ve tried my code but it doens´t work
public static void main(String[] args) {
try {
Document doc = Jsoup.connect("https://www.nasdaq.com/symbol/aapl/financials?query=balance-sheet").get();
Elements trs = doc.select("td_genTable");
for (Element tr : trs) {
Elements tds = tr.getElementsByTag("td");
Element td = tds.first();
System.out.println(td.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
Can anybody help me? To get it to work
I´m not getting an output of the table. Nothing happens.

After test your code I've got and Read time out problem. Looking on Google I found this post where suggest to add an user agent to fix it and it worked for me. So, you can try this
public static void main(String[] args) {
try {
// add user agent
Document doc = Jsoup.connect("https://www.nasdaq.com/symbol/aapl/financials?query=balance-sheet")
.userAgent("Mozilla/5.0").get();
Elements trs = doc.select("tr");
for (Element tr : trs) {
Elements tds = tr.select(".td_genTable");
// avoid tr headers that produces NullPointerException
if(tds.size() == 0) continue;
// look for siblings (see the html structure of the web)
Element td = tds.first().siblingElements().first();
System.out.println(td.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
I have added User agent option and fix some query errors. This will be useful to start your work ;)

Related

Jsoup how to get text from a specific column of an html table

I'm trying to print out all the locations in the table on this wikipedia page: https://en.wikipedia.org/wiki/COVID-19_pandemic, but it always shows up blank. Is this a problem with my code or am I searching for the wrong html classes?
try {
Document doc = Jsoup.connect("https://en.wikipedia.org/wiki/COVID-19_pandemic").get();
for (Element row : doc.select("table.wikitable.plainrowheaders.sortable.jquery-tablesorter tr")){
if (row.select("th:nth-of-type(2)").text().equals("")){
continue;
}else {
final String location = row.select("th:nth-of-type(2)").text();
System.out.println(location);
}
}
} catch (IOException e) {
e.printStackTrace();
}

When I changed
doc.select("table.wikitable.plainrowheaders.sortable.jquery-tablesorter tr")
to
doc.select("table.wikitable tr")
I was able to get the countries names.
Please try.

Get Google Search Result with Java using Jsoup

first of all i search this problem in stackoverflow database and google. Unfortunately i couldn't find a solution.
I am trying to get Google Search Result for a keyword. Heres my code :
public static void main(String[] args) throws Exception {
Document doc;
try{
doc = Jsoup.connect("https://www.google.com/search?as_q=&as_epq=%22Yorkshire+Capital%22+&as_oq=fraud+OR+allegations+OR+scam&as_eq=&as_nlo=&as_nhi=&lr=lang_en&cr=countryCA&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=&as_filetype=&as_rights=").userAgent("Mozilla").ignoreHttpErrors(true).timeout(0).get();
Elements links = (Elements) doc.select("li[class=g]");
for (Element link : links) {
Elements titles = link.select("h3[class=r]");
String title = titles.text();
Elements bodies = link.select("span[class=st]");
String body = bodies.text();
System.out.println("Title: "+title);
System.out.println("Body: "+body+"\n");
}
}
catch (IOException e) {
e.printStackTrace();
}
}
And heres the errors : https://prnt.sc/ro4ooi
It says : can only iterate over an array or an instance of java.lang.iterable ( at links )..
When i delete the (Elements) : https://prnt.sc/ro4pa9
Thank you.

parse html from a web page which uses infinite scroll

I would like to parse html from web page which use infinite scroll, such as: pinterest.com so as to get all items.
public List<String> popularTagsPinterest(String tag) throws Exception {
List<String> results = new ArrayList<>();
try {
Document doc = Jsoup.connect(
urlPinterest + tag + "&eq=%23" + tag + "&etslf=6622&term_meta[]=%23" + tag + "%7Cautocomplete%7C0")
.timeout(90000).get();
Elements img1 = doc.select("a.pinImageWrapper img.pinImg");
for (Element e : img1) {
results.add(e.attr("src"));
System.out.println(e.attr("src"));
}
} catch (Exception e) {
e.printStackTrace();
}
return results;
}

Get base url and the ajax call for loading another part can do.
Check this page, is a good example.
https://blog.scrapinghub.com/2016/06/22/scrapy-tips-from-the-pros-june-2016

Unable to correctly return links for google search result

Hi my code is unable to return links that are listed in the search result for Google, I can't seem to find the solution. It seems that it only views the homepage of the Google but I already added the link on a search result. See my code below:
public class jsoupScraper {
public static void main (String args[]) {
Document doc;
try{
doc = Jsoup.connect("https://www.google.com.ph/?gfe_rd=cr ei=yaXVV4CAGION2AT9q5OICQ#q=Silence+of+the+lambs")
.userAgent("Mozilla").ignoreHttpErrors(true).timeout(0).get();
Elements links = doc.select("a");
int count = 0;
for (Element link : links) {
if(count < 10) {
System.out.println(link.attr("href"));
count++;
} else
break;
}
}
catch (IOException e) {
e.printStackTrace();
}
}
}
Actually I just copied this on this link: Using JSoup to scrape Google Results and I just did a minor modification but it still doesn't output correctly. When I tried to run the original code it will just terminate with no result.
Here is a sample output
https://www.google.com/webhp?tab=ww
https://www.google.com/imghp?hl=en&tab=wi
https://maps.google.com/maps?hl=en&tab=wl
https://play.google.com/?hl=en&tab=w8
https://www.youtube.com/?gl=PH&tab=w1
https://news.google.com/nwshp?hl=en&tab=wn
https://mail.google.com/mail/?tab=wm
https://drive.google.com/?tab=wo
https://www.google.com/intl/en/options/
https://www.google.com/calendar?tab=wc
Thank you for the help in advance, and it seems to be the same case with HTMLUnit

JSoup core web text extraction

I am new to JSoup, Sorry if my question is too trivial.
I am trying to extract article text from http://www.nytimes.com/ but on printing the parse document
I am not able to see any articles in the parsed output
public class App
{
public static void main( String[] args )
{
String url = "http://www.nytimes.com/";
Document document;
try {
document = Jsoup.connect(url).get();
System.out.println(document.html()); // Articles not getting printed
//System.out.println(document.toString()); // Same here
String title = document.title();
System.out.println("title : " + title); // Title is fine
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
ok I have tried to parse "http://en.wikipedia.org/wiki/Big_data" to retrieve the wiki data, same issue here as well I am not getting the wiki data in the out put.
Any help or hint will be much appreciated.
Thanks.

Here's how to get all <p class="summary> text:
final String url = "http://www.nytimes.com/";
Document doc = Jsoup.connect(url).get();
for( Element element : doc.select("p.summary") )
{
if( element.hasText() ) // Skip those tags without text
{
System.out.println(element.text());
}
}
If you need all <p> tags, without any filtering, you can use doc.select("p") instead. But in most cases it's better to select only those you need (see here for Jsoup Selector documentation).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java parse data from html table with jsoup - java

Related

Jsoup how to get text from a specific column of an html table

Get Google Search Result with Java using Jsoup

parse html from a web page which uses infinite scroll

Unable to correctly return links for google search result

JSoup core web text extraction

Categories

Resources