Get data from div class by JSOUP

Get data from div class by JSOUP - java

I need to get the value "8.32" from the "rnicper", "36 mg" from "rnstr" and "20/80 PG/VG" from "nirat".
<div class="recline highlight" id="rnic">
<div class="rlab"><span class="nopr indic indic-danger"></span>Nicotine juice <span id="rnstr">36 mg</span> (<span id="nirat">20/80 PG/VG</span>)</div>
<div class="runit" id="rnicml">2.08</div>
<div class="rdrops" id="rnicdr">73</div>
<div class="rgrams" id="rnicg" style="display: none;">2.53</div>
<div class="rpercent" id="rnicper">8.32</div><br>
</div>
I tried various methods, but nothing happens.
doc.getElementById("rnicper").outerHtml();
doc.getElementById("rnicper").text();
doc.select("div#rnicper");
doc.getElementsByAttributeValue("id", "rnicper").text();
Tell me, please, how can I get this information using JSOUP?
Update for Chintak Patel
AsyncTask asyncTask = new AsyncTask() {
#Override
protected Object doInBackground(Object[] objects) {
Document doc = null;
try {
doc = Jsoup.connect("http://e-liquid-recipes.com/recipe/2254223/RY4D%20Vanilla%20Swirl%20DL").get();
} catch (IOException e) {
e.printStackTrace();
}
String content = doc.select("div[id=rnicper]").text();
Log.d("content", content);
return null;
}
};
asyncTask.execute();

The values of parameters you are trying to get are are not part of initial html, but are set by javascript after page is loaded.
Jsoup only gets static html, does not execute javascript code.
To get what you want you can use tool like HtmlUnit or Selenium.
HtmlUnit example:
try (final WebClient webClient = new WebClient()) {
webClient.getOptions().setThrowExceptionOnScriptError(false);
final HtmlPage page = webClient
.getPage("http://e-liquid-recipes.com/recipe/2254223/RY4D%20Vanilla%20Swirl%20DL");
System.out.println(page.getElementById("rnicper").asText());
}

Write the following class in your Activity class and do your execution using JSoup. This code is used to get current version from play store website. you can change the URL and div[id=rnicper] into select() method. and then do your execution in postExecute() method.
private class GetVersionCode extends AsyncTask<Void, String, String> {
#Override
protected String doInBackground(Void... voids) {
String newVersion = null;
try {
newVersion = Jsoup.connect("https://play.google.com/store/apps/details?id=" + MainActivity.this.getPackageName() + "&hl=en")
.timeout(30000)
.userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
.referrer("http://www.google.com")
.get()
.select("div[itemprop=softwareVersion]")
.first()
.ownText();
return newVersion;
} catch (Exception e) {
return newVersion;
}
}
#Override
protected void onPostExecute(String onlineVersion) {
super.onPostExecute(onlineVersion);
if (onlineVersion != null && !onlineVersion.isEmpty()) {
if (Float.valueOf(currentVersion) < Float.valueOf(onlineVersion)) {
showAlertDialogForUpdate(currentVersion, onlineVersion);
}
}
Log.e("update", "Current version " + currentVersion + "playstore version " + onlineVersion);
}
}

Related

Possible to style jsoup output with CSS?

I successfully retrieved specific text from a website with Jsoup. But is it possible to style the text with CSS? Below you find my code for retrieving text from a website.
public class connect extends AsyncTask<Void, Void, Void> {
String string;
#Override
protected Void doInBackground(Void... voids) {
try {
Document document = Jsoup.connect("MY_URL").get();
Elements elements = document.select("div.MY_DIV_CLASS");
string = elements.text();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
#Override
protected void onPostExecute(Void aVoid) {
super.onPostExecute(aVoid);
webView.loadData(string, "text/html", "UTF-8");
}
}
Thank you in advance.

When you have selected element or elements you can style it by adding new class:
elements.addClass("your-class");
or by adding your own style attribute:
elements.attr("style", "text-align: center; color: red;");
These changes are saved in document object so to use updated HTML code you will probably want to use the output of: document.html().

How to Load Entire Contents of HTML - Jsoup

I was trying to download html table rows using jsoup but it parsing only partial html contents. I tried with below code also for loading full html contents but doesn't work. any suggestion would be appreciated.
public class AmfiDaily {
public static void main(String[] args) {
AmfiDaily amfiDaily = new AmfiDaily();
amfiDaily.extractAmfiTable("https://www.amfiindia.com/intermediary/other-data/transaction-in-debt-and-money-market-securities");
}
public void extractAmfiTable(String url){
Document doc;
try {
FileWriter writer = new FileWriter("D:\\FTRACK\\Amfi Report " + java.time.LocalDate.now() + ".csv");
Document document = Jsoup.connect(url)
.userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
.maxBodySize(0)
.timeout(100000*5)
.get();
Elements rows = document.select("tr");
for (Element row : rows) {
Elements cells1 = row.select("td");
for (Element cell : cells1) {
if (cell.text().contains(",")) {
writer.write(cell.text().concat(","));
}
else
{
writer.write(cell.text().concat(","));
}
}
writer.write("\n");
}
writer.close();
} catch (IOException e) {
e.getStackTrace();
}
}
}

Disable JavaScript to see exactly what Jsoup sees. Part of the page is loaded with AJAX so Jsoup is not able to reach it. But there's an easy way to check where the additional data comes from.
You can use your browsers debugger to check Network tab and take a look at the requests and responses.
You can see that table is downloaded from this URL:
https://www.amfiindia.com/modules/LoadModules/MoneyMarketSecurities
You can use directly this URL to get the data you need.
To overcome Jsoup's limitation and load whole HTML at once you should use Selenium webdriver, example here: https://stackoverflow.com/a/54510107/9889778

Jsoup is not working with a webpage

I am trying to get URL's of some images from a webpage but I'm having problems. I'm using try.jsoup.org to parse HTML via a CSS Query img and get result:
<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=JwbPi1a4ZP00iy" style="display:none" height="1" width="1" alt="" />
<img src="http://ads.tamtay.vn/www/delivery/avw.php?zoneid=226&cb=INSET_RANDOM_NUMBER_HERE&n=aa2b62d0" border="0" alt="" />
<img src="http://a0.ttimg.vn/866392.ava" style="width: 100%;" />
I know getting these urls is very easy by attr("abs:src"), but in this case, it doesn't work, and returns null.
I try to change current webpage by other webpage. It work normal. I think problem come from webpage. not code. Any one can help?

Why did you put "abs" try only with "src"
Documentation de JSOUP

here is code:
private class Title extends AsyncTask<Void, Void, Void> {
#Override
protected void onPreExecute() {
super.onPreExecute();
}
#Override
protected Void doInBackground(Void... params) {
try {
// Connect to the web site
Document document = Jsoup.connect("http://photo.tamtay.vn").get();
Element image = document.select("img").first();
Log.d("Image", image.attr("abs:src"));
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
#Override
protected void onPostExecute(Void result) {
}
}
image.attr("abs:src") return null

Parsing table elements with Jsoup

I'm trying to parse data from this table. Let's say, for example, that I want to parse the second elements from the second row (called SLO).
I can see there is a TR inside TR and the SLO word doesn't even have an ID or anything. How can I parse this?
This is the code:
class Title extends AsyncTask<Void, Void, Void> {
#Override
protected void onPreExecute() {
super.onPreExecute();
tw1.setText("Loading...");
}
#Override
protected Void doInBackground(Void... params) {
try {
Document doc = Jsoup.connect("https://www.easistent.com/urniki/cc45c5d0d303f954588402a186f5cdba5edb51d6/razredi/16515").get();
Elements eles = doc.select("");
title = eles.toString();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
#Override
protected void onPostExecute(Void result) {
super.onPostExecute(result);
tw1.setText(title);
}
}
I don't know what to put in the doc.select(""); because I've never parsed something like this. I've only parsed titles of webpages and such. Could someone help me with this?

There is plenty of information there for you to use, for example class names or title attributes. The URL you provided won't work for me, and I can't copy paste the HTML from your image so my example will show just the parsing of the span based on its title:
String html = "<span title='Slovenscina'>SLO</span>";
Document doc = Jsoup.parse(html);
Elements eles = doc.select("span[title=Slovenscina]");
String title = eles.text();
System.out.println(title);
Will output:
SLO
This will work in the scope of the other HTML that you provided. I suggest you read some more about the selector-syntax of Jsoup.

Need to parse image src from HTML page then display it

I'm currently trying to develop an app whereby it visits the following site (Http://lulpix.com) and parses the HTML and gets the img src from the following section
<div class="pic rounded-8" style="overflow:hidden;"><div style="margin:0 0 36px 0;overflow:hidden;border:none;height:474px;"><img src="**http://lulpix.com/images/2012/April/13/4f883cdde3591.jpg**" alt="All clogged up" title="All clogged up" width="319"/></div></div>
Its of course different every time the page is loaded so I cannot give a direct URL to an Asynchronous gallery of images which is what i intend to do, for instance
Load Page > Parse img src > download ASync to imageview > Reload lulpix.com > start again
Then place each of these in an image view from which the user can swipe left and right to browse.
So the TL;DR of this is, how can i parse the html to retrieve the URL and has anyone got any experiences with libarys for displaying images.
Thank you v much.

Here's an AsyncTask that connects to lulpix, fakes a referrer & user-agent (lulpix tries to block scraping with some pretty lame checks apparently). Starts like this in your Activity:
new ForTheLulz().execute();
The resulting Bitmap is downloaded in a pretty lame way (no caching or checks if the image is already DL:ed) & error handling is overall pretty non-existent - but the basic concept should be ok.
class ForTheLulz extends AsyncTask<Void, Void, Bitmap> {
#Override
protected Bitmap doInBackground(Void... args) {
Bitmap result = null;
try {
Document doc = Jsoup.connect("http://lulpix.com")
.referrer("http://www.google.com")
.userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
.get();
//parse("http://lulpix.com");
if (doc != null) {
Elements elems = doc.getElementsByAttributeValue("class", "pic rounded-8");
if (elems != null && !elems.isEmpty()) {
Element elem = elems.first();
elems = elem.getElementsByTag("img");
if (elems != null && !elems.isEmpty()) {
elem = elems.first();
String src = elem.attr("src");
if (src != null) {
URL url = new URL(src);
// Just assuming that "src" isn't a relative URL is probably stupid.
InputStream is = url.openStream();
try {
result = BitmapFactory.decodeStream(is);
} finally {
is.close();
}
}
}
}
}
} catch (IOException e) {
// Error handling goes here
}
return result;
}
#Override
protected void onPostExecute(Bitmap result) {
ImageView lulz = (ImageView) findViewById(R.id.lulpix);
if (result != null) {
lulz.setImageBitmap(result);
} else {
//Your fallback drawable resource goes here
//lulz.setImageResource(R.drawable.nolulzwherehad);
}
}
}

I recently used JSoup to parse invalid HTML, it works well! Do something like...
Document doc = Jsoup.parse(str);
Element img = doc.body().select("div[class=pic rounded-8] img").first();
String src = img.attr("src");
Play with the "selector string" to get it right, but I think the above will work. It first selects the outer div based on the value of its class attribute, and then any descendent img element.

No need to use webview now check this sample project
https://github.com/meetmehdi/HTMLImageParser.git
In this sample project I am parsing html and image tag, than extracting the image from image URL. Image is downloaded and is displayed.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get data from div class by JSOUP - java

Related

Possible to style jsoup output with CSS?

How to Load Entire Contents of HTML - Jsoup

Jsoup is not working with a webpage

Parsing table elements with Jsoup

Need to parse image src from HTML page then display it

Categories

Resources