Long compile and install time for fetching HTML - java

Its taking too long to compile the code (around 5mins +, only for this app).
Also when it's finally done, complete HTML is not displayed in the logcat! Only partial.
Can you guys please point out what's wrong with the code?
Is it because of "InputStream" reading character by character (as the HTML is huge)?
public class MainActivity extends AppCompatActivity {
public class DownloadTask extends AsyncTask<String, Void, String> {
#Override
protected String doInBackground(String... urls) {
String result = "";
URL url;
HttpURLConnection urlConnection = null;
try {
url = new URL(urls[0]);
urlConnection = (HttpURLConnection) url.openConnection();
InputStream in = urlConnection.getInputStream();
InputStreamReader reader = new InputStreamReader(in);
int data = reader.read();
while (data != -1) {
char current = (char) data;
result += current;
data = reader.read();
}
return result;
} catch (Exception e) {
e.printStackTrace();
return "Failed";
}
}
}
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
DownloadTask task = new DownloadTask();
String result = null;
try {
result = task.execute("http://www.amazon.com").get();
} catch (Exception e) {
e.printStackTrace();
}
Log.i("Result",result);
}

Yes. The more system calls you make like that, the worse your performance. You should be reading in multiple kilobytes at a time, not characters. If you need to loop over it one at a time, do that afterwards.
Also, use a StringBuilder!!!! + on a string is HIGHLY inefficient. For every character you make a new String object. StringBuilder avoids that.

Related

Unable to get full HTML of a page, it stops at a certain point

I am trying to download the HTML of page. After it downloads I try to Log it. Everything goes smoothly but the HTML stops at a certain point every time, even though it has a lot more HTML to show.
I tried using a different page, my page which just has some instructions for my Company and it worked perfectly. Is there a limit maybe? I tried it with urlconnection.connect(), and without it and there is no difference.
public class MainActivity extends AppCompatActivity {
public class DownloadHTML extends AsyncTask<String, Void, String>{
#Override
protected String doInBackground(String... urls) {
URL url;
String result = "";
HttpURLConnection urlConnection = null;
try {
url = new URL(urls[0]);
urlConnection = (HttpURLConnection)url.openConnection();
InputStream in = urlConnection.getInputStream();
InputStreamReader reader = new InputStreamReader(in);
int data = reader.read();
while (data!=-1){
char current = (char) data;
result += current;
data = reader.read();
}
return result;
} catch (Exception e) {
e.printStackTrace();
return "Fail";
}
}
}
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
String Result = "";
DownloadHTML task = new DownloadHTML();
try {
Result = task.execute("http://www.posh24.se/kandisar").get();
} catch (Exception e) {
e.printStackTrace();
}
Log.i("URL", Result);
}
}
Here is the splitting and it wont work.
try {
Result = task.execute("http://www.posh24.se/kandisar").get();
String[] splitStrings = Result.split("<div class=\"channelListEntry\">");
Pattern p = Pattern.compile("<img src=\"(.*?)\"");
Matcher m = p.matcher(splitStrings[0]);
while (m.find()){
CelebUrls.add(m.group(1));
}
p = Pattern.compile("alt=\"(.*?)\"");
m = p.matcher(splitStrings[0]);
while (m.find()){
CelebNames.add(m.group(1));
}
} catch (Exception e) {
e.printStackTrace();
}
Log.i("URL", Arrays.toString(CelebUrls.toArray()));
}
}
Modifing your method like this will give you the content of the html page in UTF-8 format.
(In this case its UTF-8 because the page is encoded like that, in doubt you can pass Charset.forName("utf-8") as second paramter to the constructor of InputStreamReader)
When testing you example implementation I only got some output with various unreadable characters.
Ignore the class and the method changes, I only made them to have a standalone example.
public class ParsingTest {
static String doInBackground(String address) {
URL url;
StringBuilder result = new StringBuilder(1000);
HttpURLConnection urlConnection = null;
try {
url = new URL(address);
urlConnection = (HttpURLConnection)url.openConnection();
InputStream in = urlConnection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line = reader.readLine();
while (line != null){
result.append(line);
result.append("\n");
line = reader.readLine();
}
return result.toString();
} catch (Exception e) {
e.printStackTrace();
return "Fail";
}
}
public static void main(String[] args) {
String result = doInBackground("http://www.posh24.se/kandisar");
System.out.println(result);
}
}
If the only part that interest you are the images of the top100, you can just adjust the while loop to:
String line = reader.readLine();
while (line != null){
if (line.contains("<div class=\"channelListEntry\">")) {
reader.readLine();
reader.readLine();
line = reader.readLine().trim();
// At this points its probably easier to use a List<String> for the result instead
result.append(line);
result.append("\n");
}
line = reader.readLine();
}
This is only a simplied example based on the current design of the page, where the img comes 3 lines after the declaration of the div.
If you want to you can also just extract the url of the image and the alt description directly at this point. Instead of using complicated regex you could rely on the String#indexOf instead.
private static final String SRC = "src=\"";
private static final String ALT = "\" alt=\"";
private static final String END = "\"/>";
public static void extract(String image) {
int index1 = image.indexOf(SRC);
int index2 = image.indexOf(ALT);
int index3 = image.indexOf(END);
System.out.println(image);
System.out.println(image.substring(index1 + SRC.length(), index2));
System.out.println(image.substring(index2 + ALT.length(), index3));
}
Note that if you directly process the content from the page your app does not require the memory to store the full page.

Downloading the Source code of a Webpage

I am trying to download the Source code of a web page .
But the problem is the whole code is not being showing up only a small part is downloading every time .
public class MainActivity extends AppCompatActivity {
public class DownloadTask extends AsyncTask < String , Void , String >
{
#Override
protected String doInBackground(String... params) {
String content ="";
URL url ;
HttpURLConnection conn = null;
try {
url = new URL (params[0]);
conn = (HttpURLConnection)url.openConnection();
InputStream is = conn.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
int data = isr.read();
while(data!=-1)
{
char c = (char) data;
content += c;
data = isr.read();
}
Log.i("The Code is ",content);
}
catch (Exception e)
{
e.getStackTrace();
}
return content;
}
}
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
String result =" ";
DownloadTask DT = new DownloadTask();
try {
result = DT.execute("https://www.google.co.in").get();
}
catch (Exception e)
{
e.getStackTrace();
}
Log.i("The Code is ",result);
}
}
It's important to close the StreamReader. Might not be a problem, but it's a good practice.
while(data!=-1)
{
char c = (char) data;
content += c;
data = isr.read();
}
isr.close();
is.close();
I think your first page is downloaded fine, but when you try to load it again and again you might face problem. As I said this might not be a fix, but it's important. Hope this helps someone.

Android Parsing a website with AsyncTask, urlConnection and InputStreamReader takes too long

I am trying to parse the following page with AsyncTask, urlConnection and InputStreamReader
public class DownloadTask extends AsyncTask<String, Void, String> {
URL url;
URLConnection urlConnection;
String result = null;
#Override
protected String doInBackground(String... urls) {
try {
url = new URL(urls[0]);
urlConnection = (URLConnection) url.openConnection();
InputStream in = urlConnection.getInputStream();
InputStreamReader reader = new InputStreamReader(in);
int data = reader.read();
while (data != -1) {
char current = (char) data;
result += current;
data = reader.read();
}
return result;
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
}
And I am using this on this way:
DownloadTask downloadTask = new DownloadTask();
String data = null;
try {
data = downloadTask.execute("http://www.imdb.com/movies-in-theaters/").get();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
The problem is that it takes more than 3 mins to finish the DownloadTask. Finally after this time it works on the emulator but not in a real device.
I know that this is not a good way (parsing a web page) to do stuff like that, but I am doing it for educational reasons.
Any advice how I can speed up the procedure?
Thanks!

Skipped 104 frames! The application may be doing too much work on its main thread

I checked across StackOverflow for answers, but I did not find much. So, I am doing this for practice, like Hello World for working with JSON, I am getting JSON response from openweather API.
I write the name of the city in EditText and press the button to search for it and display JSON string in the logs.
public class MainActivity extends AppCompatActivity {
EditText city;
public void getData(View view){
String result;
String cityName = city.getText().toString();
getWeather weather = new getWeather();
try {
result = weather.execute(cityName).get();
System.out.println(result);
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
}
public class getWeather extends AsyncTask<String, Void, String>{
#Override
protected String doInBackground(String... urls) {
URL url;
HttpURLConnection connection = null;
String result = "";
try {
String finalString = urls[0];
finalString = finalString.replace(" ", "%20");
String fullString = "http://api.openweathermap.org/data/2.5/forecast?q=" + finalString + "&appid=a18dc34257af3b9ce5b2347bb187f0fd";
url = new URL(fullString);
connection = (HttpURLConnection) url.openConnection();
InputStream in = connection.getInputStream();
InputStreamReader reader = new InputStreamReader(in);
int data = reader.read();
while(data != -1){
char current = (char) data;
result += current;
data = reader.read();
}
return result;
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
}
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
city = (EditText) findViewById(R.id.editText);
}
}
What can I do to not get that message?
weather.execute(cityName).get()
When you do get() you are waiting the AsyncTask to finish. Thus you are running all heavy operation on Ui thread.
From documentation of get():
Waits if necessary for the computation to complete, and then retrieves its result.
Remove get().

Can't downloading HTML in Android Studio

When I'm trying to download html using this method:
public class DownloadHtml extends AsyncTask<String, Void, String> {
#Override
protected String doInBackground(String... urls) {
String result = "";
URL url;
HttpURLConnection connection = null;
try {
url = new URL(urls[0]);
connection = (HttpURLConnection) url.openConnection();
InputStream inputStream = connection.getInputStream();
InputStreamReader reader = new InputStreamReader(inputStream);
int data = reader.read();
while (data != -1) {
char currentChar = (char) data;
result += currentChar;
data = reader.read();
}
return result;
} catch (Exception e) {
e.printStackTrace();
return "Failed";
}
}
}
And logging a result
DownloadHtml downloadHtml = new DownloadHtml();
String result = null;
try {
result = downloadHtml.execute("http://stackoverflow.com").get();
} catch (Exception e) {
e.printStackTrace();
}
Log.i("Html", result);
I am gettin only small part of it.
Is there a way to get whole HTML of webpage?
Solution was simple. Looks like Log.i doesn't print everything in one go.
When I have tried to get all the links from HTML they were successfully printed.

Categories

Resources