extract specific part of html code - java

I am doing my first Android app and I have to take the code of a html page.
Actually I am doing this:
private class NetworkOperation extends AsyncTask<Void, Void, String > {
protected String doInBackground(Void... params) {
try {
URL oracle = new URL("http://www.nationalleague.ch/NL/fr/");
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
String s1 = "";
while ((inputLine = in.readLine()) != null)
s1 = s1 + inputLine;
in.close();
//return
return s1;
}
catch (IOException e) {
e.printStackTrace();
}
return null;
}
but the problem is it takes too much time. How to take for exemple the HTML from the line 200 to the line 300 ?
Sorry for my bad english :$

Best case use instead of readLine() use read(char[] cbuf, int off, int len). Another dirty way
int i =0;
while(while ((inputLine = in.readLine()) != null)
i++;
if(i>200 || i<300 )
DO SOMETHING
in.close();)

You get the HTML document through HTTP. HTTP usually relies on TCP. So... you can't just "skip lines"! The server will always try to send you all data preceding the portion of your interest, and your side of communication must acknowledge the reception of such data.

Do not read line by line [use read(char[] cbuf, int off, int len)]
Do not concat Strings [use a StringBuilder]
Open The buffered reader (much like you already do):
URL oracle = new URL("http://www.nationalleague.ch/NL/fr/");
BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));
Instead of reading line by line, read in a char[] (I would use one of size about 8192)
and than use a StringBuilder to append all the read chars.
Reading secific lines of HTML-source seams a little risky because formatting of the source code of the HTML page may change.

Related

BufferedReader does not work well

Loading following URL with BufferedReader, but content is not delivered. Even though a plain browser can show content. So str will remain nil. Any idea why?
URL url = new URL("http://www.omdbapi.com/?t=zorr&y=&plot=short&r=json");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {}
Log.d("alma", str);
You are ignoring all of the lines that you are reading. You then exit the loop when str becomes null. So, your Log.d() call will always show null.
If you want to use the lines that you are reading, use str inside` your currently empty block:
while ((str = in.readLine()) != null) {
// do something with str
}
You might also wish to consider using a third-party library that offers a simpler API. OkHttp3, for example, makes getting a string response from a URL fairly easy.
try this:
URL url = new URL("http://www.omdbapi.com/?t=zorr&y=&plot=short&r=json");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
Log.d("alma", str); // this should be here
}

How to read a stream until it ends with a string

I am trying to read a Stream which will read until it ends with some string. I came up with some solution but I do not believe it is very good solution because of lots of string conversion and call to method in loop. Could someone please suggest me better solution.
private static String readUntilEndsWith(BufferedReader reader,
String endString) throws IOException
{
StringBuffer buffer = new StringBuffer();
while (!buffer.toString().endsWith(endString))
buffer.append(reader.readLine());
return buffer.toString();
}
Read the current line only and evaluate it. If the line is ok, append it into your buffer and keep reading. This way you only evaluate the current line in the reader and not the whole content.
StringBuilder buffer = new StringBuilder();
String line = reader.readLine();
while (line != null && !line.endsWith(endString)) {
buffer.append(line);
line = reader.readLine();
}
if (line != null) {
buffer.append(line);
}
return buffer.toString();
If your endString is always a string that is in a line, then you can remake your condition to:
String line="";
do{
line=reader.readLine();
buffer.append(line);
}while (!line.endsWith(endString));
this way you do not have to convert your Buffer each time to a string.
P.S.
i also recommend using StringBuilder instead of StringBuffer
if your endString is however something that might be a sum of few lines, you can add something like:
if(endString.endsWith(line) {
if(buffer.toString().endsWith(endString)
return buffer.toString();
}

Slow download with HttpURLConnection

I'm trying to make a method that download a webpage.
First, i create a HttpURLConnection.
Second, i call the connect() method.
Third, i read the data through a BufferedReader.
The problem is that with some pages i get reasonable reading times, but with some pages it's Very slow (it can take about 10 minutes!). The slow pages are always the same, and they comes from the same website. Opening those pages with the browser takes just a few seconds instead of 10 minutes. Here is the code
static private String getWebPage(PageNode pagenode)
{
String result;
String inputLine;
URI url;
int cicliLettura=0;
long startTime=0, endTime, openConnTime=0,connTime=0, readTime=0;
try
{
if(Core.logGetWebPage())
startTime=System.nanoTime();
result="";
url=pagenode.getUri();
if(Core.logGetWebPage())
openConnTime=System.nanoTime();
HttpURLConnection yc = (HttpURLConnection) url.toURL().openConnection();
if(url.toURL().getProtocol().equalsIgnoreCase("https"))
yc=(HttpsURLConnection)yc;
yc.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)");
yc.connect();
if(Core.logGetWebPage())
connTime=System.nanoTime();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
while ((inputLine = in.readLine()) != null)
{
result=result+inputLine+"\n";
cicliLettura++;
}
if(Core.logGetWebPage())
readTime=System.nanoTime();
in.close();
yc.disconnect();
if(Core.logGetWebPage())
{
endTime=System.nanoTime();
System.out.println(/*result+*/"getWebPage eseguito in "+(endTime-startTime)/1000000+" ms. Size: "+result.length()+" Response Code="+yc.getResponseCode()+" Protocollo="+url.toURL().getProtocol()+" openConnTime: "+(openConnTime-startTime)/1000000+" connTime:"+(connTime-openConnTime)/1000000+" readTime:"+(readTime-connTime)/1000000+" cicliLettura="+cicliLettura);
}
return result;
}catch(IOException e){
System.out.println("Eccezione: "+e.toString());
e.printStackTrace();
return null;
}
}
Here you have two log samples
One of the "normal" pages
getWebPage executed Size: 48261 Response Code=200 Protocol=http openConnTime: 0 connTime:1 readTime:569 cicliLettura=359
One of the "slow" pages http://ricette.giallozafferano.it/Pan-di-spagna-al-cacao.html/allcomments
looks like this
getWebPage executed Size: 1748261 Response Code=200 Protocol=http openConnTime: 0 connTime:1 readTime:596834 cicliLettura=35685
What you're likely seeing here is a result of the way you are collating result. Remember that Strings in Java are immutable - therefore when string concatenation occurs, a new String has to be instantiated, which can often involve copying all the data contained in that String. You have the following code executing for every line:
result=result+inputLine+"\n";
Under the covers, this line involves:
A new StringBuffer is created with the entire content of result so far
inputLine is appended to the StringBuffer
The StringBuffer is converted to a String
A new StringBuffer is created for that String
A newline character is appended to that StringBuffer
The StringBuffer is converted to a String
That String is stored as result.
This operation will become more and more time-consuming as result gets bigger and bigger - and your results appear to show (albeit from a sample of 2!) that the results increase drastically with page size.
Instead, use StringBuffer directly.
StringBuffer buffer = new StringBuffer();
while ((inputLine = in.readLine()) != null)
{
buffer.append(inputLine).append('\n');
cicliLettura++;
}
String result = buffer.toString();

"StringBuffer sb = new StringBuffer()" get a null value in Android

I use the code below which in my http get request,but what I get from return is a null.I don't know why.
public static String getResponseFromGetUrl(String url) throws Exception {
StringBuffer sb = new StringBuffer();
try {
HttpResponse httpResponse = httpclient.execute(httpRequest);
String inputLine = "";
if (httpResponse.getStatusLine().getStatusCode() == HttpStatus.SC_OK) {
InputStreamReader is = new InputStreamReader(httpResponse
.getEntity().getContent());
BufferedReader in = new BufferedReader(is);
while ((inputLine = in.readLine()) != null) {
sb.append(inputLine);
}
in.close();
}
} catch (Exception e) {
e.printStackTrace();
return "net_error";
} finally {
httpclient.getConnectionManager().shutdown();
}
return sb.toString();
}
And what I have use the function is
String json_str = HttpUtils.getResponseFromGetUrl("www.xxx.com/start");
if ((json_str == null)) Log.d("Chen", "lastestTimestap----" + "json_str == null");
And sometimes the Log will be printed.Not always,in fact like 1%.But I don't know why it caused.
This code will not produce a "null". There must be more code you are not showing.
If this is all the code you have I suggest you remove the StringBuffer and replace it with
return "";
More likely you have forgetten to mention some code which is doing something like
Object o = null;
sb.append(o); // appears as "null"
EDIT: Based on your update, I would have to assume you are reading a line like "null"
It is highly unlikely you want to discard the newline between each line. I suggest either you append("\n") as well or just record all the text you get without regard for new lines.
BTW Please don't use StringBuffer as its replacement StringBuilder has been around for almost ten years. There is a common misconception that using StringBuffer helps with multi-threading but more often it results in incorrect code because it is very harder, if not impossible to use StringBuffer correctly in a multi-threaded context

Java String from InputStream [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
How do I convert an InputStream to a String in Java?
In Java how do a read an input stream in to a string?
I have an InputSteam and need to simply get a single simple String with the complete contents.
How is this done in Java?
Here is a modification of Gopi's answer that doesn't have the line ending problem and is also more effective as it doesn't need temporary String objects for every line and avoids the redundant copying in BufferedReader and the extra work in readLine().
public static String convertStreamToString( InputStream is, String ecoding ) throws IOException
{
StringBuilder sb = new StringBuilder( Math.max( 16, is.available() ) );
char[] tmp = new char[ 4096 ];
try {
InputStreamReader reader = new InputStreamReader( is, ecoding );
for( int cnt; ( cnt = reader.read( tmp ) ) > 0; )
sb.append( tmp, 0, cnt );
} finally {
is.close();
}
return sb.toString();
}
You need to construct an InputStreamReader to wrap the input stream, converting between binary data and text. Specify the appropriate encoding based on your input source.
Once you've got an InputStreamReader, you could create a BufferedReader and read the contents line by line, or just read buffer-by-buffer and append to a StringBuilder until the read() call returns -1.
The Guava library makes the second part of this easy - use CharStreams.toString(inputStreamReader).
Here is an example code adapted from here.
public String convertStreamToString(InputStream is) throws IOException {
/*
* To convert the InputStream to String we use the BufferedReader.readLine()
* method. We iterate until the BufferedReader return null which means
* there's no more data to read. Each line will appended to a StringBuilder
* and returned as String.
*/
if (is != null) {
StringBuilder sb = new StringBuilder();
String line;
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
while ((line = reader.readLine()) != null) {
sb.append(line).append("\n");
}
} finally {
is.close();
}
return sb.toString();
} else {
return "";
}
}
You can also use Apache Commons IO library
Specifically, you can use IOUtils#toString(InputStream inputStream) method
You could also use a StringWriter as follows; each read from your InputStream is matched with a write (or append) to the StringWriter, and upon completion you can call getBuffer to get a StringBuffer which could be used directly or you could get call its toString method.
Wrap the Stream in a Reader to get locale conversion, and then keep reading while collecting in a StringBuffer. When done, do a toString() on the StringBuffer.

Categories

Resources