java: speed up reading foreign characters

java: speed up reading foreign characters - java

My current code needs to read foreign characters from the web, currently my solution works but it is very slow, since it read char by char using InputStreamReader. Is there anyway to speed it up and also get the job done?
// Pull content stream from response
HttpEntity entity = response.getEntity();
InputStream inputStream = entity.getContent();
StringBuilder contents = new StringBuilder();
int ch;
InputStreamReader isr = new InputStreamReader(inputStream, "gb2312");
// FileInputStream file = new InputStream(is);
while( (ch = isr.read()) != -1)
contents.append((char)ch);
String encode = isr.getEncoding();
return contents.toString();

Wrap your InputStreamReader with a BufferedReader
for the efficient reading of
characters, arrays, and lines.
InputStreamReader isr = new InputStreamReader(inputStream, "gb2312");
Reader reader = new BufferedReader(isr);

Related

URLConnection doesn't read whole page

In my app I need to download some web page. I do it in a way like this
URL url = new URL(myUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(5000000);//5 seconds to download
conn.setConnectTimeout(5000000);//5 seconds to connect
conn.setRequestMethod("GET");
conn.setDoInput(true);
conn.connect();
int response = conn.getResponseCode();
is = conn.getInputStream();
String s = readIt(is, len);
System.out.println("got: " + s);
My readIt function is:
public String readIt(InputStream stream) throws IOException {
int len = 10000;
Reader reader;
reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[len];
reader.read(buffer);
return new String(buffer);
}
The problem is that It doesn't dowload the whole page. For example, if myUrl is "https://wikipedia.org", then the output is
How can I download the whole page?
Update
Second answer from here Read/convert an InputStream to a String solved my problem. The problem is in readIt function. You should read response from InputStream like this:
static String convertStreamToString(java.io.InputStream is) {
java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
return s.hasNext() ? s.next() : "";
}

There are a number of mistakes your code:
You are reading into a character buffer with a fixed size.
You are ignoring the result of the read(char[]) method. It returns the number of characters actually read ... and you need to use that.
You are assuming that read(char[]) will read all of the data. In fact, it is only guaranteed to return at least one character ... or zero to indicate that you have reached the end of stream. When you reach from a network connection, you are liable to only get the data that has already been sent by the other end and buffered locally.
When you create the String from the char[] you are assuming that every position in the character array contains a character from your stream.
There are multiple ways to do it correctly, and this is one way:
public String readIt(InputStream stream) throws IOException {
Reader reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[4096];
StringBuilder builder = new StringBuilder();
int len;
while ((len = reader.read(buffer) > 0) {
builder.append(buffer, 0, len);
}
return builder.toString();
}
Another way to do it is to look for an existing 3rd-party library method with a readFully(Reader) method.

You need to read in a loop till there are no more bytes left in the InputStream.
while (-1 != (len = in.read(buffer))) { //do stuff here}

You are reading only 10000 bytes from the input stream.
Use a BufferedReader to make your life easier.
public String readIt(InputStream stream) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
StringBuilder out = new StringBuilder();
String newLine = System.getProperty("line.separator");
String line;
while ((line = reader.readLine()) != null) {
out.append(line);
out.append(newLine);
}
return out.toString();
}

Why we use the StringBuilder and BufferedReader in java together?

As in BufferedReader returns the String value after readLine(), and StringBuilder is also return the same after append(line). Then why we use both together ?
HttpEntity he = res.getEntity();
InputStream is= he.getContent();
InputStreamReader ir= new InputStreamReader(is);
BufferedReader br= new BufferedReader(ir);
// StringBuilder sb = new StringBuilder();
String line=br.readLine();
// sb.append(line);
JSONObject jobj= new JSONObject(line);

It is used to concatenate all the line from buffered reader..
I am correcting your code
String line;
while((line=br.readLine())!= null){
sb.append(line);
}
Even now if you don't see any difference, It means your input stream has got a single line

They are unrelated classes.
BufferedReader buffers the IO operations making it more efficient and faster.
StringBuilder allows you to create a String in memory more efficient way (Strings do not have to be created on heap.

Can someone explain me how this code works

I got it from a page
Android AsyncTask method that I dont know how to solve
but i am not sure how it work completly, if someone can explain me what is the while for and This part "iso-8859-1"
i understood that the 8 is for the number of characters but i could be wrong
static InputStream is = null;
static String json = "";
is = httpEntity.getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(
is, "iso-8859-1"), 8);
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
is.close();
json = sb.toString();

Your code basically reads from an inputstream obtained from the httpentity, puts that into a StringBuilder and converts that into a json finally.
For understanding the api codes, javadoc is your friend.
Here is what I found in BufferredReader javadoc
public BufferedReader(Reader in,
int sz)
Creates a buffering character-input stream that uses an input buffer of the specified size.
Parameters:** in - A Reader sz - Input-buffer size
Throws: IllegalArgumentException - If sz is <=0
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
As a reader, InputStreamReader is used in your code. Here is the relevant javadoc for the InputStreamReader
public InputStreamReader(InputStream in,Charset cs) Creates an
InputStreamReader that uses the given charset.
Parameters:
in - An
InputStream cs - A charset
http://docs.oracle.com/javase/7/docs/api/java/io/InputStreamReader.html#InputStreamReader(java.io.InputStream, java.nio.charset.Charset)
So "iso-8859-1" is the charset specified.

bufferedreader Inputstream reader changes...?

URL u = new URL(url);
String expected = "";
HttpURLConnection uc = (HttpURLConnection) u.openConnection();
InputStream in = new BufferedInputStream(uc.getInputStream());
Reader r= new InputStreamReader(in);
so here is my code and i want a very little help that is the above is to fetch the content from url but now i want to use the same code for reading content from file what i need to change in above code....i mean there should be something which i need to change in the place of uc.getInputStream()...so what is that
InputStream in = new BufferedInputStream(uc.getInputStream());

Look at class FileInputStream.
You can simply user that code and do it in similar way.
InputStream in = new FileInputStream(new File("C:/temp/test.txt"));
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder out = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
out.append(line);
}
System.out.println(out.toString()); //Prints the string content read from input stream
reader.close();

Java: problem with reading a file

I'm loading an XML file with this method:
public static String readTextFile(String fullPathFilename) throws IOException {
StringBuffer sb = new StringBuffer(1024);
BufferedReader reader = new BufferedReader(new FileReader(fullPathFilename));
char[] chars = new char[1024];
while(reader.read(chars) > -1){
sb.append(String.valueOf(chars));
}
reader.close();
return sb.toString();
}
But it doesn't load the whole data. Instead of 25634 characters, it loads 10 less (25624). Why is that?
Thanks,
Ivan

With BufferedReader you get the readLine()-Method, which works well for me.
StringBuffer sb = new StringBuffer( 1024 );
BufferedReader reader = new BufferedReader( new FileReader( fullPathFilename ) );
while( true ) {
String line = reader.readLine();
if(line == null) {
break;
}
sb.append( line );
}
reader.close();

I think there's a bug in your code, the last read might not necessarily fill the char[], but you still load the string with all of it. To account for this you need to do something like:
StringBuilder res = new StringBuilder();
InputStreamReader r = new InputStreamReader(new BufferedInputStream(is));
char[] c = new char[1024];
while(true) {
int charCount = r.read(c);
if (charCount == -1) {
break;
}
res.append(c, 0, charCount);
}
r.close();
Also, how do you know you're expecting 25634 chars?
(and use StringBuilder instead of StringBuffer, the former is not threadsafe so sightly faster)

Perhaps you have 25634 Bytes in your file that represent only 25624 Characters? This might happen with multibyte character sets like UTF-8. All InputStreamReader (including FileReader) automatically do this conversion using a Charset (either an explicitly given one, or the default encoding that depends on the platform).

Use FileInputStream to avoid certain characters getting recognized as utf-8:
StringBuffer sb = new StringBuffer(1024);
FileInputStream fis = new FileInputStream(filename);
char[] chars = new char[1024];
while(reader.read(chars) > -1){
sb.append(String.valueOf(chars));
}
fis.close();
return sb.toString();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java: speed up reading foreign characters - java

Wrap your InputStreamReader with a BufferedReader for the efficient reading of characters, arrays, and lines. InputStreamReader isr = new InputStreamReader(inputStream, "gb2312"); Reader reader = new BufferedReader(isr);

Related

URLConnection doesn't read whole page

Why we use the StringBuilder and BufferedReader in java together?

Can someone explain me how this code works

bufferedreader Inputstream reader changes...?

Java: problem with reading a file

Categories

Resources