Getting only a part of text with InputStreamReader

Getting only a part of text with InputStreamReader - java

I am reading from InputStreamReader but I only get the first 10,000 characters of the text that is supposed to come. Any idea what the problem may be? If there is no solution with this class, what are my alternatives?
I found this about InputStreamReader: "The buffer size is 8K." (http://developer.android.com/reference/java/io/InputStreamReader.html). Could this be the answer?
Any pointers really appreciated
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(new InputStreamReader(
httpcon.getInputStream(),"utf-8"));
String line = null;
while ((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
result = sb.toString();

8K buffer would mean 8000 bytes and since one character is one byte that would seem to make some sense as to your problem. But what is confusing is that you get 10,000 characters.

Related

Why we use the StringBuilder and BufferedReader in java together?

As in BufferedReader returns the String value after readLine(), and StringBuilder is also return the same after append(line). Then why we use both together ?
HttpEntity he = res.getEntity();
InputStream is= he.getContent();
InputStreamReader ir= new InputStreamReader(is);
BufferedReader br= new BufferedReader(ir);
// StringBuilder sb = new StringBuilder();
String line=br.readLine();
// sb.append(line);
JSONObject jobj= new JSONObject(line);

It is used to concatenate all the line from buffered reader..
I am correcting your code
String line;
while((line=br.readLine())!= null){
sb.append(line);
}
Even now if you don't see any difference, It means your input stream has got a single line

They are unrelated classes.
BufferedReader buffers the IO operations making it more efficient and faster.
StringBuilder allows you to create a String in memory more efficient way (Strings do not have to be created on heap.

java encoding issue while reading stream

I am trying to download contents from ftp folder. There is one xml file which starts with standardazed xml codes.
< ?xml version="1.0" encoding="utf-8"?>
when i read these files (using java.net.Socket)and get input stream and then try to convert to String, somehow i get some new charecters. And the whole xml document starts with '?' eg. "?< ?xml version="1.0" encoding="utf-8"?>....."
BufferedInputStream reader = new BufferedInputStream(sock.getInputStream());
Then i am getting a string from this reader using following code.
StringBuilder sb = new StringBuilder();
String line;
BufferedReader br = new BufferedReader(new InputStreamReader(reader));
while ((line = br.readLine()) != null) {
sb.append(line);
}
System.out.println ("sb.toString()");
Not sure whats happening here. why am i getting some special charecters introduced ?Any suggestions would be appreciated
and then i just used following code to read the file and in console i see some special charecters
BufferedReader reader = new BufferedReader(new FileReader("c:/Users/appd922/DocumentMeta06122014.xml"));
StringBuffer sb = new StringBuffer();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line);
}
String output = sb.toString();
System.out.println("reading from file"+output);
I got output starting
"reading from fileï»¿< ?xml version.....
where am i getting these special charecters ?
Note- ignore the space in the xml file line given above. i could not write here with proper xmlwithout that space.

Specify the encoding when creating InputStreamReader to read the file from the ftp, for example:
BufferedReader br = new BufferedReader(new InputStreamReader(reader, "utf-8"));
Otherwise, InputStreamReader uses default encoding. Also, specify the encoding when reading the downloaded file. FileReader uses default platform encoding. Use InputStreamReader and specify encoding, for example:
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), "utf-8"));

Those characters are called BOM, Byte Order Mark. If you set the encoding of the InputStreamReader to 'UTF-8', you could see that they are interpreted as a single character, that is the BOM character.
Unfortunately, you have to handle this character yourself, because Java won't do it for you: java utf-8 and bom. Usually you just strip your stream of it. Good luck.

Can someone explain me how this code works

I got it from a page
Android AsyncTask method that I dont know how to solve
but i am not sure how it work completly, if someone can explain me what is the while for and This part "iso-8859-1"
i understood that the 8 is for the number of characters but i could be wrong
static InputStream is = null;
static String json = "";
is = httpEntity.getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(
is, "iso-8859-1"), 8);
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
is.close();
json = sb.toString();

Your code basically reads from an inputstream obtained from the httpentity, puts that into a StringBuilder and converts that into a json finally.
For understanding the api codes, javadoc is your friend.
Here is what I found in BufferredReader javadoc
public BufferedReader(Reader in,
int sz)
Creates a buffering character-input stream that uses an input buffer of the specified size.
Parameters:** in - A Reader sz - Input-buffer size
Throws: IllegalArgumentException - If sz is <=0
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
As a reader, InputStreamReader is used in your code. Here is the relevant javadoc for the InputStreamReader
public InputStreamReader(InputStream in,Charset cs) Creates an
InputStreamReader that uses the given charset.
Parameters:
in - An
InputStream cs - A charset
http://docs.oracle.com/javase/7/docs/api/java/io/InputStreamReader.html#InputStreamReader(java.io.InputStream, java.nio.charset.Charset)
So "iso-8859-1" is the charset specified.

How can I read a String including new line from InputStream object in Java?

I have a ServerSocket in Java:
ServerSocket serverSocket = new ServerSocket(1000);
which accepts a clientSocket:
Socket clientSocket;
clientSocket = serverSocket.accept();
Up until now I was reading the input like this:
BufferedReader clientSocketInputStream = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
while ((inputLine = clientSocketInputStream.readLine()) != null){
String message = inputLine;
// Hack the computer connecting to this one after here
However if the text sent is something like
String stringToBeSent = "Hello\nHowareyou";
then I am in trouble. Because I need this text as it is. 2 different Strings do not help me.
How can I read it as it is?
Thanks.

Two options:
Put the multiple strings back together.
StringBuilder sb = new StringBuilder()
while ((inputLine = clientSocketInputStream.readLine()) != null)
{
String message = inputLine;
sb.append(message);
sb.append('\n');
}
String message = sb.toString();
Read an array of bytes instead of String, using a BufferedInputStream instead of a BufferedReader. Then smash the whole byte array into the String constructor with a valid charset. This will require you to know how many bytes the String will be.

You could simply read character by character. http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html#read()
instead of read line.

BufferedReader reads characters until it gets "\n", "\r" or "\r\n". You can read character by character but it does not change anything anyway as how would you determine which new line characater shows the real new line?

Convert InputStream to String with encoding given in stream data

My input is a InputStream which contains an XML document. Encoding used in XML is unknown and it is defined in the first line of XML document.
From this InputStream, I want to have all document in a String.
To do this, I use a BufferedInputStream to mark the beginning of the file and start reading first line. I read this first line to get encoding and then I use an InputStreamReader to generate a String with the correct encoding.
It seems that it is not the best way to achieve this goal because it produces an OutOfMemory error.
Any idea, how to do it?
public static String streamToString(final InputStream is) {
String result = null;
if (is != null) {
BufferedInputStream bis = new BufferedInputStream(is);
bis.mark(Integer.MAX_VALUE);
final StringBuilder stringBuilder = new StringBuilder();
try {
// stream reader that handle encoding
final InputStreamReader readerForEncoding = new InputStreamReader(bis, "UTF-8");
final BufferedReader bufferedReaderForEncoding = new BufferedReader(readerForEncoding);
String encoding = extractEncodingFromStream(bufferedReaderForEncoding);
if (encoding == null) {
encoding = DEFAULT_ENCODING;
}
// stream reader that handle encoding
bis.reset();
final InputStreamReader readerForContent = new InputStreamReader(bis, encoding);
final BufferedReader bufferedReaderForContent = new BufferedReader(readerForContent);
String line = bufferedReaderForContent.readLine();
while (line != null) {
stringBuilder.append(line);
line = bufferedReaderForContent.readLine();
}
bufferedReaderForContent.close();
bufferedReaderForEncoding.close();
} catch (IOException e) {
// reset string builder
stringBuilder.delete(0, stringBuilder.length());
}
result = stringBuilder.toString();
}else {
result = null;
}
return result;
}

The call to mark(Integer.MAX_VALUE) is causing the OutOfMemoryError, since it's trying to allocate 2GB of memory.
You can solve this by using an iterative approach. Set the mark readLimit to a reasonable value, say 8K. In 99% of cases this will work, but in pathological cases, e.g 16K spaces between the attributes in the declaration, you will need to try again. Thus, have a loop that tries to find the encoding, but if it doesn't find it within the given mark region, it tries again, doubling the requested mark readLimit size.
To be sure you don't advance the input stream past the mark limit, you should read the InputStream yourself, upto the mark limit, into a byte array. You then wrap the byte array in a ByteArrayInputStream and pass that to the constructor of the InputStreamReader assigned to 'readerForEncoding'.

You can use this method to convert inputstream to string. this might help you...
private String convertStreamToString(InputStream input) throws Exception{
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line);
}
input.close();
return sb.toString();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting only a part of text with InputStreamReader - java

8K buffer would mean 8000 bytes and since one character is one byte that would seem to make some sense as to your problem. But what is confusing is that you get 10,000 characters.

Related

Why we use the StringBuilder and BufferedReader in java together?

java encoding issue while reading stream

Can someone explain me how this code works

How can I read a String including new line from InputStream object in Java?

Convert InputStream to String with encoding given in stream data

Categories

Resources