Convert InputStream to String with encoding given in stream data

Convert InputStream to String with encoding given in stream data - java

My input is a InputStream which contains an XML document. Encoding used in XML is unknown and it is defined in the first line of XML document.
From this InputStream, I want to have all document in a String.
To do this, I use a BufferedInputStream to mark the beginning of the file and start reading first line. I read this first line to get encoding and then I use an InputStreamReader to generate a String with the correct encoding.
It seems that it is not the best way to achieve this goal because it produces an OutOfMemory error.
Any idea, how to do it?
public static String streamToString(final InputStream is) {
String result = null;
if (is != null) {
BufferedInputStream bis = new BufferedInputStream(is);
bis.mark(Integer.MAX_VALUE);
final StringBuilder stringBuilder = new StringBuilder();
try {
// stream reader that handle encoding
final InputStreamReader readerForEncoding = new InputStreamReader(bis, "UTF-8");
final BufferedReader bufferedReaderForEncoding = new BufferedReader(readerForEncoding);
String encoding = extractEncodingFromStream(bufferedReaderForEncoding);
if (encoding == null) {
encoding = DEFAULT_ENCODING;
}
// stream reader that handle encoding
bis.reset();
final InputStreamReader readerForContent = new InputStreamReader(bis, encoding);
final BufferedReader bufferedReaderForContent = new BufferedReader(readerForContent);
String line = bufferedReaderForContent.readLine();
while (line != null) {
stringBuilder.append(line);
line = bufferedReaderForContent.readLine();
}
bufferedReaderForContent.close();
bufferedReaderForEncoding.close();
} catch (IOException e) {
// reset string builder
stringBuilder.delete(0, stringBuilder.length());
}
result = stringBuilder.toString();
}else {
result = null;
}
return result;
}

The call to mark(Integer.MAX_VALUE) is causing the OutOfMemoryError, since it's trying to allocate 2GB of memory.
You can solve this by using an iterative approach. Set the mark readLimit to a reasonable value, say 8K. In 99% of cases this will work, but in pathological cases, e.g 16K spaces between the attributes in the declaration, you will need to try again. Thus, have a loop that tries to find the encoding, but if it doesn't find it within the given mark region, it tries again, doubling the requested mark readLimit size.
To be sure you don't advance the input stream past the mark limit, you should read the InputStream yourself, upto the mark limit, into a byte array. You then wrap the byte array in a ByteArrayInputStream and pass that to the constructor of the InputStreamReader assigned to 'readerForEncoding'.

You can use this method to convert inputstream to string. this might help you...
private String convertStreamToString(InputStream input) throws Exception{
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line);
}
input.close();
return sb.toString();
}

Related

URLConnection doesn't read whole page

In my app I need to download some web page. I do it in a way like this
URL url = new URL(myUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(5000000);//5 seconds to download
conn.setConnectTimeout(5000000);//5 seconds to connect
conn.setRequestMethod("GET");
conn.setDoInput(true);
conn.connect();
int response = conn.getResponseCode();
is = conn.getInputStream();
String s = readIt(is, len);
System.out.println("got: " + s);
My readIt function is:
public String readIt(InputStream stream) throws IOException {
int len = 10000;
Reader reader;
reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[len];
reader.read(buffer);
return new String(buffer);
}
The problem is that It doesn't dowload the whole page. For example, if myUrl is "https://wikipedia.org", then the output is
How can I download the whole page?
Update
Second answer from here Read/convert an InputStream to a String solved my problem. The problem is in readIt function. You should read response from InputStream like this:
static String convertStreamToString(java.io.InputStream is) {
java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
return s.hasNext() ? s.next() : "";
}

There are a number of mistakes your code:
You are reading into a character buffer with a fixed size.
You are ignoring the result of the read(char[]) method. It returns the number of characters actually read ... and you need to use that.
You are assuming that read(char[]) will read all of the data. In fact, it is only guaranteed to return at least one character ... or zero to indicate that you have reached the end of stream. When you reach from a network connection, you are liable to only get the data that has already been sent by the other end and buffered locally.
When you create the String from the char[] you are assuming that every position in the character array contains a character from your stream.
There are multiple ways to do it correctly, and this is one way:
public String readIt(InputStream stream) throws IOException {
Reader reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[4096];
StringBuilder builder = new StringBuilder();
int len;
while ((len = reader.read(buffer) > 0) {
builder.append(buffer, 0, len);
}
return builder.toString();
}
Another way to do it is to look for an existing 3rd-party library method with a readFully(Reader) method.

You need to read in a loop till there are no more bytes left in the InputStream.
while (-1 != (len = in.read(buffer))) { //do stuff here}

You are reading only 10000 bytes from the input stream.
Use a BufferedReader to make your life easier.
public String readIt(InputStream stream) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
StringBuilder out = new StringBuilder();
String newLine = System.getProperty("line.separator");
String line;
while ((line = reader.readLine()) != null) {
out.append(line);
out.append(newLine);
}
return out.toString();
}

How to choose the buffer size when reading from a URL

Aim : To read a Url which containing information in Json.
Question: I got a code of reading Url Which is given Below. I have a complete Understanding what code is doing but I do not have any idea why the size of char array is 1024 not 2048 or something else . How to decide what character size array is good at the time of reading Url ?
private static String readUrl(String urlString) throws Exception {
BufferedReader reader = null;
try {
URL url = new URL(urlString);
reader = new BufferedReader(new InputStreamReader(url.openStream()));
StringBuffer buffer = new StringBuffer();
int read;
char[] chars = new char[1024]; ???
while ((read = reader.read(chars)) != -1)
buffer.append(chars, 0, read);
return buffer.toString();
} finally {
if (reader != null)
reader.close();
}
}

As the BufferedReader already has an internal buffer of 4096 characters, implementation-dependent, and as the socket already has a considerably larger receive buffer, it really doesn't make much difference what value you choose. The returns on buffering diminish geometrically with size.

Eclipse warns about a potential resource leak although I have a finally block which closes the outermost stream, what am I missing?

is there a reason Eclipse gives me the following resource leak warning: Resource leak: 'br' is never closed" ? The code I am talking about is at the bottom of this post.
I thought my finally block had it all covered, my reasoning:
res will only be null if the FileInputStream constructor threw and therefore nothing has to be closed
res will be the inputstream if the InputStreamReader constructor throws (malformed encoding string for example) and then only the InputStream must be closed so ok
etc...
So what am I missing? Or could this be an eclipse bug?
Kind regards!
S.
public static String fileToString(String fileName, String encoding) throws IOException {
InputStream is;
InputStreamReader isr;
BufferedReader br;
Closeable res = null;
try {
is = new FileInputStream(fileName);
res = is;
isr = new InputStreamReader(is, encoding);
res = isr;
br = new BufferedReader(isr);
res = br;
StringBuilder builder = new StringBuilder();
String line = null;
while ((line = br.readLine()) != null) {
builder.append(line);
builder.append(LS);
}
return builder.toString();
} finally {
if (res != null) {
res.close();
}
}
}

Eclipse probably just isn't understanding the shuffling you're doing with the res variable.
I recommend using the try-with-resources statement (available in Java 7 and up, so three and a half years now), it dramatically simplifies these sorts of chains:
public static String fileToString(String fileName, String encoding) throws IOException {
try (
InputStream is = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(is, encoding);
BufferedReader br = new BufferedReader(isr)
) {
StringBuilder builder = new StringBuilder();
String line = null;
while ((line = br.readLine()) != null) {
builder.append(line);
builder.append(LS);
}
return builder.toString();
}
}
If you can't use try-with-resources, you probably want something like the Apache Commons IOUtils class's closeQuietly methods (either literally that one, or your own) rather than shuffling res around, which is awkward to read and I daresay prone to maintenance issues.
Using IOUtils might look like this:
public static String fileToString(String fileName, String encoding) throws IOException {
InputStream is = null;
InputStreamReader isr = null;
BufferedReader br = null;
try {
is = new FileInputStream(fileName);
isr = new InputStreamReader(is, encoding);
br = new BufferedReader(isr)
StringBuilder builder = new StringBuilder();
String line = null;
while ((line = br.readLine()) != null) {
builder.append(line);
builder.append(LS);
}
br.close();
return builder.toString();
}
finally {
IOUtils.closeQuietly(br, isr, is);
}
}
Note how I use a normal close in the try, but then ensure cleanup in the finally.
But try-with-resources is the better answer, as it's more concise and hooks into the new(ish) "suppressed exceptions" stuff.
Side note: There's no reason for the = null initialization of line, you assign it on the next line.
Side note 2: If the file is likely to be of any size, consider finding out how big it is in advance and setting the capacity of the StringBuilder in the constructor. StringBuilder's default capacity is 16, so a file of even a few hundred bytes involves several reallocations of StringBuilder's internal buffer.

Read String and bytes from the same file java

I'm looking for a way to switch between reading bytes (as byte[]) and reading lines of Strings from a file. I know that a byte[] can be obtained form a file through a FileInputStream, and a String can be obtained through a BufferedReader, but using both of them at the same time is proving problematic. I know how long the section of bytes are. String encoding can be kept constant from when I write the file. The filetype is a custom one that is still in development, so I can change how I write data to it.
How can I read Strings and byte[]s from the same file in java?

Read as bytes. When you have read a sequence of bytes that you know should be a string, place those bytes in an array, put the array inside a ByteArrayInputStream and use that as the underlying InputStream for a Reader to get the bytes as characters, then read those characters to produce a String.
For the later parts of this process see the related SO question on how to create a String from an InputStream.

Read the file as Strings using a BufferedReader then use String.getBytes().

Why not try this:
BufferedReader bufferedReader = null;
try {
bufferedReader = new BufferedReader(new FileReader("testing.txt"));
String line = bufferedReader.readLine();
while(line != null){
byte[] b = line.getBytes();
}
} finally {
if(bufferedReader!=null){
bufferedReader.close();
}
}
or
FileInputStream in = null;
BufferedReader bufferedReader = null;
try {
bufferedReader = new BufferedReader(new FileReader("xanadu.txt"));
String line = bufferedReader.readLine();
while(line != null){
//read your line
}
in = new FileInputStream("xanadu.txt");
int c;
while ((c = in.read()) != -1) {
//read your bytes (c)
}
} finally {
if (in != null) {
in.close();
}
if(bufferedReader!=null){
bufferedReader.close();
}
}

Read everything as bytes from the buffered input stream, and convert string sections into String's using constructor that accepts the byte array:
String string = new String(bytes, offset, length, "US-ASCII");
Depending on how the data are actually encoded, you may need to use "UTF-8" or something else as the name of the charset.

Splitting strings by newline trouble

I am reading in a file that is being sent though a socket and then trying to split it via newlines (\n), when I read in the file I am using a byte[] and I convert the byte array to a string so that I can split it.
public String getUserFileData()
{
try
{
byte[] mybytearray = new byte[1024];
InputStream is = clientSocket.getInputStream();
int bytesRead = is.read(mybytearray, 0, mybytearray.length);
is.close();
return new String(mybytearray);
}
catch(IOException e)
{
}
return "";
}
Here is the code used to attempting to split the String
public void readUserFile(String userData, Log logger)
{
String[] data;
String companyName;
data = userData.split("\n");
username = data[0];
password = data[1].toCharArray();
companyName = data[2];
quota = Float.parseFloat(data[3]);
company = new Company();
company.readCompanyFile("C:\\Users\\Chris\\Documents\\NetBeansProjects\\ArFile\\ArFile Clients\\" + companyName + "\\"
+ companyName + ".cmp");
cloudFiles = new CloudFiles();
cloudFiles.readCloudFiles(this, logger);
}
It causes this error
Exception in thread "AWT-EventQueue-1" java.lang.ArrayIndexOutOfBoundsException

You can use the readLine method in BufferedReader class.
Wrap the InputStream under InputStreamReader, and wrap it under BufferedReader:
InputStream is = clientSocket.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
Please also check the encoding of the stream - you might need to specify the encoding in the constructor of InputStreamReader.

As stated in comments, using a BufferedReader would be best - you should be using an InputStreamReader anyway in order to convert from binary to text.
// Or use a different encoding - whatever's appropriate
BufferedReader reader = new BufferedReader(
new InputStreamReader(clientSocket.getInputStream(), "UTF-8");
try {
String line;
// I'm assuming you want to read every incoming line
while ((line = reader.readLine()) != null) {
processLine(line);
}
} finally {
reader.close();
}
Note that it's important to state which encoding you want to use - otherwise it'll use the platform's default encoding, which will vary from machine to machine, whereas presumably the data is in one specific encoding. If you don't know which encoding that is yet, you need to find out. Until then, you simply can't reliably understand the data.
(I hope your real code doesn't have an empty catch block, by the way.)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert InputStream to String with encoding given in stream data - java

Related

URLConnection doesn't read whole page

How to choose the buffer size when reading from a URL

Eclipse warns about a potential resource leak although I have a finally block which closes the outermost stream, what am I missing?

Read String and bytes from the same file java

Splitting strings by newline trouble

Categories

Resources