Java: problem with reading a file - java

I'm loading an XML file with this method:
public static String readTextFile(String fullPathFilename) throws IOException {
StringBuffer sb = new StringBuffer(1024);
BufferedReader reader = new BufferedReader(new FileReader(fullPathFilename));
char[] chars = new char[1024];
while(reader.read(chars) > -1){
sb.append(String.valueOf(chars));
}
reader.close();
return sb.toString();
}
But it doesn't load the whole data. Instead of 25634 characters, it loads 10 less (25624). Why is that?
Thanks,
Ivan

With BufferedReader you get the readLine()-Method, which works well for me.
StringBuffer sb = new StringBuffer( 1024 );
BufferedReader reader = new BufferedReader( new FileReader( fullPathFilename ) );
while( true ) {
String line = reader.readLine();
if(line == null) {
break;
}
sb.append( line );
}
reader.close();

I think there's a bug in your code, the last read might not necessarily fill the char[], but you still load the string with all of it. To account for this you need to do something like:
StringBuilder res = new StringBuilder();
InputStreamReader r = new InputStreamReader(new BufferedInputStream(is));
char[] c = new char[1024];
while(true) {
int charCount = r.read(c);
if (charCount == -1) {
break;
}
res.append(c, 0, charCount);
}
r.close();
Also, how do you know you're expecting 25634 chars?
(and use StringBuilder instead of StringBuffer, the former is not threadsafe so sightly faster)

Perhaps you have 25634 Bytes in your file that represent only 25624 Characters? This might happen with multibyte character sets like UTF-8. All InputStreamReader (including FileReader) automatically do this conversion using a Charset (either an explicitly given one, or the default encoding that depends on the platform).

Use FileInputStream to avoid certain characters getting recognized as utf-8:
StringBuffer sb = new StringBuffer(1024);
FileInputStream fis = new FileInputStream(filename);
char[] chars = new char[1024];
while(reader.read(chars) > -1){
sb.append(String.valueOf(chars));
}
fis.close();
return sb.toString();

Related

URLConnection doesn't read whole page

In my app I need to download some web page. I do it in a way like this
URL url = new URL(myUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(5000000);//5 seconds to download
conn.setConnectTimeout(5000000);//5 seconds to connect
conn.setRequestMethod("GET");
conn.setDoInput(true);
conn.connect();
int response = conn.getResponseCode();
is = conn.getInputStream();
String s = readIt(is, len);
System.out.println("got: " + s);
My readIt function is:
public String readIt(InputStream stream) throws IOException {
int len = 10000;
Reader reader;
reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[len];
reader.read(buffer);
return new String(buffer);
}
The problem is that It doesn't dowload the whole page. For example, if myUrl is "https://wikipedia.org", then the output is
How can I download the whole page?
Update
Second answer from here Read/convert an InputStream to a String solved my problem. The problem is in readIt function. You should read response from InputStream like this:
static String convertStreamToString(java.io.InputStream is) {
java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
return s.hasNext() ? s.next() : "";
}
There are a number of mistakes your code:
You are reading into a character buffer with a fixed size.
You are ignoring the result of the read(char[]) method. It returns the number of characters actually read ... and you need to use that.
You are assuming that read(char[]) will read all of the data. In fact, it is only guaranteed to return at least one character ... or zero to indicate that you have reached the end of stream. When you reach from a network connection, you are liable to only get the data that has already been sent by the other end and buffered locally.
When you create the String from the char[] you are assuming that every position in the character array contains a character from your stream.
There are multiple ways to do it correctly, and this is one way:
public String readIt(InputStream stream) throws IOException {
Reader reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[4096];
StringBuilder builder = new StringBuilder();
int len;
while ((len = reader.read(buffer) > 0) {
builder.append(buffer, 0, len);
}
return builder.toString();
}
Another way to do it is to look for an existing 3rd-party library method with a readFully(Reader) method.
You need to read in a loop till there are no more bytes left in the InputStream.
while (-1 != (len = in.read(buffer))) { //do stuff here}
You are reading only 10000 bytes from the input stream.
Use a BufferedReader to make your life easier.
public String readIt(InputStream stream) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
StringBuilder out = new StringBuilder();
String newLine = System.getProperty("line.separator");
String line;
while ((line = reader.readLine()) != null) {
out.append(line);
out.append(newLine);
}
return out.toString();
}

BufferedReader.readline() returning null value

I am creating this method which takes an InputStream as parameter, but the readLine() function is returning null. While debugging, inputstream is not empty.
else if (requestedMessage instanceof BytesMessage) {
BytesMessage bytesMessage = (BytesMessage) requestedMessage;
byte[] sourceBytes = new byte[(int) bytesMessage.getBodyLength()];
bytesMessage.readBytes(sourceBytes);
String strFileContent = new String(sourceBytes);
ByteArrayInputStream byteInputStream = new ByteArrayInputStream(sourceBytes);
InputStream inputStrm = (InputStream) byteInputStream;
processMessage(inputStrm, requestedMessage);
}
public void processMessage(InputStream inputStrm, javax.jms.Message requestedMessage) {
String externalmessage = tradeEntryTrsMessageHandler.convertInputStringToString(inputStrm);
}
public String convertInputStringToString(InputStream inputStream) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
StringBuilder sb = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
return sb.toString();
}
Kindly try this,
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"));
i believe that raw data as it is taken is not formatted to follow a character set. so by mentioning UTF-8 (U from Universal Character Set + Transformation Format—8-bit might help
Are you sure you are initializing and passing a valid InputStream to the function?
Also, just FYI maybe you were trying to name your function convertInputStreamToString instead of convertInputStringToString?
Here are two other ways of converting your InputStream to String, try these maybe?
1.
String theString = IOUtils.toString(inputStream, encoding);
2.
public String convertInputStringToString(InputStream is) {
java.util.Scanner s = new java.util.Scanner(is, encoding).useDelimiter("\\A");
return s.hasNext() ? s.next() : "";
}
EDIT:
You needn't explicitly convert ByteArrayInputStream to InputStream. You could do directly:
InputStream inputStrm = new ByteArrayInputStream(sourceBytes);

How to avoid encoding null (\u0000) when reading from InputStream

I am reading a request body and put it into an input stream. While I am explicitly saying the decode method, I still get many \u0000 (Null) after the string.
InputStream is = exchange.getRequestBody();
byte[] header = new byte[100];
is.read(header);
String s = new String(header, "UTF-8");
How can I avoid this with Standard Java Library? I cannot use third party libraries.
is.read(header); returns the number of bytes that were actually read. Change your code as
byte[] header = new byte[100];
int n = is.read(header);
String s = new String(header, 0, n, "UTF-8");
Try using BufferedReader with InputStreamReader. I.e.
InputStreamReader isr = new InputStreamReader(is, "UTF-8");
BufferedReader br = new BufferedReader(isr);
String line = br.readLine();
Use a buffered reader to read it line by line:
InputStream is = exchange.getRequestBody();
BufferedReader reader = new BufferedReader(new InputStreamReader(is))
String line;
while((line = reader.readLine()) != null)
{
// do work
}
If you look at the value of line.toCharArray(), you will see that the null (aka '\u0000') chars show up. A round-about solution to getting rid of those is to add only the chars that you want to another array by using an if statement to exclude the '\u0000' chars. So:
public String removeNullChars(String line) {
String result = "";
// Convert to char array.
char[] chars = line.toCharArray();
// Add only chars that are not equal to '\u0000' to result
for (int i = 0; i < chars.length; i++) {
if (chars[i] != '\u0000') {
result += chars[i];
}
}
return result;
}

String to byte convertion in java

i m new in java....i m trying to read a text file using file input stream. i m reading text line by line and set as a string.. now i want to convert string into byte. but i m getting a number format exception.. please help me to solve this problem.
FileInputStream fstream = new FileInputStream("C:/Users/data.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
byte[] bytes = null;
String str;
int i=0;
while ((str = br.readLine()) != null)
{
bytes[i] = Byte.parseByte(str,16);
i++;
}
in.close();
Try
byte[] bytes = str.getBytes();
instead of
bytes[i] = Byte.parseByte(str,16);
Also I recommend to specify encoding for InputStreamReader:
BufferedReader br = new BufferedReader(new InputStreamReader(in, "UTF-8"));
Keep in mind that Java String length and internal representation would not be same to C.
You can simply use the getBytes() method from the String class :
str.getBytes()
Or if you don't use the default character set :
str.getBytes(myCharSet);
you can use,
str.getBytes() which will convert the string into the byte array.
you can try this code.
fstream = new FileInputStream("C:/Users/s.hussain/Desktop/test3.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
byte[] bytes = null;
String str;
int i=0;
while ((str = br.readLine()) != null)
{
bytes = str.getBytes();
i++;
System.out.println(bytes.length);
}
in.close();

Convert InputStream to String with encoding given in stream data

My input is a InputStream which contains an XML document. Encoding used in XML is unknown and it is defined in the first line of XML document.
From this InputStream, I want to have all document in a String.
To do this, I use a BufferedInputStream to mark the beginning of the file and start reading first line. I read this first line to get encoding and then I use an InputStreamReader to generate a String with the correct encoding.
It seems that it is not the best way to achieve this goal because it produces an OutOfMemory error.
Any idea, how to do it?
public static String streamToString(final InputStream is) {
String result = null;
if (is != null) {
BufferedInputStream bis = new BufferedInputStream(is);
bis.mark(Integer.MAX_VALUE);
final StringBuilder stringBuilder = new StringBuilder();
try {
// stream reader that handle encoding
final InputStreamReader readerForEncoding = new InputStreamReader(bis, "UTF-8");
final BufferedReader bufferedReaderForEncoding = new BufferedReader(readerForEncoding);
String encoding = extractEncodingFromStream(bufferedReaderForEncoding);
if (encoding == null) {
encoding = DEFAULT_ENCODING;
}
// stream reader that handle encoding
bis.reset();
final InputStreamReader readerForContent = new InputStreamReader(bis, encoding);
final BufferedReader bufferedReaderForContent = new BufferedReader(readerForContent);
String line = bufferedReaderForContent.readLine();
while (line != null) {
stringBuilder.append(line);
line = bufferedReaderForContent.readLine();
}
bufferedReaderForContent.close();
bufferedReaderForEncoding.close();
} catch (IOException e) {
// reset string builder
stringBuilder.delete(0, stringBuilder.length());
}
result = stringBuilder.toString();
}else {
result = null;
}
return result;
}
The call to mark(Integer.MAX_VALUE) is causing the OutOfMemoryError, since it's trying to allocate 2GB of memory.
You can solve this by using an iterative approach. Set the mark readLimit to a reasonable value, say 8K. In 99% of cases this will work, but in pathological cases, e.g 16K spaces between the attributes in the declaration, you will need to try again. Thus, have a loop that tries to find the encoding, but if it doesn't find it within the given mark region, it tries again, doubling the requested mark readLimit size.
To be sure you don't advance the input stream past the mark limit, you should read the InputStream yourself, upto the mark limit, into a byte array. You then wrap the byte array in a ByteArrayInputStream and pass that to the constructor of the InputStreamReader assigned to 'readerForEncoding'.
You can use this method to convert inputstream to string. this might help you...
private String convertStreamToString(InputStream input) throws Exception{
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line);
}
input.close();
return sb.toString();
}

Categories

Resources