I made an InputStream Object from a file and a InputStreamReader from that.
InputStream ips = new FileInputStream("c:\\data\\input.txt");
InputStreamReader isr = new InputStreamReader(ips);
I will basically read data in the form of bytes to a buffer but when there comes a time when i should read in chars I will 'switch mode' and read with InputStreamReader
byte[] bbuffer = new byte[20];
char[] cbuffer = new char[20];
while(ips.read(buffer, 0, 20)!=-1){
doSomethingWithbBuffer(bbuffer);
// check every 20th byte and if it is 0 start reading as char
if(bbuffer[20] == 0){
while(isr.read(cbuffer, 0, 20)!=-1){
doSomethingWithcBuffer(cbuffer);
// check every 20th char if its # return to reading as byte
if(cbuffer[20] == '#'){
break;
}
}
}
}
is this a safe way to read files that have mixed char and byte data?
no, this is not safe. the InputStreamReader may read "too much" data from the underlying stream (it uses internal buffers) and corrupt your attempt to read from the underlying byte stream. You can use something like DataInputStream if you want to mix reading characters and bytes.
Alternately, just read the data as bytes and use the correct character encoding to convert those bytes to characters/Strings.
Related
My goal is to read the n number of bytes from a Socket.
Is it better to directly read from the InputStream, or wrap it into a BufferedReader?
Throughout the net you find both approaches, but none states which to use when.
Socket socket;
is = socket.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
char[] buffer = new char[CONTENT_LENGTH];
//what is better?
is.read(buffer);
br.read(buffer);
Since your goal is to "read the n number of bytes" there is little point creating a character Reader from your input as this might mean the nth byte is part way into a character - and assuming that the stream is character based.
Since JDK11 there is handy call for reading n bytes:
byte[] input = is.readNBytes(n);
If n is small and you repeat the above often, consider reading the stream using one of bis = new BufferedInputStream(is), in.transferTo(out) or len = read(byteArray) which may be more effective for longer streams.
When learning Java IO, I found that fileInputStream has an availabl() method, which can be equal to the file size when reading local files. So if you can directly know the size of the file, then in the case of the need to read the entire file, it is necessary to use BufferedInputStream to decorate it?
like this:
FileInputStream fileInputStream=new FileInputStream("F:\\test.txt");
byte[] data=new byte[fileInputStream.available()];
if (fileInputStream.read(data)!=-1) {
System.out.println(new String(data));
}
or
BufferedReader bufferedReader=new BufferedReader(new
FileReader("F:\\test.txt"));
StringBuilder stringBuilder=new StringBuilder();
for (String line;(line=bufferedReader.readLine())!=null;){
stringBuilder.append(line);
}
System.out.println(stringBuilder.toString());
or
BufferedInputStream bufferedInputStream=new BufferedInputStream(new FileInputStream("F:\\test.txt"));
byte[] data=new byte[bufferedInputStream.available()];
if (bufferedInputStream.read(data)!=-1) {
System.out.println(new String(data));
}
What are the pros and cons of these methods? Which one is better?
thx.
You are wrong about the meaning of available(). It returns the possible number of bytes you can read without blocking. From documentation:
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
So, if you want convert stream to byte array you should use corresponding libraries, such as IOUtils:
byte[] out = IOUtils.toByteArray(stream);
Ok I know a buffer is actually an array of byte, however I have never seen the following declaration (taken from here)
URLConnection con = new URL("http://maps...").openConnection();
InputStream is = con.getInputStream();
byte bytes[] = new byte[con.getContentLength()];
is.read(bytes);
Is it the right way to avoid using a BufferInputStream object? Here we have an unbuffered stream reading from a byte []? should not be the other way around?
thanks in advance.
No, it is not the right way. Method read() reads up to N bytes where N is the length of your array. It can read less bytes (even 0) if no more byte are available. Number of bytes that have been read is returned by method read(). When end of stream is reached the method returns -1.
Therefore the right way is to read bytes in loop:
byte[] buf = new buf[MAX];
int n = 0;
while ((n = stream.read(buf)) >= 0) {
// deal with n first bytes from buf
}
or use Apache commons-io
InputStream is;
byte[] bytes = IOUtils.toByteArray(is);
I'm trying to read a (Japanese) file that is encoded as a UTF-16 file.
When I read it using an InputStreamReader with a charset of 'UTF-16" the file is read correctly:
try {
InputStreamReader read = new InputStreamReader(new FileInputStream("JapanTest.txt"), "UTF-16");
BufferedReader in = new BufferedReader(read);
String str;
while((str=in.readLine())!=null){
System.out.println(str);
}
in.close();
}catch (Exception e){
System.out.println(e);
}
However, when I use File Channels and read from a byte array the Strings aren't always converted correctly:
File f = new File("JapanTest.txt");
fis = new FileInputStream(f);
channel = fis.getChannel();
MappedByteBuffer buffer = channel.map( FileChannel.MapMode.READ_ONLY, 0L, channel.size());
buffer.position(0);
int get = Math.min(buffer.remaining(), 1024);
byte[] barray = new byte[1024];
buffer.get(barray, 0, get);
CharSet charSet = Charset.forName("UTF-16");
//endOfLinePos is a calculated value and defines the number of bytes to read
rowString = new String(barray, 0, endOfLinePos, charSet);
System.out.println(rowString);
The problem I've found is that I can only read characters correctly if the MappedByteBuffer is at position 0. If I increment the position of the MappedByteBuffer and then read a number of bytes into a byte array, which is then converted to a string using the charset UTF-16, then the bytes are not converted correctly. I haven't faced this issue if a file is encoded in UTF-8, so is this only an issue with UTF-16?
More Details:
I need to be able to read any line from the file channel, so to do this I build a list of line ending byte positions and then use those positions to be able to get the bytes for any given line and then convert them to a string.
The code unit of UTF-16 is 2 bytes, not a byte like UTF-8. The pattern and single byte code unit length makes UTF-8 self-synchronizing; it can read correctly at any point and if it's a continuation byte, it can either backtrack or lose only a single character.
With UTF-16 you must always work with pairs of bytes, you cannot start reading at an odd byte or stop reading at an odd byte. You also must know the endianess, and use either UTF-16LE or UTF-16BE when not reading at the start of the file, because there will be no BOM.
You can also encode the file as UTF-8.
Possibly, the InputStreamReader does some transformations the normal new String(...) does not. As a work-around (and to verify this assumption) you could try to wrap the data read from the channel like new InputStreamReader( new ByteArrayInputStream( barray ) ).
Edit: Forget that :) - Channels.newReader() would be the way to go.
I have a file that contains some amount of plain text at the start followed by binary content at the end. The size of the binary content is determined by some one of the plain text lines I read.
I was using a BufferedReader to read the individual lines, however it exposes no methods to refer to read a byte array. The readUTF for a DataInputStream doesnt read all the way to the end of the line, and the readLine method is deprecated.
Using the underlying FileInputStream to read returns empty byte arrays. Any suggestions on how to go about this?
private DOTDataInfo parseFile(InputStream stream) throws IOException{
DOTDataInfo info = new DOTDataInfo();
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
int binSize = 0;
String line;
while((line = reader.readLine()) != null){
if(line.length() == 0)
break;
DOTProperty prop = parseProperty(line);
info.getProperties().add(prop);
if(prop.getName().equals("ContentSize"))
binSize = Integer.parseInt(prop.getValue());
}
byte[] content = new byte[binSize];
stream.read(content); //Its all empty now. If I use a DataInputStream instead, its got the values from the file
return info;
}
You could use RandomAccessFile. Use readLine() to read the plain text at the start (note the limitations of this, as described in the API), and then readByte() or readFully() to read the subsequent binary data.
Using the underlying FileInputStream
to read returns empty byte arrays.
That's because you have wrapped the stream in a BufferedReader, which has probably consumed all the bytes from the stream when filling up its buffer.
If you genuinely have a file (rather than something harder to seek in, e.g. a network stream) then I suggest something like this:
Open the file as a FileInputStream
Wrap it in InputStreamReader and a BufferedReader
Read the text, so you can find out how much content there is
Close the BufferedReader (which will close the InputStreamReader which will close the FileInputStream)
Reopen the file
Skip to (total file length - binary content length)
Read the rest of the data as normal
You could just call mark() at the start of the FileInputStream and then reset() and skip() to get to the right place if you want to avoid reopening the file. (I was looking for an InputStream.seek() but I can't see one - I can't remember wanting it before in Java, but does it really not have one? Ick.)
You need to use an InputStream. Readers are for character data. Look into wrapping your input stream with a DataInputStream, like:
stream=new DataInputStream(new BufferedInputStream(new FileInputStream(...)));
The data input stream will give you many useful methods to read various types of data, and of course, the base InputStream methods for reading bytes.
(This is actually exactly what a HTTP server must do to read a request with content.)
The readUTF doesn't read a line, it reads a string that was written in (modified) UTF8 format - refer to the JavaDoc.
Alas, DataInputStream is deprecated and does not handle UTF. But this should help (it reads a line from a binary stream, without any lookahead).
public static String lineFrom(InputStream in) throws IOException {
byte[] buf = new byte[128];
int pos = 0;
for (;;) {
int ch = in.read();
if (ch == '\n' || ch < 0) break;
buf[pos++] = (byte) ch;
if (pos == buf.length) buf = Arrays.copyOf(buf, pos + 128);
}
return new String(Arrays.copyOf(buf, pos), "UTF-8");
}
The correct way is to use an InputStream of some form, probably a FileInputStream unless this becomes a performance barrier.
What do you mean "Using the underlying FileInputStream to read returns empty byte arrays."? This seems very unlikely and is probably where your mistake is. Can you show us the example code you've tried?
You can read the text with BufferedReader. When you know where the binary starts you can close the file and open it with RandomAccessFile and read binary from any point in the file.
Or you can read the file as binary and convert to text the sections you identify as text. {Using new String(bytes, encoding)}
I recommend using DataInputStream. You have the following options:
Read both text and binary content with DataInputStream
Open a BufferedReader, read text and close the stream. Then open a DataInputStream, skip bytes equal to the size of the text and read binary data.