Read binary data from a socket

Read binary data from a socket - java

I'm trying to connect to a server, and then send it a HTTP request (GET in this case). The idea is request a file and then receive it from the server.
It should work with both text files and binary files (imgs for example). I have no problem with text files, it works perfect, but I'm having some troubles with binary files.
First, I declare a BufferedReader (for reading header and textfile) and a DataInput Stream:
BufferedReader in_text = new BufferedReader(
new InputStreamReader(socket.getInputStream()));
DataInputStream in_binary = new DataInputStream(
new BufferedInputStream(socket.getInputStream()));
Then, I read the header with in_text and discover if it's a textfile or binary file. In case it's a textfile, I read it correctly in a StringBuilder. In case it's a binary file, I declare a byte[filesize] and store the following content of in_binary.
byte[] bindata = new byte[filesize];
in_binary.readFully(bindata);
And it doesn't work. I get a EOFException.
I thought that maybe in_binary is still in the first position of the stream, so it hasn't read the header yet. So I captured the length of the header and skip that bytes in in_binary.
byte[] bindata = new byte[filesize];
in_binary.reset();
in_binary.skip(headersize);
in_binary.readFully(bindata);
And still the same.
What could be happening?
Thanks!
PD: I know I could use URLConnection and all of that. That's not the problem.

BufferedReader buffers data (hence the name) - it will almost certainly have read more data from the socket than just the header. Therefore, when you try to read the actual data some has already been read from the socket. If you try reading just a few bytes you'll probably see that they aren't the first bytes of the actual response data.
If you know how to use URLConnection, I have to wonder what reason you have for not using it.

As soon as you use any subclass of Reader, you aren't reading binary. You are converting from bytes to characters, using the default encoding of the JVM. If you really want bytes of binary, you need to stick to streams, not readers. Creating both stacks at once is asking for trouble.
Use Apache Commons IO: IOUtils.toByteArray() to read the entire content into memory as a byte[], and then decide what to do with it, unless you have a gigantic amount of data, in which case you should set up the buffered input stream, decide what to do, and only construct the reader after you push back.

Related

How do an InputStream, InputStreamReader and BufferedReader work together in Java?

I am studying Android development (I'm a beginner in programming in general) and learning about HTTP networking and saw this code in the lesson:
private String readFromStream(InputStream inputStream) throws IOException {
StringBuilder output = new StringBuilder();
if (inputStream != null) {
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, Charset.forName("UTF-8"));
BufferedReader reader = new BufferedReader(inputStreamReader);
String line = reader.readLine();
while (line != null) {
output.append(line);
line = reader.readLine();
}
}
return output.toString();
}
I don't understand exactly what InputStream, InputStreamReader and BufferedReader do. All of them have a read() method and also readLine() in the case of the BufferedReader.Why can't I only use the InputStream or only add the InputStreamReader? Why do I need to add the BufferedReader? I know it has to do with efficiency but I don't understand how.
I've been researching and the documentation for the BufferedReader tries to explain this but I still don't get who is doing what:
In general, each read request made of a Reader causes a corresponding
read request to be made of the underlying character or byte stream. It
is therefore advisable to wrap a BufferedReader around any Reader
whose read() operations may be costly, such as FileReaders and
InputStreamReaders. For example,
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file. Without buffering, each
invocation of read() or readLine() could cause bytes to be read from
the file, converted into characters, and then returned, which can be
very inefficient.
So, I understand that the InputStream can only read one byte, the InputStreamReader a single character, and the BufferedReader a whole line and that it also does something about efficiency which is what I don't get. I would like to have a better understanding of who is doing what, so as to understand why I need all three of them and what the difference would be without one of them.
I've researched a lot here and elsewhere on the web and don't seem to find any explanation about this that I can understand, almost all tutorials just repeat the documentation info. Here are some related questions that maybe begin to explain this but don't go deeper and solve my confusion: Q1, Q2, Q3, Q4. I think it may have to do with this last question's explanation about system calls and returning. But I would like to understand what is meant by all this.
Could it be that the BufferedReader's readLine() calls the InputStreamReader's read() method which in turn calls the InputStream's read() method? And the InputStream returns bytes converted to int, returning a single byte at a time, the InputStreamReader reads enough of these to make a single character and converts it to int and returns a single character at a time, and the BufferedReader reads enough of these characters represented as integers to make up a whole line? And returns the whole line as a String, returning only once instead of several times? I don't know, I'm just trying to get how things work.
Lots of thanks in advance!

This Streams in Java concepts and usage link, give a very nice explanations.
Streams, Readers, Writers, BufferedReader, BufferedWriter – these are the terminologies you will deal with in Java. There are the classes provided in Java to operate with input and output. It is really worth to know how these are related and how it is used. This post will explore the Streams in Java and other related classes in detail. So let us start:
Let us define each of these in high level then dig deeper.
Streams
Used to deal with byte level data
Reader/Writer
Used to deal with character level. It supports various character encoding also.
BufferedReader/BufferedWriter
To increase performance. Data to be read will be buffered in to memory for quick access.
While these are for taking input, just the corresponding classes exists for output as well. For example, if there is an InputStream that is meant to read stream of byte, and OutputStream will help in writing stream of bytes.
InputStreams
There are many types of InputStreams java provides. Each connect to distinct data sources such as byte array, File etc.
For example FileInputStream connects to a file data source and could be used to read bytes from a File. While ByteArrayInputStream could be used to treat byte array as input stream.
OutputStream
This helps in writing bytes to a data source. For almost every InputStream there is a corresponding OutputStream, wherever it makes sense.
UPDATE
What is Buffered Stream?
Here I'm quoting from Buffered Streams, Java documentation (With a technical explanation):
Buffered Streams
Most of the examples we've seen so far use unbuffered I/O. This means
each read or write request is handled directly by the underlying OS.
This can make a program much less efficient, since each such request
often triggers disk access, network activity, or some other operation
that is relatively expensive.
To reduce this kind of overhead, the Java platform implements buffered
I/O streams. Buffered input streams read data from a memory area known
as a buffer; the native input API is called only when the buffer is
empty. Similarly, buffered output streams write data to a buffer, and
the native output API is called only when the buffer is full.
Sometimes I'm losing my hair reading a technical documentation. So, here I quote the more humane explanation from https://yfain.github.io/Java4Kids/:
In general, disk access is much slower than the processing performed
in memory; that’s why it’s not a good idea to access the disk a
thousand times to read a file of 1,000 bytes. To minimize the number
of times the disk is accessed, Java provides buffers, which serve as
reservoirs of data.
In reading File with FileInputStream then BufferedInputStream, the
class BufferedInputStream works as a middleman between FileInputStream
and the file itself. It reads a big chunk of bytes from a file into
memory (a buffer) in one shot, and the FileInputStream object then
reads single bytes from there, which are fast memory-to-memory
operations. BufferedOutputStream works similarly with the class
FileOutputStream.
The main idea here is to minimize disk access. Buffered streams are
not changing the type of the original streams — they just make reading
more efficient. A program performs stream chaining (or stream piping)
to connect streams, just as pipes are connected in plumbing.

InputStream, OutputStream, byte[], ByteBuffer are for binary data.
Reader, Writer, String, char are for text, internally Unicode, so that all scripts in the world may be combined (say Greek and Arabic).
InputStreamReader and OutputStreamWriter form a bridge between both. If you have some InputStream and know that its bytes is actually text in some encoding, Charset, then you can wrap the InputStream:
try (InputStreamReader reader =
new InputStreamReader(stream, StandardCharsets.UTF_8)) {
... read text ...
}
There is a constructor without Charset, but that is not portable, as it uses the default platform encoding.
On Android StandardCharset may not exist, use "UTF-8".
The derived classes FileInputStream and BufferedReader add something to the parent InputStream resp. Reader.
A FileInputStream is for input from a File, and BufferedReader uses a memory buffer, so the actual physical reading does not does not read character wise (inefficient). With new BufferedReader(otherReader) you add buffering to your original reader.
All this understood, there is the utility class Files with methods like newBufferedReader(Path, Charset) which add additional brevity.

I have read lots of articles on this very topic. I hope this might help you in some way.
Basically, the BufferedReader maintains an internal buffer.
During its read operation, it reads bytes from the files in bulk and stores that bytes in its internal buffer.
Now byte is passed to the program from that internal buffer for each read operation.
This reduces the number of communication between the program and the file or disks. Hence more efficient.

Does Guava have an overload that aborts the stream if it is too large?

I have a servlet that clients will post xml or json data too.
Currently I am reading the posted content using Guava:
String string = CharStreams.toString( new InputStreamReader( inputStream, "UTF-8" ) );
I want to be able to abort my entire operation of reading the posted file if it is larger n in size.
Is there a way to do this using Guava or do I have to know implement my own function to do this?

I don't see anything that aborts, but you can use ByteStreams#limit(InputStream, long) to set a maximum number of bytes to read. The InputStream returned will simply return -1 on any read(..) that goes over the limit.
If you really want abort behavior, you could write your own InputStream wrapper that throws an exception if you go above some number of bytes read.

DataOutputStream.writeBytes adds zero-bytes

I have a small TCP server program and a corresponding client, and they communicate via ServerSocket and Socket classes and DataInputStream/DataOutputStream. And I have a problem with sending Strings to the server.
connection = new Socket("localhost", 2233);
outStream = new DataOutputStream(connection.getOutputStream());
outStream.writeBytes(fileName);
fileName is, at this point in time, a hard-coded String with the value "listener.jardesc". The server reads the string with the following code:
inStream = new DataInputStream(connection.getInputStream());
String fileName = inStream.readLine();
The string is received properly, but three zero-value bytes have been added to the end. Why is that and how can I stop it from happening? (I could, of course, trim the received string or somehow else stop this problem from mattering, but I'd rather prevent the problem completely)

I'm just going to throw this out there. You're using the readLine() method which has been deprecated in Java 5, 6 & 7. The API docs state quite clearly that this method "does not properly convert bytes to characters". I would read it as bytes or use a Buffered Reader.
http://docs.oracle.com/javase/1.5.0/docs/api/java/io/DataInputStream.html#readLine%28%29

writeBytes() does not add extra bytes.
The code you've written is invalid, as you aren't writing a newline. Therefore it doesn't work, and blocks forever in readLine().
In trying to debug this you appear to have read the bytes some other way, probably with read(); and to have ignored the return value returned by read, and to have concluded that read() filled the buffer you provided, when it didn't, leaving three bytes in their initial state, which is zero.

Why people use BufferedReader to read post data

People sometimes use BufferedReader to read post data.
BufferedReader bReader;
String postData = null;
try {
bReader = request.getReader();
char[] buf = new char[1024];
int len;
StringBuilder sBuilder = new StringBuilder();
while ((len = bReader.read(buf)) != -1) {
sBuilder.append(buf, 0, len);
}
postData = sBuilder.toString();
} catch (IOException e) {
bReader = null;
}
When should I use this to get parameter, how about request.getParameter()?

As EJP notes, this approach is used when the request's POST data consists of something other than request parameters.
So ...
When should I use this to get parameter, how about request.getParameter()?
You use it when you are expecting the request POST body to be a document. But it may not be adequate, as explained below.
That code is not particularly efficient, and that it could be problematic in other respects.
On the efficiency side, the code is using a BufferedReader AND reading into a large(-ish) character buffer before transferring into a StringBuilder.
Using a BufferedReader and a char[] is kind of pointless. If you are going to do block reads, it is (marginally) better to read from the underlying Reader.
Reading the entire POST data into a StringBuilder (without limiting its length) could leave you open to denial of service attacks aimed at triggering OOMEs. (You will get the same problem if the long requests are legitimate ...).
There are also larger issues:
Should process the POST data as a character stream rather trying to create a single String?
Is it correct to treat the POST data as characters at all? (See the Content-type header.)
Are you using the correct encoding scheme to decode the characters? (See the Content-type header, etcetera)
Should you be using the using the Content-length header to as a hint for sizing things and/or enforcing request size limits.
In short, the code that you are asking us about looks to be too simplistic to be a general solution to the problem of reading POST data.
If the post data cost lots of memory, will it be sent into two parts or more?
Probably not. Indeed, unless you (the developer of the webapp) implement a scheme which allows the client side to send in smaller chunks, the client may have no choice but to send one big document in the POST data. Of course, depending on what the document is and how it needs to be processed, you may not need to assemble it all in memory. The other point is that you should not rely on the client "doing the right thing" in terms of what it sends you. Your server needs to defend itself at some point.

Sometimes POST data isn't name-value pairs. Sometimes for example it is an XML document.

I do not know much about BufferedReader, but you could try creating an ArrayList of Strings and each time a post is sent shove the message into the array and give a limit on the index size otherwise you will end up with a memory problem depending on the harddrive space of the designated server.

Java file IO truncated while reading large files using BufferedInputStream

I have a function in which I am only given a BufferedInputStream and no other information about the file to be read. I unfortunately cannot alter the method definition as it is called by code I don't have access to. I've been using the code below to read the file and place its contents in a String:
public String[] doImport(BufferedInputStream stream) throws IOException, PersistenceException {
int bytesAvail = stream.available();
byte[] bytesRead = new byte[bytesAvail];
stream.read(bytesRead);
stream.close();
String fileContents = new String(bytesRead);
//more code here working with fileContents
}
My problem is that for large files (>2Gb), this code causes the program to either run extremely slowly or truncate the data, depending on the computer the program is executed on. Does anyone have a recommendation regarding how to deal with large files in this situation?

You're assuming that available() returns the size of the file; it does not. It returns the number of bytes available to be read, and that may be any number less than or equal to the size of the file.
Unfortunately there's no way to do what you want in just one shot without having some other source of information on the length of the file data (i.e., by calling java.io.File.length()). Instead, you have to possibly accumulate from multiple reads. One way is by using ByteArrayOutputStream. Read into a fixed, finite-size array, then write the data you read into a ByteArrayOutputStream. At the end, pull the byte array out. You'll need to use the three-argument forms of read() and write() and look at the return value of read() so you know exactly how many bytes were read into the buffer on each call.

I'm not sure why you don't think you can read it line-by-line. BufferedInputStream only describes how the underlying stream is accessed, it doesn't impose any restrictions on how you ultimately read data from it. You can use it just as if it were any other InputStream.
Namely, to read it line-by-line you could do
InputStreamReader streamReader = new InputStreamReader(stream);
BufferedInputReader lineReader = new BufferedInputReader(streamReader);
String line = lineReader.readLine();
...
[Edit] This response is to the original wording of the question, which asked specifically for a way to read the input file line-by-line.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.