In socket I/O, may I know how does a objectinputstream readObject knows how many bytes to read? Is the content length encapsulated inside the bytes itself or does it simply reads all the available bytes in the buffer itself?
I am asking this because I was referring to the Python socket how-to and it says
Now if you think about that a bit, you’ll come to realize a
fundamental truth of sockets: messages must either be fixed length
(yuck), or be delimited (shrug), or indicate how long they are (much
better), or end by shutting down the connection. The choice is
entirely yours, (but some ways are righter than others).
However in another SO answer, #DavidCrawshaw mentioned that `
So readObject() does not know how much data it will read, so it does
not know how many objects are available.
I am interested to know how it works...
You're over-interpreting the answer you cited. readObject() doesn't know how many bytes it will read, ahead of time, but once it starts reading it is just parsing an input stream according to a protocol, that consists of tags, primitive values, and objects, which in turn consist of tags, primitive values, and other objects. It doesn't have to know ahead of time. Consider the similar-ish case of XML. You don't know how long the document will be ahead of time, or each element, but you know when you've read it all, because the protocol tells you.
The readOject() method is using BlockedInputStream to read the byte.If you check the readObject of ObjectInputStream , it is calling
readObject0(false).
private Object readObject0(boolean unshared) throws IOException {
boolean oldMode = bin.getBlockDataMode();
if (oldMode) {
int remain = bin.currentBlockRemaining();
if (remain > 0) {
throw new OptionalDataException(remain);
} else if (defaultDataEnd) {
/*
* Fix for 4360508: stream is currently at the end of a field
* value block written via default serialization; since there
* is no terminating TC_ENDBLOCKDATA tag, simulate
* end-of-custom-data behavior explicitly.
*/
throw new OptionalDataException(true);
}
bin.setBlockDataMode(false);
}
byte tc;
while ((tc = bin.peekByte()) == TC_RESET) {
bin.readByte();
handleReset();
}
which is reading from the stream is using bin.readByte().bin is BlockiedDataInputStream which in turns use PeekInputStream to read it.This class finally is using InputStream.read().
From the description of the read method:
/**
* Reads the next byte of data from the input stream. The value byte is
* returned as an <code>int</code> in the range <code>0</code> to
* <code>255</code>. If no byte is available because the end of the stream
* has been reached, the value <code>-1</code> is returned. This method
* blocks until input data is available, the end of the stream is detected,
* or an exception is thrown.
So basically it reads byte after byte until it encounters -1.So As EJP mentioned, it never know ahead of time how many bytes are there to be read. Hope this will help you in understanfing it.
Related
I just asked a question about why my thread shut down wasn't working. It ended up being due to readLine() blocking my thread before the shutdown flag could be recognised. This was easy to fix by checking ready() before calling readLine().
However, I'm now using a DataInputStream to do the following in series:
int x = reader.readInt();
int y = reader.readInt();
byte[] z = new byte[y]
reader.readFully(z);
I know I could implement my own buffering which would check the running file flag while loading up the buffer. But I know this would be tedious. Instead, I could let the data be buffered within the InputStream class, and wait until I have my n bytes read, before executing a non-blocking read - as I know how much I need to read.
4 bytes for the first integer
4 bytes for the second integer y
and y bytes for the z byte array.
Instead of using ready() to check if there is a line in the buffer, is there some equivalent ready(int bytesNeeded)?
The available() method returns the amount of bytes in the InputStreams internal buffer.
So, one can do something like:
while (reader.available() < 4) checkIfShutdown();
reader.readInt();
You can use InputStream.available() to get an estimate of the amount of bytes that can be read. Quoting the Javadoc:
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking, which may be 0, or 0 when end of stream is detected. The read might be on the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In other words, if available() returns n, you know you can safely call read(n) without blocking. Note that, as the Javadoc states, the value returned is an estimate. For example, InflaterInputStream.available() will always return 1 if EOF isn't reached. Check the documentation of the InputStream subclass you will be using to ensure it meets your needs.
You are going to need to implement your own equivalent of BufferedInputStream. Either as a sole owner of an InputStream and a thread (possibly borrowed from a pool) to block in. Alternatively, implement with NIO.
I've read the java docs and a number of related questions but am unsure if the following is guaranteed to work:
I have a DataInputStream on a dedicated thread that continually reads small amounts of data, of known byte-size, from a very active connection. I'd like to alert the user when the stream becomes inactive (i.e. network goes down) so I've implemented the following:
...
streamState = waitOnStreamForState(stream, 4);
int i = stream.readInt();
...
private static int
waitOnStreamForState(DataInputStream stream, int nBytes) throws IOException {
return waitOnStream(stream, nBytes, STREAM_ACTIVITY_THRESHOLD, STREAM_POLL_INTERVAL)
? STREAM_STATE_ACTIVE
: STREAM_STATE_INACTIVE;
private static boolean
waitOnStream(DataInputStream stream, int nBytes, long timeout, long pollInterval) throws IOException {
int timeWaitingForAvailable = 0;
while( stream.available() < nBytes ){
if( timeWaitingForAvailable >= timeout && timeout > 0 ){
return false;
}
try{
Thread.sleep(pollInterval);
}catch( InterruptedException e ){
Thread.currentThread().interrupt();
return (stream.available() >= nBytes);
}
timeWaitingForAvailable += pollInterval;
}
return true;
}
The docs for available() explain:
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. The next caller might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Does this mean it's possible the next read (inside readInt()) might only, for instance, read 2 bytes, and the subsequent read to finish retrieving the Integer could block? I realize readInt() is a method of the stream 'called next' but I presume it has to loop on a read call until it gets 4 bytes and the docs don't mention subsequent calls. In the above example is it possible that the readInt() call could still block even if waitOnStreamForState(stream, 4) returns STREAM_STATE_ACTIVE?
(and yes, I realize my timeout mechanism is not exact)
Does this mean it's possible the next read (inside readInt()) might only, for instance, read 2 bytes, and the subsequent read to finish retrieving the Integer could block?
That's what it says. However at least the next read() won't block.
I realize readInt() is a method of the stream 'called next' but I presume it has to loop on a read call until it gets 4 bytes and the docs don't mention subsequent calls. In the above example is it possible that the readInt() call could still block even if waitOnStreamForState(stream, 4) returns STREAM_STATE_ACTIVE?
That's what it says.
For example, consider SSL. You can tell that there is data available, but you can't tell how much without actually decrpyting it, so a JSSE implementation is free to:
always return 0 from available() (this is what it used to do)
always return 1 if the underlying socket's input stream has available() > 0, otherwise zero
return the underlying socket input stream's available() value and rely on this wording to get it out of trouble if the actual plaintext data is less. (However the correct value might still be zero, if the cipher data consisted entirely of handshake messages or alerts.)
However you don't need any of this. All you need is a read timeout, set via Socket.setSoTimeout(), and a catch for SocketTimeoutException. There are few if any correct uses of available(): fewer and fewer as time goes on, it seems to me. You should certainly not waste time calling sleep().
im making a form to upload several images to a blob field in a mysql databases.
In a servlet i get all the images uploaded by users using a type="file".
Before inserting the images into the database, i have to check if the type="file" is empyt or not.
For doing this, according to the corrent structure of my code i do this:
if(allegatoInputStream.read()>-1){....
//allegatoInputStream is my InputStream
But, after i call the read() method, my InputStream becomes empty, so i have nothing to insert in my blob field.
i did something like this
InputStream InputStream_FOR_THE_CHECK = blablabla.getInputStream();
if(InputStream_FOR_THE_CHECK.read()>-1){....
But i don't think this is the right way to do what i want to do
You could wrap the stream in a PushbackInputStream, read one byte to see if there is content, then push back the byte if there is. As long as you then use the same instance of PushbackInputStream when you process the file content, that byte will then be part of the stream.
Your read() call reads a byte from the stream and throws it away. You need to store the result in a variable, test it for -1, and if it isn't, use it. It's the value of the next byte in the stream.
You can use inputStream.available(). From the JavaDoc:
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
A subclass' implementation of this method may choose to throw an IOException if this input stream has been closed by invoking the close() method.
In your scenario:
if(allegatoInputStream.available()>0){....
//allegatoInputStream is my InputStream
Why does this part of my client code is always zero ?
InputStream inputStream = clientSocket.getInputStream();
int readCount = inputStream.available(); // >> IS ALWAYS ZERO
byte[] recvBytes = new byte[readCount];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int n = inputStream.read(recvBytes);
...
Presumably it's because no data has been received yet. available() tries to return the amount of data available right now without blocking, so if you call available() straight after making the connection, I'd expect to receive 0 most of the time. If you wait a while, you may well find available() returns a different value.
However, personally I don't typically use available() anyway. I create a buffer of some appropriate size for the situation, and just read into that:
byte[] data = new byte[16 * 1024];
int bytesRead = stream.read(data);
That will block until some data is available, but it may well return read than 16K of data. If you want to keep reading until you reach the end of the stream, you need to loop round.
Basically it depends on what you're trying to do, but available() is rarely useful in my experience.
From the java docs
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
A subclass' implementation of this method may choose to throw an IOException if this input stream has been closed by invoking the close() method.
The available method for class InputStream always returns 0.
This method should be overridden by subclasses.
Here is a note to understand why it returns 0
In InputStreams, read() calls are said to be "blocking" method calls. That means that if no data is available at the time of the method call, the method will wait for data to be made available.
The available() method tells you how many bytes can be read until the read() call will block the execution flow of your program. On most of the input streams, all call to read() are blocking, that's why available returns 0 by default.
However, on some streams (such as BufferedInputStream, that have an internal buffer), some bytes are read and kept in memory, so you can read them without blocking the program flow. In this case, the available() method tells you how many bytes are kept in the buffer.
According to the documentation, available() only returns the number of bytes that can be read from the stream without blocking. It doesn't mean that a read operation won't return anything.
You should check this value after a delay, to see it increasing.
There are very few correct uses of available(), and this isn't one of them. In this case no data had arrived so it returned zero, which is what it's supposed to do.
Just read until you have what you need. It will block until data is available.
I am writing a utility in Java that reads a stream which may contain both text and binary data. I want to avoid having I/O wait. To do that I create a thread to keep reading the data (and wait for it) putting it into a buffer, so the clients can check avialability and terminate the waiting whenever they want (by closing the input stream which will generate IOException and stop waiting). This works every well as far as reading bytes out of it; as binary is concerned.
Now, I also want to make it easy for the client to read line out of it like '.hasNextLine()' and '.readLine()'. Without using an I/O-wait stream like buffered stream, (Q1) How can I check if a binary (byte[]) contain a valid unicode line (in the form of the length of the first line)? I look around the String/CharSet API but could not find it (or I miss it?). (NOTE: If possible I don't want to use non-build-in library).
Since I could not find one, I try to create one. Without being so complicated, here is my algorithm.
1). I look from the start of the byte array until I find '\n' or '\r' without '\n'.
2). Then, I cut the byte array from the start to that point and using it to create a string (with CharSet if specified) using 'new String(byte[])' or 'new String(byte[], CharSet)'.
3). If that success without exception, we found the first valid line and return it.
4). Otherwise, these bytes may not be a string, so I look further to another '\n' or '\r' w/o '\n'. and this process repeat.
5. If the search ends at the end of available bytes I stop and return null (no valid line found).
My question is (Q2)Is the following algorithm adequate?
Just when I was about to implement it, I searched on Google and found that there are many other codes for new line, for example U+2424, U+0085, U+000C, U+2028 and U+2029.
So my last question is (Q3), Do I really need to detect these code? If I do, Will it increase the chance of false alarm?
I am well aware that recognize something from binary is not absolute. I am just trying to find the best balance.
To sum up, I have an array of byte and I want to extract a first valid string line from it with/without specific CharSet. This must be done in Java and avoid using any non-build-in library.
Thanks you all in advance.
I am afraid your problem is not well-defined. You write that you want to extract the "first valid string line" from your data. But whether somet byte sequence is a "valid string" depends on the encoding. So you must decide which encoding(s) you want to use in testing.
Sensible choices would be:
the platform default encoding (Java property "file.encoding")
UTF-8 (as it is most common)
a list of encodings you know your clients will use (such as several Russian or Chinese encodings)
What makes sense will depend on the data, there's no general answer.
Once you have your encodings, the problem of line termination should follow, as most encodings have rules on what terminates a line. In ASCII or Latin-1, LF,CR-LF and LF-CR would suffice. On Unicode, you need all the ones you listed above.
But again, there's no general answer, as new line codes are not strictly regulated. Again, it would depend on your data.
First of all let me ask you a question, is the data you are trying to process a legacy data? In other words, are you responsible for the input stream format that you are trying to consume here?
If you are indeed controlling the input format, then you probably want to take a decision Binary vs. Text out of the Q1 algorithm. For me this algorithm has one troubling part.
`4). Otherwise, these bytes may not be a string, so I look further to
another '\n' or '\r' w/o '\n'. and this process repeat.`
Are you dismissing input prior to line terminator and take the bytes that start immediately after, or try to reevaluate the string with now 2 line terminators? If former, you may have broken binary data interface, if latter you may still not parse the text correctly.
I think having well defined markers for binary data and text data in your stream will simplify your algorithm a lot.
Couple of words on String constructor. new String(byte[], CharSet) will not generate any exception if the byte array is not in particular CharSet, instead it will create a string full of question marks ( probably not what you want ). If you want to generate an exception you should use CharsetDecoder.
Also note that in Java 6 there are 2 constructors that take charset
String(byte[] bytes, String charsetName) and String(byte[] bytes, Charset charset). I did some simple performance test a while ago, and constructor with String charsetName is magnitudes faster than the one that takes Charset object ( Question to Sun: bug, feature? ).
I would try this:
make the IO reader put strings/lines into a thread safe collection (for example some implementation of BlockingQueue)
the main code has only reference to the synced collection and checks for new data when needed, like queue.peek(). It doesn't need to know about the io thread nor the stream.
Some pseudo java code (missing exception & io handling, generics, imports++) :
class IORunner extends Thread {
IORunner(InputStream in, BlockingQueue outputQueue) {
this.reader = new BufferedReader(new InputStreamReader(in, "utf-8"));
this.outputQueue = outputQueue;
}
public void run() {
String line;
while((line=reader.readLine())!=null)
this.outputQueue.put(line);
}
}
class Main {
public static void main(String args[]) {
...
BlockingQueue dataQueue = new LinkedBlockingQueue();
new IORunner(myStreamFromSomewhere, dataQueue).start();
while(true) {
if(!dataQueue.isEmpty()) { // can also use .peek() != null
System.out.println(dataQueue.take());
}
Thread.sleep(1000);
}
}
}
The collection decouples the input(stream) more from the main code. You can also limit the number of lines stored/mem used by creating the queue with a limited capacity (see blockingqueue doc).
The BufferedReader handles the checking of new lines for you :) The InputStreamReader handles the charset (recommend setting one yourself since the default one changes depending on OS etc.).
The java.text namespace is designed for this sort of natural language operation. The BreakIterator.getLineInstance() static method returns an iterator that detects line breaks. You do need to know the locale and encoding for best results, though.
Q2: The method you use seems reasonable enough to work.
Q1: Can't think of something better than the algorithm that you are using
Q3: I believe it will be enough to test for \r and \n. The others are too exotic for usual text files.
I just solved this to get test stubb working for Datagram - I did byte[] varName= String.getBytes(); then final int len = varName.length; then send the int as DataOutputStream and then the byte array and just do readInt() on the rcv then read bytes(count) using the readInt.
Not a lib, not hard to do either. Just read up on readUTF and do what they did for the bytes.
The string should construct from the byte array recovered that way, if not you have other problems. If the string can be reconstructed, it can be buffered ... no?
May be able to just use read / write UTF() in DataStream - why not?
{ edit: per OP's request }
//Sending end
String data = new String("fdsfjal;sajssaafe8e88e88aa");// fingers pounding keyboard
DataOutputStream dataOutputStream = new DataOutputStream();//
final Integer length = new Integer(data.length());
dataOutputStream.writeInt(length.intValue());//
dataOutputStream.write(data.getBytes());//
dataOutputStream.flush();//
dataOutputStream.close();//
// rcv end
DataInputStream dataInputStream = new DataInputStream(source);
final int sizeToRead = dataInputStream.readInt();
byte[] datasink = new byte[sizeToRead.intValue()];
dataInputStream.read(datasink,sizeToRead);
dataInputStream.close;
try
{
// constructor
// String(byte[] bytes, int offset, int length)
final String result = new String(datasink,0x00000000,sizeToRead);//
// continue coding here
Do me a favor, keep the heat off of me. This is very fast right in the posting tool - code probably contains substantial errors - it's faster for me just to explain it writing Java ~ there will be others who can translate it to other code language ( s ) which you can too if you wish it in another codebase. You will need exception trapping an so on, just do a compile and start fixing errors. When you get a clean compile, start over from the beginnning and look for blunders. ( that's what a blunder is called in engineering - a blunder )