Java: read from webservice and buffer

Java: read from webservice and buffer - java

I am looking for an Object. I am quite sure it exists but I don't know its name.
My application receives a JMS message. In the message there are file ids for another system. This system is connected by consuming its webservice. As soon as I receive the message, I will start reading the data from the webservice and write it into some kind of object. This will be like 20000 requests that I will stream into some sort of data strucutre.
In the meanwhile or some time after I finished, a consumer will call a webservice of my application. This will open a stream to the already loaded data and send it as a StreamingResponseBody to the caller. My other thread is still writing data to the streamed object.
The object I am looking for is not a fixed size buffer as I want to read all data even though nobody is interested in the data (yet) and nobody calls the service to download the data.
What object / data structure do you suggest?
I would be able to use a File with an InputStream and an OutputStream but I am not sure if this is the best solution. The performance however won't be the problem as the network speed will limit the read performance anyway.
As my application will be a cloud application, everything is in memory, so persistence is nothing to care about.

Related

Unbounded PipedInputStream in Java

I am using an http library to fetch data that is 200 mb in size. Each line in the data is then processed. To save memory I would like to process the data line by line as the data is streamed in rather than waiting for all 200 mb to be downloaded first.
The http library I am using exposes a method that looks something like OnCharReceived(CharBuffer buffer) that can be overridden so that I can in effect process each chunk of data as it comes in.
I would like to expose this data as an InputStream. My first thought was to use a PipedInputStream and PipedOutputStream pair where in OnCharReceived() I would write to the PipedOutputStream and in my thread read from the PipedInputStream. However, this seems to have the problem that the underlying buffer of the pipe could get full requiring the writing thread to block in OnCharReceived until my thread gets around to processing data. But blocking in OnCharReceived would probably be blocking in the http library's IO thread and would be very bad.
Are there Java classes out there that handle the abstract problem I need to solve here without me having to roll my own custom implementation. I know of things like BlockingQueue that could be used as part of a larger solution. But are there any simple solutions.
For reasons of legacy code I really need the data exposed as an InputStream.
Edit: To be more precise I am basing my code on the following example from the apache http async library
https://hc.apache.org/httpcomponents-asyncclient-dev/httpasyncclient/examples/org/apache/http/examples/nio/client/AsyncClientHttpExchangeStreaming.java

If there's a simpler solution I would not get near Piped[In/Out]putStream. It introduces unnecessary complicated threading concerns as you pointed out. Keep in mind you can always write to a temp file and then read from the file as an InputStream. This also has the advantage of closing the HTTP connection as fast as possible and avoid timeouts.
There might be other solutions depending on the API you are using but I think the proposed solution still makes sense for the reasons above.

Reading packets from InputStream - Proper design

Developing an Android application that communicates over Bluetooth in a client/server architecture. The actual implementation isn't relevant but I was wondering the about the most appropriate design for handling an Input/OutputStream.
At the moment the server side runs a listener in a separate thread, reading the InputStream and constructing a corresponding packet from the stream based on the first received byte. These constructed packets are pushed into a buffer to be handled by another thread which parses the packets and determines what action to take.
This is more a question regarding proper technique/design of a communication protocol. I'm wondering if there are any alternative ways to process incoming data over a stream. I realize this is highly dependent on the application, however I have the inclination that there are multiple methods to handling stream data that would be just as applicable.
Rather an ambiguous question, even a reference to some text that has relevant information regarding proper object oriented communication design would be extremely helpful.

At the moment the server side runs a listener in a separate thread,
reading the InputStream and constructing a corresponding packet from
the stream based on the first received byte. These constructed packets
are pushed into a buffer to be handled by another thread which parses
the packets and determines what action to take.
I believe this isn't very efficient in terms of multithreading. Parsing packet header is not a very time-consuming operation. I suggest your primary thread not only to handle stream reading, but also to parse packet headers and push packets into relevant buffers, which can be simultaneously processed.
But since your question is too broad (I think actual implementation or, rather, actual task you are trying to fulfill and requirements are matter very much here), this may also seem inefficient.

Why are there streams in the HttpURLConnection API?

From what I understand about HTTP, it works like this: The client assembles a message, consisting of some header fields and (possibly) a body and sends it to the server. The server processes it, assembles its own response message and sends it back to the client.
And so I come to the question:
Why are there all of a sudden streams in HttpURLConnection?
This makes no sense to me. It makes it look like there is a continuous open channel. At what point does the message actually get sent to the server? On connect? On getInputStream? When trying to read from the stream? What if I have payload, does it get sent at a different time then? Can I do write-read-write-read with just a single connection?
I'm sure I just haven't understood it right yet, but right now it just seems like a bad API to me.
I would have more expected to see something like this:
HttpURLConnection http = url.openConnection();
HttpMessage req = new HttpMessage;
req.addHeader(...);
req.setBody(...);
http.post(req);
// Block until response is available (Future pattern)
HttpMessage res = http.getResponse();

IMHO HttpURLConnection has indeed a bad API. But handling the input and output message as streams is a way to deal efficiently with large amounts of data. I think all other answers (at this moment 5!) are correct. There are some questions open:
At what point does the message actually get sent to the server? On connect? On getInputStream? When trying to read from the stream?
There are some triggers when all collected data (e.g. headers, options for timeout, ...) is actually transmitted to the server. In most cases you don't have to call connect, this is done implicitly e.g. when calling getResponseCode() or getInputStream(). (BTW I recommend to call getResponseCode() before getInputStream() because if you get an error code (e.g. 404), getInputStream will throw an Exception and you should better call getErrorStream().)
What if I have payload, does it get sent at a different time then?
You have to call getOutputStream() and then send the payload. This should be done (obviously) after you added the headers. After closing the stream you can expect a response from the server.
Can I do write-read-write-read with just a single connection?
No. Technically this would be possible when using keep-alive. But HttpURLConnection handles this under the cover and you can only do one request-response roundtrip with an instance of this class.
Making life easier
If you don't want to fight with the horrible API of HttpURLConnection, you could have a look at some abstraction APIs listed on DavidWebb. When using DavidWebb, a typical request looks like this:
Webb webb = Webb.create();
String result = webb.post("http://my-server/path/resource")
.header("auth-token", myAuthToken)
.body(myBody)
.ensureSuccess()
.asString()
.getBody();

while the underlying transport does take place using individual packets, there's no guarantee that what you think about as a single http request/response will "fit" in a single http "packet". in turn, there's also no guarantee that a single http "packet" will fit in a single tcp packet, and so on.
imagine downloading a 20MB image using http. its a single http "response" but i guarantee there will be multiple packets going back and forth between the browser and the website serving it up.
every block is made up of possibly multiple smaller blocks, at each level, and since you might start processing the response before all the different bits of it have arrived, and you really dont want to concern yourself with how many of them there are, a stream is the common abstraction over this.

Here the Http protocol works on Connection-Oriented TCP connection. So internally, it creates a TCP connection. then send http request on that and receive the response back. then drop the TCP Connection. that is why two different streams are there.

Because streams are the generic way to push data between two places in Java, and that's what the HTTP connection does. HTTP works over TCP, which is a streamed connection so this API mimics that.
As for why it isn't abstracted further - consider that there is no size limits in HTTP requests. For example a file upload can be many MB or even GB in size.
Using a streamed API you can read data from a file or other source and stream it out over the connection at the same time without needing to load all that data into memory at once.

TCP is a byte stream. The body of an HTTP request or response is an arbitrary byte stream. Not sure what kind of API you were expecting, but when you have byte stream data you get a byte stream API.

A streaming response can be consumed on the fly not allocating up all the data in local memory, so it would be better from a memory point of view, for instance if you are to parse a huge json file doing this from stream and discards the raw data after it has been consumed. And in theory the parsing can begin as soon as the first byte has arrived.
And it is getInputStream that does the send/receive part as well as initiating the creation of the underlying socket

Check if ObjectInputStream has anything to read without blocking?

I am building a server in java that communicates with several clients at the same time, the initial approach we had is the the server listens to connections from the clients, once a connection is received and a socket is created, a new thread is spawned to handle the communication with each client, that is read the request with an ObjectInputStream, do the desired operation (fetch data from the DB, update it, etc.), and send back a response to the client (if needed). While the server itself goes back to listen to more connections.
This works fine for the time being, however this approach is not really scalable, it works great for a small amount of clients connected at the same time, however since every client spawns another thread, what will happen when there are a too many clients connected at once?
So my next idea was to maintain a list of sorts that will hold all connected clients (the socket object and some extra info), use a ThreadPool for to iterate through them and read anything they sent, if a message was received then put it in a queue for execution by another ThreadPool of worker threads, and once the worker has finished with its task if a response is required then send it.
The 2 latter steps are pretty trivial to implement, the problem is that with the original thread per client implementation, I use ObjectInputStream.readObject() to read the message, and this method blocks until there is something to read, which is fine for this approach, but I can't use the same thing for the new approach, since if I block on every socket, I will never get to the ones that are further down the list.
So I need a way to check if I have anything to read before I call readObject(), so far I tried the following solutions:
Solution 1:
use ObjectInputStream.available() to check if there is anything available to read, this approach failed since this method seems to always return 0, regardless of whether there is an object in the stream or not. So this does not help at all.
Solution 2:
Use PushbackInputStream to check for the existence of the first unread byte in the stream, if it exists then push it back and read the object using the ObjectInputStream, and if it doesn't move on:
boolean available;
int b = pushbackinput.read();
if (b==-1)
available = false;
else
{
pushbackinput.unread(b);
available = true;
}
if (available)
{
Object message= objectinput.readObject();
// continue with what you need to do with that object
}
This turned out to be useless too, since read() blocks also if there is no input to read. It seems to only return the -1 option if the stream was closed. If the stream is still open but empty it just blocks, so this is no different than simply using ObjectInputStream.readObject();
Can anyone suggest an approach that will actually work?

This is a good question, and you've done some homework.... but it involves going through some history to get things right. Note, your issue is actually more to do with the socket-level communication rather than the ObjectInputStream:
The easiest way to do things in the past was to have a separate thread per socket. This was scalable to a point but threads were expensive and slow to create.
In response, for large systems, people created thread pools and would service the sockets on threads when there was work to do. This was complicated.
The Java language was then changed with the java.nio package which introduced the Selector together with non-blocking IO. This created a reliable (although sometimes confusing) way to service multiple sockets with fewer threads. In your case through, it would not help fully/much because you want to know when a full Object is ready to be read, not when there's just 'some' object.
In the interim the 'landscape' changed, and Java is now able to more efficiently create and manage threads. 'Current' thinking is that it is better/faster and easier to allocate a single thread per socket again.... see Java thread per connection model vs NIO
In your case, I would suggest that you stick with the thread-per-socket model, and you'll be fine. Java can scale and handle more threads than sockets, so you'll be fine.

Java: ignoring an input stream - will buffers overflow and bad things happen?

I have a client connecting to my server. The client sends some messages to the server which I do not care about and do not want to waste time parsing its messages if I'm not going to be using them. All the i/o I'm using is simple java i/o, not nio.
If I create the input stream and just never read from it, can that buffer fill up and cause problems? If so, is there something I can do or a property I can set to have it just throw away data that it sees?
Now what if the server doesn't create the input stream at all? Will that cause any problems on the client/sending side?
Please let me know.
Thanks,
jbu

When you accept a connection from a client, you get an InputStream. If you don't read from that stream, the client's data will buffer up. Eventually, the buffer will fill up and the client will block when it tries to write more data. If the client writes all of its data before reading a response from the server, you will end up with a pretty classic deadlock situation. If you really don't care about the data from the client, just read (or call skip) until EOF and drop the data. Alternatively, if it's not a standard request/response (like HTTP) protocol, fire up a new thread that continually reads the stream to keep it from getting backed up.

If you get no useful data from the client, what's the point of allowing it to connect?

I'm not sure of the implications of never reading from a buffer in Java -- I'd guess that eventually the OS would stop accepting data on that socket, but I'm not sure there.
Why don't you just call the skip method of your InputStream occasionally with a large number, to ensure that you discard the data?

InputStream in = ....
byte[] buffer = new byte[4096] // or whatever
while(true)
in.read(buffer);
if you accept the connection, you should read the data. to tell you the truth i have never seen (or could forsee) a situation where this (a server that ignores all data) could be useful.

I think you get the InputStream once you accept the request, so if you don't acknowledge that request the underlying framework (i.e. tomcat) will drop that request (after some lapsed time).
Regards.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.