How to improve file IO performance in a rest service - java

The scenario I have at hand is from my spring boot rest service, read a word doc from resources folder and pass the byte array to the client
I read the word doc in memory using FileInputStream, convert input stream to a byte array using Apache Common IO IOUtils and place it in the response body of the rest service.
The problem here is that I always read the file in memeirh oer service request which is detrimental for there local memory of the process where service is running on.
I can’t read the file line by line and return it to the service caller in that fashion as I need to return the byte array back to the caller all together
Another problem I foresee is with how the file is read. I want to be a non blocking IO instead of a blocking IO.
Wondering what would be an efficient way to solve this

Do you actually need to read the file every time a request comes in.
Otherwise you could just read the file on server startup and then keep the file in memory stored in a Spring Bean. Then fetch it from there on every call?

If you don't want to upload file every time, it's better to create the #Bean, doing that in init/postconstruct phase. You also can add some functionality to your retrieve() method, which checks and stores file modification time with invokation of File.lastModified() to decide whether you have to reload the content or not.

Related

Efficient way to write to file in rest service

i have a java rest service that receives around 1000 requests per second. Each request has a payload of 1KB. I need to write this payload to a single file. Since there will be 1000 requests per second should i synchronize the writes to the file that uses FileWriter ? I also need to acknowledge the write to file succeeded for each request in response. This means i need to flush the write for each request.
If i synchronize the file writes the rest service performance will be degraded. Is there a way to write to file without synchronizing the file writes ?
Did you check how Log4J is logging information to text file? Its pretty even with millions of log msgs at a given time. You can refer to WriterAppender implementation for reference.
http://grepcode.com/file/repository.springsource.com/org.apache.log4j/com.springsource.org.apache.log4j/1.2.15/org/apache/log4j/WriterAppender.java#WriterAppender

In case of server crash, queue data will be lost

I'm working on creating file uploader. I would like to load files first temp folder and then convert them to needed format. For this I'll create a queue with tasks that will be executed with the executor. But in case of server crash, this queue will be lost. So could anybody suggest me a library without using another server that can make my queue persistent?
Instead of using in-memory queue implementation, you can use persistent options like DB or a JMS Queue. This will avoid loosing the data even if server crashes.
You need to use DB,a and store the bytes in it. Invoke 2 threads one will only feed the data to DB and another will poll onto convert the file. You can maintain the status if the file is changed to the format you wanted, and also the format it needs to be changed in

Java Serialization - Recovering serialized file after process crash

I have a following usecase.
A process serializes certain objects to a file using BufferedOutputStream.
After writing each object, process invokes flush()
The use case is that if the process crashes while writing an object, I want to recover the file upto the previous object that has been written successfully.
How can I deserialize such file? How will Java behave while deserializing such file.
Will it successfully deserialize upto the object that were written successfully before crash?
While reading the last partially written object, what will be the behavior. How can I detect that?
Update1 -
I have tried to simulate process crash via manually killing the process while objects are being written. I have tried around 10-15 times.Each time i am able to deserialize the file and file does not has any partial object.
I am not sure if my test is exhaustive enough and therefore need further advice.
Update2 - Adam had pointed a way which could simulate such test using truncating the file randomly.
Following is the behavior observed for trying out around 100 iterations -
From the truncated file ( which should be equivalent to the condition of file when a process crashes), Java can read upto last complete object successfully.
Upon reaching the last partially written object, Java does not throw any StreamCorruptedException or IOException. It simply throws EOFException indicated EOF and ignores the partial object.
Each object is deserialized or not before reading the next one. It won't be impacted because a later object failed to be written or will fail to deserialize
I suspect you are misusing java serialization - it's not intended to be a reliable and recoverable means of permanent storage. Use a database for that. If you must, you can
use a database to store the serialized form of java objects, but that would be pretty inefficient.
Yeah, testing such scenario manually (by killing the process) may be difficult. I would suggest writing a test case, where you :
Serialize a set of objects and write them to a file .
Open the file and basically truncate it at random position.
Try to load and deserialize (and see what happens)
Repeat 1. to 3. with several other truncate positions.
This way you are sure that you are loading a broken file and that your code handles it properly.
Have you tried appending to ObjectOutputStream? You can find the solution HERE just find the post where explains how to create an ObjectOutputStream with append.

Specify InputStream for ServletResponce instead of copying InputStream in OutputStream

In short I have a Servlet, which retrieves pictures/videos e t.c. from underlying data store.
In order to archive this I need to copy files InputStream to ServletResponce *OutputStream*
From my point of view this is not effective, since I'll need to copy the file in memory before sending it, it would be more convinient to specify InputStream, from which OutputStream would read data and send it straight away, after reading some data in the buffer.
I looked at ServletResponce documentation and it have some buffer for the message data, so I have a few questions regarding it.
Is this the right mechanism?
What If I decide not to send the file at the end of Servlet processing?
For example:
If I have copied InputStream in OutputStream, and then find out that this is not authorized request, and user have no right to see this Object (Mistake in design maybe) I would still send some data to the client, although this is not what I intended, or not.
To address your first concern, you can easily copy InputStream to OutputStream using IOUtils from Apache Commons Lang:
IOUtils.copy(fileInputStream, servletOutputStream);
It uses 4K buffer, so memory consumption should not be a concern. In fact you cannot just send straight away data from InputStream. At the lowest level the operating system still has to read file contents to some memory location and in order to send it to socket, you need to provide a memory location where the data to be sent resides. Streams are just a useful abstraction.
About your second question: this is how HTTP works: if you start streaming data to the client, servlet container sends all response headers first. If you abort in the middle, from the client perspective it looks like interrupted download.
Is this the right mechanism?
Basically, it is the only mechanism provided by the Servlet APIs. You need to design your servlet with this in mind.
(It is hard to see how it could be done any other way. A read syscall reads data into memory from a device (the disk). A write syscall writes data from memory to a device (the network interface). There is no syscall to transfer data directly from one device to another. The best you can do is to reduce the amount of copying of data within the application. If you use something like IOUtils.copy, it should minimize that as far as possible. The only way you could avoid going through application memort would be to use some special purpose hardware / operating system combination optimized for content delivery.)
However, this is probably moot anyway. In most cases, the performance bottleneck is likely to be movement of data over the network. Data can probably be read from disk to memory, copied, and written to the network interface orders of magnitude faster than it can move through the network to the user's web browser (or whatever).
If it is NOT moot, then a practical way to do content delivery would be to use a separate web server implemented in native code that us optimized for delivering static content; e.g. something like nginx.)
What If I decide not to send the file at the end of Servlet processing? For example: If I have copied InputStream in OutputStream, and then find out that this is not authorized request, and user have no right to see this Object (Mistake in design maybe) I would still send some data to the client, although this is not what I intended, or not.
You should write your servlet to do the access checks BEFORE reading the content into memory. And ideally, before you "commit" the response by sending the response header.

Special OutputStream to work into memory and file depending on the amount of input data

Currently I'm working with an SSH client api providing me stdout and stderr as InputStreams. I have to read all the data from these streams at client side and provide an api for implementors to be able to work with these data the way they want (just drop it, write it to DB, process it etc). First I tried to keep the whole data read in byte arrays, but with huge amount of data (could happen sometimes) this can cause serious memory problems. But I don't want to write all the data of every call into files if that isn't really necessary.
Anyone knows about a solution which reads data into memory until it reaches a limit (like 1mb), after it writes data from memory to a file and appends all the remaining data of the inputstream to the same file?
commons io has a workable solution: DeferredFileOutputStream.
Can you avoid reading the stream until you know what you are going to do with it?
If you use this approach you can dump them, read portions of data and write them to a database as you read it, or read and process the data as you read it.
This way you would not need to read more than 1 MB (or less) at any one time.

Categories

Resources