Apache POI gets java.io.IOException on tmp dir in multithread

Apache POI gets java.io.IOException on tmp dir in multithread - java

I have a java application that gets a request to create an XLSX file.
this application is multi-threaded which means that 5 users simultaneously can run a report.
my issue is that when the report is huge and 5 users create reports together i get this message java.io.IOException: Could not create temporary directory '
this is probably caused because one of the 5 threads deleted the java.tmp.dir and the other 4 threads failed.
how do i resolve that?
one of my suggested solutions is to give each thread a different java.io.tmpdir, is that something that can be done?

One solution will be while creating temp directory then thread should append some prefix to identify uniquely .So there will be no concurrent modification to same folder.
While implementation you have to consider how many request can simultaneously process.You can not create lot of directory.
One solution will be using thread pool and a queue to hold request if request is coming more than you can process.
or If there is similarity in content then you can create a template and change some data dynamically.So only clone will work

I may first check if your methods, in relation to write those .xlsx files, are thread safe.
And your theads may race to write the same files concurrently.

Related

Isuee with concurrent ant calls to DITA OT

We have a multi-thread application, and an integration with DITA-OT throught ant which is called from java.
We are started to face an issue with multiple concurrent ant calls to DITA-OT to run transformations, so when two threads or more run the ant call from java to DITA-OT, it randomly starts to generate an error reading the build_preprocess file.
It seems at the same time when one thread is trying to read the build_preprocess, another thread is deleting it; the build_preprocess is generated in the folder DITA-OT\plugins\org.dita.base
Is there a way to fix the issue, o have DITA-OT to support concurrent requests to run transformations?
enter image description here

This problem:
Failed to read job file: Content is not allowed in trailing section.
might occur if the same temporary files folder is used by two parallel processes.
So just make sure the "dita.temp.dir" and "output.dir" parameters are set to distinct values for the parallel processes so they do not use the same temporary files folder or output folder.
https://www.dita-ot.org/dev/parameters/parameters-base.html#ariaid-title1

What would be best practice If I am trying to constantly check if a directory exists? JAVA

I have a Java application that creates multiple threads. There is 1 producer thread which reads from a 10gb file, parses that information, creates objects from it and puts them into multiple blocking queues (5 queues).
The rest of the 5 consumer threads read from a blockingqueue (each consumer thread has its own blockingqueue). The consumer threads then each write to an individual file, so 5 files in total get created. It takes around 30min to create all files.
The problem:
The threads are writing to an external mount directory in a linux box. We've experience problems where other linux mounts have gone down and applications crash so I want to prevent that in this application.
What I would like to do is keep checking if the mount (directory) exists before writing to it. Im assuming if the directory goes down it will throw a FileNotFoundException. If that is the case I want it to keep checking if the directory is there for about 10-20min before completely crashing. Because I dont want to have to read the 10gb file again I want the consumer threads to be able to pick up from where they last left off.
What Im not sure would be best practice is:
Is it best to check if the directory exists in the main class before creating the threads? Or check in each consumer thread?
If I keep checking if the directory exists in each consumer thread it seems like repeatable code. I can check in the main class but it takes 30min to create these files. What if in those 30min the mount goes down then if Im only checking in the main class whether the directory exists the application will crash. Or if Im already writing to a directory is it impossible for an external directory to go down? Does it get locked?
thank you

We have something similar in our application, but in our case we are running a web app and if our mounted file system goes down we just throw an exception, but we want to do something more elegant, like you do...
I would recommend using a combination of the following patterns: State, CircuitBreaker, which I believe CircuitBreaker is a more specific version of the State pattern, and Observer/Observable.
These would work in the following way...
Create something that represents your file system. Maybe a class called MountedFileSystem. Make all your write calls to this particular class.
This class will catch all FileNotFoundException and one occurs, the CircutBreaker gets triggered. This change will be like the State pattern. One state is when things are working 'fine', the other state is when things aren't working 'fine', meaning that the mount has gone away.
Then, in the background, I would have a task that starts on a thread and checks the actual underlying file system to see if it is back. When the file system is back, change the state in the MountedFileSystem, and fire an Event (Observer/Observable) to try writing the files again to disk.
And as yuan quigfei stated, I am fairly certain you're going to have to rewrite those files. I just don't see being able to restart writing to them, but perhaps someone else has an idea.

write a method to detect folder exist or not.
call this method before actual writing.
create 5 thread based on 2. Once detect file is not existed, you seems have no choice but rewrite. Of course, you don't need re-read if all your content are in memory already(Big memory).

Multithreaded access to files in Java

I'm working on a multithreaded server in Java.
The server monitors a directory of files. Clients can ask the server:
to download a file from the server directory
to upload a new version of an already existing file to the server, overwriting the old version in the server directory.
To do the transfers, I'm planning to use FileChannels and SocketChannels, using the methods transferFrom and transferTo. According to the documentation, these two methods are thread safe.
The thing is that a single call to these two function could not be sufficient to read/write the file entirely.
The problem arises if there are more than one request on the same file at the same time. In this scenario, multiple threads could be doing read/write operations on the same file. Now, the single calls to transferFrom/transferTo are thread safe, according to the Java documentation. But a single call to these two functions could not be sufficient to read/write the file entirely. If thread A is replying to a download request and thread B is replying to an upload request referring to the same file, it could happen that:
Thread A starts reading from the file
In thread A, for some reason the read call returns before the EOF
Thread B overwrites the entire file with a single write call
Thread A continues reading from the file
In this case, the downloading client receives a portion of the old version and a portion of the new version.
To solve this I think I should be using some sort of locking, but I'm not sure how to do it in an efficient way. I could create two synchronized methods for reading and writing, but that creates obviously too much contention.
The best solution I have in mind is to use lock striping. Before doing any read/write operation, an hash based on the filename is calculated. Then, the lock in position lockArr[hash % numOfLocks] is acquired.
I think also that I should be using ReadWriteLocks, since multiple simultaneous reads should be allowed.
Now, this is my analysis of the problem and I could be completely wrong. Is there any better solution to this?

Locking means that somebody has to wait for somebody else -- not the best solution.
When the client uploads a file, you should write it out to a temp file on the same disk (usually in the same directory), and then when the file upload is done:
Rename the old version to a temporary name. Any current readers should be forced to close the old one, re-open the temp version, and seek to the correct position.
Rename the uploaded file to the target file name.
Delete the temp version of the old file when any readers are done with it.
In a typical implementation, you'd need a centralized class (lets call it ConcurrentFileAccessor) to manage the interactions between threads.
Readers would need to register with this class, and synchronize on some object during the actual read operation. When an upload completes, the writer would have to claim all those locks to block reads, close all the read files, rename the old version, reopen, seek, and then release them to allow the readers to continue.

Concurrent common file download in Java

I am working on Concurrent File Download process, but not sure what approach to take.
About:
An application bundles bunch of files together into a zip file. The files are usually available on the hard drive in a common location (for example /tmp). However there are cases when files are not there and need to be downloaded from a remote http server.
Question:
How can I download multiple files concurrently and ensure that NO other thread (bundling files) downloads the same file at the same time?
More over, how can I ensure that in case of multiple applications running at the same time (remember that the files are all located in a common location), no instance of the application downloads the same file at the same time?
Please describe strategy and perhaps a way to implement it. Perhaps solution the above issue already exists.
Thank you!

You could use a queue or db to download needed files, just keep a
'status' column and a thread will mark the file as 'fetching'. When
done it will set as 'done'. Keep a last change timestamp and if the
file is downloading for a long time, stop or restart download.
Using a database for this file queue might ensure that other apps
don't fetch the same file multiple times (maybe persist download
etc;). Also you can have multiple downloads running and the db could
be used to track download speed, progress, etc;
In the future your question should be formatted with specific code, a specific problem. Your question is very broad and presents a discussion (better suited for chat) vs a single answer someone else might use.

Here is a possible strategy:
In case of a single app: have some sort of dispatcher thread which reads work from a queue (could be some persisted queue too like DB table or other) and spawns new threads for each item that was read from the queue. By read I mean, read and remove from the queue.
Have that queue stored in a shared DB (or any shared storage). In this case there may be a separate single dispatcher app which just reads works or work portions from the DB, and gives work to worker apps. So each worker app asks the dispatcher app for work, this ensures that only the dispatcher app reads from the DB (or the other central storage you decide to use). This on its turn eliminates the need to sync your DB (permanent storage) access.

Concurrency while reading files from file system

We have an application that reads files from a particular folder, processes them and copies(some business logic) it to another folder.
The problem here is when there are very large number of files to be processed, running a single instance of an application or a single thread is no longer enough to process this files.
One approach we have for this is to start multiple instances of the application(I feel something is wrong with this approach. Suggest me an alternative if there is one).
Spawning threads or starting multiple instances of the application, care should be taken that, if a thread reads one file and starts processing it, another thread should not pick it up.
We are trying to achieve this by having a database table with the list of file names in the folder, so that when a thread first reads the table for the file name ,we will change the status to in-process or completed and pessimistically lock the table so that other threads cannot read it.
Is there any better solution to the problem ?

You can use most of your existing implementation as the front-end processor to feed file streams to worker threads that you can start/stop as demand dictates. Only the front-end thread opens files, so there is no possibility of one worker interfering with another.
EDIT: Added the word 'no' as it changes the meaning quite a bit...

Also have a look at JDK 7. It has a new file I/O API and a fork/ join framework which might help.

Take a look at Apache Camel (http://camel.apache.org), and its File component (http://camel.apache.org/file2.html). Using Camel allows you to very easily define a set of processing instructions to consume files in a directory atomically, and also to configure a thread pool to deal with multiple files at the same time. Camel in Action's a great book to get you started.

What you describe reminds me of the classical style to develop on UNIX.
In this classical style, you would move a file to a work-in-progress directory so that other files do not pick it up. In general: You could use one directory per processing state and than move files from state to state.
This works essentially because file moves are atomic (at least under Unix systems and NFTS).
What is nice with this approach, is that it is pretty easy to handle problematic situations like crashes and it has automatically a nice management interface everyone is familiar with (the filesystem GUI, ls, Windows Explorer, ...).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.