Resuming a Multithreaded program that parses a file after crashing

Resuming a Multithreaded program that parses a file after crashing - java

I have a single thread program, that parses the contents of a file and gives me an output. For the single threaded program, I'm creating a dump file and updating it regularly for each line read so that, even if the system crashes the program will resume from the last execution point. Now, I want to implement this as a multi threaded program, but I'm confused as to what I'll do in the case of crash. Since multiple threads will be running in parallel how can I resume from the last execution point. Any suggestions?

use mulitthread to read a file is not a good idea. if yours processing logic is time costing, you can use one thread to read content from a file and then forward it to a pool of processing threading.
crash recovery is a long period, Oracle need several hours to recovery , so trying to use mulitthread to speed up recovery procedure is ridiculous

Related

How can I check if executable JAR has started/finished running?

I have multiple executable JARs, let's call them task1.jar, task2.jar, task3.jar, etc...
They mainly access the database and do housekeeping tasks on the server.
They are run by cron jobs, one after another, early in the morning everyday. The crob jobs are set about 15 minutes apart.
The crob jobs simply just call:
java -jar taskX.jar
Depending on the data on the database, their durations to finish will vary.
The task executions must not overlap. Otherwise when they access the database at the same time, the server will crash. It has happened before when one task takes too long to run and hasn't finished before another task has started.
So my question is... is there anyway to check a JAR has started or finished so that the next JAR can be delayed?
I would preferably resolve this with Java methods over changing any configs on server which I have limited access on.
Edit: #mksmanjit Sorry, I didn't mention before. There is also one complication - not all of them are run everyday. For example, task1.jar and task3.jar may be run everyday, task2.jar may be run every two days and task4.jar may be run every Mon, Wed, Fri, etc... So the situation can get quite complicated.

Couple of ways as I see it !
Let all JAR (or tasks) write Logs in a common or a separate log file. (Usually feasible if you only want to capture minor details such as start time , stop time, result etc) Assuming format you keep is manually readable.
As per above suggestion, create a Table in DB to keep a track of task activity along with task name, it's details such as trigger by, start time, end time, result etc
Alternatively, create database Triggers to monitor event (such as write or read) on DB table and whenever your task executes, trigger will also execute depending on what DB actions you want your trigger to perform such as sending e-mail, writing to table, printing simple log
Let JARs (or tasks) also print logs on console just in case assuming they are designed to view from any console or UI at runtime !

What would be best practice If I am trying to constantly check if a directory exists? JAVA

I have a Java application that creates multiple threads. There is 1 producer thread which reads from a 10gb file, parses that information, creates objects from it and puts them into multiple blocking queues (5 queues).
The rest of the 5 consumer threads read from a blockingqueue (each consumer thread has its own blockingqueue). The consumer threads then each write to an individual file, so 5 files in total get created. It takes around 30min to create all files.
The problem:
The threads are writing to an external mount directory in a linux box. We've experience problems where other linux mounts have gone down and applications crash so I want to prevent that in this application.
What I would like to do is keep checking if the mount (directory) exists before writing to it. Im assuming if the directory goes down it will throw a FileNotFoundException. If that is the case I want it to keep checking if the directory is there for about 10-20min before completely crashing. Because I dont want to have to read the 10gb file again I want the consumer threads to be able to pick up from where they last left off.
What Im not sure would be best practice is:
Is it best to check if the directory exists in the main class before creating the threads? Or check in each consumer thread?
If I keep checking if the directory exists in each consumer thread it seems like repeatable code. I can check in the main class but it takes 30min to create these files. What if in those 30min the mount goes down then if Im only checking in the main class whether the directory exists the application will crash. Or if Im already writing to a directory is it impossible for an external directory to go down? Does it get locked?
thank you

We have something similar in our application, but in our case we are running a web app and if our mounted file system goes down we just throw an exception, but we want to do something more elegant, like you do...
I would recommend using a combination of the following patterns: State, CircuitBreaker, which I believe CircuitBreaker is a more specific version of the State pattern, and Observer/Observable.
These would work in the following way...
Create something that represents your file system. Maybe a class called MountedFileSystem. Make all your write calls to this particular class.
This class will catch all FileNotFoundException and one occurs, the CircutBreaker gets triggered. This change will be like the State pattern. One state is when things are working 'fine', the other state is when things aren't working 'fine', meaning that the mount has gone away.
Then, in the background, I would have a task that starts on a thread and checks the actual underlying file system to see if it is back. When the file system is back, change the state in the MountedFileSystem, and fire an Event (Observer/Observable) to try writing the files again to disk.
And as yuan quigfei stated, I am fairly certain you're going to have to rewrite those files. I just don't see being able to restart writing to them, but perhaps someone else has an idea.

write a method to detect folder exist or not.
call this method before actual writing.
create 5 thread based on 2. Once detect file is not existed, you seems have no choice but rewrite. Of course, you don't need re-read if all your content are in memory already(Big memory).

Synchronizing process execution in a cluster with 2 nodes in Java

I have a cluster with 2 nodes and a shared file system. Each of these nodes runs has a java process that runs periodically. That process accesses the file system and handles some files and after the processing it deletes those files.
The problem here is that only one of the schedules processes should access the files. The other process should, if the first process runs, skip the execution.
My first attempt to solve this issue to create a hidden file .lock
. When the first process starts the execution it should move the file
into another folder and start handling files. When the other
scheduled process starts the execution it first checks if the .lock
file is present and if it isn't the process skips the execution.
When the first process finishes the execution it moves the .lock
file back to its original folder. I was using the Files.move()
method with ATOMIC_MOVE option but after a certain amount of time I
got unexpected behaviour.
My second attempt was to use a distributed lock like Hazelcast. I did some tests and it seems ok but this solution seems a bit complicated for a task that is this simple.
My question is: Is there any other smarter/simpler solution for this problem or my only option is to use Hazelcast? How would you solve this issue?

How to execute multiple java programs in parallel from within a single program

I have 4 java programs (all are alike), all programs have an infinite loop running in them. Now I want to execute all these programs in parallel from one more program. I want completely independent parallel processing. Is it possible?

It can be done using MultiThreading.From a single program,U can start your all other four programs as thread.The parallel execution will be there for those totally handled by Thread Scheduler.
Still you can make thread sleep and wait as per your requirement.
"A multithreaded program contains two or more parts that can run concurrently and each part can handle different task at the same time making optimal use of the available resources."

I/O completion ports and stdout processing

I'm using I/O completion ports for a process management library (yes, there's a reason for this). You can find the source for what I'm talking about here: https://github.com/jcommon/process/blob/master/src/main/java/jcommon/process/platform/win32/Win32ProcessLauncher.java (take a look at lines 559 and 1137 -- yes, that class needs to be refactored and cleaned up).
I'm launching a child process and using named pipes (not anonymous pipes b/c I need asynchronous, overlapped ReadFile()/WriteFile()) in order to process the child process' stdout and stderr. This is mostly actually working. In a test, I launch 1,000 concurrent processes and monitor their output, ensuring they emit the proper information. Typically either all 1,000 work fine or 998 of them, leaving a couple which have problems.
Those couple of processes are showing that not all their messages are being received. I know the message is being output, but the thread processing GetQueuedCompletionStatus() for that process returns from the read with ERROR_BROKEN_PIPE.
The expected behavior is that the OS (or the C libs) would flush any remaining bytes on the stdout buffer upon process exit. I would then expect for those bytes to be queued to my iocp before getting a broken pipe error. Instead, those bytes seem to disappear and the read completes with an ERROR_BROKEN_PIPE -- which in my code causes it to initiate the teardown for the child process.
I wrote a simple application to test and figure out the behavior (https://github.com/jcommon/process/blob/master/src/test/c/stdout-1.c). This application disables buffering on stdout so all writes should effectively be flushed immediately. Using that program in my tests yields the same issues as launching "cmd.exe /c echo hi". And at any rate, shouldn't the application (or OS?) flush any remaining bytes on stdout when the process exits?
The source is in Java, using direct-mapped JNA, but should be fairly easy for C/C++ engineers to follow.
Thanks for any help you can provide!

Are you sure that the broken pipe error isn't occurring with a non zero ioSize? If ioSize is not zero then you should process the data that was read as well as noting that the file is now closed.
My C++ code which does this basically ignores ERROR_BROKEN_PIPE and ERROR_HANDLE_EOF and simply waits for either the next read attempt to fail with one of the above errors or the current read to complete with zero bytes read. The code in question works with files and pipes and I've never seen the problem that you describe when running the kind of tests that you describe.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.