Python multithreading takes longer to execute the multiple jar files - java

I am using ThreadPoolExecutor and giving exact same tasks to workers. The task is to run a jar file and do something with it. the problem I am facing is related to timings.
Case 1: I submit one task to the pool and the worker completes in 8 seconds.
Case 2: I submit same task twice into the pool, and both workers completes around ~10.50 seconds.
Case 3: I submit same task thrice into the pool, and all three workers completes around ~13.38 seconds.
Case 4: I submit same task 4 times into the pool, and all fore workers completes around ~18.88 seconds.
If I replace the workers tasks to time.sleep(8) (instead of running jar file), then all 4 workers finish at ~8 seconds. Is this because of the fact that, the OS before executing java code has to create java environment first, which the OS is not able to manage it in parallel ?
Can someone explain me why is the execution time increasing for same task, while running in parallel ?Thanks :)
Here is how I am executing the pool;
def transfer_files(file_name):
raw_file_obj = s3.Object(bucket_name='foo-bucket', key=raw_file_name)
body = raw_file_obj.get()['Body']
# prepare java command
java_cmd = "java -server -ms650M -mx800M -cp {} commandline.CSVExport --sourcenode=true --event={} --mode=human_readable --configdir={}" \
.format(jar_file_path, event_name, config_dir)
# Run decoder_tool by piping in the encoded binary bytes
log.info("Running java decoder tool for file {}".format(file_name))
res = run([java_cmd], cwd=tmp_file_path, shell=True, input=body.read(), stderr=PIPE, stdout=PIPE)
res_output = res.stderr.decode("utf-8")
if res.returncode != 0:
if 'Unknown event' in res_output:
log.error("Exception occurred whilst running decoder tool")
raise Exception("Unknown event {}".format(event_name))
log.info("decoder tool output: \n" + res_output)
with futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as pool:
# add new task(s) into thread pool
pool.map(transfer_file, ['fileA_for_workerA', 'fileB_for_workerB'])

Using multithreading doesn't necessarily mean it will execute faster. You would have to deal with the GIL for Python to execute the commands. Think of it like 1 person can do 1 task faster than 1 person doing 2 tasks at the same time. He/she would have to multitask and do part of thread 1 first, than switch to thread 2, etc. The more threads, the more things the python interpreter has to do.
The same thing might be happening for Java too. I don't use Java but they might have the same problems. Here, Is Java a Compiled or an Interpreted programming language ? it says that the JVM converts Java on the fly, so the JVM would probably have to deal with the same problems as Python.
And, for the time.sleep(8), what it does is just use up processor time for the thread, so it would be easy to switch between a bunch of waiting tasks.

Related

Jmeter thread group duration

I have this situation:
Thread Group (n_threads=X,duration Y sec)
Loop
Java Sampler
When the test duration ends, Jmeter does not stop the threads that were making requests and therefore the test does not terminate. How can this be solved?
This can happen in the case when response time of your "Java Sampler" is higher than your test duration. I would recommend introducing reasonable timeout into your Java code so the samplers would fail/exit instead of waiting forever for the response. If you have no idea what's going on there - take a thread dump and see where your thread(s) stuck
As the workaround, the only way to "terminate" the test I can think of would be:
Adding another Thread Group with 1 thread
Adding JSR223 Sampler with the following code:
sleep(5000) // wait for 5 seconds, amend accordingly to your desired test duration
log.info('Exceeded test duration')
System.exit(1) // the process will exit with non-zero exit code (error), change it to 0 if needed
See Apache Groovy - Why and How You Should Use It for more information on Groovy scripting concept in JMeter
Also be aware that this code will terminate the whole JVM so it will probably make sense to add jmeter.save.saveservice.autoflush=true line to user.properties as forcibly terminating the whole JVM might lead to some results loss.

Java task scheduling for task items in MySql database with Multi-threading

I am writing a task scheduling module in java spring to handle task items which are stored in Mysql database.
Schema structure of the Task table:
ID | TASK_UUID | TASK_CONTENT(VARCHAR) | CREATED_TS | UPDATED_TS | STATUS(NEW/PROCESSING/COMPLETE)
I would like to implement multiple task scheduler workers to get the tasks for execution from Task table. In what way I can ensure the task schedulers would not get the same task for execution at the same time? Any good java framework I can make use of?
#Edit 1:
The task execution module is designed to be run by different machines, so sychronized methods may not work.
#Edit 2:
Each machine will get random or irregular numbers of task. So if auto-increment sequence is used, the allocation size of the index should be irregular too, otherwise there will be some tasks being never handled.
#Edit 3:
Each machine is running with Quartz Scheduler, configured with a constant Task getting and executing job. The time interval between every job is about 10 seconds. So, my goal is to ensure each machine scheduler can fetch at least 10 tasks in every quartz job run.
You could create the method getTask as a synchronized method:
Eg:
synchronized Task getTask() {
// get NEW task from DB
// update status to PROCESSING
// return task
}
#Edit 1:
If so, just use SELECT FOR UPDATE query to block the others query to access the same task.
Eg:
SELECT * FROM Task t WHERE t.status = NEW ORDER BY t.created_ts LIMIT 1 FOR UPDATE;
UPDATE Task SET status = PROCESSING WHERE id = <the task id> .
You could create a procedure to wrap the queries.
You can just work around the atomicity or transaction issue like this,
Using the id of your task, assuming it's incremental. If you have three machines runing the task scheduling. Then just mod the id by three and assign the tasks with result 0, 1 ,2 to a fixed machine. So different machines wont' interfere with each other (or race condition)

Why does this ConcurrentHashMap stream only run half of entries at a time?

I'm mystified by this curiosity. (I'm using ConcurrentHashMap rather than ConcurrentSkipListSet because the class doesn't implement Comparable.) I've got plenty of free CPUs on the computer and there is no difference between the classes that are run in the stream (other than random number generation). It's suspicious that the even numbers run first (consistently).
Here are the code and output with nRuns=10. I would expect all 10 threads to fire up and run simultaneously (as they usually do in my other uses of ConcurrentHashMap). Could it be due to some static code in LIBSVM that gets called by SvmCrossValidator? That's all I can think of. It seems to me from a basic Java perspective this stream should launch all 10 processes at once.
// instantiate and run nRuns times
ConcurrentHashMap<Integer,SvmCrossValidator> scvMap = new ConcurrentHashMap<>();
for (int i=0; i<nRuns; i++) {
scvMap.put(i, new SvmCrossValidator(param, nrFold, inputFilename, nCases, nControls));
}
// parallel stream
scvMap.entrySet().parallelStream().forEach(entry -> {
System.err.println("SVM run "+entry.getKey()+" started.");
entry.getValue().run();
System.err.println("SVM run "+entry.getKey()+" finished.");
});
Output:
SVM run 2 started.
SVM run 0 started.
SVM run 6 started.
SVM run 4 started.
SVM run 8 started.
LONG wait here while this first five grind away...
SVM run 8 finished.
SVM run 9 started.
SVM run 6 finished.
SVM run 7 started.
SVM run 0 finished.
SVM run 1 started.
SVM run 2 finished.
SVM run 3 started.
SVM run 4 finished.
SVM run 5 started.
SVM run 9 finished.
SVM run 1 finished.
SVM run 7 finished.
SVM run 3 finished.
SVM run 5 finished.
I think 2 things affect this. Firstly add thread name to your System.out to make the worker threads clearer:
System.err.println("SVM run "+entry.getKey()+" started." +' '+Thread.currentThread().getName());
The system property java.util.concurrent.ForkJoinPool.common.parallelism affects execute queues available to ForkJoinPool - see the constructor or javadoc for ForkJoinPool.
private ForkJoinPool(byte forCommonPoolOnly)
However parallelStream() creates a spliterator which I think also makes choices based on the size of the content which also determines no of streams - regardless of size of ForkJoinPool.
Changing java.util.concurrent.ForkJoinPool.common.parallelism may not affect the outcome unless you make nRuns much bigger and it then uses more of the ForkJoinPool.commonPool-worker threads.
So with a few tests on my machine:
nRuns=10 never made use of more than 5 worker threads even with parallel=128, even re-used the same worker threads even though a lot more were available
nRuns=1000 - reached around 130 at same time with parallel=128. Note that parallelism is not the value used for number of worker threads.

Rerunning a process/command after an interval after its last completion

I have a java program that I want to run every 2 hours. But I am not sure how long will it take to complete. In some cases, it may take 1 min and in some cases it may take more than 3 hours. Running same command after two hours will result in several instances running in parallel. Hence, I am trying to make it run 2 hours after it finishes. One option is keeping thread.sleep() method in Java. Is there any option I can do in Ubuntu ?
A very basic way to do this could be running your task on any scheduler, like cron/quartz/etc. On each task complete, write/create a file to signify the previous task is complete. On each task start, check for completion. If it has not completed, then skip. Or you could go more complex and write another file to queue the task to run immediately after current task is done. You could apply the same concept to a db table that tracks tasks processed as well.
Of course, you could write your own task managing layer and implement your own scheduling framework hehe
The following shell script will only run my_java_program if no other instances are running:
[ "$(pgrep my_java_program)" ] || my_java_program
If your java program is just a bare jar file, say, mypgm.jar, then try:
[ "$(pgrep -f mypgm.jar)" ] || java -jar mypgm.jar

Separate processes, each run in multithread JAVA

I have 4 separate processes which need to go one after another.
1st process
2nd process
3rd process
4th process
Since, every process is connected to one another, each process should run after process before him finishes.
Each process has its own variable length which will be various as programs data input grows.
But some sketch would be like this
Program Runs
1st process - lasts 10 seconds
2nd process - has 300 HTTP get requests, last 3 minutes
3rd process - has 600 HTTP get requests, lasts 6 minutes
4th process - lasts 1 minute
Program is written in java
Thanks for any answer!
There is no concurrency support in the java API for your use case because what you're asking for is the opposite of concurrent. You have a set of four mutually dependent operations that need to be run in a specific order. You only need, and should probably only use, one thread to correctly handle this case.
It would be reasonable and prudent to put each operation in its own method or class, based on how complex the operations are.
If you insist on using multiple threads, your main thread should maintain a list of runnables. Iterate through the list. Pop the first runnable from the list, create a new thread for that runnable, start the thread, and then invoke join() on the thread. The main thread will block until the runnable is complete. The loop will take you through all the runnables in order. Again, there is no good reason to do this. There may or may not be a bad reason.

Categories

Resources