Running a multithreaded programs using multiple host - java

I have a program which spins up thousands of threads. I am currently using one host for all the threads which takes a lot of time. If I want to use multiple hosts (say 10 hosts, each running 100 different threads), how should I proceed ?

Having thousands of threads on a single JVM sounds like a bad idea - you may spend most time context-switching instead of doing the actual work.
To split your work across multiple host, you cannot use threads managed by a single JVM. You'll need to have each host exposing an API that can receive part of work and return the result of the work done.
One approach would be to use Java RMI (remote method invocation) to complete this task, but really, your question lacks so many details important for the decision of what architecture to choose.

Creating 1000 threads in on JVM is very bad design and need to minimise count.
High thread count will not give you multi-threading benefit as context switching will be very frequent and will hit performance.
If you are thinking of dividing in multiple hosts then you need parallel processing system like Hadoop /Spark.
They internally handles task allocation as well as central system for syncing all hosts on which threads/tasks are running.

Related

File synchronizer architecture

I have to make a file synchronizer: an application that essentially synchronizes H24 a large amount of data files from many systems outside to my local system using essentially FTP, SFTP and NFS.
The streams are more than twenty, for each of them the logic is slightly different and it must be configurable.
One of the requirements is that if one of the streams for some reason falls down it must be possible to retrieve it on without restarting the entire system.
Another requirement is that the transfer rate is balanced. In other words, there must not be a stream or a part of them synchronized and another stream 10 hours late
I have some perplexity about architecture to be realized: if I realize a single multithread system I would have a very high thread count (more than 100 I would say) and make it complicated by fulfilling the two requirements outlined above.
I was thinking of realizing several processes or different instances of the same process even if It seems a little "ugly" .. so in this way some load balancing would be done by the operating system and it would be simpler to kill or to start a flow ..Perhaps even performance might be better as several processes could use much more ram Someone has any tips/advice? Thanks a lot and sorry for my poor english. Gian
As #kayaman said, 100 threads is not a lot. If that means 100 threads per unit of work and you will have many units of work which would imply many magnitudes increase in threads, I would suggest having a look at Fibers
As long as you don't block the fibers, you can have 100000+ fibers running over a couple (typically number of CPU cores) of threads. Each fiber would then just wait for a callback from the process before continuing.
To access your endpoints and handle them in similar ways, have a look at Apache Camel - it will allow you to stream the FTP, SFTP, etc and handle each as just another endpoint (in theory you should be able to plug email in as well and stream packets that are emailed to the endpoint)
Regarding balancing the streams, this is business logic you need to implement. If one stream is receiving packets faster than another stream, you should be able to limit the rate by not requesting more packets under certain conditions. Need some more information on how you retrieve the packages and which libraries you are using in order to be of better assistance here.

How to nice Java processies running on Tomcat

I'm creating an Java application using a REST front-end, hence must be responsive, and once in a while (X minutes) another service is polling the internet. For this some hundreds of threads are spawned.
Needless to say when hundreds of threads are running the server is slowing down (ie irresponsive). I found an option to supply a setPriority argument to Thread. But also found some flaws, meaning the front-end is pretty much irresponsive, although appears to be better without Java nicing.
So I'm checking my options: 1) nicing threads; 2) nicing a War (found no such option) 3) spawn another tomcat and nice that one, is possible but I would be losing precious resourced. Maybe assign thread pools to a subselection of cores?
My question mainly is some pointers in a helpful directions, preferably for option 1, then 2 etc. Or, of course, something I haven't mentioned, resulting in some dedicated cpu time for other threads.
I was able to nice the processes by editing the
/etc/systemd/system/tomcat.service
Just add a Nice=10 or whatever niceness level you want. All the processes should inherit the priority of the parent.

Google App Engine Java Program concurrency

In order to improve the execution speed of a Java program running in Google App Engine, can I create additional Java threads during the runtime to make use of idle machines in the data center?
I've found conflicting data thus far.
If your primary concern is to improve the execution time, take a look at Memcache and Tasks. They can be used to reduce or avoid the latency of reading from or writing to the Datastore or other storage options, fetching URLs, sending emails, etc. If you do a lot of difficult computations that can run in parallel, look at MapReduce API.
Once you remove all the delays from your program, there will be no reason to use multiple threads within a single request.
Note that App Engine instances can use multithreading to execute multiple requests at the same time, so they tend to use allocated resources efficiently. To enable it, see:
https://developers.google.com/appengine/docs/java/config/appconfig#Java_appengine_web_xml_Using_concurrent_requests
If you have a problem that calls for a multithreaded solution, you can use threads (as described on the link that you included in your question).
However, based on your reasoning ("to make use of idle machines in the datacenter"), it seems like you're misguided. You should not use threads for that reason. You use the machines hours that you pay for and not more. The only time you will have an idle machine is if you tell App Engine to keep around an extra idle machine so that it doesn't have to start up an extra machine your app gets a big usage spike.
Most of the time, unless you are truly doing parallel computation, you won't need to use multiple threads in App Engine. For instance, the datastore has an asynchronous API so that you can do multiple datastore operations in parallel without having to deal with threads yourself.
Does that make sense?

Java multiprocessing

I have certain requirement where I need to process large data with some mysql operation, there are multiple run of the similar kind. A single run takes around 2 hrs.
If I run each run in separate java thread there was no major time saving. As per my understanding java threads are not multi process ie its only a way to obtain parallelism not to improve CPU utilization.
If there is any way I can make use of multiple processor on the same machine through java, I guess that could save some time for all run operations.
Please let me know if the problem is clear here and have any idea on the solution.
Thanks,
Ashish
I think that your problem is in your application or in mysql.
Java does support multi-threading and your application should benefit automatically from multiple cores.
Probably there is a common resource that needs to be synchronized.
From what you say, "process large data", i bet the common resource is the database file and memory.
If a single run takes a minute or more (in your case: 120 minutes), than you're better off with multiple processes anyway, as the overhead of the JVM startup is neglectible.

When to choose several processes over threads in Java?

For what reasons would one choose several processes over several threads to implement an application in Java?
I'm refactoring an older java application which is currently divided into several smaller applications (processes) running on the same multi-core machine, communicating which each other via sockets.
I personally think this should be done using threads rather than processes, but what arguments would defend the original design?
I (and others, see attributions below) can think of a couple of reasons:
Historical Reasons
The design is from the days when only green threads were available and the original author/designer figured they wouldn't work for him.
Robustness and Fault Tolerance
You use components which are not thread safe, so you cannot parallelize withough resorting to multiple processes.
Some components are buggy and you don't want them to be able to affect more than one process. Say, if a component has a memory or resource leak which eventually could force a process restart, then only the process using the component is affected.
Correct multithreading is still hard to do. Depending on your design harder than multiprocessing. The later, however, is arguably also not too easy.
You can have a model where you have a watchdog process that can actively monitor (and eventually restart) crashed worker processes. This may also include suspend/resume of processes, which is not safe with threads (thanks to #Jayan for pointing out).
OS Resource Limits & Governance
If the process, using a single thread, is already using all of the available address space (e.g. for 32bit apps on Windows 2GB), you might need to distribute work amongst processes.
Limiting the use of resources (CPU, memory, etc.) is typically only possible on a per process basis (for example on Windows you could create "job" objects, which require a separate process).
Security Considerations
You can run different processes using different accounts (i.e. "users"), thus providing better isolation between them.
Compatibility Issues
Support multiple/different Java versions: Using differnt processes you can use different Java versions for your application parts (if required by 3rd party libraries).
Location Transparency
You could (potentially) distribute your application over multiple physical machines, thus further increasing scalability and/or robustness of the application (see #Qwe's answer for more Details / the original idea).
If you decide to go with threads you will restrict your app to be run on a single machine. This solution doesn't scale (or scales to some extent) - there are always hardware limits.
And different processes communicating via sockets can be distributed between machines, so that you could add virtually unlimited number or them. This scales better at the cost of slow communication between processes.
Deciding which approach is more suitable is itself a very interesting task. And once you make the decision there's no guarantee that it will look stupid to your successors in a couple of years when requirements change or new hardware becomes available.

Categories

Resources