I have certain requirement where I need to process large data with some mysql operation, there are multiple run of the similar kind. A single run takes around 2 hrs.
If I run each run in separate java thread there was no major time saving. As per my understanding java threads are not multi process ie its only a way to obtain parallelism not to improve CPU utilization.
If there is any way I can make use of multiple processor on the same machine through java, I guess that could save some time for all run operations.
Please let me know if the problem is clear here and have any idea on the solution.
Thanks,
Ashish
I think that your problem is in your application or in mysql.
Java does support multi-threading and your application should benefit automatically from multiple cores.
Probably there is a common resource that needs to be synchronized.
From what you say, "process large data", i bet the common resource is the database file and memory.
If a single run takes a minute or more (in your case: 120 minutes), than you're better off with multiple processes anyway, as the overhead of the JVM startup is neglectible.
Related
I have a program which spins up thousands of threads. I am currently using one host for all the threads which takes a lot of time. If I want to use multiple hosts (say 10 hosts, each running 100 different threads), how should I proceed ?
Having thousands of threads on a single JVM sounds like a bad idea - you may spend most time context-switching instead of doing the actual work.
To split your work across multiple host, you cannot use threads managed by a single JVM. You'll need to have each host exposing an API that can receive part of work and return the result of the work done.
One approach would be to use Java RMI (remote method invocation) to complete this task, but really, your question lacks so many details important for the decision of what architecture to choose.
Creating 1000 threads in on JVM is very bad design and need to minimise count.
High thread count will not give you multi-threading benefit as context switching will be very frequent and will hit performance.
If you are thinking of dividing in multiple hosts then you need parallel processing system like Hadoop /Spark.
They internally handles task allocation as well as central system for syncing all hosts on which threads/tasks are running.
My application is supposed to have a "realtime with pause" functionality. The user can pause execution, do some things that modify what's going to happen, then unpause and let stuff happen. Stuff happens at regular intervals as specified by the user, can be slow, can be fast.
My goal at using threading here is to improve performance on multicore systems. The amount of data that the application is supposed to crunch at the time intervals is supposed to be arbitrarily large (I expect lots and lots of loops over collections, modifying object properties and generating random numbers, but precious little disk access). I don't want the application to be constrained by the capacity of a single core, if it can use more to run faster.
Will this actually work this way?
I've run some tests (made a program crunch numbers a lot, and looked at CPU usage during its activity), but it's not really conclusive - usage is certainly in the proximity of 100% on my dual core machine, but hardly ever 100%. Does a single-threaded (main only) Java application use all available cores for computation?
Does a single-threaded (main only) Java application use all available cores for computation?
No, it will normally use a single core.
Making a program do computations in parallel with multiple threads may make it faster, but it's not a magical solution for any kind of problem. Whether this is a suitable solution for your program depends on what your program is doing exactly, and if the algorithm can be parallelized. If, for example, you are doing lots of computations where the next computation depends on the result of the previous computation, then making it multi-threaded will not help a lot, because you can't do the computations at the same time - the next one first has to wait for the answer of the previous one. So, you first have to think about what computations in your program could be run in parallel.
Java has a lot of support for multi-threading. You can program with threads directly, or use an executor service, or use the fork/join framework. Whatever is appropriate depends on what exactly you want to do.
Does a single-threaded (main only) Java application use all available cores for computation?
Not usually, but you could make use of some higher level apis in java that is actually using threads for you and youre not even usinfpg threads directly, more obviousiously fork/join and executors, less obvious the new Streams API on collections (ie parallelStream).
In general, though, to make use of all cores, you need to do some kind of concurrency. Further...its really hard to just observe you OS monitor to see what is going on (especially with only 2 cores)...your OS has other things going on (trying to manage itself, running your IDE, running crontab, running a browers to post to stackoverflow ;).
Finally, just implementing (concurrency) itself may not help, you have to do it "right" for your code/algorithm.
a java thread will run in a single cpu. to use multiple CPUs, you should have multiple threads.
Imagine that u have to do various tasks using your hand. You will do it slowly using one hand and more effciently using both your hands. Similarly, in java or in any other language multi threading provides the system with many hands. The good news is that you can have many threads to do different tasks. Running operations in a single thread will make the program sluggish and sometimes unresponsive. A good practice is to do long running tasks in a separate thread. For example loading large chunks of data from a database should be processed in a separate thread. Downloading data from the internet should also be processed in a separate thread. What happens if you do long running operations in the main thread? The program HANGS and will become unresponsive till the task gets completed and the user will think that there is someting wrong. I hope you get it
In order to improve the execution speed of a Java program running in Google App Engine, can I create additional Java threads during the runtime to make use of idle machines in the data center?
I've found conflicting data thus far.
If your primary concern is to improve the execution time, take a look at Memcache and Tasks. They can be used to reduce or avoid the latency of reading from or writing to the Datastore or other storage options, fetching URLs, sending emails, etc. If you do a lot of difficult computations that can run in parallel, look at MapReduce API.
Once you remove all the delays from your program, there will be no reason to use multiple threads within a single request.
Note that App Engine instances can use multithreading to execute multiple requests at the same time, so they tend to use allocated resources efficiently. To enable it, see:
https://developers.google.com/appengine/docs/java/config/appconfig#Java_appengine_web_xml_Using_concurrent_requests
If you have a problem that calls for a multithreaded solution, you can use threads (as described on the link that you included in your question).
However, based on your reasoning ("to make use of idle machines in the datacenter"), it seems like you're misguided. You should not use threads for that reason. You use the machines hours that you pay for and not more. The only time you will have an idle machine is if you tell App Engine to keep around an extra idle machine so that it doesn't have to start up an extra machine your app gets a big usage spike.
Most of the time, unless you are truly doing parallel computation, you won't need to use multiple threads in App Engine. For instance, the datastore has an asynchronous API so that you can do multiple datastore operations in parallel without having to deal with threads yourself.
Does that make sense?
I'm loading about 1 million records into Oracle using a custom Java utility. The Java utility is multi-threaded and has worked numerous times in the past with no problem. My issue is that when I start the load for the very first time, it is lightning fast, around 150K object per hour. After about an hour or 2 the performance greatly decreases to around 6000 objects per hour. I'm almost certain that my performance hit has something to do with Oracle, but I can't figure out what it is. The Oracle machine has 16GB of RAM and 8 CPUs. I set the following system parameters, that have worked for me in the past:
optimizer_mode=ALL_ROWS
optimizer_index_cost_adj=10
query_rewrite_integrity=ENFORCED
pga_aggregate_target=300M
sga_target=5000M
sga_max_size=5000M
Does anyone have any Oracle knowledge to maybe know why my performance is great initially but drops off drastically? One additional note, if I stop the load, restart the machine, then start the load again, I continue to see the 6000 object per hour performance. So it's always the very first load after cloning our Production database that has the best performance. Hopefully someone has an idea, thanks in advance!!
I assume that the load is only inserts and that the distribution of the data changes over time.
Or are it continuous inserts into the same table, like loading continuously Call Detail Records of a phone system?
In principe Oracle does not easily get slower with increasing and lasting use. But there are some ways to make it run slower:
Locks / latches
I would recommend checking that concurrent use by other Oracle sessions is not causing the problems due to short locks or latches. Given that it are inserts, it could maybe be the other threads trying to insert in the same data blocks given the distribution of the data which might become different after some time.
Restricted inserts per block
Please check that max_trans on the tables is not restricted to 1 or 2. I've seen that once and it was really funny to see how Oracle got down to a crawl when only one session can do something in a block.
SGA and kernel problems
With older Oracle releases (Oracle 7 and 8) I've seen numerous occassions on large systems where Oracle started to kill itself. This especially holds for multiprocessor systems, because locking/latching on a MP-system is implemented differently: the other processor might get it's work done, so an Oracle threads first just spins a little and then tries again. Also, problems with SGA fragmentation or even bad locking of the SGA can cause problems.
Please check that the insert statements use bind variables, batches or bypass SQL completely. You might also want to try running it in one thread. Is one thread processing stable over time (although slower)? If so, you have a locking issue somewhere. Google for locks/latches/spins and follow scenarios listed.
I'm new here and I'm not that very good in CPU consumption and Multi Threading. But I was wondering why my web app is consuming too much of the CPU process? What my program does is update values in the background so that users don't have to wait for the processing of the data and will only need to fetch it upon request. The updating processes are scheduled tasks using executor library that fires off 8 threads every 5 seconds to update my data.
Now I'm wondering why my application is consuming too much of the CPU. Is it because of bad code or is it because of a low spec server? (2 cores with 2 database and 1 major application running with my web app)
Thank you very much for your help.
You need to profile your application to find out where the CPU is actually being consumed. Java has some basic profiling methods built in, or if your environment permits it, you could run the built in "hprof" compiler:
java -Xrunhprof ...
(In reality, you probably want to set some extra options: Google "hprof" for more details.)
The latter is easier in principle, but I mention the possibility of adding your own profiling routine because it's more flexible and you can do it e.g. in a Servlet environment where running another profiler is more cumbersome.
Paulo,
It is not possible for someone here to say whether the problem is that your code is inefficient or the server is under spec. It could be either or both of those, or something else.
You are going to need to do some research of your own:
Profile the code. This will allow you to identify where your webapp is spending most of its time.
Look at the OS-level stats that are available to you. This might tell you that the real problem is memory usage or disk I/O.
Look at the performance of the back-end database. Is it using a lot of CPU?
Once you have identified the area(s) where the CPU is being used, you need to figure out the real cause of the problem is and work out how to fix it. And once you've got a potential fix implemented, you can rerun your profiling, etc to see it has helped.