new string(byte[]) too slow in multithreaded applications - java

I noticed that when I call new String(byte[]) from a single thread, it's pretty fast. But when I call it from several different threads it becomes painfully slow.
For example I have a parser that makes calls to new String(bytes). If I called the parser 50 times sequentially each parse takes about 100ms, but if I make 50 threads and call the parse, each parse takes between 12000ms to 21000ms! (it becomes slower in the later threads). It seems as if the String(bytes) construction is defined to be synchronized.
The profiler tracked the bottleneck to new String(bytes) and indeed when I changed it to new String("Hello") the parser became as fast in multi-thread as it was in single thread.
Does anyone know why this is the case? And what's the workaround?
Update:
My verification test was wrong because apparently Java compiler has some internal optimizations that shares String objects instead of creating new ones when I call new String("Hello"). So that's why it was faster when I made that change. I will rewrite my test code and will update this question.
Answer:
Both #nafas and #peter answers below are correct. The String itself was not slow, but the profiler mistakenly identified them as the bottleneck. The real culprits were:
Having more threads than available cores.
Garbage collector pausing the execution in the middle of operation because too many temporary objects were created and destroyed.

Swap
If the RAM is low, the JVM together with the OS swap to the HDD. Increase the availabe Xmx and Xms for a better Performance.

I'm pretty sure, it's not String class, because they are immutable, they have nothing to do with synchronization or threads.
if the number of cores relatively low (say 2) and you make 50 threads, that reduces speeds by quite a large factor. its because you have created extra complexity in your program.
each time your CPU cycles through threads costs some times,
There is no rule of thumb on choosing number of threads I normally go with :
number of Cores + 1
NOTE:
if you have say lots of API calls I would go for a bigger number of threads, whereas if threads are purely using CPU (as you are using them) then I go for a smaller number of threads (threads= #cores + 1)

Related

How much threads can be created and executed on single cpu core?

If I have a program with multithreading and 1 dual-core cpu on working machine, how much threads can I create parallel for these 2 cores? In some articles I saw that cpu cor can handle only on thread. Does it mean that I can create only 2 threads? Or I can create multiple threads, but only 2 of them will be executed by core in a tact? I know, that this question is simple, but, I'm confused a little bit
Modern hardware and OS combinations can easily handle thousands of them. And they can all be in the 'running state'.
However, let's say you have 2000 threads, 2 CPUs in the box, and each CPU has 16 cores, then certainly at most 32 of your running threads are truly 'executing stuff' - the other ones are ready to go if any of he 32 currently actually executing threads do something that isn't CPU-dependent (for example, waiting for some bytes to flow in from the network, or bytes to be read from a disk), or just time passes (eventually the CPU+OS+JVM work together to pre-empt an executing thread - to pause it so that it doesn't hog the resources).
The bottleneck you're most likely to run into first is memory: Each thread needs a stack, and stacks take memory. If for example you are working with 1MB stacks, creating 2000 threads means you now have 2GB worth of stackspace alone, and the java heap (Where all objects live) probably also takes a GB or 2, that's 4. On your average provisioned IAAS box you don't have that much.
A simple solution to that problem is that you can actually control the stack size when you create a new thread object. You can use this to make much smaller stacks, for example, 64k stacks - now 2000 threads only take an eight of a GB. Of course, a 64k stack is not capable of running particularly deeply nested methods - usually if you're creating thousands of threads, the code they actually end up running should be relatively simple and not create large stacks or do recursive calling.
If you mess this up, you'll get StackOverflowError. Then you know to either adjust the code or increase the stack sizes.
If you're not making thousands of threads (merely, say, hundreds), don't worry about it. Just.. make em and run em, trust the OS+CPU+JVM to sort it out for you.

Number of java thread > number of cores and garbage collection

We are using java 7 and working on multithreaded data crunching application. Due to certain constraint we are not using spark or any other map-reduce way to solve this problem. The idea of this project is maximize the performance of application using multi-threading.
My understanding is that at any given point, considering the CPU is not running any other thing apart from OS, number of the thread working simultaneously will be equal to number of hyper threading that CPU provides. But there is java GC which will kick-in every now and then. We have to consider that as well.
Also, I am aware that if I create more threads then I will actually degrade the performance because of the time spent in context switching.
The question is what would be the best way to consider all these things and create appropriate number of threads. Any idea or thought process? Is there any other process that I should consider?
The question is what would be the best way to consider all these things and create appropriate number of threads
I would use Java 8 which does this for you. e.g.
Results result = listOfWork.parallelStream()
.map(t -> t.doWork())
.collect(Collectors.reduce(.....));
However if you are stuck on Java 7, you can use an ExecutorService.
int procs = Runtime.getRuntime().availableProcessors();
ExecutorService es = Executors.newFixedThreadPool(procs);
But there is java GC which will kick-in every now and then
Unless you are using CMS, it doesn't kick in at the same time, so it doesn't matter what these threads are doing (in terms of tuning your thread pool)
Is there any other process that I should consider?
If you have other processes on the machines which use the CPU a lot you should consider them.
I actually did research on this last semester. When using threads, a good rule of thumb for increased performance for CPU bound processes is to use an equal number of threads as cores, except in the case of a hyper-threaded system in which case one should use twice as many cores. The other rule of thumb that can be concluded is for I/O bound processes. This rule is to quadruple the number threads per core, except for the case of a hyper-threaded system than one can quadruple the number of threads per core.

Program Performance in java fluctuates with thread variation

The title I admit is a bit misleading but I am sort of confused why this happens.
I've written a program in java that takes an argument x that instantiates x number of threads to do the work of the program. The machine i'm running it on has 8 cores / can handle 32 threads in parallel (each core has 4 hyperthreads). When I run the program past 8 threads (i.e. 22), I notice that if I run it with an even amount of threads, the program runs faster as opposed to when I run it with 23 threads (which is actually slower). The performance difference is about 10% between the two. Why would this be? Thread overhead doesn't really take this into account and I feel that as long as im running <32 threads, it should only be faster as I increase the # of threads.
To give you an idea what the program is doing, the program is taking a 1000 * 1000 array and each thread is assigned a portion of that array to update (roundoffs/leftovers in uneven are given to the last thread instantiated).
Is there any good reason for the odd/even thread performance difference?
Two reasons I can imagine:
The need to synchronize the memory access of your cores/threads. This will eventually invalidate CPU core caches and such things, which brings down performance. Try giving them really disjoint tasks, don't let them work on the same array. See: the memory isn't managed in individual bytes.
Hyperthreading CPUs often don't have full performance. They may for example have to share some floating point units. This doesn't mattern when e.g. one thread is integer-math heavy and the other is float-heavy. But having four threads each needing the floating point units means probably waiting, switching contexts, signalling the other thread, switching context back, waiting again...
Just two guesses. For example, you should have given the actual CPU you are using, the partitioning scheme you are, and a more detailed hint about the computational task.

Thread-per-character vs Thread-per-map design

I want opnion about multithreading design in java. Between thread-per-character and thread-per-map/zone. Which is more advantage (or other way) and game server can handles 3000+ players.
Neither of those are going to give you great scalability. Threads take up quite a bit of space - e.g. by default the stack size is 256K on 32-bit systems, so for 3000 users, you will need 750MB just to start 3000 threads, and that's before they've allocated any memory for data to do actual work.
Thread-per-user will put a hard limit on the number of users available, which may be artificially low compared to what the server might handle with a different design. Thread per zone may be slightly better in this respect but it may also limit the number of zones.
Large numbers of threads have significant task switching overhead. To avoid this, I would try to remove "ownership" of threads from the design and use a work pool instead, such as an ExecutorService. The game processing is split into units of work which you then submit to the pool. The pool is usually set to allow the same number of threads as cores, so that you get the most efficient execution. (If threads are I/O bound, you can use more threads than cores.)
Generally speaking, threads do not scale up. You will have a serious performance problem with 3000+ threads.

How many threads should I use in my Java program?

I recently inherited a small Java program that takes information from a large database, does some processing and produces a detailed image regarding the information. The original author wrote the code using a single thread, then later modified it to allow it to use multiple threads.
In the code he defines a constant;
// number of threads
public static final int THREADS = Runtime.getRuntime().availableProcessors();
Which then sets the number of threads that are used to create the image.
I understand his reasoning that the number of threads cannot be greater than the number of available processors, so set it the the amount to get the full potential out of the processor(s). Is this correct? or is there a better way to utilize the full potential of the processor(s)?
EDIT: To give some more clarification, The specific algorithm that is being threaded scales to the resolution of the picture being created, (1 thread per pixel). That is obviously not the best solution though. The work that this algorithm does is what takes all the time, and is wholly mathematical operations, there are no locks or other factors that will cause any given thread to sleep. I just want to maximize the programs CPU utilization to decrease the time to completion.
Threads are fine, but as others have noted, you have to be highly aware of your bottlenecks. Your algorithm sounds like it would be susceptible to cache contention between multiple CPUs - this is particularly nasty because it has the potential to hit the performance of all of your threads (normally you think of using multiple threads to continue processing while waiting for slow or high latency IO operations).
Cache contention is a very important aspect of using multi CPUs to process a highly parallelized algorithm: Make sure that you take your memory utilization into account. If you can construct your data objects so each thread has it's own memory that it is working on, you can greatly reduce cache contention between the CPUs. For example, it may be easier to have a big array of ints and have different threads working on different parts of that array - but in Java, the bounds checks on that array are going to be trying to access the same address in memory, which can cause a given CPU to have to reload data from L2 or L3 cache.
Splitting the data into it's own data structures, and configure those data structures so they are thread local (might even be more optimal to use ThreadLocal - that actually uses constructs in the OS that provide guarantees that the CPU can use to optimize cache.
The best piece of advice I can give you is test, test, test. Don't make assumptions about how CPUs will perform - there is a huge amount of magic going on in CPUs these days, often with counterintuitive results. Note also that the JIT runtime optimization will add an additional layer of complexity here (maybe good, maybe not).
On the one hand, you'd like to think Threads == CPU/Cores makes perfect sense. Why have a thread if there's nothing to run it?
The detail boils down to "what are the threads doing". A thread that's idle waiting for a network packet or a disk block is CPU time wasted.
If your threads are CPU heavy, then a 1:1 correlation makes some sense. If you have a single "read the DB" thread that feeds the other threads, and a single "Dump the data" thread and pulls data from the CPU threads and create output, those two could most likely easily share a CPU while the CPU heavy threads keep churning away.
The real answer, as with all sorts of things, is to measure it. Since the number is configurable (apparently), configure it! Run it with 1:1 threads to CPUs, 2:1, 1.5:1, whatever, and time the results. Fast one wins.
The number that your application needs; no more, and no less.
Obviously, if you're writing an application which contains some parallelisable algorithm, then you can probably start benchmarking to find a good balance in the number of threads, but bear in mind that hundreds of threads won't speed up any operation.
If your algorithm can't be parallelised, then no number of additional threads is going to help.
Yes, that's a perfectly reasonable approach. One thread per processor/core will maximize processing power and minimize context switching. I'd probably leave that as-is unless I found a problem via benchmarking/profiling.
One thing to note is that the JVM does not guarantee availableProcessors() will be constant, so technically, you should check it immediately before spawning your threads. I doubt that this value is likely to change at runtime on typical computers, though.
P.S. As others have pointed out, if your process is not CPU-bound, this approach is unlikely to be optimal. Since you say these threads are being used to generate images, though, I assume you are CPU bound.
number of processors is a good start; but if those threads do a lot of i/o, then might be better with more... or less.
first think of what are the resources available and what do you want to optimise (least time to finish, least impact to other tasks, etc). then do the math.
sometimes it could be better if you dedicate a thread or two to each i/o resource, and the others fight for CPU. the analisys is usually easier on these designs.
The benefit of using threads is to reduce wall-clock execution time of your program by allowing your program to work on a different part of the job while another part is waiting for something to happen (usually I/O). If your program is totally CPU bound adding threads will only slow it down. If it is fully or partially I/O bound, adding threads may help but there's a balance point to be struck between the overhead of adding threads and the additional work that will get accomplished. To make the number of threads equal to the number of processors will yield peak performance if the program is totally, or near-totally CPU-bound.
As with many questions with the word "should" in them, the answer is, "It depends". If you think you can get better performance, adjust the number of threads up or down and benchmark the application's performance. Also take into account any other factors that might influence the decision (if your application is eating 100% of the computer's available horsepower, the performance of other applications will be reduced).
This assumes that the multi-threaded code is written properly etc. If the original developer only had one CPU, he would never have had a chance to experience problems with poorly-written threading code. So you should probably test behaviour as well as performance when adjusting the number of threads.
By the way, you might want to consider allowing the number of threads to be configured at run time instead of compile time to make this whole process easier.
After seeing your edit, it's quite possible that one thread per CPU is as good as it gets. Your application seems quite parallelizable. If you have extra hardware you can use GridGain to grid-enable your app and have it run on multiple machines. That's probably about the only thing, beyond buying faster / more cores, that will speed it up.

Categories

Resources