Creating many threads in java

Creating many threads in java - java

I want to simulate human population & for that I want to assign a thread to each individual. (This count should go to billion)
Each thread will behave as individual and its end will declare that human dead.
I have implemented this using simple thread creation & also by thread pooling. But after some point of time thread allocation just ceases to happen in both the methods. (say after 150000 threads).
I know java threads are bound by OS threads in 1:1 ratio & it will pose a problem.
What other approach will best simulate this problem?

You can have a look at the actor model which would be more adapted than threads in your situation.
In particular, akka is open source and well known for its implementation of this pattern : https://doc.akka.io/docs/akka/2.5.3/scala/guide/actors-intro.html

Related

java application multi-threading design and optimization

I designed a java application. A friend suggested using multi-threading, he claims that running my application as several threads will decrease the run time significantly.
In my main class, I carry several operations that are out of our scope to fill global static variables and hash maps to be used across the whole life time of the process. Then I run the core of the application on the entries of an array list.
for(int customerID : customers){
ConsumerPrinter consumerPrinter = new ConsumerPrinter();
consumerPrinter.runPE(docsPath,outputPath,customerID);
System.out.println("Customer with CustomerID:"+customerID+" Done");
}
for each iteration of this loop XMLs of the given customer is fetched from the machine, parsed and calculations are taken on the parsed data. Later, processed results are written in a text file (Fetched and written data can reach up to several Giga bytes at most and 50 MBs on average). More than one iteration can write on the same file.
Should I make this piece of code multi-threaded so each group of customers are taken in an independent thread?
How can I know the most optimal number of threads to run?
What are the best practices to take into consideration when implementing multi-threading?

Should I make this piece of code multi-threaded so each group of customers are taken
in an independent thread?
Yes multi-threading will save your processing time. While iterating on your list you can spawn new thread each iteration and do customer processing in it. But you need to do proper synchronization meaning if two customers processing requires operation on same resource you must synchronize that operation to avoid possible race condition or memory inconsistency issues.
How can I know the most optimal number of threads to run?
You cannot really without actually analyzing the processing time for n customers with different number of threads. It will depend on number of cores your processor has, and what is the actually processing that is taking place for each customer.
What are the best practices to take into consideration when implementing multi-threading?
First and foremost criteria is you must have multiple cores and your OS must support multi-threading. Almost every system does that in present times but is a good criteria to look into. Secondly you must analyze all the possible scenarios that may led to race condition. All the resource that you know will be shared among multiple threads must be thread-safe. Also you must also look out for possible chances of memory inconsistency issues(declare your variable as volatile). Finally there are something that you cannot predict or analyze until you actually run test cases like deadlocks(Need to analyze Thread dump) or memory leaks(Need to analyze Heap dump).

The idea of multi thread is to make some heavy process into another, lets say..., "block of memory".
Any UI updates have to be done on the main/default thread, like print messenges or inflate a view for example. You can ask the app to draw a bitmap, donwload images from the internet or a heavy validation/loop block to run them on a separate thread, imagine that you are creating a second short life app to handle those tasks for you.
Remember, you can ask the app to download/draw a image on another thread, but you have to print this image on the screen on the main thread.
This is common used to load a large bitmap on a separated thread, make math calculations to resize this large image and then, on the main thread, inflate/print/paint/show the smaller version of that image to te user.
In your case, I don't know how heavy runPE() method is, I don't know what it does, you could try to create another thread for him, but the rest should be on the main thread, it is the main process of your UI.
You could optmize your loop by placing the "ConsumerPrinter consumerPrinter = new ConsumerPrinter();" before the "for(...)", since it does not change dinamically, you can remove it inside the loop to avoid the creating of the same object each time the loop restarts : )

While straight java multi-threading can be used (java.util.concurrent) as other answers have discussed, consider also alternate programming approaches to multi-threading, such as the actor model. The actor model still uses threads underneath, but much complexity is handled by the actor framework rather than directly by you the programmer. In addition, there is less (or no) need to reason about synchronizing on shared state between threads because of the way programs using the actor model are created.
See Which Actor model library/framework for Java? for a discussion of popular actor model libraries.

Which is more memory efficient, Threaded Entities or Threaded Sectors for a Java Game?

I'm working on shoot 'em up game, that I'm planning on flooding the screen with entities, (Bullets, Mobs, and the like). I've tried a global timer to update everything on the screen, but I've gotten some serious fps drop when I flood the screen like I want.
So, I see myself as having two options. I can either give each individual entity a timer Thread, or I can section off the level into chunks and give each chunk its own timer.
With the first scenario, entities with their own timer threads, I will end up with hundreds of entities, each with their own thread running a timer.
In the section option, I will have multiple sections of the map with a timer updating multiple entities at once, with detections for when an entity leaves from one section to another.
I'm not familiar with Programming with Memory Efficiency in mind, so which method would be better for me to use?

You could try a ScheduledExecutorService.
It's part of the Java higher-level concurrency API. You can decide how many threads should exist (it re-uses threads for different tasks to avoid the overhead of creating new ones every time and is therefore expected to be much more efficient than creating new Threads all the time) or use a cached thread pool (which will create as many threads are necessary, but once a Thread has died it will re-use it to run new tasks).
Another advantage of this API is that not only can you run Runnables, you can also use Callables, which may return a value for you to use in the future (so you can perform calculations in different Threads and then use the result of each Thread for a final result).

I was experimenting with something similar and don't have a definite answer. But maybe some of the feedback I got from Java-Gaming.org will be helpful or of interest.
What I tried was this: each entity has its own thread, and collisions are handled via a very detailed map of the screen (basically a second version of the screen). Then, I have another thread that handles the display of the screen.
An "early" version of this, with over 500 entities being animated, is online:
http://hexara.com/pond.html
Later versions use more elaborate shapes and borders (rather than letting entities die and freeze at the edges) and collision logic such as bouncing off of each other and gravity. I was also playing with sprite aspects like "firefly" blinking. I mention "actors" on the web page, but the code isn't strictly that.
Some folks at java-gaming.org strongly thought having so many threads was not efficient. There was a lot of interesting feedback from them, which you might be interested in exploring. I haven't had time yet.
http://www.java-gaming.org/topics/multi-threading-and-collision-detection/25967/view.html
They were discussing things like hyperthreading and the acca framework for Actors.

Trouble understanding Java threads

I learned about multiprocessing from Python and I'm having a bit of trouble understanding Java's approach. In Python, I can say I want a pool of 4 processes and then send a bunch of work to my program and it'll work on 4 items at a time. I realized, with Java, I need to use threads to achieve this same task and it seems to be working really really well so far.
But.. unlike in Python, my cpu(s) aren't getting 100% utilization (they are about 70-80%) and I suspect it's the way I'm creating threads (code is the same between Python/Java and processes are independent). In Java, I'm not sure how to create one thread so I create a thread for every item in a list I want to process, like this:
for (int i = 0; i < 500; i++) {
Runnable task = new MyRunnable(10000000L + i);
Thread worker = new Thread(task);
// We can set the name of the thread
worker.setName(String.valueOf(i));
// Start the thread, never call method run() direct
worker.start();
// Remember the thread for later usage
threads.add(worker);
}
I took it from here. My question is this the correct way to launch threads or is there a way to have Java itself manage the number of threads so it's optimal? I want my code to run as fast as possible and I'm trying to understand how to tell and resolve any issues that maybe arising from too many threads being created.
This is not a major issue, just curious to how it works under the Java hood.

You use an Executor, the implementation of which handles a pool of threads, decides how many, and so forth. See the Java tutorial for lots of examples.
In general, bare threads aren’t used in Java except for very simple things. Instead, there will be some higher-level API that receives your Runnable or Task and knows what to do.

Take a look at the Java Executor API. See this article, for example.
Although creating Threads is much 'cheaper' than it used to be, creating large numbers of threads (one per runnable as in your example) isn't the way to go - there's still an overhead in creating them, and you'll end up with too much context switching.
The Executor API allows you to create various types of thread pool for executing Runnable tasks, so you can reuse threads, flexibly manage the number that are created, and avoid the overhead of thread-per-runnable.
The Java threading model and the Python threading model (not multiprocessing) are really quite similar, incidentally. There isn't a Global Interpreter Lock as in Python, so there's usually less need to fork off multiple processes.

Thread is a "low level" API.
Depending on what you want to do, and the version of java you use, their is better solution.
If you use Java 7, and if your task allow it, you can use the fork/join framework : http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html
However, take a look at the java concurrency tutorial : http://docs.oracle.com/javase/tutorial/essential/concurrency/executors.html

Difference between process and thread

I was asked a question during interview today. First they asked how to provide Synchronization
between thread. Then they asked how to provide Synchronization between process, because I told them, the variable inside each process can not be shared with other process, so they asked me to explain how two process can communicate with each other and how to provide Synchronization
between them, and where to declare the shared variable? Now the interview finished, but I want to know the answer, can anyone explain me?Thank you.

I think the interviewer(s) may not be using the proper terminology. A process runs in its own space, and has been mentioned in separate answers, you have to use OS-specific mechanisms to communicate between process. This is called IPC for Inter-Process Communication.
Using sockets is a common practice, but can be grossly inefficient, depending on your application. But if working with pure Java, this may be the only option since sockets are universally supported.
Shared memory is another technique, but that is OS-specific and requires OS-specific calls. You would have to use something like JNI for a Java application to access shared memory services. Shared memory access is not synchronized, so you will likely have to use semaphors to synchronize access among multiple processes.
Unix-like systems provide multiple IPC mechansims, and which one to use depends on the nature of your application. Shared memory can be a limited resource, so it may not be the best method. Googling on this topics provides numerous hits providing useful information on the technical details.

A process is a collection of virtual memory space, code, data, and system resources. A thread is code that is to be serially executed within a process. A processor executes threads, not processes, so each application has at least one process, and a process always has at least one thread of execution, known as the primary thread. A process can have multiple threads in addition to the primary thread. Prior to the introduction of multiple threads of execution, applications were all designed to run on a single thread of execution.
When a thread begins to execute, it continues until it is killed or until it is interrupted by a thread with higher priority (by a user action or the kernel's thread scheduler). Each thread can run separate sections of code, or multiple threads can execute the same section of code. Threads executing the same block of code maintain separate stacks. Each thread in a process shares that process's global variables and resources.

To communicate between two processes I suppose you can use a ServerSocket and Socket to manage process synchronization. You would bind to a specific port (acquire lock) and if a process already is bound you can connect to the socket (block) and wait until the server socket is closed.
private static int KNOWN_PORT = 11000;//arbitrary valid port
private ServerSocket socket;
public void acquireProcessLock(){
socket = new ServetSocket(KNOWN_PORT);
INetAddress localhostInetAddres = ...
try{
socket.bind(localhostInetAddres );
}catch(IOException failed){
try{
Socket socket = new Socket(localhostInetAddres ,KNOWN_PORT);
socket.getInputStream().read();//block
}catch(IOException ex){ acquireProcessLock(); } //other process invoked releaseProcessLock()
}
}
public void releaseProcessLock(){
socket.close();
}
Not sure if this is the actual best means of doing it but I think its worth considering.

Synchronization is for threads only it wont work for processes in Java. There is no utility in them working across processes, since the processes do not share any state that would need to be synchronized. A variable in one process will not have the same data as a variable in the other process

From a system point of view, a thread is defined by his "state" and the "instruction pointer".
The instruction pointer (eip) contains the address of the next instruction to be executed.
A thread "state" can be : the registers (eax, ebx,etc), the signals, the open files, the code, the stack, the data managed by this thread (variables, arrays, etc) and also the heap.
A process is a group of threads that share a part of their "state": it might be the code, the data, the heap.
Hope i answer your question ;)
EDIT:
The processes can communicate via the IPCs (Inter process communications). There are 3 mecanisms : shared memory, message queue. Synchronization between processes can me made with the Semaphors
Threads synchronization can me made with mutexes (pthread_mutex_lock, pthread_mutex_unlock, etc)

Check Terracotta Cluster or Terracotta's DSO Clustering documentation to see how this issue can be solved (bytecode manipulation, maintaince the semantics of Java Language Specification on putfield/getfield-level etc.)

the most simplest answer is process means a program under execution and a program is nothing but collection of functions.
where thread is the part of the proccess because all the threads are functions.
in other way we can say that a process may have multiple threads.
always OS allocates the memory for a process and that memory is disributed among the threads of that process.OS does not allocates memory for threads.

In one sentence processes are designed more independently than threads are.
Their major differences can be described at memory level. Different processes share nothing among each other, from register, stock memory to heap memory, which make them safe on their own tracks. However, normally threads are designed to share a common heap memory, which provides a more closely connected way for multiple processes computing task. Creating a more efficient way to take up computation resources.
E.g. If I compute with 3 processes, I have to let them each finish their jobs and wait for their results in a system level, at the mean time, registers and stack memory are always taken up. However, if I do it with 3 threads, then if thread 2 luckily finish its job earlier, because the result it computed had already been stored to the common heap memory pool, we can simply kill it without waiting for others to deliver their results, and this released resources of registers and stock memory can be used on other purposes.

Process:
A process is nothing but a program under execution.
Each process have its own memory address space.
Process are used for heavyweight tasks i.e. is basically execution of applications.
Cost of communication between process is high.
Switching from one process to another require some time for saving and loading registers, memory maps etc.
Process is operating system approach.
Threads:
A thread is light weight sub-process.
Thread share the same address space.
Cost of communication between the thread is low.
Note: At least one process is required for each thread.

I suppose the processes can communicate through a third-party : a file or a database...

Forcing multiple threads to use multiple CPUs when they are available

I'm writing a Java program which uses a lot of CPU because of the nature of what it does. However, lots of it can run in parallel, and I have made my program multi-threaded. When I run it, it only seems to use one CPU until it needs more then it uses another CPU - is there anything I can do in Java to force different threads to run on different cores/CPUs?

There are two basic ways to multi-thread in Java. Each logical task you create with these methods should run on a fresh core when needed and available.
Method one: define a Runnable or Thread object (which can take a Runnable in the constructor) and start it running with the Thread.start() method. It will execute on whatever core the OS gives it -- generally the less loaded one.
Tutorial: Defining and Starting Threads
Method two: define objects implementing the Runnable (if they don't return values) or Callable (if they do) interface, which contain your processing code. Pass these as tasks to an ExecutorService from the java.util.concurrent package. The java.util.concurrent.Executors class has a bunch of methods to create standard, useful kinds of ExecutorServices. Link to Executors tutorial.
From personal experience, the Executors fixed & cached thread pools are very good, although you'll want to tweak thread counts. Runtime.getRuntime().availableProcessors() can be used at run-time to count available cores. You'll need to shut down thread pools when your application is done, otherwise the application won't exit because the ThreadPool threads stay running.
Getting good multicore performance is sometimes tricky, and full of gotchas:
Disk I/O slows down a LOT when run in
parallel. Only one thread should do disk read/write at a time.
Synchronization of objects provides safety to multi-threaded operations, but slows down work.
If tasks are too
trivial (small work bits, execute
fast) the overhead of managing them
in an ExecutorService costs more than
you gain from multiple cores.
Creating new Thread objects is slow. The ExecutorServices will try to re-use existing threads if possible.
All sorts of crazy stuff can happen when multiple threads work on something. Keep your system simple and try to make tasks logically distinct and non-interacting.
One other problem: controlling work is hard! A good practice is to have one manager thread that creates and submits tasks, and then a couple working threads with work queues (using an ExecutorService).
I'm just touching on key points here -- multithreaded programming is considered one of the hardest programming subjects by many experts. It's non-intuitive, complex, and the abstractions are often weak.
Edit -- Example using ExecutorService:
public class TaskThreader {
class DoStuff implements Callable {
Object in;
public Object call(){
in = doStep1(in);
in = doStep2(in);
in = doStep3(in);
return in;
}
public DoStuff(Object input){
in = input;
}
}
public abstract Object doStep1(Object input);
public abstract Object doStep2(Object input);
public abstract Object doStep3(Object input);
public static void main(String[] args) throws Exception {
ExecutorService exec = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
ArrayList<Callable> tasks = new ArrayList<Callable>();
for(Object input : inputs){
tasks.add(new DoStuff(input));
}
List<Future> results = exec.invokeAll(tasks);
exec.shutdown();
for(Future f : results) {
write(f.get());
}
}
}

When I run it, it only seems to use
one CPU until it needs more then it
uses another CPU - is there anything I
can do in Java to force different
threads to run on different
cores/CPUs?
I interpret this part of your question as meaning that you have already addressed the problem of making your application multi-thread capable. And despite that, it doesn't immediately start using multiple cores.
The answer to "is there any way to force ..." is (AFAIK) not directly. Your JVM and/or the host OS decide how many 'native' threads to use, and how those threads are mapped to physical processors. You do have some options for tuning. For example, I found this page which talks about how to tune Java threading on Solaris. And this page talks about other things that can slow down a multi-threaded application.

First, you should prove to yourself that your program would run faster on multiple cores. Many operating systems put effort into running program threads on the same core whenever possible.
Running on the same core has many advantages. The CPU cache is hot, meaning that data for that program is loaded into the CPU. The lock/monitor/synchronization objects are in CPU cache which means that other CPUs do not need to do cache synchronization operations across the bus (expensive!).
One thing that can very easily make your program run on the same CPU all the time is over-use of locks and shared memory. Your threads should not talk to each other. The less often your threads use the same objects in the same memory, the more often they will run on different CPUs. The more often they use the same memory, the more often they must block waiting for the other thread.
Whenever the OS sees one thread block for another thread, it will run that thread on the same CPU whenever it can. It reduces the amount of memory that moves over the inter-CPU bus. That is what I guess is causing what you see in your program.

First, I'd suggest reading "Concurrency in Practice" by Brian Goetz.
This is by far the best book describing concurrent java programming.
Concurrency is 'easy to learn, difficult to master'. I'd suggest reading plenty about the subject before attempting it. It's very easy to get a multi-threaded program to work correctly 99.9% of the time, and fail 0.1%. However, here are some tips to get you started:
There are two common ways to make a program use more than one core:
Make the program run using multiple processes. An example is Apache compiled with the Pre-Fork MPM, which assigns requests to child processes. In a multi-process program, memory is not shared by default. However, you can map sections of shared memory across processes. Apache does this with it's 'scoreboard'.
Make the program multi-threaded. In a multi-threaded program, all heap memory is shared by default. Each thread still has it's own stack, but can access any part of the heap. Typically, most Java programs are multi-threaded, and not multi-process.
At the lowest level, one can create and destroy threads. Java makes it easy to create threads in a portable cross platform manner.
As it tends to get expensive to create and destroy threads all the time, Java now includes Executors to create re-usable thread pools. Tasks can be assigned to the executors, and the result can be retrieved via a Future object.
Typically, one has a task which can be divided into smaller tasks, but the end results need to be brought back together. For example, with a merge sort, one can divide the list into smaller and smaller parts, until one has every core doing the sorting. However, as each sublist is sorted, it needs to be merged in order to get the final sorted list. Since this is "divide-and-conquer" issue is fairly common, there is a JSR framework which can handle the underlying distribution and joining. This framework will likely be included in Java 7.

There is no way to set CPU affinity in Java. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4234402
If you have to do it, use JNI to create native threads and set their affinity.

You should write your program to do its work in the form of a lot of Callable's handed to an ExecutorService and executed with invokeAll(...).
You can then choose a suitable implementation at runtime from the Executors class. A suggestion would be to call Executors.newFixedThreadPool() with a number roughly corresponding to the number of cpu cores to keep busy.

The easiest thing to do is break your program into multiple processes. The OS will allocate them across the cores.
Somewhat harder is to break your program into multiple threads and trust the JVM to allocate them properly. This is -- generally -- what people do to make use of available hardware.
Edit
How can a multi-processing program be "easier"? Here's a step in a pipeline.
public class SomeStep {
public static void main( String args[] ) {
BufferedReader stdin= new BufferedReader( System.in );
BufferedWriter stdout= new BufferedWriter( System.out );
String line= stdin.readLine();
while( line != null ) {
// process line, writing to stdout
line = stdin.readLine();
}
}
}
Each step in the pipeline is similarly structured. 9 lines of overhead for whatever processing is included.
This may not be the absolute most efficient. But it's very easy.
The overall structure of your concurrent processes is not a JVM problem. It's an OS problem, so use the shell.
java -cp pipline.jar FirstStep | java -cp pipline.jar SomeStep | java -cp pipline.jar LastStep
The only thing left is to work out some serialization for your data objects in the pipeline.
Standard Serialization works well. Read http://java.sun.com/developer/technicalArticles/Programming/serialization/ for hints on how to serialize. You can replace the BufferedReader and BufferedWriter with ObjectInputStream and ObjectOutputStream to accomplish this.

I think this issue is related to Java Parallel Proccesing Framework (JPPF). Using this you can run diferent jobs on diferent processors.

JVM performance tuning has been mentioned before in Why does this Java code not utilize all CPU cores?. Note that this only applies to the JVM, so your application must already be using threads (and more or less "correctly" at that):
http://ch.sun.com/sunnews/events/2009/apr/adworkshop/pdf/5-1-Java-Performance.pdf

You can use below API from Executors with Java 8 version
public static ExecutorService newWorkStealingPool()
Creates a work-stealing thread pool using all available processors as its target parallelism level.
Due to work stealing mechanism, idle threads steal tasks from task queue of busy threads and overall throughput will increase.
From grepcode, implementation of newWorkStealingPool is as follows
/**
* Creates a work-stealing thread pool using all
* {#link Runtime#availableProcessors available processors}
* as its target parallelism level.
* #return the newly created thread pool
* #see #newWorkStealingPool(int)
* #since 1.8
*/
public static ExecutorService newWorkStealingPool() {
return new ForkJoinPool
(Runtime.getRuntime().availableProcessors(),
ForkJoinPool.defaultForkJoinWorkerThreadFactory,
null, true);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.