Java aims to be ‘Threaded’

Java aims to be ‘Threaded’ - java

i need help to under stand the threads in java.
A thread is a thread of execution in a program. The Java Virtual Machine allows an application to have multiple threads of execution running concurrently.
What do we mean when we say that Java aims to be ‘Threaded’

This means that various operations can and should be executed concurrently. This can be achieve by using threads. You can either use "low level" thread API (Thread, Runnable) or higher level API (Timer, Executors).
I hope this is enough to start googling and learn. I'd recommend you to start from low level threading API to understand how to work with threads and synchronization. Then go forward and learn facilities of concurrency package introduced in java 1.5. Do not start from higher level API. You need low level to understand later what happens behind the scene when you are submitting task to executor.

threads are a popular way to implement concurrency in languages. java has them. that's what it means.

"Java is threaded" means that Java could execute two or more jobs at the same time.
If you want to learn more about that look at Oracle Java concurrency tutorial: http://docs.oracle.com/javase/tutorial/essential/concurrency/

What do we mean when we say that Java aims to be ‘Threaded’
Well, literally we don't say that, because calling a runtime environment "threaded" means something rather different; see http://en.wikipedia.org/wiki/Threaded_code. (And note that that page takes care to distinguish between "threaded" and "multi-threaded"!)
In fact, we describe Java as being a language that supports "Multi-threaded" programming. The quotation in your question is a succinct description of what that means. A more long-winded description is as follows.
A program normally executes statements in sequence. So for example:
int i = 1;
i = i + j;
if (i < 10) {
...
}
In the above, the statements are executed one after another in sequence.
A thing that controls the execution of statements like that is called a "thread of control" or (more commonly) a thread. You can think of it as an automaton that executes statements one after another, and that is only capable of doing one at a time. It keeps a record of the state of the local variables and the procedure calls. (It typically uses a stack and a set of private registers to do this ... but that's an implementation detail.)
In a multi-threaded program, there are potentially many of these automatons, each executing a different sequence of statements (using its own stack and registers). Each thread is potentially able to communicate with other threads (by observing shared objects, etc) and can synchronize with them in various was and for various reasons.
Depending on the hardware (and the operating system), the threads may either all run on the same processor, or they may (at different times) run on different processors. It is typically a combination of the two, and it is typically up to the operating system to decide which of the threads that can do work is allowed to run. (This is handled by the thread scheduler.)
From a Java perspective, multi-threaded programming is implemented at the low level using the Thread class, synchronized methods and blocks, and the Object level wait and notify methods. Higher level APIs provide standard building blocks for solving common problems.

Related

Project loom: what makes the performance better when using virtual threads?

To give some context here, I have been following Project Loom for some time now. I have read The state of Loom. I have done asynchronous programming.
Asynchronous programming (provided by Java NIO) returns the thread to the thread pool when the task waits and it goes to great lengths to not block threads. And this gives a large performance gain, we can now handle many more request as they are not directly bound by the number of OS threads. But what we lose here, is the context. The same task is now NOT associated with just one thread. All the context is lost once we dissociate tasks from threads. Exception traces do not provide very useful information and debugging is difficult.
In comes Project Loom with virtual threads that become the single unit of concurrency. And now you can perform a single task on a single virtual thread.
It's all fine until now, but the article goes on to state, with Project Loom:
A simple, synchronous web server will be able to handle many more requests without requiring more hardware.
I don't understand how we get performance benefits with Project Loom over asynchronous APIs? The asynchrounous API:s make sure to not keep any thread idle. So, what does Project Loom do to make it more efficient and performant that asynchronous API:s?
EDIT
Let me re-phrase the question. Let's say we have an http server that takes in requests and does some crud operations with a backing persistent database. Say, this http server handles a lot of requests - 100K RPM. Two ways of implementing this:
The HTTP server has a dedicated pool of threads. When a request comes in, a thread carries the task up until it reaches the DB, wherein the task has to wait for the response from DB. At this point, the thread is returned to the thread pool and goes on to do the other tasks. When DB responds, it is again handled by some thread from the thread pool and it returns an HTTP response.
The HTTP server just spawns virtual threads for every request. If there is an IO, the virtual thread just waits for the task to complete. And then returns the HTTP Response. Basically, there is no pooling business going on for the virtual threads.
Given that the hardware and the throughput remain the same, would any one solution fare better than the other in terms of response times or handling more throughput?
My guess is that there would not be any difference w.r.t performance.

We don't get benefit over asynchronous API. What we potentially will get is performance similar to asynchronous, but with synchronous code.

The answer by #talex puts it crisply. Adding further to it.
Loom is more about a native concurrency abstraction, which additionally helps one write asynchronous code. Given its a VM level abstraction, rather than just code level (like what we have been doing till now with CompletableFuture etc), It lets one implement asynchronous behavior but with reduce boiler plate.
With Loom, a more powerful abstraction is the savior. We have seen this repeatedly on how abstraction with syntactic sugar, makes one effectively write programs. Whether it was FunctionalInterfaces in JDK8, for-comprehensions in Scala.
With loom, there isn't a need to chain multiple CompletableFuture's (to save on resources). But one can write the code synchronously. And with each blocking operation encountered (ReentrantLock, i/o, JDBC calls), the virtual-thread gets parked. And because these are light-weight threads, the context switch is way-cheaper, distinguishing itself from kernel-threads.
When blocked, the actual carrier-thread (that was running the run-body of the virtual thread), gets engaged for executing some other virtual-thread's run. So effectively, the carrier-thread is not sitting idle but executing some other work. And comes back to continue the execution of the original virtual-thread whenever unparked. Just like how a thread-pool would work. But here, you have a single carrier-thread in a way executing the body of multiple virtual-threads, switching from one to another when blocked.
We get the same behavior (and hence performance) as manually written asynchronous code, but instead avoiding the boiler-plate to do the same thing.
Consider the case of a web-framework, where there is a separate thread-pool to handle i/o and the other for execution of http requests. For simple HTTP requests, one might serve the request from the http-pool thread itself. But if there are any blocking (or) high CPU operations, we let this activity happen on a separate thread asynchronously.
This thread would collect the information from an incoming request, spawn a CompletableFuture, and chain it with a pipeline (read from database as one stage, followed by computation from it, followed by another stage to write back to database case, web service calls etc). Each one is a stage, and the resultant CompletablFuture is returned back to the web-framework.
When the resultant future is complete, the web-framework uses the results to be relayed back to the client. This is how Play-Framework and others, have been dealing with it. Providing an isolation between the http thread handling pool, and the execution of each request. But if we dive deeper in this, why is it that we do this?
One core reason is to use the resources effectively. Particularly blocking calls. And hence we chain with thenApply etc so that no thread is blocked on any activity, and we do more with less number of threads.
This works great, but quite verbose. And debugging is indeed painful, and if one of the intermediary stages results with an exception, the control-flow goes hay-wire, resulting in further code to handle it.
With Loom, we write synchronous code, and let someone else decide what to do when blocked. Rather than sleep and do nothing.

The http server has a dedicated pool of threads ....
How big of a pool? (Number of CPUs)*N + C? N>1 one can fall back to anti-scaling, as lock contention extends latency; where as N=1 can under-utilize available bandwidth. There is a good analysis here.
The http server just spawns...
That would be a very naive implementation of this concept. A more realistic one would strive for collecting from a dynamic pool which kept one real thread for every blocked system call + one for every real CPU. At least that is what the folks behind Go came up with.
The crux is to keep the {handlers, callbacks, completions, virtual threads, goroutines : all PEAs in a pod} from fighting over internal resources; thus they do not lean on system based blocking mechanisms until absolutely necessary This falls under the banner of lock avoidance, and might be accomplished with various queuing strategies (see libdispatch), etc.. Note that this leaves the PEA divorced from the underlying system thread, because they are internally multiplexed between them. This is your concern about divorcing the concepts. In practice, you pass around your favourite languages abstraction of a context pointer.
As 1 indicates, there are tangible results that can be directly linked to this approach; and a few intangibles. Locking is easy -- you just make one big lock around your transactions and you are good to go. That doesn't scale; but fine-grained locking is hard. Hard to get working, hard to choose the fineness of the grain. When to use { locks, CVs, semaphores, barriers, ... } are obvious in textbook examples; a little less so in deeply nested logic. Lock avoidance makes that, for the most part, go away, and be limited to contended leaf components like malloc().
I maintain some skepticism, as the research typically shows a poorly scaled system, which is transformed into a lock avoidance model, then shown to be better. I have yet to see one which unleashes some experienced developers to analyze the synchronization behavior of the system, transform it for scalability, then measure the result. But, even if that were a win experienced developers are a rare(ish) and expensive commodity; the heart of scalability is really financial.

Is wait set a part of the monitor or are they two separate things?

So i'm trying to understand monitors in java and I came across two new concepts which are object headers and the wait-set.
My question is does the concept of the monitor correspond to a explicit implementation like in terms of code, which is in itself separate from wait set or any other synchronization feature for that matter.
Or is it a concept that encompasses or includes different features such as the wait set feature and locks? so basically what I'm asking is, is the monitor simply a name given to a group of features that control thread access and behavior?

The authors of the Java language consider locks as a part of monitor and wait set as a separate concept (see JLS chapter 17).
On the other hand, wait set is tightly coupled with monitor, so thinking of wait set as a part of monitor is no harm.

Why does Java have no async/await?

Using async/await it is possible to code asynchronous functions in an imperative style. This can greatly facilitate asynchronous programming. After it was first introduced in C#, it was adopted by many languages such as JavaScript, Python, and Kotlin.
EA Async is a library that adds async/await like functionality to Java. The library abstracts away the complexity of working with CompletableFutures.
But why has async/await neither been added to Java SE, nor are there any plans to add it in the future?

The short answer is that the designers of Java try to eliminate the need for asynchronous methods instead of facilitating their use.
According to Ron Pressler's talk asynchronous programming using CompletableFuture causes three main problems.
branching or looping over the results of asynchronous method calls is not possible
stacktraces cannot be used to identify the source of errors, profiling becomes impossible
it is viral: all methods that do asynchronous calls have to be asynchronous as well, i.e. synchronous and asynchronous worlds don't mix
While async/await solves the first problem it can only partially solve the second problem and does not solve the third problem at all (e.g. all methods in C# doing an await have to be marked as async).
But why is asynchronous programming needed at all? Only to prevent the blocking of threads, because threads are expensive. Thus instead of introducing async/await in Java, in project Loom Java designers are working on virtual threads (aka fibers/lightweight threads) which will aim to significantly reduce the cost of threads and thus eliminate the need of asynchronous programming. This would make all three problems above also obsolete.

Better late than never!!!
Java is 10+ years late in trying to come up with lighter weight units of execution which can be executed in parallel. As a side note, Project loom also aims to expose in Java 'delimited continuation' which, I believe is nothing more than good old 'yield' keyword of C# (again almost 20 years late!!)
Java does recognize the need for solving the bigger problem solved by asyn await (or actually Tasks in C# which is the big idea. Async Await is more of a syntactical sugar. Highly significant improvement, but still not a necessity to solve the actual problem of OS mapped Threads being heavier than desired).
Look at the proposal for project loom here: https://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.html
and navigate to last section 'Other Approaches'. You will see why Java does not want to introduce async/await.
Having said this, I don't really agree with the reasoning being provided. Neither in this proposal nor in Stephan's answer.
First let us diagnose Stephan's answer
async await solves point 1 mentioned there. (Stephan also acknowledges it further down the answer)
It is extra work for sure on the part of the framework and tools but not at all on the part of the programmers. Even with async await, .Net debuggers are pretty good in this aspect.
This I only partially agree with. Whole purpose of async await is to elegantly mix asynchronous world with synchronous constructs. But yes, you either need to declare the caller also as async or deal directly with Task in the caller routine. However, project loom will not solve it either in a meaningful way. To fully benefit from the light weight virtual threads, even the caller routine must be getting executed on a virtual thread. Otherwise what's the benefit? You will end up blocking an OS backed thread!!! Hence even virtual threads need to be 'viral' in the code. On the contrary, it will be easier in Java to not notice that the routine you are calling is async and will block the calling thread (which will be concerning if the calling routine is itself not executing on a virtual thread). Async keyword in C# makes the intent very clear and forces you to decide (it is possible in C# to block as well if you want by asking for Task.Result. Most of the time the calling routine can just as easily be async itself).
Stephan is right when he says async programming is needed to prevent blocking of (OS) threads as (OS) threads are expensive. And that's precisely the whole reason why virtual threads (or C# tasks) are needed. You should be able to 'block' on these tasks without losing your sleep. Offcourse to not lose the sleep, either the calling routine itself should be a task or blocking should be on non-blocking IO, with framework being smart enough to not block the calling thread in that case (power of continuation).
C# supports this and proposed Java feature aims to support this.
According to the proposed Java api, blocking on virtual thread will require calling vThread.join() method in Java.
How is it really more beneficial than calling await workDoneByVThread()?
Now let us look at project loom proposal reasoning
Continuations and fibers dominate async/await in the sense that async/await is easily implemented with continuations (in fact, it can be implemented with a weak form of delimited continuations known as stackless continuations, that don't capture an entire call-stack but only the local context of a single subroutine), but not vice-versa
I don't simply understand this statement. If someone does, please let me know in the comments.
For me, async/await are implemented using continuations and as far as stack trace is concerned, since the fibres/virtual threads/tasks are within the virtual machine, it must be possible to manage that aspect. In-fact .net tools do manage that.
While async/await makes code simpler and gives it the appearance of normal, sequential code, like asynchronous code it still requires significant changes to existing code, explicit support in libraries, and does not interoperate well with synchronous code
I have already covered this. Not making significant changes to existing code and no explicit support in libraries will actually mean not using this feature effectively. Until and unless Java is aiming to transparently transform all the threads to virtual threads, which it can't and isn't, this statement does not make sense to me.
As a core idea, I find no real difference between Java virtual threads and C# tasks. To the point that project loom is also aiming for work-stealing scheduler as default, same as the scheduler used by .Net by default (https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler?view=net-5.0, scroll to last remarks section ).
Only debate it seems is on what syntax should be adopted to consume these.
C# adopted
A distinct class and interface as compared to existing threads
Very helpful syntactical sugar for marrying async with sync
Java is aiming for:
Same familiar interface of Java Thread
No special constructs apart from try-with-resources support for ExecutorService so that the result for submitted tasks/virtual threads can be automatically waited for (thus blocking the calling thread, virtual/non-virtual).
IMHO, Java's choices are worse than those of C#. Having a separate interface and class actually makes it very clear that the behavior is a lot different. Retaining same old interface can lead to subtle bugs when a programmer does not realize that she is now dealing with something different or when a library implementation changes to take advantage of the new constructs but ends up blocking the calling (non-virtual) thread.
Also no special language syntax means that reading async code will remain difficult to understand and reason about (I don't know why Java thinks programmers are in love with Java's Thread syntax and they will be thrilled to know that instead of writing sync looking code they will be using the lovely Thread class)
Heck, even Javascript now has async await (with all its 'single-threadedness').

I release a new project JAsync implement async-await fashion in java which use Reactor as its low level framework. It is in the alpha stage. I need more suggest and test case.
This project makes the developer's asynchronous programming experience as close as possible to the usual synchronous programming, including both coding and debugging.
I think my project solves point 1 mentioned by Stephan.
Here is an example:
#RestController
#RequestMapping("/employees")
public class MyRestController {
#Inject
private EmployeeRepository employeeRepository;
#Inject
private SalaryRepository salaryRepository;
// The standard JAsync async method must be annotated with the Async annotation, and return a JPromise object.
#Async()
private JPromise<Double> _getEmployeeTotalSalaryByDepartment(String department) {
double money = 0.0;
// A Mono object can be transformed to the JPromise object. So we get a Mono object first.
Mono<List<Employee>> empsMono = employeeRepository.findEmployeeByDepartment(department);
// Transformed the Mono object to the JPromise object.
JPromise<List<Employee>> empsPromise = Promises.from(empsMono);
// Use await just like es and c# to get the value of the JPromise without blocking the current thread.
for (Employee employee : empsPromise.await()) {
// The method findSalaryByEmployee also return a Mono object. We transform it to the JPromise just like above. And then await to get the result.
Salary salary = Promises.from(salaryRepository.findSalaryByEmployee(employee.id)).await();
money += salary.total;
}
// The async method must return a JPromise object, so we use just method to wrap the result to a JPromise.
return JAsync.just(money);
}
// This is a normal webflux method.
#GetMapping("/{department}/salary")
public Mono<Double> getEmployeeTotalSalaryByDepartment(#PathVariable String department) {
// Use unwrap method to transform the JPromise object back to the Mono object.
return _getEmployeeTotalSalaryByDepartment(department).unwrap(Mono.class);
}
}
In addition to coding, JAsync also greatly improves the debugging experience of async code.
When debugging, you can see all variables in the monitor window just like when debugging normal code. I will try my best to solve point 2 mentioned by Stephan.
For point 3, I think it is not a big problem. Async/Await is popular in c# and es even if it is not satisfied with it.

Trouble understanding Java threads

I learned about multiprocessing from Python and I'm having a bit of trouble understanding Java's approach. In Python, I can say I want a pool of 4 processes and then send a bunch of work to my program and it'll work on 4 items at a time. I realized, with Java, I need to use threads to achieve this same task and it seems to be working really really well so far.
But.. unlike in Python, my cpu(s) aren't getting 100% utilization (they are about 70-80%) and I suspect it's the way I'm creating threads (code is the same between Python/Java and processes are independent). In Java, I'm not sure how to create one thread so I create a thread for every item in a list I want to process, like this:
for (int i = 0; i < 500; i++) {
Runnable task = new MyRunnable(10000000L + i);
Thread worker = new Thread(task);
// We can set the name of the thread
worker.setName(String.valueOf(i));
// Start the thread, never call method run() direct
worker.start();
// Remember the thread for later usage
threads.add(worker);
}
I took it from here. My question is this the correct way to launch threads or is there a way to have Java itself manage the number of threads so it's optimal? I want my code to run as fast as possible and I'm trying to understand how to tell and resolve any issues that maybe arising from too many threads being created.
This is not a major issue, just curious to how it works under the Java hood.

You use an Executor, the implementation of which handles a pool of threads, decides how many, and so forth. See the Java tutorial for lots of examples.
In general, bare threads aren’t used in Java except for very simple things. Instead, there will be some higher-level API that receives your Runnable or Task and knows what to do.

Take a look at the Java Executor API. See this article, for example.
Although creating Threads is much 'cheaper' than it used to be, creating large numbers of threads (one per runnable as in your example) isn't the way to go - there's still an overhead in creating them, and you'll end up with too much context switching.
The Executor API allows you to create various types of thread pool for executing Runnable tasks, so you can reuse threads, flexibly manage the number that are created, and avoid the overhead of thread-per-runnable.
The Java threading model and the Python threading model (not multiprocessing) are really quite similar, incidentally. There isn't a Global Interpreter Lock as in Python, so there's usually less need to fork off multiple processes.

Thread is a "low level" API.
Depending on what you want to do, and the version of java you use, their is better solution.
If you use Java 7, and if your task allow it, you can use the fork/join framework : http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html
However, take a look at the java concurrency tutorial : http://docs.oracle.com/javase/tutorial/essential/concurrency/executors.html

Forcing multiple threads to use multiple CPUs when they are available

I'm writing a Java program which uses a lot of CPU because of the nature of what it does. However, lots of it can run in parallel, and I have made my program multi-threaded. When I run it, it only seems to use one CPU until it needs more then it uses another CPU - is there anything I can do in Java to force different threads to run on different cores/CPUs?

There are two basic ways to multi-thread in Java. Each logical task you create with these methods should run on a fresh core when needed and available.
Method one: define a Runnable or Thread object (which can take a Runnable in the constructor) and start it running with the Thread.start() method. It will execute on whatever core the OS gives it -- generally the less loaded one.
Tutorial: Defining and Starting Threads
Method two: define objects implementing the Runnable (if they don't return values) or Callable (if they do) interface, which contain your processing code. Pass these as tasks to an ExecutorService from the java.util.concurrent package. The java.util.concurrent.Executors class has a bunch of methods to create standard, useful kinds of ExecutorServices. Link to Executors tutorial.
From personal experience, the Executors fixed & cached thread pools are very good, although you'll want to tweak thread counts. Runtime.getRuntime().availableProcessors() can be used at run-time to count available cores. You'll need to shut down thread pools when your application is done, otherwise the application won't exit because the ThreadPool threads stay running.
Getting good multicore performance is sometimes tricky, and full of gotchas:
Disk I/O slows down a LOT when run in
parallel. Only one thread should do disk read/write at a time.
Synchronization of objects provides safety to multi-threaded operations, but slows down work.
If tasks are too
trivial (small work bits, execute
fast) the overhead of managing them
in an ExecutorService costs more than
you gain from multiple cores.
Creating new Thread objects is slow. The ExecutorServices will try to re-use existing threads if possible.
All sorts of crazy stuff can happen when multiple threads work on something. Keep your system simple and try to make tasks logically distinct and non-interacting.
One other problem: controlling work is hard! A good practice is to have one manager thread that creates and submits tasks, and then a couple working threads with work queues (using an ExecutorService).
I'm just touching on key points here -- multithreaded programming is considered one of the hardest programming subjects by many experts. It's non-intuitive, complex, and the abstractions are often weak.
Edit -- Example using ExecutorService:
public class TaskThreader {
class DoStuff implements Callable {
Object in;
public Object call(){
in = doStep1(in);
in = doStep2(in);
in = doStep3(in);
return in;
}
public DoStuff(Object input){
in = input;
}
}
public abstract Object doStep1(Object input);
public abstract Object doStep2(Object input);
public abstract Object doStep3(Object input);
public static void main(String[] args) throws Exception {
ExecutorService exec = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
ArrayList<Callable> tasks = new ArrayList<Callable>();
for(Object input : inputs){
tasks.add(new DoStuff(input));
}
List<Future> results = exec.invokeAll(tasks);
exec.shutdown();
for(Future f : results) {
write(f.get());
}
}
}

When I run it, it only seems to use
one CPU until it needs more then it
uses another CPU - is there anything I
can do in Java to force different
threads to run on different
cores/CPUs?
I interpret this part of your question as meaning that you have already addressed the problem of making your application multi-thread capable. And despite that, it doesn't immediately start using multiple cores.
The answer to "is there any way to force ..." is (AFAIK) not directly. Your JVM and/or the host OS decide how many 'native' threads to use, and how those threads are mapped to physical processors. You do have some options for tuning. For example, I found this page which talks about how to tune Java threading on Solaris. And this page talks about other things that can slow down a multi-threaded application.

First, you should prove to yourself that your program would run faster on multiple cores. Many operating systems put effort into running program threads on the same core whenever possible.
Running on the same core has many advantages. The CPU cache is hot, meaning that data for that program is loaded into the CPU. The lock/monitor/synchronization objects are in CPU cache which means that other CPUs do not need to do cache synchronization operations across the bus (expensive!).
One thing that can very easily make your program run on the same CPU all the time is over-use of locks and shared memory. Your threads should not talk to each other. The less often your threads use the same objects in the same memory, the more often they will run on different CPUs. The more often they use the same memory, the more often they must block waiting for the other thread.
Whenever the OS sees one thread block for another thread, it will run that thread on the same CPU whenever it can. It reduces the amount of memory that moves over the inter-CPU bus. That is what I guess is causing what you see in your program.

First, I'd suggest reading "Concurrency in Practice" by Brian Goetz.
This is by far the best book describing concurrent java programming.
Concurrency is 'easy to learn, difficult to master'. I'd suggest reading plenty about the subject before attempting it. It's very easy to get a multi-threaded program to work correctly 99.9% of the time, and fail 0.1%. However, here are some tips to get you started:
There are two common ways to make a program use more than one core:
Make the program run using multiple processes. An example is Apache compiled with the Pre-Fork MPM, which assigns requests to child processes. In a multi-process program, memory is not shared by default. However, you can map sections of shared memory across processes. Apache does this with it's 'scoreboard'.
Make the program multi-threaded. In a multi-threaded program, all heap memory is shared by default. Each thread still has it's own stack, but can access any part of the heap. Typically, most Java programs are multi-threaded, and not multi-process.
At the lowest level, one can create and destroy threads. Java makes it easy to create threads in a portable cross platform manner.
As it tends to get expensive to create and destroy threads all the time, Java now includes Executors to create re-usable thread pools. Tasks can be assigned to the executors, and the result can be retrieved via a Future object.
Typically, one has a task which can be divided into smaller tasks, but the end results need to be brought back together. For example, with a merge sort, one can divide the list into smaller and smaller parts, until one has every core doing the sorting. However, as each sublist is sorted, it needs to be merged in order to get the final sorted list. Since this is "divide-and-conquer" issue is fairly common, there is a JSR framework which can handle the underlying distribution and joining. This framework will likely be included in Java 7.

There is no way to set CPU affinity in Java. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4234402
If you have to do it, use JNI to create native threads and set their affinity.

You should write your program to do its work in the form of a lot of Callable's handed to an ExecutorService and executed with invokeAll(...).
You can then choose a suitable implementation at runtime from the Executors class. A suggestion would be to call Executors.newFixedThreadPool() with a number roughly corresponding to the number of cpu cores to keep busy.

The easiest thing to do is break your program into multiple processes. The OS will allocate them across the cores.
Somewhat harder is to break your program into multiple threads and trust the JVM to allocate them properly. This is -- generally -- what people do to make use of available hardware.
Edit
How can a multi-processing program be "easier"? Here's a step in a pipeline.
public class SomeStep {
public static void main( String args[] ) {
BufferedReader stdin= new BufferedReader( System.in );
BufferedWriter stdout= new BufferedWriter( System.out );
String line= stdin.readLine();
while( line != null ) {
// process line, writing to stdout
line = stdin.readLine();
}
}
}
Each step in the pipeline is similarly structured. 9 lines of overhead for whatever processing is included.
This may not be the absolute most efficient. But it's very easy.
The overall structure of your concurrent processes is not a JVM problem. It's an OS problem, so use the shell.
java -cp pipline.jar FirstStep | java -cp pipline.jar SomeStep | java -cp pipline.jar LastStep
The only thing left is to work out some serialization for your data objects in the pipeline.
Standard Serialization works well. Read http://java.sun.com/developer/technicalArticles/Programming/serialization/ for hints on how to serialize. You can replace the BufferedReader and BufferedWriter with ObjectInputStream and ObjectOutputStream to accomplish this.

I think this issue is related to Java Parallel Proccesing Framework (JPPF). Using this you can run diferent jobs on diferent processors.

JVM performance tuning has been mentioned before in Why does this Java code not utilize all CPU cores?. Note that this only applies to the JVM, so your application must already be using threads (and more or less "correctly" at that):
http://ch.sun.com/sunnews/events/2009/apr/adworkshop/pdf/5-1-Java-Performance.pdf

You can use below API from Executors with Java 8 version
public static ExecutorService newWorkStealingPool()
Creates a work-stealing thread pool using all available processors as its target parallelism level.
Due to work stealing mechanism, idle threads steal tasks from task queue of busy threads and overall throughput will increase.
From grepcode, implementation of newWorkStealingPool is as follows
/**
* Creates a work-stealing thread pool using all
* {#link Runtime#availableProcessors available processors}
* as its target parallelism level.
* #return the newly created thread pool
* #see #newWorkStealingPool(int)
* #since 1.8
*/
public static ExecutorService newWorkStealingPool() {
return new ForkJoinPool
(Runtime.getRuntime().availableProcessors(),
ForkJoinPool.defaultForkJoinWorkerThreadFactory,
null, true);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.