Is there a difference between concurrency and parallelism in java?

Is there a difference between concurrency and parallelism in java? - java

I have been doing some research in Google and cant quite get my head around the differences (if any) between concurrent and parallel programs in java. Some of the information I have looked at suggests no differences between both. Is this the case??

It depends on who is defining it. The people who created the Go programming language call code Concurrent if it is broken up into pieces which could be treated in parallel, whereas Parallelism implies that those pieces are actually running at the same time.
Since these are programming principles, the programming language has no bearing on how they are defined. However, Java 8 will have more features to enable both concurrency and parallelism without messing up your code too much. For example, code like this:
List<Integer> coolItemIds = new List<Integer>();
for(Item item : getItems())
{
if(item.isCool())
{
int itemId = item.getId();
coolItemIds.add(item);
}
}
... which is non-concurrent and non-parallel, could be written like this (my syntax is probably wrong, but hopefully you get the idea):
Iterable<Item> items = getItems();
Iterable<Item> coolItems = items.filter(item -> item.isCool());
Iterable<Integer> coolItemIds = coolItems.map(item -> item.getId());
The above code is written in a concurrent manner: none of the given code requires that the coolItems be filtered one at a time, or that you can only call getId() on one item at a time, or even that the items at the beginning of the list need to be filtered or mapped before items at the end. Depending on what type of Iterable is returned from getItems(), the given operations may or may not run in parallel, but the code you've written is concurrent.
Also of interest:
Concurrency is not Parallelism (presentation video)
Concurrency is not Parallelism? (Discussion on StackOverflow

I suppose it depends on your definitions, but my understanding goes roughly like this:
Concurrency refers to things happening in some unspecified order. Multitasking - executing multiple programs by interleaving instructions via time slicing - is an good way to think about this sense of concurrency.
Parallelism (or "true" parallelism) refers to things happening at literally the same time. This requires hardware support (coprocessors, multi-core processors, networked machines, etc.). All parallelism is concurrent, but not all concurrency is parallel.
As far as I'm aware, neither term is Java-specific, or has any Java-specific nuances.

Parallelization (or Parallelism or Parallel computing) is a form of computation in which many calculations are carried out simultaneously. In essence, if a CPU intensive problem can be divided in smaller, independent tasks, then those tasks can be assigned to different processors
Concurrency is more about multitasking which is executing many actions but not necessary CPU intensive problem.

I don't think the two terms have well-defined distinct meanings. They're both terms of art rather than technical terms.
That said, the way i interpret them is that something is concurrent if it can be done at the same time as other things, and parallel if it can be done by multiple threads at the same time. I take this usage largely from the JVM garbage collection documentation, which says things like
The concurrent mark sweep collector, also known as the concurrent collector or CMS, is targeted at applications that are sensitive to garbage collection pauses. It performs most garbage collection activity concurrently, i.e., while the application threads are running
and
CMS collector now uses multiple threads to perform the concurrent marking task in parallel on platforms with multiple processors.
Admittedly, this is a very specific context, and it is probably unwise to generalise from it.

If you program using threads (concurrent programming), it's not necessarily going to be executed as such (parallel execution), since it depends on whether the machine can handle several threads.
Here's a visual example. Threads on a non-threaded machine:
-- -- --
/ \
>---- -- -- -- -- ---->>
Threads on a threaded machine:
------
/ \
>-------------->>
The dashes represent executed code. As you can see, they both split up and execute separately, but the threaded machine can execute several separate pieces at once.
Please refer this What is the difference between concurrent programming and parallel programming?

From oracle documentation page:
In a multithreaded process on a single processor, the processor can switch execution resources between threads, resulting in concurrent execution.
In the same multithreaded process in a shared-memory multiprocessor environment, each thread in the process can run on a separate processor at the same time, resulting in parallel execution.
When the process has fewer or as many threads as there are processors, the threads support system in conjunction with the operating environment ensure that each thread runs on a different processor.
Java SE 7 further enhanced parallel processing by adding ForkJoinPool API.
Refer to below posts for more details:
Parallel programming with threads in Java ( Java specific )
Concurrency vs Parallelism - What is the difference? ( Language agnostic)

Concurrency is an architectural design pattern, which allows you to run multiple operations at once (which can but dont have to be executed in parallel).
In case of a single core execution of such operations, parallelizm can be "simulated" by for example context switching ( assuming your programming language uses threads for parallel execution).
Lets say you have two threads, in one you enqueue jobs. Second one awaits untill any job exists and picks it up for execution. Despite using a single core processor, both of them are running and communicating (via queue).
This is a concurrent execution - even through threads are executed sequentially on a single core (share it).
Parallel version of the same exercise would look similar with with one difference:
Execution of threads would happen on a multi-core processor. Threads would be running in parallel to each other and not sequentially (each on his own core).

Question is quite old, but I'd like to summarize these two in a very clear and concise manner:
Concurrency - think of multitasking by one actor:is when x number of processes/threads (x>1) compete for the same resource. In case of concurrency, when two processes/threads are executed on one CPU, they're not really parallel, meaning, that CPU clock will be switching back and forth from process to process in a super fast way, which will depict the illusion of parallelism, but again - it's 1 CPU shared among different processes/threads. Imagine 5 instructions are to be executed, and they compete to get the resource of CPU, in order to get executed;
Parallelism - think of a multiple tasks where each task is taken care by a separate actor:
is when x number of processes/threads (x>1) execute in parallel, at the same time. Imagine if there are 5 processes/threads, and there are 5 CPU cores, which means that each core can independently execute each thread/process.

Related

Number of java thread > number of cores and garbage collection

We are using java 7 and working on multithreaded data crunching application. Due to certain constraint we are not using spark or any other map-reduce way to solve this problem. The idea of this project is maximize the performance of application using multi-threading.
My understanding is that at any given point, considering the CPU is not running any other thing apart from OS, number of the thread working simultaneously will be equal to number of hyper threading that CPU provides. But there is java GC which will kick-in every now and then. We have to consider that as well.
Also, I am aware that if I create more threads then I will actually degrade the performance because of the time spent in context switching.
The question is what would be the best way to consider all these things and create appropriate number of threads. Any idea or thought process? Is there any other process that I should consider?

The question is what would be the best way to consider all these things and create appropriate number of threads
I would use Java 8 which does this for you. e.g.
Results result = listOfWork.parallelStream()
.map(t -> t.doWork())
.collect(Collectors.reduce(.....));
However if you are stuck on Java 7, you can use an ExecutorService.
int procs = Runtime.getRuntime().availableProcessors();
ExecutorService es = Executors.newFixedThreadPool(procs);
But there is java GC which will kick-in every now and then
Unless you are using CMS, it doesn't kick in at the same time, so it doesn't matter what these threads are doing (in terms of tuning your thread pool)
Is there any other process that I should consider?
If you have other processes on the machines which use the CPU a lot you should consider them.

I actually did research on this last semester. When using threads, a good rule of thumb for increased performance for CPU bound processes is to use an equal number of threads as cores, except in the case of a hyper-threaded system in which case one should use twice as many cores. The other rule of thumb that can be concluded is for I/O bound processes. This rule is to quadruple the number threads per core, except for the case of a hyper-threaded system than one can quadruple the number of threads per core.

Java Fork/Join vs Multithreading in the Multicore World

Couldn't just get why Fork Join is better in terms of multicore utilisation.
An example to illustrate this (Just a theoretical one):
I have an array of webservice endpoints: [E1, E2, E3, E4]
Let's assume each endpoint returns a number.
I have to then sum up the total and return the result.
With this simple story in mind.
I have 2 options:
ExecutorService fixedThreadPool of 4 and span these 4 calls in parallel.
Fork Join framework with 4 tasks.
Assume I have 4 cores.
With Executor service, 4 JVM threads are created and AFAIK, it's completely upto the OS to schedule them on the same core or multiple cores.
If they're scheduled on the same core, then we have the problem of under-utilised cores.
If they're scheduled on different cores, then we're laughing!
All I'm trying to get at is this bit of uncertainty around using multiple cores.
How does Fork Join get around this? Does it internally pass some kind of magical instructions to the OS to use multiple cores?
If my above example is not relevant to draw a comparison between Fork Join vs Executors, how does Fork Join claim that it utilises cores much efficiently than traditional multithreading.

Fork/Join is faster than simple ExecutorSevice only if the task can be recursively subdivided into smaller tasks. In this case Fork/Join can use work stealing in order to ensure that all CPUs are optimally used.
In your case it is not faster.
EDIT: in addition to work stealing, the scheduling of the tasks can be also faster. As the Wikipedia puts it:
The lightweight threads used in fork–join programming will typically have their own scheduler (typically a work stealing one) that maps them onto the underlying thread pool. This scheduler can be much simpler than a fully featured, preemptive operating system scheduler: general-purpose thread schedulers must deal with blocking for locks, but in the fork–join paradigm, threads only block at the join point.

Fork/join is a higher level of abstraction. Usually, if you use higher-level libraries to solve your problem, then you will solve your problem with fewer lines of new code. That's always a Good Thing.
Higher-level usually means more specialized though. Fork/join may or may not fit your problem. If it doesn't fit, then you could waste a lot of effort and make an awful mess by trying to make it fit.
If it does fit though, and you don't use it, then chances are, you will be re-inventing some wheel. That's never a good thing.

What is a Thread? How many threads does my program have? [duplicate]

I have been trying to find a good definition, and get an understanding, of what a thread really is.
It seems that I must be missing something obvious, but every time I read about what a thread is, it's almost a circular definition, a la "a thread is a thread of execution" or " a way to divide into running tasks". Uh uh. Huh?
It seems from what I have read that a thread is not really something concrete, like a process is. It is in fact just a concept. From what I understand of the way this works, a processor executes some commands for a program (which has been termed a thread of execution), then when it needs to switch to processing for some other program for a bit, it stores the state of the program it's currently executing for somewhere (Thread Local Storage) and then starts executing the other program's instructions. And back and forth. Such that, a thread is really just a concept for "one of the paths of execution" of a program that is currently running.
Unlike a process, which really is something - it is a conglomeration of resources, etc.
As an example of a definition that didn't really help me much . . .
From Wikipedia:
"A thread in computer science is short for a thread of execution. Threads are a way for a program to divide (termed "split") itself into two or more simultaneously (or pseudo-simultaneously) running tasks. Threads and processes differ from one operating system to another but, in general, a thread is contained inside a process and different threads in the same process share same resources while different processes in the same multitasking operating system do not."
So am I right? Wrong? What is a thread really?
Edit: Apparently a thread is also given its own call stack, so that is somewhat of a concrete thing.

A thread is an execution context, which is all the information a CPU needs to execute a stream of instructions.
Suppose you're reading a book, and you want to take a break right now, but you want to be able to come back and resume reading from the exact point where you stopped. One way to achieve that is by jotting down the page number, line number, and word number. So your execution context for reading a book is these 3 numbers.
If you have a roommate, and she's using the same technique, she can take the book while you're not using it, and resume reading from where she stopped. Then you can take it back, and resume it from where you were.
Threads work in the same way. A CPU is giving you the illusion that it's doing multiple computations at the same time. It does that by spending a bit of time on each computation. It can do that because it has an execution context for each computation. Just like you can share a book with your friend, many tasks can share a CPU.
On a more technical level, an execution context (therefore a thread) consists of the values of the CPU's registers.
Last: threads are different from processes. A thread is a context of execution, while a process is a bunch of resources associated with a computation. A process can have one or many threads.
Clarification: the resources associated with a process include memory pages (all the threads in a process have the same view of the memory), file descriptors (e.g., open sockets), and security credentials (e.g., the ID of the user who started the process).

A thread is an independent set of values for the processor registers (for a single core). Since this includes the Instruction Pointer (aka Program Counter), it controls what executes in what order. It also includes the Stack Pointer, which had better point to a unique area of memory for each thread or else they will interfere with each other.
Threads are the software unit affected by control flow (function call, loop, goto), because those instructions operate on the Instruction Pointer, and that belongs to a particular thread. Threads are often scheduled according to some prioritization scheme (although it's possible to design a system with one thread per processor core, in which case every thread is always running and no scheduling is needed).
In fact the value of the Instruction Pointer and the instruction stored at that location is sufficient to determine a new value for the Instruction Pointer. For most instructions, this simply advances the IP by the size of the instruction, but control flow instructions change the IP in other, predictable ways. The sequence of values the IP takes on forms a path of execution weaving through the program code, giving rise to the name "thread".

In order to define a thread formally, we must first understand the boundaries of where a thread operates.
A computer program becomes a process when it is loaded from some store into the computer's memory and begins execution. A process can be executed by a processor or a set of processors. A process description in memory contains vital information such as the program counter which keeps track of the current position in the program (i.e. which instruction is currently being executed), registers, variable stores, file handles, signals, and so forth.
A thread is a sequence of such instructions within a program that can be executed independently of other code. The figure shows the concept:
Threads are within the same process address space, thus, much of the information present in the memory description of the process can be shared across threads.
Some information cannot be replicated, such as the stack (stack pointer to a different memory area per thread), registers and thread-specific data. This information suffices to allow threads to be scheduled independently of the program's main thread and possibly one or more other threads within the program.
Explicit operating system support is required to run multithreaded programs. Fortunately, most modern operating systems support threads such as Linux (via NPTL), BSD variants, Mac OS X, Windows, Solaris, AIX, HP-UX, etc. Operating systems may use different mechanisms to implement multithreading support.
Here, you can find more information about the topic. That was also my information-source.
Let me just add a sentence coming from Introduction to Embedded System by Edward Lee and Seshia:
Threads are imperative programs that run concurrently and share a memory space. They can access each others’ variables. Many practitioners in the field use the term “threads” more narrowly to refer to particular ways of constructing programs that share memory, [others] to broadly refer to any mechanism where imperative programs run concurrently and share memory. In this broad sense, threads exist in the form of interrupts on almost all microprocessors, even without any operating system at all (bare iron).

Processes are like two people using two different computers, who use the network to share data when necessary. Threads are like two people using the same computer, who don't have to share data explicitly but must carefully take turns.
Conceptually, threads are just multiple worker bees buzzing around in the same address space. Each thread has its own stack, its own program counter, etc., but all threads in a process share the same memory. Imagine two programs running at the same time, but they both can access the same objects.
Contrast this with processes. Processes each have their own address space, meaning a pointer in one process cannot be used to refer to an object in another (unless you use shared memory).
I guess the key things to understand are:
Both processes and threads can "run at the same time".
Processes do not share memory (by default), but threads share all of their memory with other threads in the same process.
Each thread in a process has its own stack and its own instruction pointer.

I am going to use a lot of text from the book Operating Systems Concepts by ABRAHAM SILBERSCHATZ, PETER BAER GALVIN and GREG GAGNE along with my own understanding of things.
Process
Any application resides in the computer in the form of text (or code).
We emphasize that a program by itself is not a process. A program is a
passive entity, such as a file containing a list of instructions stored on disk
(often called an executable file).
When we start an application, we create an instance of execution. This instance of execution is called a process.
EDIT:(As per my interpretation, analogous to a class and an instance of a class, the instance of a class being a process. )
An example of processes is that of Google Chrome.
When we start Google Chrome, 3 processes are spawned:
• The browser process is responsible for managing the user interface as
well as disk and network I/O. A new browser process is created when
Chrome is started. Only one browser process is created.
• Renderer processes contain logic for rendering web pages. Thus, they
contain the logic for handling HTML, Javascript, images, and so forth.
As a general rule, a new renderer process is created for each website
opened in a new tab, and so several renderer processes may be active
at the same time.
• A plug-in process is created for each type of plug-in (such as Flash
or QuickTime) in use. Plug-in processes contain the code for the
plug-in as well as additional code that enables the plug-in to
communicate with associated renderer processes and the browser
process.
Thread
To answer this I think you should first know what a processor is. A Processor is the piece of hardware that actually performs the computations.
EDIT: (Computations like adding two numbers, sorting an array, basically executing the code that has been written)
Now moving on to the definition of a thread.
A thread is a basic unit of CPU utilization; it comprises a thread ID, a program
counter, a register set, and a stack.
EDIT: Definition of a thread from intel's website:
A Thread, or thread of execution, is a software term for the basic ordered sequence of instructions that can be passed through or processed by a single CPU core.
So, if the Renderer process from the Chrome application sorts an array of numbers, the sorting will take place on a thread/thread of execution. (The grammar regarding threads seems confusing to me)
My Interpretation of Things
A process is an execution instance. Threads are the actual workers that perform the computations via CPU access. When there are multiple threads running for a process, the process provides common memory.
EDIT:
Other Information that I found useful to give more context
All modern day computer have more than one threads. The number of threads in a computer depends on the number of cores in a computer.
Concurrent Computing:
From Wikipedia:
Concurrent computing is a form of computing in which several computations are executed during overlapping time periods—concurrently—instead of sequentially (one completing before the next starts). This is a property of a system—this may be an individual program, a computer, or a network—and there is a separate execution point or "thread of control" for each computation ("process").
So, I could write a program which calculates the sum of 4 numbers:
(1 + 3) + (4 + 5)
In the program to compute this sum (which will be one process running on a thread of execution) I can fork another process which can run on a different thread to compute (4 + 5) and return the result to the original process, while the original process calculates the sum of (1 + 3).

This was taken from a Yahoo Answer:
A thread is a coding construct
unaffect by the architecture of an
application. A single process
frequently may contain multiple
threads. Threads can also directly
communicate with each other since they
share the same variables.
Processes are independent execution
units with their own state
information. They also use their own
address spaces and can only interact
with other processes through
interprocess communication mechanisms.
However, to put in simpler terms threads are like different "tasks". So think of when you are doing something, for instance you are writing down a formula on one paper. That can be considered one thread. Then another thread is you writing something else on another piece of paper. That is where multitasking comes in.
Intel processors are said to have "hyper-threading" (AMD has it too) and it is meant to be able to perform multiple "threads" or multitask much better.
I am not sure about the logistics of how a thread is handled. I do recall hearing about the processor going back and forth between them, but I am not 100% sure about this and hopefully somebody else can answer that.

A thread is nothing more than a memory context (or how Tanenbaum better puts it, resource grouping) with execution rules. It's a software construct. The CPU has no idea what a thread is (some exceptions here, some processors have hardware threads), it just executes instructions.
The kernel introduces the thread and process concept to manage the memory and instructions order in a meaningful way.

A thread is a set of (CPU)instructions which can be executed.
But in order to have a better understanding of what a thread is, some computer architecture knowledge is required.
What a computer does, is to follow instructions and manipulate data.
RAM is the place where the instructions and data are saved, the processor uses those instructions to perform operations on the saved data.
The CPU has some internal memory cells called, registers. It can perform simple mathematical operations with numbers stored in these registers. It can also move data between the RAM and these registers. These are examples of typical operations a CPU can be instructed to execute:
Copy data from memory position #220 into register #3
Add the number in register #3 to the number in register #1.
The collection of all operations a CPU can do is called instruction set. Each operation in the instruction set is assigned a number. Computer code is essentially a sequence of numbers representing CPU operations. These operations are stored as numbers in the RAM. We store input/output data, partial calculations, and computer code, all mixed together in the RAM.
The CPU works in a never-ending loop, always fetching and executing an instruction from memory. At the core of this cycle is the PC register, or Program Counter. It's a special register that stores the memory address of the next instruction to be executed.
The CPU will:
Fetch the instruction at the memory address given by the PC,
Increment the PC by 1,
Execute the instruction,
Go back to step 1.
The CPU can be instructed to write a new value to the PC, causing the execution to branch, or "jump" to somewhere else in the memory. And this branching can be conditional. For instance, a CPU instruction could say: "set PC to address #200 if register #1 equals zero". This allows computers to execute stuff like this:
if x = 0
compute_this()
else
compute_that()
Resources used from Computer Science Distilled.

The answer varies hugely across different systems and different implementations, but the most important parts are:
A thread has an independent thread of execution (i.e. you can context-switch away from it, and then back, and it will resume running where it was).
A thread has a lifetime (it can be created by another thread, and another thread can wait for it to finish).
It probably has less baggage attached than a "process".
Beyond that: threads could be implemented within a single process by a language runtime, threads could be coroutines, threads could be implemented within a single process by a threading library, or threads could be a kernel construct.
In several modern Unix systems, including Linux which I'm most familiar with, everything is threads -- a process is merely a type of thread that shares relatively few things with its parent (i.e. it gets its own memory mappings, its own file table and permissions, etc.) Reading man 2 clone, especially the list of flags, is really instructive here.

Unfortunately, threads do exist. A thread is something tangible. You can kill one, and the others will still be running. You can spawn new threads.... although each thread is not its own process, they are running separately inside the process. On multi-core machines, 2 threads could run at the same time.
http://en.wikipedia.org/wiki/Simultaneous_multithreading
http://www.intel.com/intelpress/samples/mcp_samplech01.pdf

Just as a process represents a virtual computer, the thread
abstraction represents a virtual processor.
So threads are an abstraction.
Abstractions reduce complexity. Thus, the first question is what problem threads solve. The second question is how they can be implemented.
As to the first question: Threads make implementing multitasking easier. The main idea behind this is that multitasking is unnecessary if every task can be assigned to a unique worker. Actually, for the time being, it's fine to generalize the definition even further and say that the thread abstraction represents a virtual worker.
Now, imagine you have a robot that you want to give multiple tasks. Unfortunately, it can only execute a single, step by step task description. Well, if you want to make it multitask, you can try creating one big task description by interleaving the separate tasks you already have. This is a good start but the issue is that the robot sits at a desk and puts items on it while working. In order to get things right, you cannot just interleave instructions but also have to save and restore the items on the table.
This works, but now it's hard to disentangle the separate tasks by simply looking at the big task description that you created. Also, the ceremony of saving and restoring the items on the tabe is tedious and further clutters the task description.
Here is where the thread abstraction comes in and saves the day. It lets you assume that you have an infinite number of robots, each sitting in a different room at its own desk. Now, you can just throw task descriptions in a pot and everything else is taken care of by the thread abstraction's implementer. Remember? If there are enough workers, nobody has to multitask.
Often it is useful to indicate your perspective and say robot to mean real robots and virtual robot to mean the robots the thread abstraction provides you with.
At this point the problem of multitasking is solved for the case when the tasks are fully independent. However, wouldn't it be nice to let the robots go out of their rooms, interact and work together towards a common goal? Well, as you probably guessed, this requires coordination. Traffic lights, queues - you name it.
As an intermediate summary, the thread abstraction solves the problem of multitasking and creates an opportunity for cooperation. Without it, we only had a single robot, so cooperation was unthinkable. However, it has also brought the problem of coordination (synchronization) on us. Now we know what problem the tread abstraction solves and, as a bonus, we also know what new challenge it creates.
But wait, why do we care about multitasking in the first place?
First, multitasking can increase performance if the tasks involve waiting. For example, while the washing machine is running, you can easily start preparing dinner. And while your dinner is in the over, you can hang out the clothes. Note that here you wait because an independent component does the job for you. Tasks that involve waiting are called I/O bound tasks.
Second, if multitasking is done rapidly, and you look at it from a bird's eyes view, it appears as parallelism. It's a bit like how the human eye perceives a series of still images as motion if shown in quick succession. If I write a letter to Alice for one second and to Bob for one second as well, can you tell if I wrote the two letters simultaneously or alternately, if you only look at what I'm doing every two seconds? Search for Multitasking Operating System for more on this.
Now, let's focus on the question of how the thread abstraction can be implemented.
Essentially, implementing the thread abstraction is about writing a task, a main task, that takes care of scheduling all the other tasks.
A fundamental question to ask is: If the scheduler schedules all tasks and the scheduler is also a task, then who schedules the scheduler?
Let's brake this down. Say you write a scheduler, compile it and load it into the main memory of a computer at the address 1024, which happens to be the address that is loaded into the processor's instruction pointer when the computer is started. Now, your scheduler goes ahead and finds some tasks sitting precompiled in the main memory. For example, a task starts at the address 1,048,576. The scheduler wants to execute this task so it loads the task's address (1,048,576) into the instruction pointer. Huh, that was quite an ill considered move because now the scheduler has no way to regain control from the task it has just started.
One solution is to insert jump instructions to the scheduler (address 1024) into the task descriptions before execution. Actually, you shouldn't forget to save the items on the desk the robot is working at, so you also have to save the processor's registers before jumping. The issue here is that it is hard to tell where to insert the jump instructions. If there are too many, they create too much overhead and if there are too few of them, one task might monopolize the processor.
A second approach is to ask the task authors to designate a few places where control can be transferred back to the scheduler. Note that the authors don't have to write the logic for saving the registers and inserting the jump instruction because it suffices that they mark the appropriate places and the scheduler takes care of the rest. This looks like a good idea because task authors probably know that, for example, their task will wait for a while after loading and starting a washing machine, so they let the scheduler take control there.
The problem that neither of the above approaches solve is that of an erroneous or malicious task that, for example, gets caught up in an infinite loop and never jumps to the address where the scheduler lives.
Now, what to do if you cannot solve something in software? Solve it in hardware! What is needed is a programmable circuitry wired up to the processor that acts like an alarm clock. The scheduler sets a timer and its address (1024) and when the timer runs out, the alarm saves the registers and sets the instruction pointer to the address where the scheduler lives. This approach is called preemptive scheduling.
Probably by now you start to sense that implementing the thread abstraction is not like implementing a linked list. The most well-known implementers of the thread abstraction are operating systems. The threads they provide are sometimes called kernel-level threads. Since an operating system cannot afford losing control, all major, general-purpose operating systems uses preemptive scheduling.
Arguably, operating systems feel like the right place to implement the thread abstraction because they control all the hardware components and can suspend and resume threads very wisely. If a thread requests the contents of a file stored on a hard drive from the operating system, it immediately knows that this operation will most likely take a while and can let another task occupy the processor in the meanwhile. Then, it can pause the current task and resume the one that made the request, once the file's contents are available.
However, the story doesn't end here because threads can also be implemented in user space. These implementers are normally compilers. Interestingly, as far as I know, kernel-level threads are as powerful as threads can get. So why do we bother with user-level threads? The reason, of course, is performance. User-level threads are more lightweight so you can create more of them and normally the overhead of pausing and resuming them is small.
User-level threads can be implemented using async/await. Do you remember that one option to achieve that control gets back to the scheduler is to make task authors designate places where the transition can happen? Well, the async and await keywords serve exactly this purpose.
Now, if you've made it this far, be prepared because here comes the real fun!
Have you noticed that we barely talked about parallelism? I mean, don't we use threads to run related computations in parallel and thereby increase throughput? Well, not quiet.. Actually, if you only want parallelism, you don't need this machinery at all. You just create as many tasks as the number of processing units you have and none of the tasks has to be paused or resumed ever. You don't even need a scheduler because you don't multitask.
In a sense, parallelism is an implementation detail. If you think about it, implementers of the thread abstraction can utilize as many processors as they wish under the hood. You can just compile some well-written multithreaded code from 1950, run it on a multicore today and see that it utilizes all cores. Importantly, the programmer who wrote that code probably didn't anticipate that piece of code being run on a multicore.
You could even argue that threads are abused when they are used to achieve parallelism: Even though people know they don't need the core feature, multitasking, they use threads to get access to parallelism.
As a final thought, note that user-level threads alone cannot provide parallelism. Remember the quote from the beginning? Operating systems run programs inside a virtual computer (process) that is normally equipped with a single virtual processor (thread) by default. No matter what magic you do in user space, if your virtual computer has only a single virtual processor, you cannot run code in parallel.
So what do we want? Of course, we want parallelism. But we also want lightweight threads. Therefore, many implementers of the thread abstraction started to use a hybrid approach: They start as many kernel-level threads as there are processing units in the hardware and run many user-level threads on top of a few kernel-level threads. Essentially, parallelism is taken care of by the kernel-level and multitasking by the user-level threads.
Now, an interesting design decision is what threading interface a language exposes. Go, for example, provides a single interface that allows users to create hybrid threads, so called goroutines. There is no way to ask for, say, just a single kernel-level thread in Go. Other languages have separate interfaces for different kinds of threads. In Rust, kernel-level threads live in the standard library, while user-level and hybrid threads can be found in external libraries like async-std and tokio. In Python, the asyncio package provides user-level threads while multithreading and multiprocessing provide kernel-level threads. Interestingly, the threads multithreading provides cannot run in parallel. On the other hand, the threads multiprocessing provides can run in parallel but, as the library's name suggests, each kernel-level thread lives in a different process (virtual machine). This makes multiprocessing unsuitable for certain tasks because transferring data between different virtual machines is often slow.
Further resources:
Operating Systems: Principles and Practice by Thomas and Anderson
Concurrency is not parallelism by Rob Pike
Parallelism and concurrency need different tools
Asynchronous Programming in Rust
Inside Rust's Async Transform
Rust's Journey to Async/Await
What Color is Your Function?
Why goroutines instead of threads?
Why doesn't my program run faster with more CPUs?
John Reese - Thinking Outside the GIL with AsyncIO and Multiprocessing - PyCon 2018

Java threads are not actually executed in parallel?

Until now I was under the impression that 2 threads that start in the same time are also executed in parallel (both running their piece of codes in the same time), but I read some documentation recently and I understood that they actually take turns on the execution of their code, so there is no piece of code for first thread executed in the same time as a piece of code from the second thread.
Is my understanding correct?
If yes, then how multi-threading is faster then one thread execution?
I'm asking this because the only difference is that a single thread executes the code sequential, while multithreading can take turns on the execution, but still should take the same amount of time since it's nothing done in parallel

a) on multi-processor machines, threads can actually run in parallel (one per CPU)
b) If your thread calls Thread.sleep() while waiting for IO etc., it makes resources available to other threads. So multi-threaded applications are actually faster than single-threaded ones when dealing with external resources

Java threads are executed in parallel if there are enough CPUs available for a JVM. You can't run 2 computations on a machine with a single computing element at the same time, so this computing element is used either by first, or by second computation at any given time. Probably what you've read concerned this circumstances.

No, Java threads are executed in parallel (unlike some other platforms like CPython). However, whether that gives performance improvements depends on the code you execute.
If you test with easily parallelizable & CPU intensive tasks like calculating PI with a parallelizable algorithm or resizing lots of images etc., you can easily demonstrate that performance can be increased basically linearly (if you have 2 CPUs = x2, 4 CPUs = x4 etc.)
EDIT:
When you only have one CPU, multi-threading is still beneficial. For example, you can have one thread reading images from the disk while the other thread resizes the images. This will also improve the performance because you can utilize the CPU without waste.
EDIT2:
When you read and resize images (note the plural) in a single thread, then you will see that CPU usage won't be 100% at all times. This is because while the thread is reading from file, it can't perform the resizing. If you had more than one thread, by the time a resize has finished another file would have been ready in-memory. If you are dealing with big images, it's relatively easy to peg the CPU at 100% with this design.

Well the answer of you question depends on the number of CPU a system has .
Keep in mind that a single CPU can process only one thread at a time but the context switching between the threads is so fast that it seems that the threads are running concurrently.
On your second question If yes, then how multi-threading is faster then one thread execution?
Mutlithreading utilizes the CPU cycles . Say if one thread is blocked on some resource , other threads might get a chance to run .
On a side note , go through this blog page if you want to see some basic multithreading tutorials http://javasolutionsonline.blogspot.in/p/java-concurrency.html

umm..threads do run in parallel...but not in your conventional pcs that had single cores..
if you have a multi core chip or many CPUs , then they can run in parallel..
imagine one thread running on every of the quad-cores...
thread give u many other advantages as well , as you must already know

Is concurrent programming more grided or clustered?

I'm trying to wrap my brain around parallel/concurrent programming (in Java) and am getting hung up on some fundamentals that don't seem to be covered in any of the tutorials I've been reading.
When we talk about "multi-threading", or "parallel/concurrent programming", does that mean we're taking a big problem and spreading it over many threads, or are we first explicitly decomposing it into smaller sub-problems, and passing each sub-problem to its own thread?
For example, let's say we have EndWorldHungerTask implements Runnable, and task accomplishes some enormous problem. In order to complete its objective, it has to do some really heavy lifting, say, a hundred million times:
public class EndWorldHungerTask implements Runnable {
public void run() {
for(int i = 0; i < 100000000; i++)
someReallyExpensiveOperation();
}
}
In order to make this "concurrent" or "multi-threaded", would we pass this EndWorldHungerTask to, say, 100 worker threads (where each of the 100 workers are told by the JVM when to be active and work on the next iteration/someReallyExpensiveOperation() call), or would we refactor it manually/explicitly so that each of the 100 workers is iterating over different parts of the loop/work-to-be-done? In both cases, each of the 100 workers is only iterating a million times.
But, under the first paradigm, Java is telling each Thread when to execute. Under the second, the developer needs to manually (in the code) partition the problem ahead of time, and assign each sub-problem to a new Thread.
I guess I'm asking how its "normally done" in Java land. And, not just for this problem, but in general.

I guess I'm asking how its "normally done" in Java land. And, not just for this problem, but in general.
This is highly dependent on the task at hand.
The standard paradigm in Java is that you have to split the work into chunks yourself. Distributing those chunks across multiple threads/cores is a separate problem, and there exist a variety of patterns for that (queues, thread pools, etc).
It is interesting to note that there exist frameworks that can automatically make use of multiple cores to execute things like for loops in parallel (for example, OpenMP). However, I am not aware of any such frameworks for Java.
Finally, it could be the case that the low-level library that does the bulk of the work can make use of multiple cores. In such a case the higher-level code may be able to remain single-threaded and still benefit from multicore hardware. One example might be numerical code using MKL under the covers.

When we talk about "multi-threading", or "parallel/concurrent programming", does that mean we're taking a big problem and spreading it over many threads, or are we first explicitly decomposing it into smaller sub-problems, and passing each sub-problem to its own thread?
I think this depends highly on the problem. There are times where you have the same task that you call 1000s or millions of times using the same code. This is the ExecutorSerivce.submit() type of pattern. You has million of lines from a file and you are running some processing methods on each line. I guess this is your "spreading it over many threads" type of problem. This works for simple thread models.
But there are other cases where the problem space is made up of a large number of non-homogenous tasks. Sometimes you might spawn a single thread to handle some background keep-alive, and other times a thread pool here and there to process some queue of work. Typically the larger the scope of the problem, the more complicated the concurrency model and the more different types of pools and threads are used. I guess this is your "decomposing it into smaller sub-problems" type.
In order to make this "concurrent" or "multi-threaded", would we pass this EndWorldHungerTask to, say, 100 worker threads (where each of the 100 workers are told by the JVM when to be active and work on the next iteration/someReallyExpensiveOperation() call), or would we refactor it manually/explicitly so that each of the 100 workers is iterating over different parts of the loop/work-to-be-done? In both cases, each of the 100 workers is only iterating a million times.
In your case, I don't see how you can solve world hunger (to use your analogy) with one set of thread code. I think that you have to "decompose it into smaller sub-problems" which corresponds to the latter case that I explain above: a whole series of threads running different code. Some of the sub-solutions can be done in thread-pools and some will be done with individual threads, each running separate code.
I guess I'm asking how its "normally done" in Java land. And, not just for this problem, but in general.
"Normally" depends highly on the problem and its complexity. In my experience, I normally use the ExecutorService constructs as much as possible. But with any decent sized problem you will find yourself with a number of different thread-pools, Spring timer threads, custom one-off thread tasks, producer/consumer models, etc., etc..

Normally you would want each thread to execute one task form start to finish, you would gain nothing from leaving the task half done, then halting execution on that thread and "calling" another thread to finish the job. Java offers of course tools for this kind of thread synchronization, but they are really used when a task is depending on another task to complete - not so that another thread may complete the task.
Most of the time you will have a big problem, that consists of several tasks, if this tasks can be executed concurrently then it would make sense to spawn threads to execute this tasks. There is an overhead associated with creating threads, so if all the tasks are sequential and must wait for the other to finish, then it would not be beneficial at all to spawn multiple threads, just one thread so you don't block the main thread.

"multi-threading" <> "parallel/concurrent programming".
Multithreaded apps are often written to take advantage of the high I/O performance of a preemptive multitasker. An example might be a web crawler/downloader. A multithreaded crawler would typically outperform a single-threaded version by a huge factor, even when running on a box with only one CPU core. The actions of a DNS query to get a site address, connecting to the site, downloading a page, writing it to a disk file are all operations that require little CPU but a lot of IO waiting. So, a lot of these unavoidable waits can be performed in parallel by many threads. When a DNS query comes in, an HTTP client connects or a disk operation is complete, the thread that requested it is made ready/running and can move on to the next operation.
The vast majority of apps are, primarily, written as multithreaded for this reason. That's why the box I'm writing this on has 98 processes, (of which 94 have more than one thread), 1360 threads and 3% CPU use - it's got little to do with splitting CPU work up across cores - it's mostly about IO performance.
Parallel/concurrent programming can actually take place with multiple CPU cores. For those apps that have CPU-intensive work that can be decomposed into largish packages for distribution across cores, a speedup factor approaching the number of cores is possible with care.
Naturally there is some bleedover - the I/O bound web-crawler will tend to perform better on a box with more cores, if only because the interrupt/driver overhead has a smaller impact on overall performance, but it wont be better by much.
It doesn't matter how many workers you have available for the EndWorldHunger Task if they are all waiting for the crops to grow.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.