Minimum Time given to a thread

Minimum Time given to a thread - java

My question is regarding threads in general(ex: in Java).
The problem : when a thread is being in (Runnable - state) - i.e: it is executing, and it is giving an instruction (say by invoking the method addOneToX(int x)), is it possible for the thread to quit or stop its work before finishing the instruction but after it has started executing it. in other words, most of instructions in high-level languages are decoded into the machine-specific language and decomposed to a number of machine-cycles (clock-cycles) in the CPU. So I guess it is clear, accordingly:
1> What is the minimum time given to a thread to be in the Runnable state?
2> How does the thread save its state so that it comes to it later? (i.e: when it quits Runnable state and comes back to it later to continue from where it stopped)

There is no guaranteed minimum time.
The scheduler decides what the time slice will be. Usually you could expect anything from fractions of a millisecond to about 100ms. But often this value will by dynamic. Also, a thread might even encounter an extreme like running only one instruction which happens to be I/O, upon which the thread blocks and is pushed out of CPU.
The high-level language instructions are eventually translated into (possibly) multiple CPU instructions. The CPU instruction is the atomic part that will be executed without interruption, other than that the program may be interrupted at any place between two instructions, even in the middle of a high-level language command. Note that there are some specific CPU instructions (like atomic get-and-set or get-and-increase) that can be used for thread synchronization.
The basic (much simplified) idea of storing the thread's state is: store the registers in RAM and store the pointer to current instruction.

What is the minimum time given to a thread to be in the Runnable state?
Most practical Java implementations use native threads: That is, they allow the operating system to take care of the details of scheduling threads. Most modern operating systems offer a choice of various thread-scheduling algorithms, and most algorithms offer a number of configurable parameters. There is no single answer to your question.
Almost certainly, it will be less than one second. Quite likely less than 100 milliseconds. Other than that, it's hard to say.
How does the thread save its state so that it comes to it later?
The state of a thread (in most programming languages, Java included) consists of its call stack, and it's CPU registers. The CPU registers include one that points to the top of the call stack, one that points to the current instruction, and usually others.
When it's time to switch threads, the OS interrupts the processor (an interrupt basically forces an immediate function call) and the interrupt handler routine saves all of the CPU registers to a memory location that is reserved for the current thread. Then it restores the registers for some other thread and basically "returns" to wherever that other thread was interrupted.

Related

Java memory model: single thread and multi-core CPU

In a Java application, if accesses to an object's state happen on the same thread (in simplest case, in a single-threaded application), there is no need to synchronize to enforce visibility/consistency of changes, as per happens-before relationship specification:
"Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.
If we have two actions x and y, we write hb(x, y) to indicate that x happens-before y.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y)."
But modern architectures are multi-core, so Java thread can potentially execute on any of them at any given time (unless this is not true and Java threads are pinned to specific cores?). So if that is teh case, if a thread writes to variable x, caches that in L1 cache or CPU register, and then starts running on another core, that previously accessed x and also cached it in a register, that value is incosistent... Is there some mechanism (implicit memory barrier) when thread is taken off a CPU?

can potentially execute on any of them at any given time
Tasks don't just spontaneously migrate between cores. These things have to happen:
the task is pre-empted on the core it was previously running on (marking it as waiting to run in the kernel's global task list)
the kernel's task scheduler on another core sees that task waiting for a CPU and decided to run it.
(Scheduling is a distributed algorithm; each core is effectively running the kernel on that core very much like a multi-threaded process. One core can't tell another core what to do, only put data in memory where the kernel running on that core can look at it.)
This isn't a problem because:
Data caches (L1 and so on) are coherent across all cores that a thread could be scheduled on by an OS. Myths Programmers Believe about CPU Caches
Or on hypothetical and unlikely hardware + OS + JVM that runs threads across cores with non-coherent shared memory, the OS would have to flush dirty private cache back to actual shared memory at some point after stopping the task on one core, before putting it in a global task queue where the task scheduler on another core could run it.
Is there some mechanism (implicit memory barrier) when thread is taken off a CPU?
On a real-world system (coherent caches), the OS just has to make sure there's a full memory barrier that drains the store buffer on one core before another core can resume the task.
That barrier is not always implicit as part of something the OS was going to do anyway; the OS kernel might need to explicitly include a barrier just for this. However, saving the register state and marking the task as runnable probably needed at least release stores, so another core that could restore this task's state would also see all user-space stores that task had done.
Still, I've heard of the possibility of breaking a single-threaded process by migrating between CPUs without sufficient barriers. It is something to think about for an OS. It's not at all Java specific; it's about how to not break a single thread running any arbitrary machine code.
Only the registers are truly private, and yes compilers will keep variables in registers. I don't like the term "cached" for that; in asm registers are separate from memory. Compilers can keep the only currently-valid copy of a variable in a register for the duration of a loop, and store it back afterwards.
Every task has its own register state; this is called the "architectural state" and is the context that's saved/restored by a context switch.
Restarting execution of a thread on another core means restoring its saved register state from memory, ending by restoring its program-counter. i.e. jumping to the instruction it stopped at, restoring the program counter into the architectural program counter register. e.g. RIP on x86-64. (64-bit version of the "Instruction Pointer" register)
Note that registers and (virtual) memory contents are the entire state of a user-space process (and open file descriptors and other kernel stuff associated with it). But cache state is not. Registers are not cache. Cache is transparent to software (memory reordering happens because of the store buffer and CPU memory parallelism to hide cache misses, not because of cache, on most ISAs). Registers are the asm equivalent of local variables.
Compiler terminology: "caching"
Hoisting a load out of a loop and keeping the value in a register is sometimes described as "caching" a value in a register, but that's just casual terminology that has you confused here. The compiler-developer terminology for that is "Enregistration" of variables.
Or just "hoisting" a load or "sinking" a store out of a loop; normally you need to load a value into a register before you can use it for other things (at least on a RISC that doesn't have memory-source ALU instructions). By hoisting a load of a loop-invariant value you only have to load once, ahead of the loop, and re-read the register multiple times.
Same for stores; if you know that no other threads are allowed to look at the memory location for a variable, only the final value needs to actually get stored with a store instruction. Any other stores of other values would be "dead" if nothing can read them and we know there's a later store. So you keep the variable in a register for the duration of the loop, then store once at the end. This is called "sinking" a store, and is related to dead store elimination.

Peter Cordes answered you from the implementation level, but it's worth mentioning the specificiation level:
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y)."
That pretty much says it all right there. The Java language specification guarantees that if x comes before y at the source code level of your program, then x "happens before" y for purposes of determining memory visibility.
All of the stuff that Peter Cordes said is what guarantees hb(x, y). If any JVM ever failed to do all of that stuff, then it would not be a valid implementation of Java.
Long story shortened: If your code only ever runs in a single thread, then you'll never have to worry about memory visibility.

Java. Difference between Thread.sleep() and ScheduledExecutorService methods [duplicate]

Goal: Execute certain code every once in a while.
Question: In terms of performance, is there a significant difference between:
while(true) {
execute();
Thread.sleep(10 * 1000);
}
and
executor.scheduleWithFixedDelay(runnableWithoutSleep, 0, 10, TimeUnit.SECONDS);
?
Of course, the latter option is more kosher. Yet, I would like to know whether I should embark on an adventure called "Spend a few days refactoring legacy code to say goodbye to Thread.sleep()".
Update:
This code runs in super/mega/hyper high-load environment.

You're dealing with sleep times termed in tens of seconds. The possible savings by changing your sleep option here is likely nanoseconds or microseconds.
I'd prefer the latter style every time, but if you have the former and it's going to cost you a lot to change it, "improving performance" isn't a particularly good justification.
EDIT re: 8000 threads
8000 threads is an awful lot; I might move to the scheduled executor just so that you can control the amount of load put on your system. Your point about varying wakeup times is something to be aware of, although I would argue that the bigger risk is a stampede of threads all sleeping and then waking in close succession and competing for all the system resources.
I would spend the time to throw these all in a fixed thread pool scheduled executor. Only have as many running concurrently as you have available of the most limited resource (for example, # cores, or # IO paths) plus a few to pick up any slop. This will give you good throughput at the expense of latency.
With the Thread.sleep() method it will be very hard to control what is going on, and you will likely lose out on both throughput and latency.
If you need more detailed advice, you'll probably have to describe what you're trying to do in more detail.

Since you haven't mentioned the Java version, so, things might change.
As I recall from the source code of Java, the prime difference that comes is the way things are written internally.
For Sun Java 1.6 if you use the second approach the native code also brings in the wait and notify calls to the system. So, in a way more thread efficient and CPU friendly.
But then again you loose the control and it becomes more unpredictable for your code - consider you want to sleep for 10 seconds.
So, if you want more predictability - surely you can go with option 1.
Also, on a side note, in the legacy systems when you encounter things like this - 80% chances there are now better ways of doing it- but the magic numbers are there for a reason(the rest 20%) so, change it at own risk :)

There are different scenarios,
The Timer creates a queue of tasks that is continually updated. When the Timer is done, it may not be garbage collected immediately. So creating more Timers only adds more objects onto the heap. Thread.sleep() only pauses the thread, so memory overhead would be extremely low
Timer/TimerTask also takes into account the execution time of your task, so it will be a bit more accurate. And it deals better with multithreading issues (such as avoiding deadlocks etc.).
If you thread get exception and gets killed, that is a problem. But TimerTask will take care of it. It will run irrespective of failure in previous run
The advantage of TimerTask is that it expresses your intention much better (i.e. code readability), and it already has the cancel() feature implemented.
Reference is taken from here

You said you are running in a "mega... high-load environment" so if I understand you correctly you have many such threads simultaneously sleeping like your code example. It takes less CPU time to reuse a thread than to kill and create a new one, and the refactoring may allow you to reuse threads.
You can create a thread pool by using a ScheduledThreadPoolExecutor with a corePoolSize greater than 1. Then when you call scheduleWithFixedDelay on that thread pool, if a thread is available it will be reused.
This change may reduce CPU utilization as threads are being reused rather than destroyed and created, but the degree of reduction will depend on the tasks they're doing, the number of threads in the pool, etc. Memory usage will also go down if some of the tasks overlap since there will be less threads sitting idle at once.

Waiting Threads Resource Consumption

My Problem:
Does large numbers of threads in JVM consume a lot of resources (memory, CPU), when the threads are TIMED_WAIT state (not sleeping) >99.9% of the time? When the threads are waiting, how much CPU overhead does it cost to maintain them if any are needed at all?
Does the answer also apply to non-JVM related environments (like linux kernels)?
Context:
My program receives a large number of space consuming packages. It store counts of similar attributes within the different packages. After a given period of time after receiving a package(could be hours or days), that specific package expires and any count the package contributed to should be decremented.
Currently, I achieve these functionalities by storing all the packages in memory or disk. Every 5 minutes, I delete the expired packages from storage, and scan through the remaining packages to count the attributes. This method uses up a lot of memory, and has bad time complexity (O(n) for time and memory where n is the number of unexpired packages). This makes scalability of the program terrible.
One alternative way to approach this problem is to increment the attribute count every time a package comes by and start a Timer() thread that decrements the attribute count after the package expires. This eliminates the need to store all the bulky packages and cut the time complexity to O(1). However, this creates another problem as my program will start having O(n) number of threads, which could cut into performance. Since most of the threads will be in the TIMED_WAIT state (Java’s Timer() invokes the Object.wait(long) method) the vast majority of their lifecycle, does it still impact the CPU in a very large way?

First, a Java (or .NET) thread != a kernel/OS thread.
A Java Thread is a high level wrapper that abstracts some of the functionality of a system thread; these kinds of threads are also known as managed threads. At the kernel level a thread only has 2 states, running and not running. There's some management information (stack, instruction pointers, thread id, etc.) that the kernel keeps track of, but there is no such thing at the kernel level as a thread that is in a TIMED_WAITING state (the .NET equivalent to the WaitSleepJoin state). Those "states" only exists within those kinds of contexts (part of why the C++ std::thread does not have a state member).
Having said that, when a managed thread is being blocked, it's being done so in a couple of ways (depending on how it is being requested to be blocked at the managed level); the implementations I've seen in the OpenJDK for the threading code utilize semaphores to handle the managed waits (which is what I've seen in other C++ frameworks that have a sort of "managed" thread class as well as in the .NET Core libraries), and utilize a mutex for other types of waits/locks.
Since most implementations will utilize some sort of locking mechanism (like a semaphore or mutex), the kernel generally does the same thing (at least where your question is concerned); that is, the kernel will take the thread off of the "run" queue and put it in the "wait" queue (a context switch). Getting into thread scheduling and specifically how the kernel handles the execution of the threads is beyond the scope of this Q&A, especially since your question is in regards to Java and Java can be run on quite a few different types of OS (each of which handles threading completely differently).
Answering your questions more directly:
Does large numbers of threads in JVM consume a lot of resources (memory, CPU), when the threads are TIMED_WAIT state (not sleeping) >99.9% of the time?
To this, there are a couple of things to note: the thread created consumes memory for the JVM (stack, ID, garbage collector, etc.) and the kernel consumes kernel memory to manage the thread at the kernel level. That memory that is consumed does not change unless you specifically say so. So if the thread is sleeping or running, the memory is the same.
The CPU is what will change based on the thread activity and the number of threads requested (remember, a thread also consumes kernel resources, thus has to be managed at a kernel level, so the more threads that have to be handled, the more kernel time must be consumed to manage them).
Keep in mind that the kernel times to schedule and run the threads are extremely minuscule (that's part of the point of the design), but it's still something to consider if you plan on running a lot of threads; additionally, if you know your application will be running on a CPU (or cluster) with only a few cores, the fewer cores you have available to you, the more the kernel has to context switch, adding additional time in general.
When the threads are waiting, how much CPU overhead does it cost to maintain them if any are needed at all?
None. See above, but the CPU overhead used to manage the threads does not change based on the thread context. Extra CPU might be used for context switching and most certainly extra CPU will be utilized by the threads themselves when active, but there's no additional "cost" to the CPU to maintain a waiting thread vs. a running thread.
Does the answer also apply to non-JVM related environments (like linux kernels)?
Yes and no. As stated, the managed contexts generally apply to most of those types of environments (e.g. Java, .NET, PHP, Lua, etc.), but those contexts can vary and the threading idioms and general functionality is dependant upon the kernel being utilized. So while one specific kernel might be able to handle 1000+ threads per process, some might have hard limits, others might have other issues with higher thread counts per process; you'll have to reference the OS/CPU specs to see what kind of limits you might have.
Since most of the threads will be in the TIMED_WAIT state (Java’s Timer() invokes the Object.wait(long) method) the vast majority of their lifecycle, does it still impact the CPU in a very large way?
No (part of the point of a blocked thread), but something to consider: what if (edge case) all (or >50%) of those threads need to run at the exact same time? If you only have a few threads managing your packages, that might not be an issue, but say you have 500+; 250 threads all being woken at the same time would cause massive CPU contention.
Since you haven't posted any code, it's hard to make specific suggestions to your scenario, but one would be inclined to store a structure of attributes as a class and keep that class in a list or hash map that can be referenced in a Timer (or a separate thread) to see if the current time matches the expiration time of the package, then the "expire" code would run. This cuts down the number of threads to 1 and the access time to O(1); but again, without code, that suggestion might not work in your scenario.
Hope that helps.

What is a Thread? How many threads does my program have? [duplicate]

I have been trying to find a good definition, and get an understanding, of what a thread really is.
It seems that I must be missing something obvious, but every time I read about what a thread is, it's almost a circular definition, a la "a thread is a thread of execution" or " a way to divide into running tasks". Uh uh. Huh?
It seems from what I have read that a thread is not really something concrete, like a process is. It is in fact just a concept. From what I understand of the way this works, a processor executes some commands for a program (which has been termed a thread of execution), then when it needs to switch to processing for some other program for a bit, it stores the state of the program it's currently executing for somewhere (Thread Local Storage) and then starts executing the other program's instructions. And back and forth. Such that, a thread is really just a concept for "one of the paths of execution" of a program that is currently running.
Unlike a process, which really is something - it is a conglomeration of resources, etc.
As an example of a definition that didn't really help me much . . .
From Wikipedia:
"A thread in computer science is short for a thread of execution. Threads are a way for a program to divide (termed "split") itself into two or more simultaneously (or pseudo-simultaneously) running tasks. Threads and processes differ from one operating system to another but, in general, a thread is contained inside a process and different threads in the same process share same resources while different processes in the same multitasking operating system do not."
So am I right? Wrong? What is a thread really?
Edit: Apparently a thread is also given its own call stack, so that is somewhat of a concrete thing.

A thread is an execution context, which is all the information a CPU needs to execute a stream of instructions.
Suppose you're reading a book, and you want to take a break right now, but you want to be able to come back and resume reading from the exact point where you stopped. One way to achieve that is by jotting down the page number, line number, and word number. So your execution context for reading a book is these 3 numbers.
If you have a roommate, and she's using the same technique, she can take the book while you're not using it, and resume reading from where she stopped. Then you can take it back, and resume it from where you were.
Threads work in the same way. A CPU is giving you the illusion that it's doing multiple computations at the same time. It does that by spending a bit of time on each computation. It can do that because it has an execution context for each computation. Just like you can share a book with your friend, many tasks can share a CPU.
On a more technical level, an execution context (therefore a thread) consists of the values of the CPU's registers.
Last: threads are different from processes. A thread is a context of execution, while a process is a bunch of resources associated with a computation. A process can have one or many threads.
Clarification: the resources associated with a process include memory pages (all the threads in a process have the same view of the memory), file descriptors (e.g., open sockets), and security credentials (e.g., the ID of the user who started the process).

A thread is an independent set of values for the processor registers (for a single core). Since this includes the Instruction Pointer (aka Program Counter), it controls what executes in what order. It also includes the Stack Pointer, which had better point to a unique area of memory for each thread or else they will interfere with each other.
Threads are the software unit affected by control flow (function call, loop, goto), because those instructions operate on the Instruction Pointer, and that belongs to a particular thread. Threads are often scheduled according to some prioritization scheme (although it's possible to design a system with one thread per processor core, in which case every thread is always running and no scheduling is needed).
In fact the value of the Instruction Pointer and the instruction stored at that location is sufficient to determine a new value for the Instruction Pointer. For most instructions, this simply advances the IP by the size of the instruction, but control flow instructions change the IP in other, predictable ways. The sequence of values the IP takes on forms a path of execution weaving through the program code, giving rise to the name "thread".

In order to define a thread formally, we must first understand the boundaries of where a thread operates.
A computer program becomes a process when it is loaded from some store into the computer's memory and begins execution. A process can be executed by a processor or a set of processors. A process description in memory contains vital information such as the program counter which keeps track of the current position in the program (i.e. which instruction is currently being executed), registers, variable stores, file handles, signals, and so forth.
A thread is a sequence of such instructions within a program that can be executed independently of other code. The figure shows the concept:
Threads are within the same process address space, thus, much of the information present in the memory description of the process can be shared across threads.
Some information cannot be replicated, such as the stack (stack pointer to a different memory area per thread), registers and thread-specific data. This information suffices to allow threads to be scheduled independently of the program's main thread and possibly one or more other threads within the program.
Explicit operating system support is required to run multithreaded programs. Fortunately, most modern operating systems support threads such as Linux (via NPTL), BSD variants, Mac OS X, Windows, Solaris, AIX, HP-UX, etc. Operating systems may use different mechanisms to implement multithreading support.
Here, you can find more information about the topic. That was also my information-source.
Let me just add a sentence coming from Introduction to Embedded System by Edward Lee and Seshia:
Threads are imperative programs that run concurrently and share a memory space. They can access each others’ variables. Many practitioners in the field use the term “threads” more narrowly to refer to particular ways of constructing programs that share memory, [others] to broadly refer to any mechanism where imperative programs run concurrently and share memory. In this broad sense, threads exist in the form of interrupts on almost all microprocessors, even without any operating system at all (bare iron).

Processes are like two people using two different computers, who use the network to share data when necessary. Threads are like two people using the same computer, who don't have to share data explicitly but must carefully take turns.
Conceptually, threads are just multiple worker bees buzzing around in the same address space. Each thread has its own stack, its own program counter, etc., but all threads in a process share the same memory. Imagine two programs running at the same time, but they both can access the same objects.
Contrast this with processes. Processes each have their own address space, meaning a pointer in one process cannot be used to refer to an object in another (unless you use shared memory).
I guess the key things to understand are:
Both processes and threads can "run at the same time".
Processes do not share memory (by default), but threads share all of their memory with other threads in the same process.
Each thread in a process has its own stack and its own instruction pointer.

I am going to use a lot of text from the book Operating Systems Concepts by ABRAHAM SILBERSCHATZ, PETER BAER GALVIN and GREG GAGNE along with my own understanding of things.
Process
Any application resides in the computer in the form of text (or code).
We emphasize that a program by itself is not a process. A program is a
passive entity, such as a file containing a list of instructions stored on disk
(often called an executable file).
When we start an application, we create an instance of execution. This instance of execution is called a process.
EDIT:(As per my interpretation, analogous to a class and an instance of a class, the instance of a class being a process. )
An example of processes is that of Google Chrome.
When we start Google Chrome, 3 processes are spawned:
• The browser process is responsible for managing the user interface as
well as disk and network I/O. A new browser process is created when
Chrome is started. Only one browser process is created.
• Renderer processes contain logic for rendering web pages. Thus, they
contain the logic for handling HTML, Javascript, images, and so forth.
As a general rule, a new renderer process is created for each website
opened in a new tab, and so several renderer processes may be active
at the same time.
• A plug-in process is created for each type of plug-in (such as Flash
or QuickTime) in use. Plug-in processes contain the code for the
plug-in as well as additional code that enables the plug-in to
communicate with associated renderer processes and the browser
process.
Thread
To answer this I think you should first know what a processor is. A Processor is the piece of hardware that actually performs the computations.
EDIT: (Computations like adding two numbers, sorting an array, basically executing the code that has been written)
Now moving on to the definition of a thread.
A thread is a basic unit of CPU utilization; it comprises a thread ID, a program
counter, a register set, and a stack.
EDIT: Definition of a thread from intel's website:
A Thread, or thread of execution, is a software term for the basic ordered sequence of instructions that can be passed through or processed by a single CPU core.
So, if the Renderer process from the Chrome application sorts an array of numbers, the sorting will take place on a thread/thread of execution. (The grammar regarding threads seems confusing to me)
My Interpretation of Things
A process is an execution instance. Threads are the actual workers that perform the computations via CPU access. When there are multiple threads running for a process, the process provides common memory.
EDIT:
Other Information that I found useful to give more context
All modern day computer have more than one threads. The number of threads in a computer depends on the number of cores in a computer.
Concurrent Computing:
From Wikipedia:
Concurrent computing is a form of computing in which several computations are executed during overlapping time periods—concurrently—instead of sequentially (one completing before the next starts). This is a property of a system—this may be an individual program, a computer, or a network—and there is a separate execution point or "thread of control" for each computation ("process").
So, I could write a program which calculates the sum of 4 numbers:
(1 + 3) + (4 + 5)
In the program to compute this sum (which will be one process running on a thread of execution) I can fork another process which can run on a different thread to compute (4 + 5) and return the result to the original process, while the original process calculates the sum of (1 + 3).

This was taken from a Yahoo Answer:
A thread is a coding construct
unaffect by the architecture of an
application. A single process
frequently may contain multiple
threads. Threads can also directly
communicate with each other since they
share the same variables.
Processes are independent execution
units with their own state
information. They also use their own
address spaces and can only interact
with other processes through
interprocess communication mechanisms.
However, to put in simpler terms threads are like different "tasks". So think of when you are doing something, for instance you are writing down a formula on one paper. That can be considered one thread. Then another thread is you writing something else on another piece of paper. That is where multitasking comes in.
Intel processors are said to have "hyper-threading" (AMD has it too) and it is meant to be able to perform multiple "threads" or multitask much better.
I am not sure about the logistics of how a thread is handled. I do recall hearing about the processor going back and forth between them, but I am not 100% sure about this and hopefully somebody else can answer that.

A thread is nothing more than a memory context (or how Tanenbaum better puts it, resource grouping) with execution rules. It's a software construct. The CPU has no idea what a thread is (some exceptions here, some processors have hardware threads), it just executes instructions.
The kernel introduces the thread and process concept to manage the memory and instructions order in a meaningful way.

A thread is a set of (CPU)instructions which can be executed.
But in order to have a better understanding of what a thread is, some computer architecture knowledge is required.
What a computer does, is to follow instructions and manipulate data.
RAM is the place where the instructions and data are saved, the processor uses those instructions to perform operations on the saved data.
The CPU has some internal memory cells called, registers. It can perform simple mathematical operations with numbers stored in these registers. It can also move data between the RAM and these registers. These are examples of typical operations a CPU can be instructed to execute:
Copy data from memory position #220 into register #3
Add the number in register #3 to the number in register #1.
The collection of all operations a CPU can do is called instruction set. Each operation in the instruction set is assigned a number. Computer code is essentially a sequence of numbers representing CPU operations. These operations are stored as numbers in the RAM. We store input/output data, partial calculations, and computer code, all mixed together in the RAM.
The CPU works in a never-ending loop, always fetching and executing an instruction from memory. At the core of this cycle is the PC register, or Program Counter. It's a special register that stores the memory address of the next instruction to be executed.
The CPU will:
Fetch the instruction at the memory address given by the PC,
Increment the PC by 1,
Execute the instruction,
Go back to step 1.
The CPU can be instructed to write a new value to the PC, causing the execution to branch, or "jump" to somewhere else in the memory. And this branching can be conditional. For instance, a CPU instruction could say: "set PC to address #200 if register #1 equals zero". This allows computers to execute stuff like this:
if x = 0
compute_this()
else
compute_that()
Resources used from Computer Science Distilled.

The answer varies hugely across different systems and different implementations, but the most important parts are:
A thread has an independent thread of execution (i.e. you can context-switch away from it, and then back, and it will resume running where it was).
A thread has a lifetime (it can be created by another thread, and another thread can wait for it to finish).
It probably has less baggage attached than a "process".
Beyond that: threads could be implemented within a single process by a language runtime, threads could be coroutines, threads could be implemented within a single process by a threading library, or threads could be a kernel construct.
In several modern Unix systems, including Linux which I'm most familiar with, everything is threads -- a process is merely a type of thread that shares relatively few things with its parent (i.e. it gets its own memory mappings, its own file table and permissions, etc.) Reading man 2 clone, especially the list of flags, is really instructive here.

Unfortunately, threads do exist. A thread is something tangible. You can kill one, and the others will still be running. You can spawn new threads.... although each thread is not its own process, they are running separately inside the process. On multi-core machines, 2 threads could run at the same time.
http://en.wikipedia.org/wiki/Simultaneous_multithreading
http://www.intel.com/intelpress/samples/mcp_samplech01.pdf

Just as a process represents a virtual computer, the thread
abstraction represents a virtual processor.
So threads are an abstraction.
Abstractions reduce complexity. Thus, the first question is what problem threads solve. The second question is how they can be implemented.
As to the first question: Threads make implementing multitasking easier. The main idea behind this is that multitasking is unnecessary if every task can be assigned to a unique worker. Actually, for the time being, it's fine to generalize the definition even further and say that the thread abstraction represents a virtual worker.
Now, imagine you have a robot that you want to give multiple tasks. Unfortunately, it can only execute a single, step by step task description. Well, if you want to make it multitask, you can try creating one big task description by interleaving the separate tasks you already have. This is a good start but the issue is that the robot sits at a desk and puts items on it while working. In order to get things right, you cannot just interleave instructions but also have to save and restore the items on the table.
This works, but now it's hard to disentangle the separate tasks by simply looking at the big task description that you created. Also, the ceremony of saving and restoring the items on the tabe is tedious and further clutters the task description.
Here is where the thread abstraction comes in and saves the day. It lets you assume that you have an infinite number of robots, each sitting in a different room at its own desk. Now, you can just throw task descriptions in a pot and everything else is taken care of by the thread abstraction's implementer. Remember? If there are enough workers, nobody has to multitask.
Often it is useful to indicate your perspective and say robot to mean real robots and virtual robot to mean the robots the thread abstraction provides you with.
At this point the problem of multitasking is solved for the case when the tasks are fully independent. However, wouldn't it be nice to let the robots go out of their rooms, interact and work together towards a common goal? Well, as you probably guessed, this requires coordination. Traffic lights, queues - you name it.
As an intermediate summary, the thread abstraction solves the problem of multitasking and creates an opportunity for cooperation. Without it, we only had a single robot, so cooperation was unthinkable. However, it has also brought the problem of coordination (synchronization) on us. Now we know what problem the tread abstraction solves and, as a bonus, we also know what new challenge it creates.
But wait, why do we care about multitasking in the first place?
First, multitasking can increase performance if the tasks involve waiting. For example, while the washing machine is running, you can easily start preparing dinner. And while your dinner is in the over, you can hang out the clothes. Note that here you wait because an independent component does the job for you. Tasks that involve waiting are called I/O bound tasks.
Second, if multitasking is done rapidly, and you look at it from a bird's eyes view, it appears as parallelism. It's a bit like how the human eye perceives a series of still images as motion if shown in quick succession. If I write a letter to Alice for one second and to Bob for one second as well, can you tell if I wrote the two letters simultaneously or alternately, if you only look at what I'm doing every two seconds? Search for Multitasking Operating System for more on this.
Now, let's focus on the question of how the thread abstraction can be implemented.
Essentially, implementing the thread abstraction is about writing a task, a main task, that takes care of scheduling all the other tasks.
A fundamental question to ask is: If the scheduler schedules all tasks and the scheduler is also a task, then who schedules the scheduler?
Let's brake this down. Say you write a scheduler, compile it and load it into the main memory of a computer at the address 1024, which happens to be the address that is loaded into the processor's instruction pointer when the computer is started. Now, your scheduler goes ahead and finds some tasks sitting precompiled in the main memory. For example, a task starts at the address 1,048,576. The scheduler wants to execute this task so it loads the task's address (1,048,576) into the instruction pointer. Huh, that was quite an ill considered move because now the scheduler has no way to regain control from the task it has just started.
One solution is to insert jump instructions to the scheduler (address 1024) into the task descriptions before execution. Actually, you shouldn't forget to save the items on the desk the robot is working at, so you also have to save the processor's registers before jumping. The issue here is that it is hard to tell where to insert the jump instructions. If there are too many, they create too much overhead and if there are too few of them, one task might monopolize the processor.
A second approach is to ask the task authors to designate a few places where control can be transferred back to the scheduler. Note that the authors don't have to write the logic for saving the registers and inserting the jump instruction because it suffices that they mark the appropriate places and the scheduler takes care of the rest. This looks like a good idea because task authors probably know that, for example, their task will wait for a while after loading and starting a washing machine, so they let the scheduler take control there.
The problem that neither of the above approaches solve is that of an erroneous or malicious task that, for example, gets caught up in an infinite loop and never jumps to the address where the scheduler lives.
Now, what to do if you cannot solve something in software? Solve it in hardware! What is needed is a programmable circuitry wired up to the processor that acts like an alarm clock. The scheduler sets a timer and its address (1024) and when the timer runs out, the alarm saves the registers and sets the instruction pointer to the address where the scheduler lives. This approach is called preemptive scheduling.
Probably by now you start to sense that implementing the thread abstraction is not like implementing a linked list. The most well-known implementers of the thread abstraction are operating systems. The threads they provide are sometimes called kernel-level threads. Since an operating system cannot afford losing control, all major, general-purpose operating systems uses preemptive scheduling.
Arguably, operating systems feel like the right place to implement the thread abstraction because they control all the hardware components and can suspend and resume threads very wisely. If a thread requests the contents of a file stored on a hard drive from the operating system, it immediately knows that this operation will most likely take a while and can let another task occupy the processor in the meanwhile. Then, it can pause the current task and resume the one that made the request, once the file's contents are available.
However, the story doesn't end here because threads can also be implemented in user space. These implementers are normally compilers. Interestingly, as far as I know, kernel-level threads are as powerful as threads can get. So why do we bother with user-level threads? The reason, of course, is performance. User-level threads are more lightweight so you can create more of them and normally the overhead of pausing and resuming them is small.
User-level threads can be implemented using async/await. Do you remember that one option to achieve that control gets back to the scheduler is to make task authors designate places where the transition can happen? Well, the async and await keywords serve exactly this purpose.
Now, if you've made it this far, be prepared because here comes the real fun!
Have you noticed that we barely talked about parallelism? I mean, don't we use threads to run related computations in parallel and thereby increase throughput? Well, not quiet.. Actually, if you only want parallelism, you don't need this machinery at all. You just create as many tasks as the number of processing units you have and none of the tasks has to be paused or resumed ever. You don't even need a scheduler because you don't multitask.
In a sense, parallelism is an implementation detail. If you think about it, implementers of the thread abstraction can utilize as many processors as they wish under the hood. You can just compile some well-written multithreaded code from 1950, run it on a multicore today and see that it utilizes all cores. Importantly, the programmer who wrote that code probably didn't anticipate that piece of code being run on a multicore.
You could even argue that threads are abused when they are used to achieve parallelism: Even though people know they don't need the core feature, multitasking, they use threads to get access to parallelism.
As a final thought, note that user-level threads alone cannot provide parallelism. Remember the quote from the beginning? Operating systems run programs inside a virtual computer (process) that is normally equipped with a single virtual processor (thread) by default. No matter what magic you do in user space, if your virtual computer has only a single virtual processor, you cannot run code in parallel.
So what do we want? Of course, we want parallelism. But we also want lightweight threads. Therefore, many implementers of the thread abstraction started to use a hybrid approach: They start as many kernel-level threads as there are processing units in the hardware and run many user-level threads on top of a few kernel-level threads. Essentially, parallelism is taken care of by the kernel-level and multitasking by the user-level threads.
Now, an interesting design decision is what threading interface a language exposes. Go, for example, provides a single interface that allows users to create hybrid threads, so called goroutines. There is no way to ask for, say, just a single kernel-level thread in Go. Other languages have separate interfaces for different kinds of threads. In Rust, kernel-level threads live in the standard library, while user-level and hybrid threads can be found in external libraries like async-std and tokio. In Python, the asyncio package provides user-level threads while multithreading and multiprocessing provide kernel-level threads. Interestingly, the threads multithreading provides cannot run in parallel. On the other hand, the threads multiprocessing provides can run in parallel but, as the library's name suggests, each kernel-level thread lives in a different process (virtual machine). This makes multiprocessing unsuitable for certain tasks because transferring data between different virtual machines is often slow.
Further resources:
Operating Systems: Principles and Practice by Thomas and Anderson
Concurrency is not parallelism by Rob Pike
Parallelism and concurrency need different tools
Asynchronous Programming in Rust
Inside Rust's Async Transform
Rust's Journey to Async/Await
What Color is Your Function?
Why goroutines instead of threads?
Why doesn't my program run faster with more CPUs?
John Reese - Thinking Outside the GIL with AsyncIO and Multiprocessing - PyCon 2018

Java threads are not actually executed in parallel?

Until now I was under the impression that 2 threads that start in the same time are also executed in parallel (both running their piece of codes in the same time), but I read some documentation recently and I understood that they actually take turns on the execution of their code, so there is no piece of code for first thread executed in the same time as a piece of code from the second thread.
Is my understanding correct?
If yes, then how multi-threading is faster then one thread execution?
I'm asking this because the only difference is that a single thread executes the code sequential, while multithreading can take turns on the execution, but still should take the same amount of time since it's nothing done in parallel

a) on multi-processor machines, threads can actually run in parallel (one per CPU)
b) If your thread calls Thread.sleep() while waiting for IO etc., it makes resources available to other threads. So multi-threaded applications are actually faster than single-threaded ones when dealing with external resources

Java threads are executed in parallel if there are enough CPUs available for a JVM. You can't run 2 computations on a machine with a single computing element at the same time, so this computing element is used either by first, or by second computation at any given time. Probably what you've read concerned this circumstances.

No, Java threads are executed in parallel (unlike some other platforms like CPython). However, whether that gives performance improvements depends on the code you execute.
If you test with easily parallelizable & CPU intensive tasks like calculating PI with a parallelizable algorithm or resizing lots of images etc., you can easily demonstrate that performance can be increased basically linearly (if you have 2 CPUs = x2, 4 CPUs = x4 etc.)
EDIT:
When you only have one CPU, multi-threading is still beneficial. For example, you can have one thread reading images from the disk while the other thread resizes the images. This will also improve the performance because you can utilize the CPU without waste.
EDIT2:
When you read and resize images (note the plural) in a single thread, then you will see that CPU usage won't be 100% at all times. This is because while the thread is reading from file, it can't perform the resizing. If you had more than one thread, by the time a resize has finished another file would have been ready in-memory. If you are dealing with big images, it's relatively easy to peg the CPU at 100% with this design.

Well the answer of you question depends on the number of CPU a system has .
Keep in mind that a single CPU can process only one thread at a time but the context switching between the threads is so fast that it seems that the threads are running concurrently.
On your second question If yes, then how multi-threading is faster then one thread execution?
Mutlithreading utilizes the CPU cycles . Say if one thread is blocked on some resource , other threads might get a chance to run .
On a side note , go through this blog page if you want to see some basic multithreading tutorials http://javasolutionsonline.blogspot.in/p/java-concurrency.html

umm..threads do run in parallel...but not in your conventional pcs that had single cores..
if you have a multi core chip or many CPUs , then they can run in parallel..
imagine one thread running on every of the quad-cores...
thread give u many other advantages as well , as you must already know

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.