New and delete of native memory across Java threads

New and delete of native memory across Java threads - java

If I have two Java threads executing in native C++ code and one of them new's a native object, is it OK for the other Java thread to do the delete? Naturally, there are plenty of ways to screw up one thread doing a new and another doing the delete. But is there anything 'extra' because they are Java threads? (Like maybe each Java thread gets a separate native heap, or other such nonsense).
I assume there is nothing special about this case, but Valgrind is telling me that I am definitely leaking that memory and I need to get this nonsense out of my head so I can focus on finding the real problem Valgrind is trying to show me.
All Java threads executing native code see the exact same native heap and there is nothing special about them being Java threads vs threads created from native code. Right?

It's impossible to tell without more details.
If your threads are using some sort of publishing protocol like this
thread 'a' allocates some memory and does some work
thread 'b' is waiting on a semaphore
thread 'a' finishes its work and signals the semaphore
thread 'b' consumes the mememory and deletes it
then things should be OK. (Usually you'd have more than just the two threads.)
If you don't use some synchronization mechanism then the memory needs to have strong thread safety guarantees. Most things in C++ only give weak guarantees.

Related

Why are objects visible to all threads, while a reading thread might not see a value written by another thread on a timely basis?

From java in a nutshell
In Java, all Java application threads in a process have their own
stacks (and local variables) but share a single heap. This makes it
very easy to share objects between threads, as all that is required is
to pass a reference from one thread to another.
This leads to a general design principle of Java—that objects are
visible by default. If I have a reference to an object, I can copy
it and hand it off to another thread with no restrictions. A Java
reference is essentially a typed pointer to a location in memory—and
threads share the same address space, so visible by default
is a natural model.
From Java Concurrency in Practice
Visibility is subtle because the things that can go wrong are so
counterintuitive. In a single-threaded environment, if you write a
value to a variable and later read that variable with no intervening
writes, you can expect to get the same value back. This seems only
natural. It may be hard to accept at ﬁrst, but when the reads and
writes occur in different threads, this is simply not the case. In
general,
there is no guarantee that the reading thread will see a value written by another thread on a timely basis, or even at all. In
order to ensure visibility of memory writes across threads, you must
use synchronization.
When a thread reads a variable without synchronization, it may see a stale value.
So why does Java in a Nutshell says objects are visible to all threads, while Java Concurrency in Practice says no guarantee that a reading thread sees a value written by another thread on a timely basis? They don't seem consistent.
Thanks.

"So why does Java in a Nutshell says objects are visible to all threads" -->
As your quote says, in Java objects are allocated on the heap. A 'global' heap available for the entire JVM. Whereas in other languages (e.g. C++) objects can also be allocated on a stack. Objects on a heap can be passed to other threads, using different stacks. Objects on a stack can only be used on the thread using the same stack, as the stack's content will change beyond control of another thread.
"while Java Concurrency in Practice says no guarantee that a reading thread sees a value written by another thread on a timely basis?" -> This is another issue, as this is about values of memory locations. Though they are reachable compilers and CPUs try to optimize reading from or writing to this memory locations and will heavily cache the value by assuming "I'm the only one reading and writing to this memory location". So if one thread modifies a memory location's value the other thread does not know it has changed and will not read it new. This makes the program much faster. By declaring a variable volatile you are telling the compiler that another thread may change the value at will and the compiler will use this to create code that doesn't cache the value.
Finally, multithreading is much more difficult than adding volatile, or using synchronized, one really needs to dive into the topic of the issues you will encounter when using multiple threads.

In Java, all Java application threads in a process have their own
stacks (and local variables) but share a single heap. This makes it
very easy to share objects between threads, as all that is required is
to pass a reference from one thread to another.
This leads to a general design principle of Java—that objects are
visible by default.
I suppose that these statements are strictly true ... but they are misleading because they don't convey the whole truth. For example, what does the author mean when he says "...that objects are visible by default."
Any thread executing on a Java JVM does not have de facto visibility to all the objects on the JVM's heap. If we define visibility as "the ability to access by reference", then a thread only has visibility to objects:
whose references have been published to that thread
whose references are in static fields or fields of objects to which the thread has access
In fact, an important and commonly used thread safety policy in Java concurrent programming is thread confinement. If a thread holds a reference to an object to which only it has access and which is not published to any other thread, then that object is thread safe. That object can be safely mutated by the thread in which it is confined without any further regard to visibility and atomicity ... as long as it is correctly thread confined.
In other words, an object that is thread confined, no matter where it is on the JVM heap, is not visible to any other thread that may be running on that same JVM by virtue of being inaccessible.
since shared objects are stored in the heap shared by threads, why
some threads might not see the most updated value by other threads?
In this age of multi-core processors, each CPU on which a JVM may be running has its own levels of local cache memory that no other core can see. This gets to the heart of why values written to variables in one thread are not guaranteed to be visible to another thread: the Java Memory Model makes no guarantees when values written by one thread will become visible to other threads because it does not specify when cached values will be written back from cache to memory.
It is, in fact, usual for the unsynchronized access of values to be stale (or inconsistent) when those values are accessed by many threads. Depending on the state transition that is occurring, thread safety in a concurrent environment in which many threads may be accessing the same value, may require:
mutual exclusion
atomicity protection
visibility guarantees
or all of the above
in order to achieve a thread safety policy that allows your program to be correct.

What is a Thread? How many threads does my program have? [duplicate]

I have been trying to find a good definition, and get an understanding, of what a thread really is.
It seems that I must be missing something obvious, but every time I read about what a thread is, it's almost a circular definition, a la "a thread is a thread of execution" or " a way to divide into running tasks". Uh uh. Huh?
It seems from what I have read that a thread is not really something concrete, like a process is. It is in fact just a concept. From what I understand of the way this works, a processor executes some commands for a program (which has been termed a thread of execution), then when it needs to switch to processing for some other program for a bit, it stores the state of the program it's currently executing for somewhere (Thread Local Storage) and then starts executing the other program's instructions. And back and forth. Such that, a thread is really just a concept for "one of the paths of execution" of a program that is currently running.
Unlike a process, which really is something - it is a conglomeration of resources, etc.
As an example of a definition that didn't really help me much . . .
From Wikipedia:
"A thread in computer science is short for a thread of execution. Threads are a way for a program to divide (termed "split") itself into two or more simultaneously (or pseudo-simultaneously) running tasks. Threads and processes differ from one operating system to another but, in general, a thread is contained inside a process and different threads in the same process share same resources while different processes in the same multitasking operating system do not."
So am I right? Wrong? What is a thread really?
Edit: Apparently a thread is also given its own call stack, so that is somewhat of a concrete thing.

A thread is an execution context, which is all the information a CPU needs to execute a stream of instructions.
Suppose you're reading a book, and you want to take a break right now, but you want to be able to come back and resume reading from the exact point where you stopped. One way to achieve that is by jotting down the page number, line number, and word number. So your execution context for reading a book is these 3 numbers.
If you have a roommate, and she's using the same technique, she can take the book while you're not using it, and resume reading from where she stopped. Then you can take it back, and resume it from where you were.
Threads work in the same way. A CPU is giving you the illusion that it's doing multiple computations at the same time. It does that by spending a bit of time on each computation. It can do that because it has an execution context for each computation. Just like you can share a book with your friend, many tasks can share a CPU.
On a more technical level, an execution context (therefore a thread) consists of the values of the CPU's registers.
Last: threads are different from processes. A thread is a context of execution, while a process is a bunch of resources associated with a computation. A process can have one or many threads.
Clarification: the resources associated with a process include memory pages (all the threads in a process have the same view of the memory), file descriptors (e.g., open sockets), and security credentials (e.g., the ID of the user who started the process).

A thread is an independent set of values for the processor registers (for a single core). Since this includes the Instruction Pointer (aka Program Counter), it controls what executes in what order. It also includes the Stack Pointer, which had better point to a unique area of memory for each thread or else they will interfere with each other.
Threads are the software unit affected by control flow (function call, loop, goto), because those instructions operate on the Instruction Pointer, and that belongs to a particular thread. Threads are often scheduled according to some prioritization scheme (although it's possible to design a system with one thread per processor core, in which case every thread is always running and no scheduling is needed).
In fact the value of the Instruction Pointer and the instruction stored at that location is sufficient to determine a new value for the Instruction Pointer. For most instructions, this simply advances the IP by the size of the instruction, but control flow instructions change the IP in other, predictable ways. The sequence of values the IP takes on forms a path of execution weaving through the program code, giving rise to the name "thread".

In order to define a thread formally, we must first understand the boundaries of where a thread operates.
A computer program becomes a process when it is loaded from some store into the computer's memory and begins execution. A process can be executed by a processor or a set of processors. A process description in memory contains vital information such as the program counter which keeps track of the current position in the program (i.e. which instruction is currently being executed), registers, variable stores, file handles, signals, and so forth.
A thread is a sequence of such instructions within a program that can be executed independently of other code. The figure shows the concept:
Threads are within the same process address space, thus, much of the information present in the memory description of the process can be shared across threads.
Some information cannot be replicated, such as the stack (stack pointer to a different memory area per thread), registers and thread-specific data. This information suffices to allow threads to be scheduled independently of the program's main thread and possibly one or more other threads within the program.
Explicit operating system support is required to run multithreaded programs. Fortunately, most modern operating systems support threads such as Linux (via NPTL), BSD variants, Mac OS X, Windows, Solaris, AIX, HP-UX, etc. Operating systems may use different mechanisms to implement multithreading support.
Here, you can find more information about the topic. That was also my information-source.
Let me just add a sentence coming from Introduction to Embedded System by Edward Lee and Seshia:
Threads are imperative programs that run concurrently and share a memory space. They can access each others’ variables. Many practitioners in the field use the term “threads” more narrowly to refer to particular ways of constructing programs that share memory, [others] to broadly refer to any mechanism where imperative programs run concurrently and share memory. In this broad sense, threads exist in the form of interrupts on almost all microprocessors, even without any operating system at all (bare iron).

Processes are like two people using two different computers, who use the network to share data when necessary. Threads are like two people using the same computer, who don't have to share data explicitly but must carefully take turns.
Conceptually, threads are just multiple worker bees buzzing around in the same address space. Each thread has its own stack, its own program counter, etc., but all threads in a process share the same memory. Imagine two programs running at the same time, but they both can access the same objects.
Contrast this with processes. Processes each have their own address space, meaning a pointer in one process cannot be used to refer to an object in another (unless you use shared memory).
I guess the key things to understand are:
Both processes and threads can "run at the same time".
Processes do not share memory (by default), but threads share all of their memory with other threads in the same process.
Each thread in a process has its own stack and its own instruction pointer.

I am going to use a lot of text from the book Operating Systems Concepts by ABRAHAM SILBERSCHATZ, PETER BAER GALVIN and GREG GAGNE along with my own understanding of things.
Process
Any application resides in the computer in the form of text (or code).
We emphasize that a program by itself is not a process. A program is a
passive entity, such as a file containing a list of instructions stored on disk
(often called an executable file).
When we start an application, we create an instance of execution. This instance of execution is called a process.
EDIT:(As per my interpretation, analogous to a class and an instance of a class, the instance of a class being a process. )
An example of processes is that of Google Chrome.
When we start Google Chrome, 3 processes are spawned:
• The browser process is responsible for managing the user interface as
well as disk and network I/O. A new browser process is created when
Chrome is started. Only one browser process is created.
• Renderer processes contain logic for rendering web pages. Thus, they
contain the logic for handling HTML, Javascript, images, and so forth.
As a general rule, a new renderer process is created for each website
opened in a new tab, and so several renderer processes may be active
at the same time.
• A plug-in process is created for each type of plug-in (such as Flash
or QuickTime) in use. Plug-in processes contain the code for the
plug-in as well as additional code that enables the plug-in to
communicate with associated renderer processes and the browser
process.
Thread
To answer this I think you should first know what a processor is. A Processor is the piece of hardware that actually performs the computations.
EDIT: (Computations like adding two numbers, sorting an array, basically executing the code that has been written)
Now moving on to the definition of a thread.
A thread is a basic unit of CPU utilization; it comprises a thread ID, a program
counter, a register set, and a stack.
EDIT: Definition of a thread from intel's website:
A Thread, or thread of execution, is a software term for the basic ordered sequence of instructions that can be passed through or processed by a single CPU core.
So, if the Renderer process from the Chrome application sorts an array of numbers, the sorting will take place on a thread/thread of execution. (The grammar regarding threads seems confusing to me)
My Interpretation of Things
A process is an execution instance. Threads are the actual workers that perform the computations via CPU access. When there are multiple threads running for a process, the process provides common memory.
EDIT:
Other Information that I found useful to give more context
All modern day computer have more than one threads. The number of threads in a computer depends on the number of cores in a computer.
Concurrent Computing:
From Wikipedia:
Concurrent computing is a form of computing in which several computations are executed during overlapping time periods—concurrently—instead of sequentially (one completing before the next starts). This is a property of a system—this may be an individual program, a computer, or a network—and there is a separate execution point or "thread of control" for each computation ("process").
So, I could write a program which calculates the sum of 4 numbers:
(1 + 3) + (4 + 5)
In the program to compute this sum (which will be one process running on a thread of execution) I can fork another process which can run on a different thread to compute (4 + 5) and return the result to the original process, while the original process calculates the sum of (1 + 3).

This was taken from a Yahoo Answer:
A thread is a coding construct
unaffect by the architecture of an
application. A single process
frequently may contain multiple
threads. Threads can also directly
communicate with each other since they
share the same variables.
Processes are independent execution
units with their own state
information. They also use their own
address spaces and can only interact
with other processes through
interprocess communication mechanisms.
However, to put in simpler terms threads are like different "tasks". So think of when you are doing something, for instance you are writing down a formula on one paper. That can be considered one thread. Then another thread is you writing something else on another piece of paper. That is where multitasking comes in.
Intel processors are said to have "hyper-threading" (AMD has it too) and it is meant to be able to perform multiple "threads" or multitask much better.
I am not sure about the logistics of how a thread is handled. I do recall hearing about the processor going back and forth between them, but I am not 100% sure about this and hopefully somebody else can answer that.

A thread is nothing more than a memory context (or how Tanenbaum better puts it, resource grouping) with execution rules. It's a software construct. The CPU has no idea what a thread is (some exceptions here, some processors have hardware threads), it just executes instructions.
The kernel introduces the thread and process concept to manage the memory and instructions order in a meaningful way.

A thread is a set of (CPU)instructions which can be executed.
But in order to have a better understanding of what a thread is, some computer architecture knowledge is required.
What a computer does, is to follow instructions and manipulate data.
RAM is the place where the instructions and data are saved, the processor uses those instructions to perform operations on the saved data.
The CPU has some internal memory cells called, registers. It can perform simple mathematical operations with numbers stored in these registers. It can also move data between the RAM and these registers. These are examples of typical operations a CPU can be instructed to execute:
Copy data from memory position #220 into register #3
Add the number in register #3 to the number in register #1.
The collection of all operations a CPU can do is called instruction set. Each operation in the instruction set is assigned a number. Computer code is essentially a sequence of numbers representing CPU operations. These operations are stored as numbers in the RAM. We store input/output data, partial calculations, and computer code, all mixed together in the RAM.
The CPU works in a never-ending loop, always fetching and executing an instruction from memory. At the core of this cycle is the PC register, or Program Counter. It's a special register that stores the memory address of the next instruction to be executed.
The CPU will:
Fetch the instruction at the memory address given by the PC,
Increment the PC by 1,
Execute the instruction,
Go back to step 1.
The CPU can be instructed to write a new value to the PC, causing the execution to branch, or "jump" to somewhere else in the memory. And this branching can be conditional. For instance, a CPU instruction could say: "set PC to address #200 if register #1 equals zero". This allows computers to execute stuff like this:
if x = 0
compute_this()
else
compute_that()
Resources used from Computer Science Distilled.

The answer varies hugely across different systems and different implementations, but the most important parts are:
A thread has an independent thread of execution (i.e. you can context-switch away from it, and then back, and it will resume running where it was).
A thread has a lifetime (it can be created by another thread, and another thread can wait for it to finish).
It probably has less baggage attached than a "process".
Beyond that: threads could be implemented within a single process by a language runtime, threads could be coroutines, threads could be implemented within a single process by a threading library, or threads could be a kernel construct.
In several modern Unix systems, including Linux which I'm most familiar with, everything is threads -- a process is merely a type of thread that shares relatively few things with its parent (i.e. it gets its own memory mappings, its own file table and permissions, etc.) Reading man 2 clone, especially the list of flags, is really instructive here.

Unfortunately, threads do exist. A thread is something tangible. You can kill one, and the others will still be running. You can spawn new threads.... although each thread is not its own process, they are running separately inside the process. On multi-core machines, 2 threads could run at the same time.
http://en.wikipedia.org/wiki/Simultaneous_multithreading
http://www.intel.com/intelpress/samples/mcp_samplech01.pdf

Just as a process represents a virtual computer, the thread
abstraction represents a virtual processor.
So threads are an abstraction.
Abstractions reduce complexity. Thus, the first question is what problem threads solve. The second question is how they can be implemented.
As to the first question: Threads make implementing multitasking easier. The main idea behind this is that multitasking is unnecessary if every task can be assigned to a unique worker. Actually, for the time being, it's fine to generalize the definition even further and say that the thread abstraction represents a virtual worker.
Now, imagine you have a robot that you want to give multiple tasks. Unfortunately, it can only execute a single, step by step task description. Well, if you want to make it multitask, you can try creating one big task description by interleaving the separate tasks you already have. This is a good start but the issue is that the robot sits at a desk and puts items on it while working. In order to get things right, you cannot just interleave instructions but also have to save and restore the items on the table.
This works, but now it's hard to disentangle the separate tasks by simply looking at the big task description that you created. Also, the ceremony of saving and restoring the items on the tabe is tedious and further clutters the task description.
Here is where the thread abstraction comes in and saves the day. It lets you assume that you have an infinite number of robots, each sitting in a different room at its own desk. Now, you can just throw task descriptions in a pot and everything else is taken care of by the thread abstraction's implementer. Remember? If there are enough workers, nobody has to multitask.
Often it is useful to indicate your perspective and say robot to mean real robots and virtual robot to mean the robots the thread abstraction provides you with.
At this point the problem of multitasking is solved for the case when the tasks are fully independent. However, wouldn't it be nice to let the robots go out of their rooms, interact and work together towards a common goal? Well, as you probably guessed, this requires coordination. Traffic lights, queues - you name it.
As an intermediate summary, the thread abstraction solves the problem of multitasking and creates an opportunity for cooperation. Without it, we only had a single robot, so cooperation was unthinkable. However, it has also brought the problem of coordination (synchronization) on us. Now we know what problem the tread abstraction solves and, as a bonus, we also know what new challenge it creates.
But wait, why do we care about multitasking in the first place?
First, multitasking can increase performance if the tasks involve waiting. For example, while the washing machine is running, you can easily start preparing dinner. And while your dinner is in the over, you can hang out the clothes. Note that here you wait because an independent component does the job for you. Tasks that involve waiting are called I/O bound tasks.
Second, if multitasking is done rapidly, and you look at it from a bird's eyes view, it appears as parallelism. It's a bit like how the human eye perceives a series of still images as motion if shown in quick succession. If I write a letter to Alice for one second and to Bob for one second as well, can you tell if I wrote the two letters simultaneously or alternately, if you only look at what I'm doing every two seconds? Search for Multitasking Operating System for more on this.
Now, let's focus on the question of how the thread abstraction can be implemented.
Essentially, implementing the thread abstraction is about writing a task, a main task, that takes care of scheduling all the other tasks.
A fundamental question to ask is: If the scheduler schedules all tasks and the scheduler is also a task, then who schedules the scheduler?
Let's brake this down. Say you write a scheduler, compile it and load it into the main memory of a computer at the address 1024, which happens to be the address that is loaded into the processor's instruction pointer when the computer is started. Now, your scheduler goes ahead and finds some tasks sitting precompiled in the main memory. For example, a task starts at the address 1,048,576. The scheduler wants to execute this task so it loads the task's address (1,048,576) into the instruction pointer. Huh, that was quite an ill considered move because now the scheduler has no way to regain control from the task it has just started.
One solution is to insert jump instructions to the scheduler (address 1024) into the task descriptions before execution. Actually, you shouldn't forget to save the items on the desk the robot is working at, so you also have to save the processor's registers before jumping. The issue here is that it is hard to tell where to insert the jump instructions. If there are too many, they create too much overhead and if there are too few of them, one task might monopolize the processor.
A second approach is to ask the task authors to designate a few places where control can be transferred back to the scheduler. Note that the authors don't have to write the logic for saving the registers and inserting the jump instruction because it suffices that they mark the appropriate places and the scheduler takes care of the rest. This looks like a good idea because task authors probably know that, for example, their task will wait for a while after loading and starting a washing machine, so they let the scheduler take control there.
The problem that neither of the above approaches solve is that of an erroneous or malicious task that, for example, gets caught up in an infinite loop and never jumps to the address where the scheduler lives.
Now, what to do if you cannot solve something in software? Solve it in hardware! What is needed is a programmable circuitry wired up to the processor that acts like an alarm clock. The scheduler sets a timer and its address (1024) and when the timer runs out, the alarm saves the registers and sets the instruction pointer to the address where the scheduler lives. This approach is called preemptive scheduling.
Probably by now you start to sense that implementing the thread abstraction is not like implementing a linked list. The most well-known implementers of the thread abstraction are operating systems. The threads they provide are sometimes called kernel-level threads. Since an operating system cannot afford losing control, all major, general-purpose operating systems uses preemptive scheduling.
Arguably, operating systems feel like the right place to implement the thread abstraction because they control all the hardware components and can suspend and resume threads very wisely. If a thread requests the contents of a file stored on a hard drive from the operating system, it immediately knows that this operation will most likely take a while and can let another task occupy the processor in the meanwhile. Then, it can pause the current task and resume the one that made the request, once the file's contents are available.
However, the story doesn't end here because threads can also be implemented in user space. These implementers are normally compilers. Interestingly, as far as I know, kernel-level threads are as powerful as threads can get. So why do we bother with user-level threads? The reason, of course, is performance. User-level threads are more lightweight so you can create more of them and normally the overhead of pausing and resuming them is small.
User-level threads can be implemented using async/await. Do you remember that one option to achieve that control gets back to the scheduler is to make task authors designate places where the transition can happen? Well, the async and await keywords serve exactly this purpose.
Now, if you've made it this far, be prepared because here comes the real fun!
Have you noticed that we barely talked about parallelism? I mean, don't we use threads to run related computations in parallel and thereby increase throughput? Well, not quiet.. Actually, if you only want parallelism, you don't need this machinery at all. You just create as many tasks as the number of processing units you have and none of the tasks has to be paused or resumed ever. You don't even need a scheduler because you don't multitask.
In a sense, parallelism is an implementation detail. If you think about it, implementers of the thread abstraction can utilize as many processors as they wish under the hood. You can just compile some well-written multithreaded code from 1950, run it on a multicore today and see that it utilizes all cores. Importantly, the programmer who wrote that code probably didn't anticipate that piece of code being run on a multicore.
You could even argue that threads are abused when they are used to achieve parallelism: Even though people know they don't need the core feature, multitasking, they use threads to get access to parallelism.
As a final thought, note that user-level threads alone cannot provide parallelism. Remember the quote from the beginning? Operating systems run programs inside a virtual computer (process) that is normally equipped with a single virtual processor (thread) by default. No matter what magic you do in user space, if your virtual computer has only a single virtual processor, you cannot run code in parallel.
So what do we want? Of course, we want parallelism. But we also want lightweight threads. Therefore, many implementers of the thread abstraction started to use a hybrid approach: They start as many kernel-level threads as there are processing units in the hardware and run many user-level threads on top of a few kernel-level threads. Essentially, parallelism is taken care of by the kernel-level and multitasking by the user-level threads.
Now, an interesting design decision is what threading interface a language exposes. Go, for example, provides a single interface that allows users to create hybrid threads, so called goroutines. There is no way to ask for, say, just a single kernel-level thread in Go. Other languages have separate interfaces for different kinds of threads. In Rust, kernel-level threads live in the standard library, while user-level and hybrid threads can be found in external libraries like async-std and tokio. In Python, the asyncio package provides user-level threads while multithreading and multiprocessing provide kernel-level threads. Interestingly, the threads multithreading provides cannot run in parallel. On the other hand, the threads multiprocessing provides can run in parallel but, as the library's name suggests, each kernel-level thread lives in a different process (virtual machine). This makes multiprocessing unsuitable for certain tasks because transferring data between different virtual machines is often slow.
Further resources:
Operating Systems: Principles and Practice by Thomas and Anderson
Concurrency is not parallelism by Rob Pike
Parallelism and concurrency need different tools
Asynchronous Programming in Rust
Inside Rust's Async Transform
Rust's Journey to Async/Await
What Color is Your Function?
Why goroutines instead of threads?
Why doesn't my program run faster with more CPUs?
John Reese - Thinking Outside the GIL with AsyncIO and Multiprocessing - PyCon 2018

Why would a Java function block on the monitor of Reference$Lock?

I'm looking into a program that hits some sort of concurrency bottleneck when it goes from 4 to 8 threads. Using Yourkit, I've watched the monitor profile, and it tells me that the threads are ending up blocked on the monitor of java.lang.ref.Reference$Lock while in a function which is not synchronized and not calling anything synchronized.
There are zero, no, zilch, references to the class Reference.Lock or Reference in my code. Not in the function at the top of the stack when the profiler shows the threads blocked, and not anywhere else. There are, in fact, no references to any of the subclasses of Reference in any of my code at all. So this question is about the internal working of the JVM, not about my code. Presumably, behind my back, new char[100] gets into this pickle, and I'm asking about how and why.
I'm using java 1.6.0_26 on a Linux box with 16 cores, otherwise relatively idle.
The function in at the top of the backtrace does allocate.

java threading memory management issues

I'm working on a program right now that is essentially this: there is a 4 way stop with cars arriving on each road at random times. Each road is served FCFS and the intersection is managed round robin style, 1 car crossing at a time. Each waiting car is a thread. I've gotten the thread synchronization and algorithm working no problem. The issue I can't quite figure out is how to prevent the error: OutOfMemoryError: unable to create new native thread. I realize that this is due to the heap (stack? I always get them switched) becoming full. I can't figure out a way to ensure executed threads are properly managed by the garbage collector and not lingering in memory after execution. I've tried setting my queues (each "road" with the car threads) up with soft references and nulling any hard references out to no avail. Anyone on here have experience with this!? THANKS!!!

"OutOfMemoryError: unable to create new native thread" does not refer to heap memory. It won't help you nulling references or using soft/weak references. Furthermore, increasing the heap size can only make things worse.
Java uses native memory for thread stacks. Each time you start a thread, a new stack is allocated, outside of the JVM heap. The stack is not released until the thread terminates. Consider using less concurrent threads (you can control the number by using ThreadPoolExecutor for example), or maybe decrease the stack sizes (using -Xss{size}k)
See also this post, which details many types of out of memory errors.

Did you tried using a ThreadPool?
You can create a ThreadPool since Java 5 in which you decide how many threads the Vm should initialize for you algorithm. Threads are created and reused.
I had a similar problem. Threads are not deleted/removed by the GarbageCollector and somehow live for ever.

This will only happen if you have too many running threads. (Not just references to threads) Like #Markus, I would suggest you switch to a ThreadPool like ExecutionService as it will manage the creation of threads and it works.
BTW: The concurrency library dates back to 1998, but was only included in Java 5.0 (2005) so if you have to have an older version you can use either the backport or the original library.

How to reclaim the memory used by a Java thread stack?

I've been having this memory leak issue for days and I think I have some clues now. The memory of my java process keeps growing but yet the heap does not increase. I was told that this is possible if I create many threads, because Java threads uses memory outside of the heap.
My java process is a server type program so there are 1000-2000 threads. Created and deleted ongoing. How do I reclaim the memory used by a java thread? Do I simply erase all references to the thread object and make sure that this is terminated?

Yes. That is the answer. As long as there is an active reference to any Java object, then that object won't be garbage collected when it's done.
If you're creating and destroying threads and not pooling them, I think you have other issues as well.

From the Java API docs threads die when:
All threads that are not daemon threads have died, either by returning from the call to the run method or by throwing an exception that propagates beyond the run method.
Threads die when they return from their run() method. When they die they are candidates for garbage collection. You should make sure that your threads release all references to objects and exit the run() method.
I don't think that nulling references to your threads will really do the trick.
You should also check out the new threading facilities in Java 5 and up. Check the package java.util.concurrent in the API documentation here.
I also recommend you to check the book Concurrency in Practice. It's being priceless for me.

There are two things that will cause a Thread to be not garbage collected.
Any thread that is still alive will not be garbage collected. A thread is alive until the run method called by Thread.start() exits, either normally or by throwing an exception. Once this happens (and the thread's uncaught exception handler has finished), the thread is dead.
Any live reference to the Thread object for a thread will prevent it from being garbage collected. The live reference could be in your code, or if you are using thread pools, they could be part of the pool data structures.
The memory of my java process keeps growing but yet the heap does not increase.
That would be because each thread has a large (e.g. 1Mb) stack segment that is not allocated in the Java heap.
A thread's stack segment is only allocated when the thread is started, and released as soon as the thread terminates. The same also applies (I think) to the thread's thread-local map. A Thread object that is not "alive" doesn't use much memory at all.
So to sum it up. You appear to have lots of live threads. They won't be garbage collected as long as they are alive, and the only way to make them release their memory is to cause them to die ... somehow.
To reduce memory usage, you to need to do one or more of:
Look at the thread code (the run() methods, etc) to figure out why they are still hanging around.
Reduce the size of the thread stacks. (In theory, you can go as low as 64K ...)
Redesign your app so that it doesn't create thousands of threads. (Thread pools and some kind of work queue is one possible approach.)

That is a lot of threads, each of which imposes a memory overhead, and well as other resources for managing them (context switching etc). Use a profiler to view the thread activity - you'll likely find that most of the threads are idle most of the time.
I'd suggest the first step is to look at managing the threads using the thread pools provided by java.util.concurrent. Rather than creating threads, look to create tasks that are handed off to the pools. Tweak the pools until you have a much smaller number of threads that are kept reasonably busy. This may well resolve the memory issue; it will certainly improve performance.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.