I suppose, that ThreadLocal variables are allocated in Thread Local allocation Buffer(s) or TLABs, am I right ?
I was not successful in finding any document stating what exactly makes some class stored in TLAB. If you know some, please post a link.
I was not successfull to find any document stating what exactly makes some class stored in TLAB. If you know some, please post a link.
Actually, the explanation is right there in the blog post you lnked to:
A Thread Local Allocation Buffer (TLAB) is a region of Eden that is used for allocation by a single thread. It enables a thread to do object allocation using thread local top and limit pointers, which is faster than doing an atomic operation on a top pointer that is shared across threads.
Every thread allocates memory from its own chunk of Eden, the "Generation 0" part of the heap. Pretty much everything is stored in the TLAB for a period of time - quite possibly your ThreadLocals, too - but they get moved away from there after a gen0 garbage collection. TLABs are there to make allocations faster, not to make the memory unaccessible from other threads. A more accessible description from the same blog you linked to is A little thread privacy, please.
No. Here how it is:
As of 1.4 each thread in Java has a field called threadLocals where the map is kept. Each threadLocal has an index to the structure, so it doesn't use hashCode(). Imagine an array and each ThreadLocal keep a slot index.
When the thread dies and there are no more references to it, the ThreadLocals are GC'd. Very simple idea.
You can implement your own ThreaLocal(s) by extending Thread and adding a field to hold the reference. Then cast the Thread to youw own class and take the data.
So it's not TLAB, it's still the heap like any other object.
Historically there were implementations w/ static WeakHashMap which were very slow to access the data.
It is my understanding that TLAB is used for object allocation of all small to medium objects. Your ThreadLocal won't be allocated any differently.
I'm pretty sure that this is up to the discretion of the JVM implementer. They could put the data in TLABs if they wanted to, or in a global table keyed by the thread ID. The Java Language Specification tends to be mute about these sorts of issues so that JVM authors can deploy Java on as many and as diverse platforms as possible.
i think only the pointer to it is, while the data itself resides in some other memory area. see http://blogs.oracle.com/jonthecollector/entry/the_real_thing and http://wikis.sun.com/display/MaxineVM/Threads#Threads-Threadlocalvariables
Related
I couldn't find a comprehensive source that would explain the concept in a clear manner. My understanding is that a thread is given some chunk of memory in the eden where it allocates new objects. A competing thread will end up having a somewhat consecutive chunk of eden. What happens if the first thread runs out of free area in its TLAB? Would it request a new chunk of eden?
The idea of a TLAB is to reduce the need of synchronization between threads. Using TLABs, this need is reduced as any thread has an area it can use and expect that it is the only thread using this area. Assuming that a TLAB can hold 100 objects, a thread would only need to aquire a lock for claiming more memory when allocating the 101st object. Without TLABs, this would be required for every object. The downside is of course that you potentially waste space.
Large objects are typically allocated outside of a TLAB as they void the advantage of reducing the frequency of synchronized memory allocation. Some objects might not even fit inside of a TLAB.
You can set the size of a TLAB using the -XX:TLABSize flag but generally I do not recommend you to mess with these settings unless you really discovered a problem that you can solve by that.
I'm reading up on different JVM implementations, and I'm wondering why a stack-based memory management isn't more widespread (not to be confused with escape analysis). Are any of you familiar with attempts on writing JVMs with stack-based memory management?
This is just not really practical.
As soon as you have multithreading, you have the need to share references to objects between threads. This means that threads need to hold references to other threads stacks, and these get invalidated as soon as the method that initially created the object returns.
The heap is effectively the shared area of memory that all threads in a process can see, so any object that needs to be seen by multiple threads naturally lives there.
Another way to say this is that the stack is private to a thread, whereas the heap is shared between them.
I know that I Java's Runtime object can report the JVM's memory usage. However, I need the memory usage for a certain thread. Any idea how to get this?
I appreciate your answer!
A Thread shares everything except its stack and the CPU cycles with all other threads in the VM. All objects created by the Thread are pooled with all the other objects.
The problem is to define what the memory usage of a Thread is. Is it only those objects it created? What if these objects subsequently are referenced by other threads? Do they only count half, then? What about objects created somewhere else, but are now referenced by this Thread?
I know of no tool trying to measure the memory consumption of separate Threads.
I see only disadvantage of this: you can get StackOverflow :) Why not use only Heap?
In Java, C, C++ the parameters to functions are passed on stack. The plain variables inside functions bodies are created in stack.
As I know the stack is limited per thread, has some default values, but relative low: 1-8 Mb.
Why not use the Heap instead of Stack. Both are in memory, just the OS make a separation from Address A to B is Heap and from C to D is Stack.
There are variable arguments. It says there are 10 variable of 4 byte each. If you read 11 than you maybe read some data a "memory" trash, and maybe exactly that you want for hacking or maybe you get a Segmentation fault ... if the OS detects you as bad boy. :) - So security can't be a reason for use Stack.
Performance is one of many reasons: memory in the stack is trivial to book-keep; it has no holes; it can be mapped directly into the cache; it is attached on a per-thread basis.
In contrast, memory in the heap is, well, a heap of stuff; it is more difficult to book-keep; it can have holes.
Check out this answer (excellent, in my opinion) explaining some other differences.
Others have already mentioned that the stack can be faster due to simplicity of incrementing/decrementing the stack pointer. This is, however, quite a ways from the whole story.
First of all, if you're using a garbage collector that compacts the heap (i.e., most modern collectors) allocation on the heap isn't much different from allocation on the stack. You simply keep a pointer to boundary between allocated and free memory, and to allocate some space, you just move that pointer, just like you would on the stack. Objects that will have extremely short lives (like the locals in most functions) cost next to nothing in a GC cycle too. Keeping a live object accessible takes (a little) work, but an object that's no longer accessible normally involves next to no work.
There is, however, often still a substantial advantage to using the stack for most variables. Many typical programs tend to run for fairly extended periods of time using nearly constant amounts of stack space. They enter one function, create some variables, use them for a while, pop them off the stack, then repeat the same cycle in another function.
This means most of the memory toward the top of the stack is almost always in the cache. Most function calls are re-using memory that was just vacated by the previous function call. By reusing the same memory continuously, you end up with considerably better cache usage.
By contrast, when you allocate items in the heap, you typically end up allocating separate space for nearly every item. You cache is in a constant state of "churn", throwing away the memory for objects you're no longer user to make space for newly allocated ones. Unless you use a minuscule heap, the chances of re-using an address while it's still in the cache are nearly nonexistent.
I'm sure this is answered a million times online, but...
Because you don't want every method call to be a memory allocation (slow). So, you pre-allocate your stack.
Some more reasons listed here (including security).
The answer is that you get holes when you allocate and de-allocate on the heap. This means that it gets more and more difficult to allocate memory since the places that are available are different sizes. The stack only reserves what is needed and gives it all back when you get out of scope. No hassle.
If everything was on the stack, each time you passed those values on, they would have to be copied. However, unlike the heap, it doesn't need to be cleverly managed - items on the heap require garbage collection.
So they work in two different ways that suit two different uses. The stack is a quick and lightweight home for values to be held for a short time whereas the heap allows you to pass objects around without copying them.
Neither stack nor heap is perfect for every scenario - that is why they both exist.
Using the heap requires "requesting" a bit of memory from the heap, using new or some similar function. Then, when it's finished, you delete the it again. This is very useful for variables that are long-lived and/or that take up quite a bit of space (or take up an "unknown at compile-time" space - for example if you read a string into a variable from a file, you don't necessarily know how much space it needs, and it's REALLY annoying to get a message from the program saying "String too large on line X in file Y").
On the other hand, the stack is "free" both when it comes to allocating and de-allocating (technically, any function that uses stack-space will need one extra instruction for the allocation of the stackspace, but compared to the several hundred or thousands that a call to new will involve, it's not noticeable). Of course, class objects will still have to have their respective constructors called, which may take almost any amount of time to complete, but that is true regardless of how/where the storage is allocated from.
Does the Java virtual machine ever move objects in memory, and if so, how does it handle updating references to the moved object?
I ask because I'm exploring an idea of storing objects in a distributed fashion (ie. across multiple servers), but I need the ability to move objects between servers for efficiency reasons. Objects need to be able to contain pointers to each-other, even to objects on remote servers. I'm trying to think of the best way to update references to moved objects.
My two ideas so far are:
Maintain a reference indirection somewhere that doesn't move for the lifetime of the object, which we update if the object moves. But - how are these indirections managed?
Keep a list of reverse-references with each object, so we know what has to be updated if the object is moved. Of course, this creates a performance overhead.
I'd be interested in feedback on these approaches, and any suggestions for alternative approaches.
In reference to the comment above about walking the heap.
Different GC's do it different ways.
Typically copying collectors when they walk the heap, they don't walk all of the objects in the heap. Rather they walk the LIVE objects in the heap. The implication is that if it's reachable from the "root" object, the object is live.
So, at this stage is has to touch all of the live objects anyway, as it copies them from the old heap to the new heap. Once the copy of the live objects is done, all that remains in the old heap are either objects already copied, or garbage. At that point the old heap can be discarded completely.
The two primary benefits of this kind of collector are that it compacts the heap during the copy phase, and that it only copies living objects. This is important to many systems because with this kind of collector, object allocation is dirt cheap, literally little more than incrementing a heap pointer. When GC happens, none of the "dead" objects are copied, so they don't slow the collector down. It also turns out in dynamic systems that there's a lot more little, temporary garbage, than there is long standing garbage.
Also, by walking the live object graph, you can see how the GC can "know" about every object, and keep track of them for any address adjustment purposes performed during the copy.
This is not the forum to talk deeply about GC mechanics, as it's a non-trivial problem, but that's the basics of how a copying collector works.
A generational copying GC will put "older" objects in different heaps, and those end up being collected less often than "newer" heaps. The theory is that the long lasting objects get promoted to older generations and get collected less and less, improving overall GC performance.
The keyword you're after is "compacting garbage collector". JVMs are permitted to use one, meaning that objects can be relocated. Consult your JVM's manual to find out whether yours does, and to see whether there are any command-line options which affect it.
The conceptually simplest way to explain compaction is to assume that the garbage collector freezes all threads, relocates the object, searches heap and stack for all references to that object, and updates them with the new address. Actually it's more complex than that, since for performance reasons you don't want to perform a full sweep with threads stalled, so an incremental garbage collector will do work in preparation for compaction whenever it can.
If you're interested in indirect references, you could start by researching weak and soft references in Java, and also the remote references used by various RPC systems.
I'd be curious to know more about your requirements. As another answer suggests, Terracotta may be exactly what you are looking for.
There is a subtle difference however between what Terracotta provides, and what you are asking for, thus my inquiry.
The difference is that as far as you are concerned, Terracotta does not provide "remote" references to objects - in fact the whole "remote" notion of RMI, JMS, etc. is entirely absent when using Terracotta.
Rather, in Terracotta, all objects reside in large virtual heap. Threads, whether on Node 1, or Node 2, Node 3, Node 4, etc all have access to any object in the virtual heap.
There's no special programming to learn, or special APIs, objects in the "virtual" heap have exactly the same behavior as objects in the local heap.
In short, what Terracotta provides is a programming model for multiple JVMs that operates exactly the same as a the programming model for a single JVM. Threads in separate nodes simply behave like threads in a single node - object mutations, synchronized, wait, notify all behave exactly the same across nodes as as across threads - there's no difference.
Furthermore, unlike any solution to come before it, object references are maintained across nodes - meaning you can use ==. It's all a part of maintaining the Java Memory Model across the cluster which is the fundamental requirement to make "regular" Java (e.g. POJOs, synchronized, wait/notify) work (none of that works if you don't / can't preserve object identity across the cluster).
So the question comes back to you to further refine your requiements - for what purpose do you need "remote" pointers?
(Practically) Any garbage collected system has to move objects around in memory to pack them more densely and avoid fragmentation problems.
What you are looking at is a very large and complex subject. I'd suggest you read up on existing remote object style API's: .NET remoting and going further back technologies like CORBA
Any solution for tracking the references will be complicated by having to deal with all the failure modes that exist in distributed systems. The JVM doesn't have to worry about suddenly finding it can't see half of its heap because a network switch glitched.
When you drill into the design I think a lot of it will come down to how you want to handle different failure cases.
Response to comments:
Your question talks about storing objects in a distributed fashion, which is exactly what .NET remoting and CORBA address. Admittedly neither technology supports migration of these objects (AFAIK). But they both deal extensively with the concepts of object identity which is a critical part of any distributed object system: how do different parts of the system know which objects they are talking about.
I am not overly familiar with the details of the Java garbage collector, and I'm sure the Java and .NET garbage collectors have a lot of complexity in them to achieve maximum performance with minimum impact on the application.
However, the basic idea for garbage collection is:
The VM stops all threads from running managed code
It performs a reachability analysis from the set of known 'roots': static variables, local variables on all the threads. For each object it finds it follows all references within the object.
Any object not identified by the reachability analysis is garbage.
Objects that are still alive can then be moved down in memory to pack them densely. This means that any references to these objects also have to be updated with the new address. By controlling when a garbage collect can occur the VM is able to guarantee that there are no object references 'in-the-air' (ie. being held in a machine register) that would cause a problem.
Once the process is complete the VM starts the threads executing again.
As a refinement of this process the VM can perform generational garbage collection, where separate heaps are maintained based on the 'age' of an object. Objects start in heap 0 and if they survive several GCs then the migrate to heap 1 and eventually to heap 2 (and so on - .NET supports 3 generations only though). The advantage of this is that the GC can run heap 0 collections very frequently, and not have to worry about doing the work to prove the long lived objects (which have ended up in heap 2) are still alive (which they almost certainly are).
There are other refinements to support concurrent garbage collection, and details around threads that are actually executing unmanaged code when the GC is scheduled that add a lot more complexity to this area.
sounds like you are looking for a distributed cache, something like terracotta or oracle's java objece cache (formerly tangersol).
If you are willing to go that deep down, you can take a look to JBoss Cache architecture docs and grab some of its source code as reference.
This is not exactly what you described, but it works very similar.
Here's the link.
http://www.jboss.org/jbosscache/
I hope this helps.