I am working on a project involving optimizing energy consumption within a system. Part of that project consists in allocating RAM memory based on locality, that is allocating memory segments for a program as close as possible to each other. Is there a way I can know where exactly is the position of the memory I allocate (the memory chips) and I was also wondering if it is possible to force allocation in a deterministic manner. I am interested in both Windows and Linux. Also, the project will be implemented in Java and .NET so I am interested in managed APIs to achieve this.
[I am aware that this might not translate into direct energy consumption reduction but the project is supposed to be a proof of concept.]
You're working at the wrong level of abstraction.
Java (and presumably .NET) refers to objects using handles, rather than raw pointers. The underlying Java VM can move objects around in virtual memory at any time; the Java application doesn't see any difference.
Win32 and Linux applications (such as the Java VM) refer to memory using virtual addresses. There is a mapping from virtual address to a physical address on a RAM chip. The kernel can change this mapping at any time (e.g. if the data gets paged to disk then read back into a different memory location) and applications don't see any difference.
So if you're using Java and .NET, I wouldn't change your Java/.NET application to achieve this. Instead, I would change the underlying Linux kernel, or possibly the Java VM.
For a prototype, one approach might be to boot Linux with the mem= parameter to restrict the kernel's memory usage to less than the amount of memory you have, then look at whether you can mmap the spare memory (maybe by mapping /dev/mem as root?). You could then change all calls to malloc() in the Java VM to use your own special memory allocator, which allocates from that free space.
For a real implementation of this, you should do it by changing the kernel and keeping userspace compatibility. Look at the work that's been done on memory hotplug in Linux, e.g. http://lhms.sourceforge.net/
If you want to try this in a language with a big runtime you'd have to tweak the implementation of that runtime or write a DLL/shared object to do all the memory management for your sample application. At which point the overall system behaviour is unlikely to be much like the usual operation of those runtimes.
The simplest, cleanest test environment to detect the (probably small) advantages of locality of reference would be in C++ using custom allocators. This environment will remove several potential causes of noise in the runtime data (mainly the garbage collection). You will also lose any power overhead associated with starting the CLR/JVM or maintaining its operating state - which would presumably also be welcome in a project to minimise power consumption. You will naturally want to give the test app a processor core to itself to eliminate thread switching noise.
Writing a custom allocator to give you one of the preallocated chunks on your current page shouldn't be too tough, but given that to accomplish locality of reference in C/C++ you would ordinarily just use the stack it seems unlikely there will be one you can just find, download and use.
In C/C++, if you coerce a pointer to an int, this tells you the address. However, under Windows and Linux, this is a virtual address -- the operating system determines the mapping to physical memory, and the memory management unit in the processor carries it out.
So, if you care where your data is in physical memory, you'll have to ask the OS. If you just care if your data is in the same MMU block, then check the OS documentation to see what size blocks it's using (4KB is usual for x86, but I hear kids these days are playing around with 16M giant blocks?).
Java and .NET add a third layer to the mix, though I'm afraid I can't help you there.
Is pre-allocating in bigger chunks (than needed) an option at all? Will it defeat the original purpose?
I think that if you want such a tide control over memory allocation you are better of using a compiled language such as C, the JVM, isolated the actual implementation of the language from the hardware, chip selection for data storage included.
The approach requires specialized hardware. In ordinary memory sticks and slots arrangements are designed to dissipate heat as even per chip as possible. For example 1 bit in every bus word per physical chip.
This is an interesting topic although I think it is waaaaaaay beyond the capabilities of managed languages such as Java or .NET. One of the major principals of those languages is that you don't have to manage the memory and consequently they abstract that away for you. C/C++ gives you better control in terms of actually allocating that memory, but even in that case, as referenced previously, the operating system can do some hand waving and indirection with memory allocation making it difficult to determine how things are allocated together. Even then, you make reference to the actual chips, that's even harder and I would imagine would be hardware-dependent. I seriously would consider utilizing a prototyping board where you can code at the assembly level and actually control every memory unit allocation explicitly without any interference from compiler optimizations or operating system security practices. That would give you the most meaningful results as it would give you the ability to control every aspect of the program and determine, definitively that any power consumption improvements are due to your algorithm rather than some invisible optimization performed by the compiler or operating system. I imagine this is some sort of research project (very intriguing) so spending ~$100 on a prototyping board would definitely be worth it in my opinion.
In .NET there is a COM interface exposed for profiling .NET applications that can give you detailed address information. I think you will need to combine this with some calls to the OS to translate virtual addresses though.
As zztop eluded to, the .NET CLR compacts memory everytime a garbage collection is done. Although for large objects, they are not compacted. These are objects on the large object heap. The large object heap can consist of many segments scattered around from OS calls to VirtualAlloc.
Here are a couple links on the profiling APIs:
http://msdn.microsoft.com/en-us/magazine/cc300553.aspx
David Broman's CLR Profiling API Blog
Related
In the spirit of question Java: Why does MaxPermSize exist?, I'd like to ask why the Oracle JVM uses a fixed upper limit for the size of its memory allocation pool.
The default is 1/4 of your physical RAM (with upper & lower limit); as a consequence, if you have a memory-hungry application you have to manually change the limit (parameter -Xmx), or your app will perform poorly, possible even crash with an OutOfMemoryError.
Why does this fixed limit even exist? Why does the JVM not allocate memory as needed, like native programs do on most operating systems?
This would solve a whole class of common problems with Java software (just Google to see how many hints there are on the net on solving problems by setting -Xmx).
Edit:
Some answers point out that this will protect the rest of the system from a Java program with a run-away memory leak; without the limit this would bring the whole system down by exhausting all memory. This is true. However, it is equally true for any other program, and modern OSes already let you limit the maximum memory for a programm (Linux ulimit, Windows "Job Objects"). So this does not really answer the question, which is "Why does the JVM do it differently from most other programs / runtime environments?".
Why does this fixed limit even exist? Why does the JVM not allocate memory as needed, like native programs do on most operating systems?
The reason is NOT that the GC needs to know before hand what the maximum heap size can be. The JVM is clearly capable of expanding its heap ... up to the maximum ... and I'm sure it would be a relatively small change to remove that maximum. (After all, other Java implementations do this.) And it would equally be possible to have a simple way to say "use as much memory as you like" to the JVM.
I'm sure that the real reason is to protect the host operating system against the effects of faulty Java applications using all available memory. Running with an unbounded heap is potentially dangerous.
Basically, many operating systems (e.g. Windows, Linux) suffer serious performance degradation if some application tries to use all available memory. On Linux for example, the system may thrash badly, resulting in everything on the system running incredibly slowly. In the worst case, the system won't be able to start new processes, and existing processes may start crashing when the operating system refuses their (legitimate) requests for more memory. Often, the only option is to reboot.
If the JVM ran with an unbounded heap by default, any time someone ran a Java program with a storage leak ... or that simply tried to use too much memory ... they would risk bringing down the entire operating system.
In summary, having a default heap bound is a good thing because:
it protects the health of your system,
it encourages developers / users to think about memory usage by "hungry" applications, and
it potentially allows GC optimizations. (As suggested by other answers: it is plausible, but I cannot confirm this.)
EDIT
In response to the comments:
It doesn't really matter why Sun's JVMs live within a bounded heap, where other applications don't. They do, and advantages of doing so are (IMO) clear. Perhaps a more interesting question is why other managed languages don't put a bound on their heaps by default.
The -Xmx and ulimit approaches are qualitatively different. In the former case, the JVM has full knowledge of the limits it is running under and gets a chance to manage its memory usage accordingly. In the latter case, the first thing a typical C application knows about it is when a malloc call fails. The typical response is to exit with an error code (if the program checks the malloc result), or die with a segmentation fault. OK, a C application could in theory keep track of how much memory it has used, and try to respond to an impending memory crisis. But it would be hard work.
The other thing that is different about Java and C/C++ applications is that the former tend to be both more complicated and longer running. In practice, this means that Java applications are more likely to suffer from slow leaks. In the C/C++ case, the fact that memory management is harder means that developers don't attempt to build single applications of that complexity. Rather, they are more likely to build (say) a complex service by having a listener process fork of child processes to do stuff ... and then exit. This naturally mitigates the effect of memory leaks in the child process.
The idea of a JVM responding "adaptively" to requests from the OS to give memory back is interesting. But there is a BIG problem. In order to give a segment of memory back, the JVM first has to clear out any reachable objects in the segment. Typically that means running the garbage collector. But running the garbage collector is the last thing you want to do if the system is in a memory crisis ... because it is pretty much guaranteed to generate a burst of virtual memory paging.
Hm, I'll try summarizing the answers so far.
There is no technical reason why the JVM needs to have a hard limit for its heap size. It could have been implemented without one, and in fact many other dynamic languages do not have this.
Therefore, giving the JVM a heap size limit was simply a design decision by the implementors. Second-guessing why this was done is a bit difficult, and there may not be a single reason. The most likely reason is that it helps protect a system from a Java program with a memory leak, which might otherwise exhaust all RAM and cause other apps to crash or the system to thrash.
Sun could have omitted the feature and simply told people to use the OS-native resource limiting mechanisms, but they probably wanted to always have a limit, so they implemented it themselves.
At any rate, the JVM needs to be aware of any such limit (to adapt its GC strategy), so using an OS-native mechanism would not have saved much programming effort.
Also, there is one reason why such a built-in limit is more important for the JVM than for a "normal" program without GC (such as a C/C++ program):
Unlike a program with manual memory management, a program using GC does not really have a well-defined memory requirement, even with fixed input data. It only has a minimum requirement, i.e. the sum of the sizes of all objects that are actually live (reachable) at a given point in time. However, in practice a program will need additional memory to hold dead, but not yet GCed objects, because the GC cannot collect every object right away, as that would cause too much GC overhead. So GC only kicks in from time to time, and therefore some "breathing room" is required on the heap, where dead objects can await the GC.
This means that the memory required for a program using GC is really a compromise between saving memory and having good througput (by letting the GC run less often). So in some cases it may make sense to set the heap limit lower than what the JVM would use if it could, so save RAM at the expense of performance. To do this, there needs to be a way to set a heap limit.
I think part of it has to do with the implementation of the Garbage Collector (GC). The GC is typically lazy, meaning it will only start really trying to reclaim memory internally when the heap is at its maximum size. If you didn't set an upper limit, the runtime would happily continue to inflate until it used every available bit of memory on your system.
That's because from the application's perspective, it's more performant to take more resources than exert effort to use the resources you already have to full utilization. This tends to make sense for a lot of (if not most) uses of Java, which is a server setting where the application is literally the only thing that matters on the server. It tends to be slightly less ideal when you're trying to implement a client in Java, which will run amongst dozens of other applications at the same time.
Remember that with native programs, the programmer typically requests but also explicitly cleans up resources. That isn't typically true with environments who do automatic memory management.
It is due to the design of the JVM. Other JVM's (like the one from Microsoft and some IBM ones) can use all the memory available in the system if needed, without an arbitrary limit.
I believe it allows for GC-optimizations.
I think that the upper limit for memory is is linked to the fact that JVM is a VM.
As any physical machine has a given (fixed) ammount of RAM so the VM has one.
The maximal size makes the JVM easier to manage by the operating system and ensures some performance gains(less swapping).
Sun' JVM also works in quite limited hardware architecture(embedded ARM systems) and there the management of resources is crucial.
One answer that no-one above gave is that the JVM uses both heap and non-heap memory pools. Putting an upper limit on the heap defines not only how much memory is available for the heap memory pools, but it also defines how much memory is available for NON-HEAP usages. I suppose that the JVM could just allocate non-heap at the top of virtual memory and heap at the bottom of virtual memory and grow both toward each other.
Non-heap memory includes the DLLs or SOs that comprise the JVM and any native code being used as well as compiled Java code, thread stacks, native objects, PermGen (meta-data about compiled classes), among other uses. I've seen Java programs crash because so much memory was given to the heap that the application ran out of non-heap memory. This is where I learned that it can be important to reserve memory for non-heap usages by not setting the heap to be too large.
This makes a much bigger difference in a 32-bit world where an application often has only 2GB of virtual address space than it does in a 64-bit world, of course.
Would it not make more sense to separate the upper bound that triggers GC and the maximum that can be allocated ? Once the memory allocated hits the upper-bound, GC can kick in and release some memory to the free pool.
sort of like how I clean my desk that I share with my co-worker. I have a large desk, and my threshold of how much junk I can tolerate on the table is much less than the size of my desk. I don't need to have fill up every available inch before I garbage collect.
I could also return some of the desk space that I using to my co-worker, who is sharing my desk....I understand jvms don't return memory back to the system after they've allocated it to themselves, but it does not have to be that way no ?
It does allocate memory as needed, up to -Xmx ;)
One reason I can think of is that once the JVM allocates an amount of memory for its heap, it will never let it go. So if your heap has no upper bound, the JVM may just grab all the free memory on the system and then never let it go.
The upper bound also tells the JVM when it needs to do a full garbage collection. If your app is still under the upper bound, the JVM will postpone garbage collection and let the memory footprint of your application grow.
Native programs can die due to out of memory errors as well since native applications also have a memory limit: the memory available on the system - the memory already held by other applications.
The JVM also needs a contiguous block of system memory in order for garbage collection to be performed efficiently.
EDIT
Contiguous memory claim or here
The JVM will apparently let some memory go, but it is rare with the default configuration.
Is there a way to share core library between Java processes (or other way to minimize JVM initial memory impact)
So here's my case. I'm playing with microservices. I'm runing quite a lot of them. I'm setting their heap for 128M as it's enough for them. But I've noticed that the Linux process is consuming much more.
If I understand correctly from here
Max memory = [-Xmx] + [-XX:MaxPermSize] + number_of_threads * [-Xss]
although I am using Java 8 so probably perm size is no longer the issue? or is it.
There is initial "core" JVM memory footprint... and I was wondering if you heard a way to somehow share that "core" memory between processes (as it's really the same). Or any way to deal with that extra cost when running many processes of java.
Conceptually you're asking if you can fork a JVM - since forking (generally) uses copy-on-write memory semantics this can be an effective space-saving measure. Unfortunately as discussed in this answer forking the JVM is not supported, and generally not practical. Non-Unix systems cannot fork efficiently, and there are numerous other side-effects that a forked JVM would have to resolve in messy ways. Theoretically you could probably fork a JVM process, but you'd be walking squarely into "undefined behavior" territory.
The "right" way to avoid JVM startup costs is to reduce the number of JVMs you need to start up in the first place. Java is a highly-concurrent language that supports shared access to common memory out of the box via its threading model. If you can refactor your code to run concurrently in the same JVM you'll see much better performance.
I'm just starting to learn programming. And as of now, I know a tad bit of memory management in Objective-C. It wasn't easy learning it.
So, just out of curiosity, is the memory management employed in major languages like C, C++, Java, etc., in any way similar to what I've learned?
Memory management comes in two distinct flavours: unmanaged and managed.
Unmanaged is C/C++ where the programmer is responsible for memory allocation.
Managed is like Java/.Net, where memory is allocated for you but cleaned up by the virtual machine ("garbage collected").
Within those two flavours you will find many variations.
No, it can vary significantly between platforms - and even within the same platform, there can be various different options. (e.g. in C++, you can use auto pointers, a Boehm GC, etc.)
Java and .NET have mostly similar memory management, mind you.
No, it is different.
In Java and .NET languages, there is the concept of Automatic Memory Management which involves Garbage Collectors. The implementation of Garbage Collectors again varies from language to language and platform to platform.
C/C++ do not have automatic memory management and it is programmer's responsibility to Manage memory himself.
In short, it is different for different languages.
I don't know how memory is managed in objective-c, but C and C++ use a manual memory management and Java has Garbage-collection built-in and doesn't allow manual memory-management. So they are very different.
Memory management approaches vary wildly across languages and platforms, not only in the level of programmer visibility and control, but also in implementation.
Even so, the basics of allocating and deallocating memory are roughly the same when you get down to the OS level. There are certainly differences, tweaks, and optimizations, but usually the programmer doesn't have to deal with such details.
Objective-C is an interesting hybrid, since version 2.0 of the language added opt-in garbage collection, but retains the ability to use reference counting (retain/release/autorelease) as well. In fact, the same code can run in either mode depending on compilation flags and the settings of other code loaded in the same process. This is atypical for programming languages — usually you get either managed (automatic) or unmanaged (manual) based on the code you write, and sometimes the language/platform doesn't provide a way for you to choose at all (e.g. Java).
One flavor is not necessarily better than the other, and there are still still occasional religious debates about whether "real programmers [don't] use garbage collection", but don't worry excessively about it. General knowledge of how various memory management approaches never hurt anyone, and it is generally sufficient to understand the approach for the language(s) in which you code.
G'day,
The only thing you could say that is similar about memory management in various languages is the objectives. Namely:
provide a fixed chunk of memory for a process to play in,
protect memory outside of this chunk from being accessed by the process,
provide a dynamic mechanism of allocation for variables/objects/functions/etc. within the process,
ensure that the memory allocations for these items is done on sensible boundaries, sensible being from a point of view of the processor,
provide a mechanism to free memory as required,
clean up (garbage collect) unused objects,
coalesce fragmented memory into contiguous pools of occupied memory,
etc.
Various languages and runtime environments provide mechanisms to implement at least some of these features.
HTH
cheers,
Why does the jvm require around 10 MB of memory for a simple hello world but the clr doesn't. What is the trade-off here, i.e. what does the jvm gain by doing this?
Let me clarify a bit because I'm not conveying the question that is in my head. There is clearly an architectural difference between the jvm and clr runtimes. The jvm has a significantly higher memory footprint than the clr. I'm assuming there is some benefit to this overhead otherwise why would it exist. I'm asking what the trade-offs are in these two designs. What benefit does the jvm gain from it's memory overhead?
I guess one reason is that Java has to do everything itself (another aspect of platform independence). For instance, Swing draws it's own components from scratch, it doesn't rely on the OS to draw them. That's all got to take place in memory. Lots of stuff that windows may do, but linux does not (or does differently) has to be fully contained in Java so that it works the same on both.
Java also always insists that it's entire library is "Linked" and available. Since it doesn't use DLLs (they wouldn't be available on every platform), everything has to be loaded and tracked by java.
Java even does a lot of it's own floating point since the FPUs often give different results which has been deemed unacceptable.
So if you think about all the stuff C# can delegate to the OS it's tied to vs all the stuff Java has to do for the OS to compensate for others, the difference should be expected.
I've run java apps on 2 embedded platforms now. One was a spectrum analyzer where it actually drew the traces, the other is set-top cable boxes.
In both cases, this minimum memory footprint hasn't been an issue--there HAVE been Java specific issues, that just hasn't been one. The number of objects instantiated and Swing painting speed were bigger issues in these cases.
I don't know if initial memory footprint or a footprint of a Hello World application is important. A difference might be due to the number and sizes of the libraries that are loaded by the JVM / CLR. There can also be an amount of memory that is preallocated for garbage collection pools.
Every application that I know off, uses a lot more then Hello World functionality. That will load and free memory thousands of times throughout the execution of the application. If you are interested in Memory Utilization differences of JVM vs CLR, here are a couple of links with good information
http://benpryor.com/blog/2006/05/04/jvm-vs-clr-memory-allocation/
Memory Management Case study (JVM & CLR)
Memory Management Case study is in Power Point. A very interesting presentation.
Seems like java is just using more virtual memory.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
amwise 20598 0.0 0.5 22052 5624 pts/3 Sl+ 14:59 0:00 mono Test.exe
amwise 20601 0.0 0.7 214312 7284 pts/2 Sl+ 15:00 0:00 java Program
I made a test program in C# and in Java that print the string "test" and waits for input. I believe that the resident set size (RSS) value more accurately shows the memory usage. The virtual memory useage (VSZ) is less meaningful.
As I understand it applications can reserve a ton of virtual memory without actually using any real memory. For example you can ask the VirtualAlloc function on Windows to either reserve or commit virtual memory.
EDIT:
Here is a pretty picture from my windows box:
alt text http://awise.us/images/mem.png
Each app was a simple printf followed by a getchar.
Lots of virtual memory usage by Java and CLR. The C version depends on just about nothing, so it's memory usage is tiny relatively.
I doubt it really matters either way. Just pick whichever platform you are more familiar with and then don't write terrible, memory-wasting code. I'm sure it will work out.
EDIT:
This VMMap tool from Microsoft might be useful in figureing out where memory is going.
The JVM counts all its shared libraries whether they use memory or not.
Task manager is rather unreliable when it comes to reporting the memory consumption of programs. You should take it as a guide.
JVM loads lots of unnecessary core classes on each run from rt.jar. Unfortunately, the inner-cross dependencies (java.lang <-> java.io) of java packages make it hard to do a partial runtime init. Not to mention the rt.jar itself is over 40MB, needs lots of time for lookup and decompress.
Post Java 6u10 seems to load things a bit smarter (it has a jqs.exe = java quick starter service to keep necessary data in memory and do a faster startup), still Java 7 is told to be better.
The Process Explorer in Windows reports the Private Bytes correctly (Private bytes are those memory regions, which are not shared by any dll).
A slightly bigger annoyance is that after 10 years, JVM still defaults to 64MB memory usage. It is really annoying to use -Xmx almost every time and cannot run demanding programs in jars with a simple double click (unless I alter the file extension assignment's command).
CLR is counted as part of the OS so the task manager doesn't report it's memory consumption under the application process.
I'm exploring the possibility of running a Java app on a machine with very large amounts of RAM (anywhere from 300GB to 15TB, probably on an SGI Altix 4700 machine), and I'm curious as to how Java's GC is likely to perform in this scenario.
I've heard that IBM's or JRockit's JVMs may be better suited to this than Sun's. Does anyone know of any research or data on JVM performance in this situation?
On the Sun JVM, you can use the option -XX:UseConcMarkSweepGC to turn on the Concurrent mark and sweep Collector, which will avoid the "stop the world" phases of the default GC algorithm almost completely, at the cost of a little bit more overhead.
The advise to use more than on VM on such a machine is IMHO outdated.
In real world applications you often have enough shared data so that the performance with the CMS and one JVM is better.
The question is: do you want to run within a single process (JVM) or not? If you do, then you're going to have a problem. Refer to Tuning Java Virtual Machines, Oracle Coherence User Guide and similar documentation. The rule of thumb I've operated by is try and avoid heaps larger than 1GB. Whereas a 512MB-1GB full GC might take less than a second. A 2-4GB full GC could potentially take 5 seconds or longer. Obvioiusly this depends on many factors but the moral of the story is that GC overhead does not scale linearly and once you get into the one second range performance then degrades rapidly.
Sun's JVM allows you to configure and optimize the heck out of garbage collection, but it's a science unto itself:
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
You might have to do some reading and research, but for that kind of machine, GC settings optimized for the machine and application probably make a big difference.
Since 5.0 the Hotspot JVM uses a concept know as Ergonomics to try to optimise the memory usage. This is based on more than just the sheer amount of memory available and effects heap sizes, generation sizes and garbage collection algorithms.
Start by having a read of this, which explains Ergonomics and more:
https://www.oracle.com/technetwork/java/javase/memorymanagement-whitepaper-150215.pdf
There's also a guy called Brian Goetz that's written numerous articles about how Java allocates and uses memory, all of which and more can be found here:
http://www.briangoetz.com/pubs.html
This is not at all answering your question, but if you plan do deploy a huge Java app you might be interested in looking into Azul Systems appliances. They say to be able to garbage-collect without creating a pause in the application up to a single 670 GB heap.
You might want to consider running a virtual Terracotta cluster on this machine.
The only people who can really tell you are SGI. Super computers don't behave like regular servers only bigger.
However, I have found that Java performs best when memory is local to the processors accessing it. Note: the GC needs to be able to walk the whole memory end to end. This means it doesn't scale well if you have a design which is like lots of computers stuck together which may be the case here. The memory module size is 32 GB, so you may get better performance if you limit your JVM to comfortably fit into this size.
The accepted answer for this post is rather old and is now outdated. As of September 2014, if you are using Java 7, you should probably switch to the GC1 collector. From the Java 7 update 4 release notes:
http://www.oracle.com/technetwork/java/javase/7u4-relnotes-1575007.html
"The G1 collector is targeted for applications that fully utilize the large amount of memory available in today's multiprocessor servers, while still keeping garbage collection latencies under control. Applications that require a large heap, have a big active data set, have bursty or non-uniform workloads or suffer from long Garbage Collection induced latencies should benefit from switching to G1."
Surely the answer as to how the GC's going to perform is "who cares?" ;-)