I am studying java, and I remember reading somewhere that java objects, had some overhead inside the JVM, which was used for administration reasons by the virtual machine. So my question is, can someone tell me if and how I can get an object's total size in the HotSpot JVM, along with any overhead it may come with?
You can't get the overhead directly. The amount of overhead is implementation dependent, and can vary based on a number of factors (e.g. the precise JVM version, and whether you are on a 32 or 64bit JVM).
However it is reasonably safe to assume that in typical modern JVM implementations like HotSpot, the overhead per object is between 8 and 16 bytes. Arrays typically have an overhead that is 4 bytes larger than other objects (to contain the integer array length).
See also:
In Java, what is the best way to determine the size of an object?
Memory usage of Java objects: general guide
I found this article rather informative, although I had some doubts by some of the values mentioned in the tables
Here is a snippet for object header, object overhead, array header, object reference. Hope it helps someone, if not the OP as it is a quite old question.
private static int OBJ_HEADER;
private static int ARR_HEADER;
private static int INT_FIELDS = 12;
private static int OBJ_REF;
private static int OBJ_OVERHEAD;
private static boolean IS_64_BIT_JVM;
static {
String arch = System.getProperty("sun.arch.data.model");
IS_64_BIT_JVM = (arch == null) || arch.contains("32");
OBJ_HEADER = IS_64_BIT_JVM ? 16 : 8;
ARR_HEADER = IS_64_BIT_JVM ? 24 : 12;
OBJ_REF = IS_64_BIT_JVM ? 8 : 4;
OBJ_OVERHEAD = OBJ_HEADER + INT_FIELDS + OBJ_REF + ARR_HEADER;
}
I should say that I know only the solution, but I haven't yet figured out why this works. This is why people should leave comments in their code... Oh, well, when I do figure it out, I will share the logic behind it.
Related
I have a class which I want to use some ten to 100 thousands of. Therefor I don't unnecessarily want to waste memory location for.
Only in a few (<100) of them I need 2 special variables. If I just declare them, but don't initialise them, do I need the same memory usage as if I would initialise them?
And if yes, do I have another option (besides making them an own class) to reduce memory usage?
This is my code example (name and propability I only need few times):
public class Neuron {
private String name;
private float propability;
private float connection[];
private float bias;
public Neuron(float[] connection, float bias) {
this.connection = connection;
this.bias = bias;
}
public Neuron(int propability, String name, float[] connection, float bias) {
this.name = name;
this.propability = propability;
this.connection = connection;
this.bias = bias;
}
//more code
}
I have to disagree a bit:
private float connection[];
private float bias;
The first one (the array) is a reference type. In other words: a (potential) pointer to some memory area. Obviously: as long as that pointer points to null ("nowhere"), no extra memory is required.
But make no mistake, your object itself needs to fit into memory. Meaning: when you instantiate a new Neuron object, then the JVM requests exactly that amount of memory it needs to store a Neuron object. This means: there is a bit of memory allocated to fit that array reference into it, and of course: the memory for your float primitive values, they are all immediately allocated.
It doesn't matter whether you have 0 or 100.00 or 10394.283 stored in that member field: because the JVM made sure that you have enough memory to fit in the required bits and bytes.
Thus: when you really have millions of objects, each float field in your object adds 32 bits. No matter where the value within that field is coming from.
Sure, if your arrays will later hold 5 or 10 or 1000 entries, then that will make up most of your memory consumption. But initially, when you just create millions of "empty" Neuron objects, you have to "pay" for each and any field that exists in your class.
Meaning: when only 1 out of 100 Neuron object will ever need these two fields, then you could decide to have:
A BaseNeuron class that doesn't have all 4 fields
one or more classes deriving from that class, adding the fields they need
Also note that this can also be the better choice from a design perspective: "empty" values always mean: extra code to deal with that. Meaning: if that array can be null ... then of course, all code dealing with that field has to check whether the array is null before using it. Compare that to: a class not having that array, versus a class where you know that the array is always set and ready to use.
I am not saying that you absolutely must change your design, but as explained: you can reduce your memory footprint, and you could make your design more clear by doing so.
Even if not used in the constructor :
public Neuron(float[] connection, float bias) {
this.connection = connection;
this.bias = bias;
}
all instance fields (so name and propability too) are initialized before that the constructor be executed.
These are initialized with a default value :
private String name; // null
private float propability; // 0F
But these default values cost nothing (null and 0).
So don't bother with that.
I have a Class which I want to use some ten to 100 thousands of.
Therefor I don't unnecessarily wanna waste memory location for.
And if yes, do I have another option (besides making them an own
class) to reduce memory usage?
If these objects have some common data, share these data between the instances.
The flightweight pattern that rely on immutability of shared data illustrates that practice.
The String objects use that.
Agreed completely with the remark of GhostCat : fields even not used consume memory. Not a lot but they consume. But that is right for many classes in Java.
For example we will not replace all lists by arrays because the arrays consume less in general. We will do that because in our specific case, the list memory occupation is a real worry.
To sum up, before optimizing and to change our design, I think that the first thing to do is measuring and to deem whether you want to optimize to gain gold or nuts.
With your actual code and that main() method that produces 1 millions of Neuron, I notice that about 131 Mo are consumed :
public static void main(String[] args) {
long beforeUsedMem=Runtime.getRuntime().totalMemory()-Runtime.getRuntime().freeMemory();
List<Neuron> neurons = new ArrayList<>();
for (int i = 0; i < 1_000_000; i++) {
neurons.add(new Neuron(new float[] { 0, 15.4F, 1.1F, 2.1F, 3.4F, 4.5F, 8.9F, 158.9F, 8648.9F, 80.9F, 10.9F, 1450.9F, 114.9F, 14.5F, 4444.4F }, 1.9F));
}
long afterUsedMem=Runtime.getRuntime().totalMemory()-Runtime.getRuntime().freeMemory();
long actualMemUsed=(afterUsedMem-beforeUsedMem)/1_000_000;
System.out.println(actualMemUsed + " Mo");
}
Objectively, it is low.
By removing the unused fields, it drops about to 124 Mo (7 Mo in less) :
private String name;
private float propability;
131 Mo as 124 Mo are quite low values for so many created objects : 1 million.
If the class declared a dozen of unused fields, things would be different. A no negligible amount of memory would be wasted and overall the class would be not clear at all in terms of design : low cohesion.
But there that is not the case.
Well, yes they make a difference but how significant that different is debatable according to your use case/jvm implementation/hardware resources you have. If you're running your application on a server with 500mb of ram and 1 cpu and your application is creating those objects at high rate (number of object over time) and the garbage collector is not able to cleanup after that rate, then yes eventually you will run into memory issues. So technically and by Java Language Specs, they take memory but practically and based in your use case it might not be any issue.
String secret="foo";
WhatILookFor.securelyWipe(secret);
And I need to know that it will not be removed by java optimizer.
A String cannot be "wiped". It is immutable, and short of some really dirty and dangerous tricks you cannot alter that.
So the safest solution is to not put the data into a string in the first place. Use a StringBuilder or an array of characters instead, or some other representation that is not immutable. (And then clear it when you are done.)
For the record, there are a couple of ways that you can change the contents of a String's backing array. For example, you can use reflection to fish out a reference to the String's backing array, and overwrite its contents. However, this involves doing things that the JLS states have unspecified behaviour so you cannot guarantee that the optimizer won't do something unexpected.
My personal take on this is that you are better off locking down your application platform so that unauthorized people can't gain access to the memory / memory dump in the first place. After all, if the platform is not properly secured, the "bad guys" may be able to get hold of the string contents before you erase it. Steps like this might be warranted for small amounts of security critical state, but if you've got a lot of "confidential" information to process, it is going to be a major hassle to not be able to use normal strings and string handling.
You would need direct access to the memory.
You really wouldn't be able to do this with String, since you don't have reliable access to the string, and don't know if it's been interned somewhere, or if an object was created that you don't know about.
If you really needed to this, you'd have to do something like
public class SecureString implements CharSequence {
char[] data;
public void wipe() {
for(int i = 0; i < data.length; i++) data[i] = '.'; // random char
}
}
That being said, if you're worried about data still being in memory, you have to realize that if it was ever in memory at one point, than an attacker probably already got it. The only thing you realistically protect yourself from is if a core dump is flushed to a log file.
Regarding the optimizer, I incredibly doubt it will optimize away the operation. If you really needed it to, you could do something like this:
public int wipe() {
// wipe the array to a random value
java.util.Arrays.fill(data, (char)(rand.nextInt(60000));
// compute hash to force optimizer to do the wipe
int hash = 0;
for(int i = 0; i < data.length; i++) {
hash = hash * 31 + (int)data[i];
}
return hash;
}
This will force the compiler to do the wipe. It makes it roughly twice as long to run, but it's a pretty fast operation as it is, and doesn't increase the order of complexity.
Store the data off-heap using the "Unsafe" methods. You can then zero over it when done and be certain that it won't be pushed around the heap by the JVM.
Here is a good post on Unsafe:
http://highlyscalable.wordpress.com/2012/02/02/direct-memory-access-in-java/
If you're going to use a String then I think you are worried about it appearing in a memory dump. I suggest using String.replace() on key-characters so that when the String is used at run-time it will change and then go out of scope after it is used and won't appear correctly in a memory dump. However, I strongly recommend that you not use a String for sensitive data.
So a number of variations of a question exist here on stackoverflow that ask how to measure the size of an object (for example this one). And the answers point to the fact, without much elaboration, that it is not possible. Can somebody explain in length why is it not possible or why does it not make sense to measure object sizes?
I guess from the tags that you are asking about measurements of object sizes in Java and C#. Don't know much about C# therefore the following only pertains to Java.
Also there is a difference between the shallow and detained size of a single object and I suppose you are asking about the shallow size (which would be the base to derive the detained size).
I also interpret your term managed environment that you only want to know the size of an object at runtime in a specific JVM (not for instance calculating the size looking only at source code).
My short answers first:
Does it make sense to measure object sizes? Yes it does. Any developer of an application which runs under memory constraints is happy to know the memory implications of class layouts and object allocations.
Is it impossible to measure in managed environments? No it is not. The JVM must know about the size of its objects and therefore must be able to report the size of an object. If we only had a way to ask for it.
Long answer:
There are plenty of reasons why the object size cannot be derived from the class definition alone, for example:
The Java language spec only gives lowerbound memory requirements for primitive types. A int consumes at least 4 bytes, but the real size is up to the VM.
Not sure what the language spec tells about the size of references. Is there any constraint on the number of possible objects in a JVM (which would have implications for the size of internal storage for object references)? Today's JVMs use 4 bytes for a reference pointer.
JVMs may (and do) pad the object bytes to align at some boundary which may extend the object size. Todays JVMs usually align object memory at a 8 byte boundary.
But all these reasons do not apply to a JVM runtime which uses actual memory layouts, eventually allows its generational garbage collector to push objects around, and must therefore be able to report object sizes.
So how do we know about object sizes at runtime?
In Java 1.5 we got java.lang.instrument.Instrumentation#getObjectSize(Object).
The Javadoc says:
Returns an implementation-specific approximation of the amount of
storage consumed by the specified object. The result may include some
or all of the object's overhead, and thus is useful for comparison
within an implementation but not between implementations. The estimate
may change during a single invocation of the JVM.
Reading with a grain of salt this tells me that there is a reasonable way to get the exact shallow size of an object during one point at runtime.
Getting size of object is easily possible.
Getting object size may have little overhead if the object is large and we use IO streams to get the size.
If you have to get size of larger objects very frequently, you have to be careful.
Have a look at below code.
import java.io.*;
class ObjectData implements Serializable{
private int id=1;;
private String name="sunrise76";
private String city = "Newyork";
private int dimensitons[] = {20,45,789};
}
public class ObjectSize{
public static void main(String args[]){
try{
ObjectData data = new ObjectData();
ByteArrayOutputStream b = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(b);
oos.writeObject(data);
System.out.println("Size:"+b.toByteArray().length);
}catch(Exception err){
err.printStackTrace();
}
}
}
I'm trying to understand what is the memory footprint of an object in Java. I read this and other docs on object and memory in Java.
However, when I'm using the sizeof Java library or visualvm, I get two different results where none of them feet what I could expect according to the previous reference (http://www.javamex.com).
For my test, I'm using Java SE 7 Developer Preview on a 64-bits Mac with java.sizeof 0.2.1 and visualvm 1.3.5.
I have three classes, TestObject, TestObject2, TestObject3.
public class TestObject
{
}
public class TestObject2 extends TestObject
{
int a = 3;
}
public class TestObject3 extends TestObject2
{
int b = 4;
int c = 5;
}
My main class:
public class memoryTester
{
public static void main(String[] args) throws Throwable
{
TestObject object1 = new TestObject();
TestObject2 object2 = new TestObject2();
TestObject3 object3 = new TestObject3();
int sum = object2.a + object3.b + object3.c;
System.out.println(sum);
SizeOf.turnOnDebug();
System.out.println(SizeOf.humanReadable(SizeOf.deepSizeOf(object1)));
System.out.println(SizeOf.humanReadable(SizeOf.deepSizeOf(object2)));
System.out.println(SizeOf.humanReadable(SizeOf.deepSizeOf(object3)));
}
}
With java.SizeOf() I get:
{ test.TestObject
} size = 16.0b
16.0b
{ test.TestObject2
a = 3
} size = 16.0b
16.0b
{ test.TestObject3
b = 4
c = 5
} size = 24.0b
24.0b
With visualvm I have:
this (Java frame) TestObject #1 16
this (Java frame) TestObject2 #1 20
this (Java frame) TestObject3 #1 28
According to documentations I read over Internet, as I'm in 64-bits, I should have an object header of 16 bytes, ok for TestObject.
Then for TestObject2 I should add 4 bytes for the integer field giving the 20 bytes, I should add again 4 bytes of padding, giving a total size of 24 bytes for TestObject2. Am I wrong?
Continuing that way for TestObject3, I have to add 8 bytes more for the two integer fields which should give 32 bytes.
VisualVm seems to ignore padding whereas java.sizeOf seems to miss 4 bytes as if there were included in the object header. I can replace an integer by 4 booleans it gives the same result.
Questions:
Why these two tools give different results?
Should we have padding?
I also read somewhere (I did'nt find back the link) that between a class and its subclass there could be some padding, is it right? In that case, an inherited tree of classes could have some memory overhead?
Finally, is there some Java spec/doc which details what Java is doing?
Thanks for your help.
Update:
To answer the comment of utapyngo, to get the size of the objects in visualvm, I create a heapdump, then in the "Classes" part I check the column "size" next after the column "instances". The number of instances if 1 for each kind of objects.
To answer comment of Nathaniel Ford, I initialized each fieds and then did a simple sum with them in my main method to make use of them. It didn't change the results.
Yes padding can happen. It is also possible for objects on the stack to get optimised out entirely. Only the JVM knows the exact sizes at any point in time. As such techniques to approximate the size from within the Java language all tend to disagree, tools that attach to the JVM tend to be the most accurate however. The three main techniques of implementing sizeOf within Java that I am aware of are:
serialize the object and return the length of those bytes
(clearly wrong, but useful for relative comparisons)
List item reflection,
and hard coded size constants for each field found on an object. can
be tuned to be kinda accurate but changes in the JVM and padding
that the JVM may or may not be doing will throw it.
List item create loads
of objects, run gc and compare changes in jvm heap size
None of these techniques are accurate.
If you are running on the Oracle JVM, on or after v1.5. Then there is a way to read the size of an object straight out of the C structure used by the Java runtime. Not a good idea for production, and get it wrong then you can crash the JVM. But here is a blog post that you may find interesting if you wish to have a go at it: http://highlyscalable.wordpress.com/2012/02/02/direct-memory-access-in-java/
As for documentation on what Java is actually doing, that is JVM specific, version specific and potentially configuration specific too. Each implementation is free to handle objects differently. Even to the extent of optimising objects out entirely, for example, objects that are not passed out from the stack are free not to be allocated on the heap. Some JVMs may even manage to keep the object within the CPU registers entirely. Not your case here, but I include it as an example as to why getting the true size of Java objects is tricky.
So best to take any sizeOf values that you get with a pinch of salt and treat it as a 'guideline' measurement only.
I have successfully written a new Chronology that represents my company's fiscal calendar, based off of JodaTime. I referred to the JodaTime source code quite a bit, to figure out what I needed to do. One of the things I noticed in the BasicChronology class was the use of the inner class YearInfo to cache the 'firstDayOfYearMillis' - the number of milliseconds since 1970-01-01 (ISO). Figuring that, if it was enough of a performance bottleneck that JodaTime was caching it, I should probably add it to my chronology too.
When I did so, though, I made some modifications. Specifically, I moved the getYearInfo method into the YearInfo inner class, as well as making it static. I also moved the array used to store the cached values into the inner class as well. Full definition of the modified class is like this:
/**
* Caching class for first-day-of-year millis.
*
*/
private static final class YearInfo {
/**
* Cache setup for first-day-of-year milliseconds.
*/
private static final int CACHE_SIZE = 1 << 10;
private static final int CACHE_MASK = CACHE_SIZE - 1;
private static transient final YearInfo[] YEAR_INFO_CACHE = new YearInfo[CACHE_SIZE];
/**
* Storage variables for cache.
*/
private final int year;
private final long firstDayMillis;
private final boolean isLeapYear;
/**
* Create the stored year information.
*
* #param inYear The year to store info about.
*/
private YearInfo(final int inYear) {
this.firstDayMillis = calculateFirstDayOfYearMillis(inYear);
this.isLeapYear = calculateLeapYear(inYear);
this.year = inYear;
}
/**
* Get year information.
*
* #param year The given year.
*
* #return Year information.
*/
private static YearInfo getYearInfo(final int year) {
YearInfo info = YEAR_INFO_CACHE[year & CACHE_MASK];
if (info == null || info.year != year) {
info = new YearInfo(year);
YEAR_INFO_CACHE[year & CACHE_MASK] = info;
}
return info;
}
}
My question is... What are the performance or design implications of my changes? I've already decided that my changes should be thread-safe (given answers about final member variables). But why was the original implementation done the way it was, and not like this? I get why most of the methods that are being used effectively staticly aren't (given subclasses of BasicChronology), but I'll admit that some of my OO design stuff is a little rusty (having spent the last two years using RPG).
So... thoughts?
Regarding correctness, by switching YEAR_INFO_CACHE to static, you've introduced a minor memory leak. There are a few ways to tell if your static references matter in practice, e.g. do a back-of-the-envelope approximation of how large the cache will grow based on what you know about the data; profile the heap during/after a load test of your application; etc.
You're caching such small objects that you probably can cache a lot of them without a problem. Still, if you find that the cache needs to be bounded, then you have a few options, such as an LRU cache, a cache based on soft references instead of direct (strong) references, etc. But again, I emphasize that for your particular situation, implementing either of these might be a waste of time.
To explain the theoretical problem with static references, I'll refer to other posts, rather than reproducing them here:
1. Are static fields open for garbage collection?
2. Can using too many static variables cause a memory leak in Java?
Also regarding correctness, the code is thread safe not because references are final, but rather because the YearInfo values created by multiple threads for some cache position must be equal, so it doesn't matter which one ends up in the cache.
Regarding design, all of the YearInfo related stuff in the original Joda code is private, so the YearInfo details including caching are well encapsulated. This is a good thing.
Regarding performance, the best thing to do is profile your code and see what's using a significant amount of CPU. For profiling, you want to see whether the time spent in this code matters in the context of your entire application. Run your app under load, and check if this particular part of the code matters. If you don't see a performance problem in this code even without the YearInfo cache, then it's probably not a good use of time to work on / worry about that cache. Here is some information about how to do the check:
1. Performance profiler for a java application
2. How to find CPU-intensive class in Java?
That said, the converse is true -- if what you've got is working, then leave it as is!
I wrote the original code that caches into YearInfo objects. Your solution to encapsulate more logic into the YearInfo class is perfectly fine and should perform just as well. I designed the YearInfo based on intent -- I wanted a crude data pair and nothing more. If Java supported structs I would have used one here.
As for the cache design itself, it was based on profiling results to see if it had any impact. In most places, Joda-Time lazily computes field values, and caching them for later did improve performance. Because this particular cache is fixed in size, it cannot leak memory. The maximum amount of memory it consumes is 1024 YearInfo objects, which is about 20k bytes.
Joda-Time is full of specialized caches like this, and all of them showed measurable performance improvement. I cannot say how effective these techniques are anymore, since they were written and tested against JDK 1.3.