Integer i = 3;
i = i + 1;
Integer j = i;
j = i + j;
How many objects are created as a result of the statements in the sample code above and why? Is there any IDE in which we can see how many objects are created (maybe in a debug mode)?
The answer, surprisingly, is zero.
All the Integers from -128 to +127 are pre-computed by the JVM.
Your code creates references to these existing objects.
The strictly correct answer is that the number of Integer objects created is indeterminate. It could be between 0 and 3, or 2561 or even more2, depending on
the Java platform3,
whether this is the first time that this code is executed, and
(potentially) whether other code that relies on boxing of int values runs before it4.
The Integer values for -128 to 127 are not strictly required to be precomputed. In fact, JLS 5.1.7 which specified the Boxing conversion says this:
If the value p being boxed is an integer literal of type int between -128 and 127 inclusive (§3.10.1) ... then let a and b be the results of any two boxing conversions of p. It is always the case that a == b.
Two things to note:
The JLS only requires this for >>literals<<.
The JLS does not mandate eager caching of the values. Lazy caching also satisfies the JLS's behavioral requirements.
Even the javadoc for Integer.valueof(int) does not specify that the results are cached eagerly.
If we examine the Java SE source code for java.lang.Integer from Java 6 through 8, it is clear that the current Java SE implementation strategy is to precompute the values. However, for various reasons (see above) that is still not enough to allow us to give a definite answer to the "how many objects" question.
1 - It could be 256 if execution of the above code triggers class initialization for Integer in a version of Java where the cache is eagerly initialized during class initialization.
2 - It could be even more, if the cache is larger than the JVM spec requires. The cache size can be increased via a JVM option in some versions of Java.
3 - In addition to the platform's general approach to implementing boxing, a compiler could spot that some or all of the computation could be done at compile time or optimized it away entirely.
4 - Such code could trigger either lazy or eager initialization of the integer cache.
First of all: The answer you are looking for is 0, as others already mentioned.
But let's go a bit deeper. As Stephen menthioned it depends on the time you execute it. Because the cache is actually lazy initialized.
If you look at the documentation of java.lang.Integer.IntegerCache:
The cache is initialized on first usage.
This means that if it is the first time you call any Integer you actually create:
256 Integer Objects (or more: see below)
1 Object for the Array to store the Integers
Let's ignore the Objects needed for Store the Class (and Methods / Fields). They are anyway stored in the metaspace.
From the second time on you call them, you create 0 Objects.
Things get more funny once you make the numbers a bit higher. E.g. by the following example:
Integer i = 1500;
Valid options here are: 0, 1 or any number between 1629 to 2147483776 (this time only counting the created Integer-values.
Why? The answer is given in the next sentence of Integer-Cache definition:
The size of the cache may be controlled by the -XX:AutoBoxCacheMax= option.
So you actually can vary the size of the cache which is implemented.
Which means you can reach for above line:
1: new Object if your cache is smaller than 1500
0: new Objects if your cache has been initialized before and contains 1500
1629: new (Integer) - Objects if your cache is set to exactly 1500 and has not been initialized yet. Then Integer-values from -128 to 1500 will be created.
As in the sentence above you reach any amount of integer Objects here up to: Integer.MAX_VALUE + 129, which is the mentioned: 2147483776.
Keep in mind: This is only guaranteed on Oracle / Open JDK (i checked Version 7 and 8)
As you can see the completely correct answer is not so easy to get. But just saying 0 will make people happy.
PS: using the menthoned parameter can make the following statement true: Integer.valueOf(1500) == 1500
The compiler unboxes the Integer objects to ints to do arithmetic with them by calling intValue() on them, and it calls Integer.valueOf to box the int results when they are assigned to Integer variables, so your example is equivalent to:
Integer i = Integer.valueOf(3);
i = Integer.valueOf(i.intValue() + 1);
Integer j = i;
j = Integer.valueOf(i.intValue() + j.intValue());
The assignment j = i; is a completely normal object reference assignment which creates no new objects. It does no boxing or unboxing, and doesn't need to as Integer objects are immutable.
The valueOf method is allowed to cache objects and return the same instance each time for a particular number. It is required to cache ints −128 through +127. For your starting number of i = 3, all the numbers are small and guaranteed to be cached, so the number of objects that need to be created is 0. Strictly speaking, valueOf is allowed to cache instances lazily rather than having them all pre-generated, so the example might still create objects the first time, but if the code is run repeatedly during a program the number of objects created each time on average approaches 0.
What if you start with a larger number whose instances will not be cached (e.g., i = 300)? Then each valueOf call must create one new Integer object, and the total number of objects created each time is 3.
(Or, maybe it's still zero, or maybe it's millions. Remember that compilers and virtual machines are allowed to rewrite code for performance or implementation reasons, so long as its behavior is not otherwise changed. So it could delete the above code entirely if you don't use the result. Or if you try to print j, it could realize that j will always end up with the same constant value after the above snippet, and thus do all the arithmetic at compile time, and print a constant value. The actual amount of work done behind the scenes to run your code is always an implementation detail.)
You can debug the Integer.valueOf(int i) method to find out it by yourself.
This method is called by the autoboxing process by the compiler.
Related
In my book there is an example which explains the differences between arrays in Java and C.
In Java we can create an array by writing:
int[] a = new int[5];
This just allocates storage space on the stack for five integers and we can access them exactly as we would have done in Java
int a[5] = {0};
int i;
for (i = 0, i < 5; i++){
printf("%2d: %7d\n", i, a[i]);
}
Then the author says the following
Of course our program should not use a number 5 as we did on several places in the example, instead we use a constant. We can use the C preprocessor to do this:
#define SIZE 5
What are advantages of defining a constant SIZE 5?
Using a named constant is generally considered good practice because if it is used in multiple places, you only need to change the definition to change the value, rather than change every occurrence - which is error prone.
For example, as mentioned by stark in the comments, it is likely that you'll want to loop over an array. If the size of the array is defined by a named constant called SIZE, then you can use that in the loop bounds. Changing the size of the array then only requires changing the definition of SIZE.
There is also the question of whether #define is really the right solution.
To borrow another comment, from Jonathan Leffer: see static const vs #define vs enum for a discussion of different ways of naming constants. While modern C does allow using a variable as an array size specifier, this technically results in a variable-length array which may incur a small overhead.
You should use a constant, because embedding magic numbers in code makes it harder to read and maintain. For instance, if you see 52 in some code, you don't know what it is. However, if you write #define DECKSIZE 52, then whenever you see DECKSIZE, you know exactly what it means. In addition, if you want to change the deck size, say 36 for durak, you could simply change one line, instead of changing every instance throughout the code base.
Well, imagine that you create a static array of 5 integer just like you did int my_arr [5]; ,you code a whole programm with it, but.. suddenly you realise that maybe you need more space. Imagine that you wrote a code of 6-700 lines, you MUST replace every occurence of you array with the fixed number of your choice. Every for loop, and everything that is related with the size of this array. You can avoid all of this using the preprocessor command #define which will replace every occurence of a "keyword" with the content you want, it's like a synonymous for something. Eg: #define SIZE 5 will replace in your code every occurence of the word SIZE with the value 5.
I find comments here to be superflous. As long as you use your constant (5 in this case) only once, it doesn't matter where it is. Moreover, having it in place improves readability. And you certainly do not need to use the constant in more than one place - afterall, you should infer the size of array through sizeof operator anyways. The benefit of sizeof approach is that it works seamlessly with VLAs.
The drawback of global #define (or any other global name) is that it pollutes global namespace. One should understand that global names is a resource to be used conservatively.
#define SIZE 5
This looks like an old outdated way of declaring constants in C code that was popular in dinosaur era. I suppose some lovers of this style are still alive.
The preferred way to declare constants in C languages nowadays is:
const int kSize = 5;
I know that -1, 0, 1, and 2 are exceptions to the magic number rule. However I was wondering if the same is true for when they are floats. Do I have to initialize a final variable for them or can I just use them directly in my program.
I am using it as a percentage in a class. If the input is less than 0.0 or greater than 1.0 then I want it set the percentage automatically to zero. So if (0.0 <= input && input <= 1.0).
Thank you
Those numbers aren't really exceptions to the magic number rule. The common sense rule (as far as there is "one" rule), when it isn't simplified to the level of dogma, is basically, "Don't use numbers in a context that doesn't make their meaning obvious." It just so happens that these four numbers are very commonly used in obvious contexts. That doesn't mean they're the only numbers where this applies, e.g. if I have:
long kilometersToMeters(int km) { return km * 1000L; }
there is really no point in naming the number: it's obvious from the tiny context that it's a conversion factor. On the other hand, if I do this in some low-level code:
sendCommandToDevice(1);
it's still wrong, because that should be a constant kResetCommand = 1 or something like it.
So whether 0.0 and 1.0 should be replaced by a constant completely depends on the context.
It really depends on the context. The whole point of avoiding magic numbers is to maintain the readability of your code. Use your best judgement, or provide us with some context so that we may use ours.
Magic numbers are [u]nique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants.
http://en.wikipedia.org/wiki/Magic_number_(programming)
Edit: When to document code with variables names vs. when to just use a number is a hotly debated topic. My opinion is that of the author of the Wiki article linked above: if the meaning is not immediately obvious and it occurs multiple times in your code, use a named constant. If it only occurs once, just comment the code.
If you are interested in other people's (strongly biased) opinions, read
What is self-documenting code and can it replace well documented code?
Usually, every rule has exceptions (and this one too). It is a matter of style to use some mnemonic names for these constants.
For example:
int Rows = 2;
int Cols = 2;
Is a pretty valid example where usage of raw values will be misleading.
The meaning of the magic number should be obvious from the context. If it is not - give the thing a name.
Attaching a name for something creates an identity. Given the definitions
const double Moe = 2.0;
const double Joe = 2.0;
...
double Larry = Moe;
double Harry = Moe;
double Garry = Joe;
the use of symbols for Moe and Joe suggests that the default value of Larry and Harry are related to each other in a way that the default value of Garry is not. The decision of whether or not to define a name for a particular constant shouldn't depend upon the value of that constant, but rather whether it will non-coincidentally appear multiple places in the code. If one is communicating with a remote device which requires that a particular byte value be sent to it to trigger a reset, I would consider:
void ResetDevice()
{
// The 0xF9 command is described in the REMOTE_RESET section of the
// Frobnitz 9000 manual
transmitByte(0xF9);
}
... elsewhere
myDevice.ResetDevice();
...
otherDevice.ResetDevice();
to be in many cases superior to
// The 0xF9 command is described in the REMOTE_RESET section of the
// Frobnitz 9000 manual
const int FrobnitzResetCode = 0xF9;
... elsewhere
myDevice.transmitByte(FrobnitzResetCode );
...
otherDevice.transmitByte(FrobnitzResetCode );
The value 0xF9 has no real meaning outside the context of resetting the Frobnitz 9000 device. Unless there is some reason why outside code should prefer to send the necessary value itself rather than calling a ResetDevice method, the constant should have no value to any code outside the method. While one could perhaps use
void ResetDevice()
{
// The 0xF9 command is described in the REMOTE_RESET section of the
// Frobnitz 9000 manual
int FrobnitzResetCode = 0xF9;
transmitByte(FrobnitzResetCode);
}
there's really not much point to defining a name for something which is in such a narrow context.
The only thing "special" about values like 0 and 1 is that used significantly more often than other constants like e.g. 23 in cases where they have no domain-specific identity outside the context where they are used. If one is using a function which requires that the first parameter indicates the number of additional parameters (somewhat common in C) it's better to say:
output_multiple_strings(4, "Bob", Joe, Larry, "Fred"); // There are 4 arguments
...
output_multiple_strings(4, "George", Fred, "James", Lucy); // There are 4 arguments
than
#define NUMBER_OF_STRINGS 4 // There are 4 arguments
output_multiple_strings(NUMBER_OF_STRINGS, "Bob", Joe, Larry, "Fred");
...
output_multiple_strings(NUMBER_OF_STRINGS, "George", Fred, "James", Lucy);
The latter statement implies a stronger connection between the value passed to the first method and the value passed to the second, than exists between the value passed to the first method and anything else in that method call. Among other things, if one of the calls needs to be changed to pass 5 arguments, it would be unclear in the second code sample what should be changed to allow that. By contrast, in the former sample, the constant "4" should be changed to "5".
Recently I noticed declaring an array containing 64 elements is a lot faster (>1000 fold) than declaring the same type of array with 65 elements.
Here is the code I used to test this:
public class Tests{
public static void main(String args[]){
double start = System.nanoTime();
int job = 100000000;//100 million
for(int i = 0; i < job; i++){
double[] test = new double[64];
}
double end = System.nanoTime();
System.out.println("Total runtime = " + (end-start)/1000000 + " ms");
}
}
This runs in approximately 6 ms, if I replace new double[64] with new double[65] it takes approximately 7 seconds. This problem becomes exponentially more severe if the job is spread across more and more threads, which is where my problem originates from.
This problem also occurs with different types of arrays such as int[65] or String[65].
This problem does not occur with large strings: String test = "many characters";, but does start occurring when this is changed into String test = i + "";
I was wondering why this is the case and if it is possible to circumvent this problem.
You are observing a behavior that is caused by the optimizations done by the JIT compiler of your Java VM. This behavior is reproducible triggered with scalar arrays up to 64 elements, and is not triggered with arrays larger than 64.
Before going into details, let's take a closer look at the body of the loop:
double[] test = new double[64];
The body has no effect (observable behavior). That means it makes no difference outside of the program execution whether this statement is executed or not. The same is true for the whole loop. So it might happen, that the code optimizer translates the loop to something (or nothing) with the same functional and different timing behavior.
For benchmarks you should at least adhere to the following two guidelines. If you had done so, the difference would have been significantly smaller.
Warm-up the JIT compiler (and optimizer) by executing the benchmark several times.
Use the result of every expression and print it at the end of the benchmark.
Now let's go into details. Not surprisingly there is an optimization that is triggered for scalar arrays not larger than 64 elements. The optimization is part of the Escape analysis. It puts small objects and small arrays onto the stack instead of allocating them on the heap - or even better optimize them away entirely. You can find some information about it in the following article by Brian Goetz written in 2005:
Urban performance legends, revisited: Allocation is faster than you think, and getting faster
The optimization can be disabled with the command line option -XX:-DoEscapeAnalysis. The magic value 64 for scalar arrays can also be changed on the command line. If you execute your program as follows, there will be no difference between arrays with 64 and 65 elements:
java -XX:EliminateAllocationArraySizeLimit=65 Tests
Having said that, I strongly discourage using such command line options. I doubt that it makes a huge difference in a realistic application. I would only use it, if I would be absolutely convinced of the necessity - and not based on the results of some pseudo benchmarks.
There are any number of ways that there can be a difference, based on the size of an object.
As nosid stated, the JITC may be (most likely is) allocating small "local" objects on the stack, and the size cutoff for "small" arrays may be at 64 elements.
Allocating on the stack is significantly faster than allocating in heap, and, more to the point, stack does not need to be garbage collected, so GC overhead is greatly reduced. (And for this test case GC overhead is likely 80-90% of the total execution time.)
Further, once the value is stack-allocated the JITC can perform "dead code elimination", determine that the result of the new is never used anywhere, and, after assuring there are no side-effects that would be lost, eliminate the entire new operation, and then the (now empty) loop itself.
Even if the JITC does not do stack allocation, it's entirely possible for objects smaller than a certain size to be allocated in a heap differently (eg, from a different "space") than larger objects. (Normally this would not produce quite so dramatic timing differences, though.)
So, for example, Notification has the following flag:
public static final int FLAG_AUTO_CANCEL = 0x00000010;
This is hexadecimal for the number 16. There are other flags with values:
0x00000020
0x00000040
0x00000080
Each time, it goes up by a power of 2. Converting this to binary, we get:
00010000
00100000
01000000
10000000
Hence, we can use a bitwise operators to determine which of the flags are present, etc, since each flag contains only one 1 and they are all in different locations.
Question:
This all makes perfect sense, but why not just use booleans? Is this merely stylistic, or are there memory or efficiency benefits?
EDIT:
I understand that by combining them, we can store a lot of information in a single int. Is this used solely so we can pass a lot of boolean type values in a single int instead of having to pass a ton of parameters? I don't mean to trivialize that, it's very convenient, but are there any other benefits?
What you're talking about is called a Bit Field. One advantage is that all the information can be contained in a single variable (with no overhead like that of an ArrayList). This is useful for keeping function signatures tidy, and will have some minor benefits with efficiency because of fewer stack operations, but probably this will be offset by additional bitshift operations. Additionally, you can use (for example) one byte to store 8 fields rather than wasting 7 additional bytes. You can also, if you're clever with it, perform several flag checks in a single operation.
Having said that, personal preference may see the list of booleans as cleaner or preferable. Bitfields are most common in embedded systems where space is limited or something of that nature.
In reference to your edit: it's storing the values of the flags in ints, but those are just reference constants-- you aren't editing those, you're sticking those bits into (or out of) the flags field, which is a single int. I don't really know why they chose a bitfield for this application; perhaps someone that grew up programming space-limited microcontrollers coded that specific class. The general consensus seems to be that bitfields shouldn't be included in new code.
This is a common idiom in C, where resource constraints are a much larger concern, and you usually see it in Java where the Java API is directly mapping an underlying well-known C API. However, it's not a great idea in Java for a wide number of reasons.
As of Java 5, most of the uses for one-bit bit fields are taken care of very nicely by EnumSet, which is internally implemented using a bit field (so it's extremely fast) but is type-safe, easy to read, and Iterable.
I am designing an entity class which has a field named "documentYear", which might have unsigned integer values such as 1999, 2006, etc. Meanwhile, this field might also be "unknown", that is, not sure which year the document is created.
Therefore, a nullable int type as in C# will be well suited. However, Java does not have a nullable feature as C# has.
I have two options but I don't like them both:
Use java.lang.Integer instead of the primitive type int;
Use -1 to present the "unknown" value
Does anyone have better options or ideas?
Update: My entity class will have tens of thousands of instances; therefore the overhead of java.lang.Integer might be too heavy for overall performance of the system.
Using the Integer class here is probably what you want to do. The overhead associated with the object is most likely (though not necessarily) trivial to your applications overall responsiveness and performance.
You're going to have to either ditch the primitive type or use some arbitrary int value as your "invalid year".
A negative value is actually a good choice since there is little chance of having a valid year that would cause an integer overflow and there is no valid negative year.
Tens of thousands of instances of Integer is not a lot. Consider expending a few hundred kilobytes rather than optimise prematurely. It's a small price to pay for correctness.
Beware of using sentinel values like null or 0. This basically amounts to lying, since 0 is not a year, and null is not an integer. A common source of bugs, especially if you at some point are not the only maintainer of the software.
Consider using a type-safe null like Option, sometimes known as Maybe. Popular in languages like Scala and Haskell, this is like a container that has one or zero elements. Your field would have the type Option<Integer>, which advertises the optional nature of your year field to the type system and forces other code to deal with possibly missing years.
Here's a library that includes the Option type.
Here's how you would call your code if you were using it:
partyLikeIts.setDocumentYear(Option.some(1999));
Option<Integer> y = doc.getDocumentYear();
if (y.isSome())
// This doc has a year
else
// This doc has no year
for (Integer year: y) {
// This code only executed if the document has a year.
}
Another option is to have an associated boolean flag that indicates whether or not your year value is valid. This flag being false would mean the year is "unknown." This means you have to check one primitive (boolean) to know if you have a value, and if you do, check another primitive (integer).
Sentinel values often result in fragile code, so it's worth making the effort to avoid the sentinel value unless you are very sure that it will never be a use case.
You can use a regular int, but use a value such as Integer.MAX_VALUE or Integer.MIN_VALUE which are defined constants as your invalid date. It is also more obvious that -1 or a low negative value that it is invalid, it will certainly not look like a 4 digit date that we are used to seeing.
If you have an integer and are concerned that an arbitrary value for null might be confused with a real value, you could use long instead. It is more efficient than using an Integer and Long.MIN_VALUE is no where near any valid int value.
For completeness, another option (definitely not the most efficient), is to use a wrapper class Year.
class Year {
public int year;
public Year(int year) { this.year = year; }
}
Year documentYear = null;
documentYear = new Year(2013);
Or, if it is more semantic, or you want multiple types of nullable ints (Other than Years), you can imitate C# Nullable primitives like so:
class Int {
public int value;
public Int(int value) { this.value = value; }
#Override
public String toString() { return value; }
}
Using the int primitive vs the Integer type is a perfect example of premature optimization.
If you do the math:
int = N(4)
Integer = N(16)
So for 10,000 ints it'll cost 40,000 bytes or 40k. For 10,000 ints it'll cost 160,000 bytes or 160K. If you consider the amount of memory required to process images/photos/video data that's practically negligible.
My suggestion is, quit wasting time prematurely optimizing based on variable types and look for a good data structure that'll make it easy to process all that data. Regardless of that you do, unless you define 10K primitive variables individually, it's going to end up on the heap anyway.
What's wrong with java.lang.Integer? It's a reasonable solution, unless you're storing very large amounts of this value maybe.
If you wanted to use primitives, a -1 value would be a good solution as well. The only other option you have is using a separate boolean flag, like someone already suggested. Choose your poison :)
PS: damn you, I was trying to get away with a little white lie on the objects vs structs. My point was that it uses more memory, similar to the boolean flag method, although syntactically the nullable type is is nicer of course. Also, I wasn't sure someone with a Java background would know what I meant with a struct.
java.lang.Integer is reasonable for this case. And it already implemented Serializable, so you can save just only the year field down to the HDD and load it back.
Another option might be to use a special value internally (-1 or Integer.MIN_VALUE or similar), but expose the integer as two methods:
hasValue() {
return (internalValue != -1);
}
getValue() {
if (internalValue == -1) {
throw new IllegalStateException(
"Check hasValue() before calling getValue().");
}
return internalValue;
}
If you're going to save memory, I would suggest packing several years in a single int. Thus 0 is nil. Then you can make assumptions in order to optimize. If you are working only with the current dates, like years 1970—2014, you can subtract 1969 from all of them and get into 1—55 range. Such values can be coded with only 6 bits. So you can divide your int which is always 32 bit, into 4 zones, with a year in there. This way you can pack 4 years in the range 1970—2226 into a single int. The more narrow your range is, like only 2000—2014 (4 bits), the more years you can pack in a single int.
You could use the #Nullable annotation if using java 7