The difference between arrays in Java and C - java

In my book there is an example which explains the differences between arrays in Java and C.
In Java we can create an array by writing:
int[] a = new int[5];
This just allocates storage space on the stack for five integers and we can access them exactly as we would have done in Java
int a[5] = {0};
int i;
for (i = 0, i < 5; i++){
printf("%2d: %7d\n", i, a[i]);
}
Then the author says the following
Of course our program should not use a number 5 as we did on several places in the example, instead we use a constant. We can use the C preprocessor to do this:
#define SIZE 5
What are advantages of defining a constant SIZE 5?

Using a named constant is generally considered good practice because if it is used in multiple places, you only need to change the definition to change the value, rather than change every occurrence - which is error prone.
For example, as mentioned by stark in the comments, it is likely that you'll want to loop over an array. If the size of the array is defined by a named constant called SIZE, then you can use that in the loop bounds. Changing the size of the array then only requires changing the definition of SIZE.
There is also the question of whether #define is really the right solution.
To borrow another comment, from Jonathan Leffer: see static const vs #define vs enum for a discussion of different ways of naming constants. While modern C does allow using a variable as an array size specifier, this technically results in a variable-length array which may incur a small overhead.

You should use a constant, because embedding magic numbers in code makes it harder to read and maintain. For instance, if you see 52 in some code, you don't know what it is. However, if you write #define DECKSIZE 52, then whenever you see DECKSIZE, you know exactly what it means. In addition, if you want to change the deck size, say 36 for durak, you could simply change one line, instead of changing every instance throughout the code base.

Well, imagine that you create a static array of 5 integer just like you did int my_arr [5]; ,you code a whole programm with it, but.. suddenly you realise that maybe you need more space. Imagine that you wrote a code of 6-700 lines, you MUST replace every occurence of you array with the fixed number of your choice. Every for loop, and everything that is related with the size of this array. You can avoid all of this using the preprocessor command #define which will replace every occurence of a "keyword" with the content you want, it's like a synonymous for something. Eg: #define SIZE 5 will replace in your code every occurence of the word SIZE with the value 5.

I find comments here to be superflous. As long as you use your constant (5 in this case) only once, it doesn't matter where it is. Moreover, having it in place improves readability. And you certainly do not need to use the constant in more than one place - afterall, you should infer the size of array through sizeof operator anyways. The benefit of sizeof approach is that it works seamlessly with VLAs.
The drawback of global #define (or any other global name) is that it pollutes global namespace. One should understand that global names is a resource to be used conservatively.

#define SIZE 5
This looks like an old outdated way of declaring constants in C code that was popular in dinosaur era. I suppose some lovers of this style are still alive.
The preferred way to declare constants in C languages nowadays is:
const int kSize = 5;

Related

Java in Android Studio: Weirdness when writing value into array of user defined class

I am writing my first app in Android Studio, I am a self-taught novice. When I write data into the subscript of an array I have created as a user-defined class the value is written into an adjacent subscript as well! Have traced to some code where I move data down one position in the array, thought I could do this in one operation, but it seems this messes something up and I need to copy each member of the class individually.
Here is my class
class LeaderboardClass
{
public String DateTime;
public String UserName;
public long Milliseconds; //0 denotes not in use
}
Here is my array declaration
LeaderboardClass[] LeaderboardData = new LeaderboardClass[LeaderboardEntries];
I want to move some data from subscript j to subscript j+1
I tried
LeaderboardData[j + 1] = LeaderboardData[j];
I thought this would copy all the data from subscript j to j+1
Then when I subsequently write to the array I (subscript i) I get the correct entry I made, plus a duplicate entry in the adjacent subscript i+1.
When I rewrite the above to:
LeaderboardData[j + 1].UserName = LeaderboardData[j].UserName;
LeaderboardData[j + 1].DateTime = LeaderboardData[j].DateTime;
LeaderboardData[j + 1].Milliseconds = LeaderboardData[j].Milliseconds;
Everything else behaves as expected. So I was wondering exactly what is happening with my first (presumably incorrect) code?
Thanks.
In Java, there's a difference between primitive values and objects (instances of classes): Primitives are stored by value whereas objects are stored by reference. This means that your code would work as you expect if you were using integers. However, since you are using a class, the array merely stores the references to those objects. Hence, when you do LeaderboardData[j + 1] = LeaderboardData[j]; you are merely copying the reference of that object. Therefore, LeaderboardData[j + 1]and LeaderboardData[j] will point to the same object.
Sidenote: If you run your program with a debugger, you can actually see this in action:
The number behind the # denotes the reference number and if you look closely, you can see that the objects at indices 8 and 9 both have the reference #716.
To fix this, I would suggest that you use lists instead of arrays as they allow you to remove and add new entries. The standard list implementation is an ArrayList but in your use-case, a LinkedList might be more efficient.
Lastly, a closing notes on your code: For variable names (like DateTime, UserName or LeaderboardData should always start with a lowercase letter to distinguish them from classes. That way, you can avoid lots of confusion - especially because Java also has a built-in class called DateTime.

Why it's impossible to create an array of MAX_INT size in Java?

I have read some answers for this question(Why I can't create an array with large size? and https://bugs.openjdk.java.net/browse/JDK-8029587) and I don't understand the following.
"In the GC code we pass around the size of objects in words as an int." As I know the size of a word in JVM is 4 bytes. According to this, if we pass around the size of long array of large size (for example, MAX_INT - 5) in words as an int, we must get OutOfMemoryException with Requested array size exceeds VM limit because the size is too large for int even without size of header. So why arrays of different types have the same limit on max count of elements?
Only addressing the why arrays of different types have the same limit on max count of elements? part:
Because it doesn't matter to much in practical reality; but allows the code implementing the JVM to be simpler.
When there is only one limit; that is the same for all kinds of arrays; then you can deal all arrays with that code. Instead of having a lot of type-specific code.
And given the fact that the people that need "large" arrays can still create them; and only those that need really really large arrays are impacted; why spent that effort?
The answer is in the jdk sources as far as I can tell (I'm looking at jdk-9); also after writing it I am not sure if it should be a comment instead (and if it answers your question), but it's too long for a comment...
First the error is thrown from hotspot/src/share/vm/oops/arrayKlass.cpp here:
if (length > arrayOopDesc::max_array_length(T_ARRAY)) {
report_java_out_of_memory("Requested array size exceeds VM limit");
....
}
Now, T_ARRAY is actually an enum of type BasicType that looks like this:
public static final BasicType T_ARRAY = new BasicType(tArray);
// tArray is an int with value = 13
That is the first indication that when computing the maximum size, jdk does not care what that array will hold (the T_ARRAY does not specify what types will that array hold).
Now the method that actually validates the maximum array size looks like this:
static int32_t max_array_length(BasicType type) {
assert(type >= 0 && type < T_CONFLICT, "wrong type");
assert(type2aelembytes(type) != 0, "wrong type");
const size_t max_element_words_per_size_t =
align_size_down((SIZE_MAX/HeapWordSize - header_size(type)), MinObjAlignment);
const size_t max_elements_per_size_t =
HeapWordSize * max_element_words_per_size_t / type2aelembytes(type);
if ((size_t)max_jint < max_elements_per_size_t) {
// It should be ok to return max_jint here, but parts of the code
// (CollectedHeap, Klass::oop_oop_iterate(), and more) uses an int for
// passing around the size (in words) of an object. So, we need to avoid
// overflowing an int when we add the header. See CRs 4718400 and 7110613.
return align_size_down(max_jint - header_size(type), MinObjAlignment);
}
return (int32_t)max_elements_per_size_t;
}
I did not dive too much into the code, but it is based on HeapWordSize; which is 8 bytes at least. here is a good reference (I tried to look it up into the code itself, but there are too many references to it).

Are 0.0 and 1.0 considered magic numbers?

I know that -1, 0, 1, and 2 are exceptions to the magic number rule. However I was wondering if the same is true for when they are floats. Do I have to initialize a final variable for them or can I just use them directly in my program.
I am using it as a percentage in a class. If the input is less than 0.0 or greater than 1.0 then I want it set the percentage automatically to zero. So if (0.0 <= input && input <= 1.0).
Thank you
Those numbers aren't really exceptions to the magic number rule. The common sense rule (as far as there is "one" rule), when it isn't simplified to the level of dogma, is basically, "Don't use numbers in a context that doesn't make their meaning obvious." It just so happens that these four numbers are very commonly used in obvious contexts. That doesn't mean they're the only numbers where this applies, e.g. if I have:
long kilometersToMeters(int km) { return km * 1000L; }
there is really no point in naming the number: it's obvious from the tiny context that it's a conversion factor. On the other hand, if I do this in some low-level code:
sendCommandToDevice(1);
it's still wrong, because that should be a constant kResetCommand = 1 or something like it.
So whether 0.0 and 1.0 should be replaced by a constant completely depends on the context.
It really depends on the context. The whole point of avoiding magic numbers is to maintain the readability of your code. Use your best judgement, or provide us with some context so that we may use ours.
Magic numbers are [u]nique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants.
http://en.wikipedia.org/wiki/Magic_number_(programming)
Edit: When to document code with variables names vs. when to just use a number is a hotly debated topic. My opinion is that of the author of the Wiki article linked above: if the meaning is not immediately obvious and it occurs multiple times in your code, use a named constant. If it only occurs once, just comment the code.
If you are interested in other people's (strongly biased) opinions, read
What is self-documenting code and can it replace well documented code?
Usually, every rule has exceptions (and this one too). It is a matter of style to use some mnemonic names for these constants.
For example:
int Rows = 2;
int Cols = 2;
Is a pretty valid example where usage of raw values will be misleading.
The meaning of the magic number should be obvious from the context. If it is not - give the thing a name.
Attaching a name for something creates an identity. Given the definitions
const double Moe = 2.0;
const double Joe = 2.0;
...
double Larry = Moe;
double Harry = Moe;
double Garry = Joe;
the use of symbols for Moe and Joe suggests that the default value of Larry and Harry are related to each other in a way that the default value of Garry is not. The decision of whether or not to define a name for a particular constant shouldn't depend upon the value of that constant, but rather whether it will non-coincidentally appear multiple places in the code. If one is communicating with a remote device which requires that a particular byte value be sent to it to trigger a reset, I would consider:
void ResetDevice()
{
// The 0xF9 command is described in the REMOTE_RESET section of the
// Frobnitz 9000 manual
transmitByte(0xF9);
}
... elsewhere
myDevice.ResetDevice();
...
otherDevice.ResetDevice();
to be in many cases superior to
// The 0xF9 command is described in the REMOTE_RESET section of the
// Frobnitz 9000 manual
const int FrobnitzResetCode = 0xF9;
... elsewhere
myDevice.transmitByte(FrobnitzResetCode );
...
otherDevice.transmitByte(FrobnitzResetCode );
The value 0xF9 has no real meaning outside the context of resetting the Frobnitz 9000 device. Unless there is some reason why outside code should prefer to send the necessary value itself rather than calling a ResetDevice method, the constant should have no value to any code outside the method. While one could perhaps use
void ResetDevice()
{
// The 0xF9 command is described in the REMOTE_RESET section of the
// Frobnitz 9000 manual
int FrobnitzResetCode = 0xF9;
transmitByte(FrobnitzResetCode);
}
there's really not much point to defining a name for something which is in such a narrow context.
The only thing "special" about values like 0 and 1 is that used significantly more often than other constants like e.g. 23 in cases where they have no domain-specific identity outside the context where they are used. If one is using a function which requires that the first parameter indicates the number of additional parameters (somewhat common in C) it's better to say:
output_multiple_strings(4, "Bob", Joe, Larry, "Fred"); // There are 4 arguments
...
output_multiple_strings(4, "George", Fred, "James", Lucy); // There are 4 arguments
than
#define NUMBER_OF_STRINGS 4 // There are 4 arguments
output_multiple_strings(NUMBER_OF_STRINGS, "Bob", Joe, Larry, "Fred");
...
output_multiple_strings(NUMBER_OF_STRINGS, "George", Fred, "James", Lucy);
The latter statement implies a stronger connection between the value passed to the first method and the value passed to the second, than exists between the value passed to the first method and anything else in that method call. Among other things, if one of the calls needs to be changed to pass 5 arguments, it would be unclear in the second code sample what should be changed to allow that. By contrast, in the former sample, the constant "4" should be changed to "5".

Using int flags in lieu of booleans

So, for example, Notification has the following flag:
public static final int FLAG_AUTO_CANCEL = 0x00000010;
This is hexadecimal for the number 16. There are other flags with values:
0x00000020
0x00000040
0x00000080
Each time, it goes up by a power of 2. Converting this to binary, we get:
00010000
00100000
01000000
10000000
Hence, we can use a bitwise operators to determine which of the flags are present, etc, since each flag contains only one 1 and they are all in different locations.
Question:
This all makes perfect sense, but why not just use booleans? Is this merely stylistic, or are there memory or efficiency benefits?
EDIT:
I understand that by combining them, we can store a lot of information in a single int. Is this used solely so we can pass a lot of boolean type values in a single int instead of having to pass a ton of parameters? I don't mean to trivialize that, it's very convenient, but are there any other benefits?
What you're talking about is called a Bit Field. One advantage is that all the information can be contained in a single variable (with no overhead like that of an ArrayList). This is useful for keeping function signatures tidy, and will have some minor benefits with efficiency because of fewer stack operations, but probably this will be offset by additional bitshift operations. Additionally, you can use (for example) one byte to store 8 fields rather than wasting 7 additional bytes. You can also, if you're clever with it, perform several flag checks in a single operation.
Having said that, personal preference may see the list of booleans as cleaner or preferable. Bitfields are most common in embedded systems where space is limited or something of that nature.
In reference to your edit: it's storing the values of the flags in ints, but those are just reference constants-- you aren't editing those, you're sticking those bits into (or out of) the flags field, which is a single int. I don't really know why they chose a bitfield for this application; perhaps someone that grew up programming space-limited microcontrollers coded that specific class. The general consensus seems to be that bitfields shouldn't be included in new code.
This is a common idiom in C, where resource constraints are a much larger concern, and you usually see it in Java where the Java API is directly mapping an underlying well-known C API. However, it's not a great idea in Java for a wide number of reasons.
As of Java 5, most of the uses for one-bit bit fields are taken care of very nicely by EnumSet, which is internally implemented using a bit field (so it's extremely fast) but is type-safe, easy to read, and Iterable.

Maximum number of enum elements in Java

What is the maximum number of elements allowed in an enum in Java?
I wanted to find out the maximum number of cases in a switch statement. Since the largest primitive type allowed in switch is int, we have cases from -2,147,483,648 to 2,147,483,647 and one default case. However enums are also allowed... so the question..
From the class file format spec:
The per-class or per-interface constant pool is limited to 65535 entries by the 16-bit constant_pool_count field of the ClassFile structure (§4.1). This acts as an internal limit on the total complexity of a single class or interface.
I believe that this implies that you cannot have more then 65535 named "things" in a single class, which would also limit the number of enum constants.
If a see a switch with 2 billion cases, I'll probably kill anyone that has touched that code.
Fortunately, that cannot happen:
The amount of code per non-native, non-abstract method is limited to 65536 bytes by the sizes of the indices in the exception_table of the Code attribute (§4.7.3), in the LineNumberTable attribute (§4.7.8), and in the LocalVariableTable attribute (§4.7.9).
The maximum number of enum elements is 2746. Reading the spec was very misleading and caused me to create a flawed design with the assumption I would never hit the 64K or even 32K high-water mark. Unfortunately, the number is much lower than the spec seems to indicate. As a test, I tried the following with both Java 7 and Java 8: Ran the following code redirecting it to a file, then compiled the resulting .java file.
System.out.println("public enum EnumSizeTest {");
int max = 2746;
for ( int i=0; i<max; i++) {
System.out.println("VAR"+i+",");
}
System.out.println("VAR"+max+"}");
Result, 2746 works, and 2747 does not.
After 2746 entries, the compiler throws a code too large error, like
EnumSizeTest.java:2: error: code too large
Decompiling this Enum class file, the restriction appears to be caused by the code generated for each enum value in the static constructor (mostly).
Enums definitely have limits, with the primary (hard) limit around 32K values. They are subject to Java class maximums, both of the 'constant pool' (64K entries) and -- in some compiler versions -- to a method size limit (64K bytecode) on the static initializer.
'Enum' initialization internally, uses two constants per value -- a FieldRef and a Utf8 string. This gives the "hard limit" at ~32K values.
Older compilers (Eclipse Indigo at least) also run into an issue as to the static initializer method-size. With 24 bytes of bytecode required to instantiate each value & add it to the values array. a limit around 2730 values may be encountered.
Newer compilers (JDK 7 at least) automatically split large static initializers off into methods named " enum constant initialization$2", " enum constant initialization$3" etc so are not subject to the second limit.
You can disassemble bytecode via javap -v -c YourEnum.class to see how this works.
[It might be theoretically possible to write an "old-style" Enum class as handcoded Java, to break the 32K limit and approach close to 64K values. The approach would be to initialize the enum values by reflection to avoid needing string constants in the pool. I tested this approach and it works in Java 7, but the desirability of such an approach (safety issues) are questionable.]
Note to editors: Utf8 was an internal type in the Java classfile IIRC, it's not a typo to be corrected.
Well, on jdk1.6 I hit this limit. Someone has 10,000 enum in an xsd and when we generate, we get a 60,000 line enum file and I get a nice java compiler error of
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project framework: Compilation failure
[ERROR] /Users/dhiller/Space/ifp-core/framework/target/generated-sources/com/framework/util/LanguageCodeSimpleType.java:[7627,4] code too large
so quite possibly the limit is much lower than the other answers here OR maybe the comments and such that are generated are taking up too much space. Notice the line number is 7627 in the java compiler error though if the line limit is 7627, I wonder what the line length limit is ;) which may be similar. ie. the limits may be not be based on number of enums but line length limits or number of lines in the file limit so you would have rename enums to A, B, etc. to be very small to fit more enums into the enum.
I can't believe someone wrote an xsd with a 10,000 enum..they must have generated this portion of the xsd.
Update for Java 15+
In JDK 15, the maximal number of constants in enums was raised to about 4103: https://bugs.openjdk.java.net/browse/JDK-8241798
This was achieved by splitting the static initializer into two parts:
Before Java 15 (pseudocode):
enum E extends Enum<E> {
...
static {
C1 = new E(...);
C2 = new E(...);
...
CN = new E(...);
$VALUES = new E[N];
$VALUES[0] = C1;
$VALUES[1] = C2;
...
$VALUES[N-1] = CN;
}
}
Since Java 15:
enum E extends Enum<E> {
...
static {
C1 = new E(...);
C2 = new E(...);
...
CN = new E(...);
$VALUES = $values();
}
private static E[] $values() {
E[] array = new E[N];
array[0] = C1;
array[1] = C2;
...
array[N-1] = CN;
return array ;
}
}
This allowed the static initializer to contain more code (until it hits the 64 kilobytes limit) and therefore to initialize more enum constants.
The maximum size of any method in Java is 65536 bytes. While you can theoretically have a large switch or more enum values, its the maximum size of a method you are likely to hit first.
The Enum class uses an int to track each value's ordinal, so the max would be the same as int at best, if not much lower.
And as others have said, if you have to ask you're Doing It Wrong
There is no maximum number per se for any practical purposes. If you need to define thousands of enums in one class you need to rewrite your program.

Categories

Resources