How can everything be an object? [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
In Java for example there is the primitive data type "int" which represents a 32 Bit value and there is "Integer" which is just a class with a single "int" property (and some methods of course). That means that a Java "Integer" class still uses primitives behind the scenes. And that's the reason Java is not a pure object oriented programming language.
Where could a value be stored if there where no primitives? For example I imagine this pseudo class:
class Integer
{
private Integer i = 12;
public Integer getInteger
{
return this.Integer;
}
}
This would be recursive.
How can a programming language be implemented without primitives?
I appreciate any help solving my confusion.

Behind the scene always will be primitives because it just a bits in memory. But some languages hide primitives that You can work only with objects. Java allows You to work both with objects and primitives.

If you mean by primitives value types, then yes you can live without them as a user and use Integer instead of int and pay for the overhead of heap allocation and GC. But this doesn't come for free and you have to pay the cost. Primitives like 32-bit/64-bit integers and IEEE-754 floating points will always be faster because there is a hardware support for them.
From a compiler writer point of view you have to use what the machine supports to make things work.

LISP is a very simple functional language. The basic LISP did not have a primitive int and one solution to integers was to have successor of successor of successor of zero for 3.
This actually had some advantages, integers being open ended, no overflow so operations really commutative, associative, and so on. Some nice optimizations possible. And of course succ(succ(succ(zero))) could be encoded in a more tuple like way (probably better not in LISP).
In a later, normal, LISP '3' would be an atom, 123 would be such an atom, with math operators.
Then there are symbol manipulating languages (SNOBOL) that could do math on numerical strings ['4', '0'] * ['3'].
So names are objects (atoms) like a char 'a' or int 42.

It might help to show you the analogous code in a language that takes the "everything is an object" design principle much more seriously than Java does. Namely, Smalltalk. Imagine what it would be like if Java had only int, not Integer, but everything you used to need to use Integer for was possible with int. That's Smalltalk.
This is an excerpt of the code defining the SmallInteger class in Squeak 5.0:
Integer immediateSubclass: #SmallInteger
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'Kernel-Numbers'!
!SmallInteger commentStamp: 'eem 11/20/2014 08:41' prior: 0!
My instances are at least 31-bit numbers, stored in twos complement
form. The allowable range in 32-bits is approximately +- 10^9
(+- 1billion). In 64-bits my instances are 61-bit numbers,
stored in twos complement form. The allowable range is
approximately +- 10^18 (+- 1 quintillion). The actual
values are computed at start-up. See SmallInteger class startUp:,
minVal, maxVal.!
!SmallInteger methodsFor: 'arithmetic' stamp: 'di 2/1/1999 21:31'!
+ aNumber
"Primitive. Add the receiver to the argument and answer with the result
if it is a SmallInteger. Fail if the argument or the result is not a
SmallInteger.
Essential, No Lookup. See Object documentation whatIsAPrimitive."
<primitive: 1>
^ super + aNumber! !
!SmallInteger class methodsFor: 'instance creation' stamp: 'tk 4/20/1999 14:17'!
basicNew
self error: 'SmallIntegers can only be created by performing arithmetic'! !
Don't sweat the fine details of syntax or semantics. What you should get out of this is: SmallInteger is defined as an object class just like everything else in the language, and arithmetic operations are methods just like every other piece of code in the language. But it's a little odd. It has no instance variables, you can only create instances by performing arithmetic, and most of the methods look like they're being defined circularly.
"Under the hood", the implementation maps arithmetic to the appropriate machine instructions (the <primitive: 1> thing is a hint to the implementation about that) and stores SmallIntegers as nothing more than the integer itself. The restricted range, relative to the hardware, is because a couple of bits are reserved to mark memory words as integers, rather than pointers to objects ("tagged pointers").

Without being able to eventually access real data, (eg. primitives or actual bits) (directly or indirectly) on a machine, it is no longer a programming language, it is an Interface Description Language.

(I'll rephrase the question to what I believe you're asking. If you think I've got it wrong, feel free to comment.)
How can a type system that's based on composition and inheritance define any useful type, if there are no intrinsic types to start from? Unless the language implementation knows about at least one intrinsic type to start from, any defined types would be doomed to be either recursive or empty. Is this inevitable?
Yes, in every C-family language that I know of, this is pretty much inevitable.
If every type is composed of other types then, at the very least, you need to have an intrinsic type to build upon - for example, an intrinsic type that represents a bit, in order to construct the byte type out of it through composition, then the word type, then various integer types, and so on. Then you'd need to define the operations that can be performed on these types, by manipulating the bits that make up their internal representation.
And even though all you need is one intrinsic type to build upon, it would likely be terribly inefficient - you don't want to waste space or CPU cycles and you do want to take advantage of the various storage locations and instructions that your target architecture offers, including FP registers and other stuff.
Thus, a good compromise between performance and "purity" is to offer in the language some intrinsic types that are likely to be recognizable by modern CPUs (like int32, int64, float, double, etc) and build the rest of the type system upon them. In Java, they decided to call these intrinsic types primitives and make them separate from classes.

Eventually everything comes back to bits in memory and instructions to the computer. The difference between assembler, compiled, procedural, object oriented, and all the other things is how much abstraction there is between you and the bits and how much benefit (or cost) you get from that abstraction.

Related

Why does Java CharSequence.chars() return an IntStream? [duplicate]

In Java 8, there is a new method String.chars() which returns a stream of ints (IntStream) that represent the character codes. I guess many people would expect a stream of chars here instead. What was the motivation to design the API this way?
As others have already mentioned, the design decision behind this was to prevent the explosion of methods and classes.
Still, personally I think this was a very bad decision, and there should, given they do not want to make CharStream, which is reasonable, different methods instead of chars(), I would think of:
Stream<Character> chars(), that gives a stream of boxes characters, which will have some light performance penalty.
IntStream unboxedChars(), which would to be used for performance code.
However, instead of focusing on why it is done this way currently, I think this answer should focus on showing a way to do it with the API that we have gotten with Java 8.
In Java 7 I would have done it like this:
for (int i = 0; i < hello.length(); i++) {
System.out.println(hello.charAt(i));
}
And I think a reasonable method to do it in Java 8 is the following:
hello.chars()
.mapToObj(i -> (char)i)
.forEach(System.out::println);
Here I obtain an IntStream and map it to an object via the lambda i -> (char)i, this will automatically box it into a Stream<Character>, and then we can do what we want, and still use method references as a plus.
Be aware though that you must do mapToObj, if you forget and use map, then nothing will complain, but you will still end up with an IntStream, and you might be left off wondering why it prints the integer values instead of the strings representing the characters.
Other ugly alternatives for Java 8:
By remaining in an IntStream and wanting to print them ultimately, you cannot use method references anymore for printing:
hello.chars()
.forEach(i -> System.out.println((char)i));
Moreover, using method references to your own method do not work anymore! Consider the following:
private void print(char c) {
System.out.println(c);
}
and then
hello.chars()
.forEach(this::print);
This will give a compile error, as there possibly is a lossy conversion.
Conclusion:
The API was designed this way because of not wanting to add CharStream, I personally think that the method should return a Stream<Character>, and the workaround currently is to use mapToObj(i -> (char)i) on an IntStream to be able to work properly with them.
The answer from skiwi covered many of the major points already. I'll fill in a bit more background.
The design of any API is a series of tradeoffs. In Java, one of the difficult issues is dealing with design decisions that were made long ago.
Primitives have been in Java since 1.0. They make Java an "impure" object-oriented language, since the primitives are not objects. The addition of primitives was, I believe, a pragmatic decision to improve performance at the expense of object-oriented purity.
This is a tradeoff we're still living with today, nearly 20 years later. The autoboxing feature added in Java 5 mostly eliminated the need to clutter source code with boxing and unboxing method calls, but the overhead is still there. In many cases it's not noticeable. However, if you were to perform boxing or unboxing within an inner loop, you'd see that it can impose significant CPU and garbage collection overhead.
When designing the Streams API, it was clear that we had to support primitives. The boxing/unboxing overhead would kill any performance benefit from parallelism. We didn't want to support all of the primitives, though, since that would have added a huge amount of clutter to the API. (Can you really see a use for a ShortStream?) "All" or "none" are comfortable places for a design to be, yet neither was acceptable. So we had to find a reasonable value of "some". We ended up with primitive specializations for int, long, and double. (Personally I would have left out int but that's just me.)
For CharSequence.chars() we considered returning Stream<Character> (an early prototype might have implemented this) but it was rejected because of boxing overhead. Considering that a String has char values as primitives, it would seem to be a mistake to impose boxing unconditionally when the caller would probably just do a bit of processing on the value and unbox it right back into a string.
We also considered a CharStream primitive specialization, but its use would seem to be quite narrow compared to the amount of bulk it would add to the API. It didn't seem worthwhile to add it.
The penalty this imposes on callers is that they have to know that the IntStream contains char values represented as ints and that casting must be done at the proper place. This is doubly confusing because there are overloaded API calls like PrintStream.print(char) and PrintStream.print(int) that differ markedly in their behavior. An additional point of confusion possibly arises because the codePoints() call also returns an IntStream but the values it contains are quite different.
So, this boils down to choosing pragmatically among several alternatives:
We could provide no primitive specializations, resulting in a simple, elegant, consistent API, but which imposes a high performance and GC overhead;
we could provide a complete set of primitive specializations, at the cost of cluttering up the API and imposing a maintenance burden on JDK developers; or
we could provide a subset of primitive specializations, giving a moderately sized, high performing API that imposes a relatively small burden on callers in a fairly narrow range of use cases (char processing).
We chose the last one.

Why is String.chars() a stream of ints in Java 8?

In Java 8, there is a new method String.chars() which returns a stream of ints (IntStream) that represent the character codes. I guess many people would expect a stream of chars here instead. What was the motivation to design the API this way?
As others have already mentioned, the design decision behind this was to prevent the explosion of methods and classes.
Still, personally I think this was a very bad decision, and there should, given they do not want to make CharStream, which is reasonable, different methods instead of chars(), I would think of:
Stream<Character> chars(), that gives a stream of boxes characters, which will have some light performance penalty.
IntStream unboxedChars(), which would to be used for performance code.
However, instead of focusing on why it is done this way currently, I think this answer should focus on showing a way to do it with the API that we have gotten with Java 8.
In Java 7 I would have done it like this:
for (int i = 0; i < hello.length(); i++) {
System.out.println(hello.charAt(i));
}
And I think a reasonable method to do it in Java 8 is the following:
hello.chars()
.mapToObj(i -> (char)i)
.forEach(System.out::println);
Here I obtain an IntStream and map it to an object via the lambda i -> (char)i, this will automatically box it into a Stream<Character>, and then we can do what we want, and still use method references as a plus.
Be aware though that you must do mapToObj, if you forget and use map, then nothing will complain, but you will still end up with an IntStream, and you might be left off wondering why it prints the integer values instead of the strings representing the characters.
Other ugly alternatives for Java 8:
By remaining in an IntStream and wanting to print them ultimately, you cannot use method references anymore for printing:
hello.chars()
.forEach(i -> System.out.println((char)i));
Moreover, using method references to your own method do not work anymore! Consider the following:
private void print(char c) {
System.out.println(c);
}
and then
hello.chars()
.forEach(this::print);
This will give a compile error, as there possibly is a lossy conversion.
Conclusion:
The API was designed this way because of not wanting to add CharStream, I personally think that the method should return a Stream<Character>, and the workaround currently is to use mapToObj(i -> (char)i) on an IntStream to be able to work properly with them.
The answer from skiwi covered many of the major points already. I'll fill in a bit more background.
The design of any API is a series of tradeoffs. In Java, one of the difficult issues is dealing with design decisions that were made long ago.
Primitives have been in Java since 1.0. They make Java an "impure" object-oriented language, since the primitives are not objects. The addition of primitives was, I believe, a pragmatic decision to improve performance at the expense of object-oriented purity.
This is a tradeoff we're still living with today, nearly 20 years later. The autoboxing feature added in Java 5 mostly eliminated the need to clutter source code with boxing and unboxing method calls, but the overhead is still there. In many cases it's not noticeable. However, if you were to perform boxing or unboxing within an inner loop, you'd see that it can impose significant CPU and garbage collection overhead.
When designing the Streams API, it was clear that we had to support primitives. The boxing/unboxing overhead would kill any performance benefit from parallelism. We didn't want to support all of the primitives, though, since that would have added a huge amount of clutter to the API. (Can you really see a use for a ShortStream?) "All" or "none" are comfortable places for a design to be, yet neither was acceptable. So we had to find a reasonable value of "some". We ended up with primitive specializations for int, long, and double. (Personally I would have left out int but that's just me.)
For CharSequence.chars() we considered returning Stream<Character> (an early prototype might have implemented this) but it was rejected because of boxing overhead. Considering that a String has char values as primitives, it would seem to be a mistake to impose boxing unconditionally when the caller would probably just do a bit of processing on the value and unbox it right back into a string.
We also considered a CharStream primitive specialization, but its use would seem to be quite narrow compared to the amount of bulk it would add to the API. It didn't seem worthwhile to add it.
The penalty this imposes on callers is that they have to know that the IntStream contains char values represented as ints and that casting must be done at the proper place. This is doubly confusing because there are overloaded API calls like PrintStream.print(char) and PrintStream.print(int) that differ markedly in their behavior. An additional point of confusion possibly arises because the codePoints() call also returns an IntStream but the values it contains are quite different.
So, this boils down to choosing pragmatically among several alternatives:
We could provide no primitive specializations, resulting in a simple, elegant, consistent API, but which imposes a high performance and GC overhead;
we could provide a complete set of primitive specializations, at the cost of cluttering up the API and imposing a maintenance burden on JDK developers; or
we could provide a subset of primitive specializations, giving a moderately sized, high performing API that imposes a relatively small burden on callers in a fairly narrow range of use cases (char processing).
We chose the last one.

What are "typing models"?

In Beyond Java(Section 2.2.9), Brute Tate claims that "typing model" is one of the problems of C++. What does that mean?
What he means is that objects in C++ don't intrinsically have types. While you might write
struct Dog {
char* name;
int breed;
};
Dog ralph("Ralph", POODLE);
in truth ralph doesn't have a type; it's just a bunch of bits, and the CPU doesn't give a damn about the fact that you call that collection of bits a Dog. For example, the following is valid:
struct Cat {
int color;
char* country_of_origin;
};
Cat ralph_is_that_you = * (Cat*) &ralph;
Watch in wonder as Professor C performs trans-species mutations between Dogs and Cats! The point here is that since ralph is just a sequence of bits, you can just claim that that sequence of bits is really a Cat and nothing would go wrong... except that the "Cat"'s color would be some random large integer, and you better not try to read it's country of origin. The fundamental problems is that while a variable (as in, the name, not the object it represents) has a type, the underlying object does not.
Compare this with JAVA, where not only types, but objects have intrinsic types. This may be due partly to the fact that there are no pointers and thus no access to memory, but the fact nonetheless exists that if you cast a Dog to an Object, you can't cast it back down to a Cat, because the object knows that deep down, it's actually a Dog, not a Cat.
The weak typing present in C++ is rather detrimental, because it makes the compiler static type checks nearly useless if you want to truly abuse-proof your application, and also makes secure and robust software hard to write. For example, you need to be very careful whenever you access a "pointer" because it could really be any random bit pattern.
EDIT 1: The comments had very good points, and I'd like to add them here.
kts points out that Sun's JAVA does indeed have pointers if you look deeply enough. Thanks! I hadn't known, and that's rather cool. However, the fundamental point is that JAVA objects are typed and C types aren't. Yes, you can circumvent this, but this is the same as the difference between opt-in and opt-out spam: yes, you could abuse the JAVA pointers, but the default is that no abuse is possible. You'd have to opt in.
martin-york points out that the example I showed is a purely C phenomenon. This is true, but
C++ mostly contains C as a subset (the differences are usually too minor to list).
C++ includes reinterpret_cast<T> specifically to allow hacks like this.
Just because it's discouraged doesn't mean it isn't pervasive or dangerous. Basically, even if JAVA has opt-in pointers (as I'll call them), the fact is that the person using them has probably thought of the consequences. C's casts are so easy that they're at times done without thinking (To quote Stroustroup, "But the new syntax was made deliberately ugly, because casting is still an ugly and often unsafe operation."). There's also the fact that the work needed to circumvent JAVA's type system is far more than what would make for a clever hack, while circumventing the C type system (and, yes, the C++ type system) is easy enough that I've seen it done just for a minor performance boost.
Anyway, discouraging something doesn't make it not happen. I discourage bad coding, but I haven't seen it get me anywhere...
As for the feature being useful, it admittedly is (just look up "fast inverse square root" on Google or Wikipedia) but it is dangerous enough that, following Stroustroup's maxim that ugly operations should be ugly, the difficulty threshold should be significantly higher.
That it is hard to type C++ code. :-p
Seriously though, they are probably referring to the fact that C++ has a weak static type system, that can be easily circumvented. Some examples: typedefs are not real types, enumerated types are just ints, booleans and integers are equivalent in many cases, and so on.

Is Java fully object-oriented? [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 12 years ago.
Java has primitive data types which doesn't derive from object like in Ruby. So can we consider Java as a 100% object oriented language? Another question: Why doesn't Java design primitive data types the object way?
When Java first appeared (versions 1.x) the JVM was really, really slow.
Not implementing primitives as first-class objects was a compromise they had taken for speed purposes, although I think in the long run it was a really bad decision.
"Object oriented" also means lots of things for lots of people.
You can have class-based OO (C++, Java, C#), or you can have prototype-based OO (Javascript, Lua).
100% object oriented doesn't mean much, really. Ruby also has problems that you'll encounter from time to time.
What bothers me about Java is that it doesn't provide the means to abstract ideas efficiently, to extend the language where it has problems. And whenever this issue was raised (see Guy Steele's "Growing a Language") the "oh noes, but what about Joe Sixpack?" argument is given. Even if you design a language that prevents shooting yourself in the foot, there's a difference between accidental complexity and real complexity (see No Silver Bullet) and mediocre developers will always find creative ways to shoot themselves.
For example Perl 5 is not object-oriented, but it is extensible enough that it allows Moose, an object system that allows very advanced techniques for dealing with the complexity of OO. And syntactic sugar is no problem.
No, because it has data types that are not objects (such as int and byte). I believe Smalltalk is truly object-oriented but I have only a little experience with that language (about two months worth some five years ago).
I've also heard claims from the Ruby crowd but I have zero experience with that language.
This is, of course, using the definition of "truly OO" meaning it only has objects and no other types. Others may not agree with this definition.
It appears, after a little research into Python (I had no idea about the name/object distinction despite having coded in it for a year or so - more fool me, I guess), that it may indeed be truly OO.
The following code works fine:
#!/usr/bin/python
i = 7
print id(i)
print type(i)
print i.__str__()
outputting:
6701648
<type 'int'>
7
so even the base integers are objects here.
To get to true 100% OO think Smalltalk for instance, where everything is an object, including the compiler itself, and even if statements: ifTrue: is a message sent to a Boolean with a block of code parameter.
The problem is that object-oriented is not really well defined and can mean a lot of things. This article explains the problem in more detail:
http://www.paulgraham.com/reesoo.html
Also, Alan Kay (the inventor of Smalltalk and author(?) of the term "object-oriented") famously said that he hadn't C++ in mind when thought about OOP. So I think this could apply to Java as well.
The language being fully OO (whatever that means) is desirable, because it means better orthogonality, which is a good thing. But given that Java is not very orthogonal anyway in other respects, the small bit of its OO incompleteness probably doesn't matter in practice.
Java is not 100% OO.
Java may going towards 99% OO (think of auto-boxing, Scala).
I would say Java is now 87% OO.
Why java doesn't design primitive data
types as object way ?
Back in the 90's there were Performance reasons and at the same time Java stays backward compatible. So they cannot take them out.
No, Java is not, since it has primitive data types, which are different from objects (they don't have methods, instance variables, etc.). Ruby, on the other hand, is completely OOP. Everything is an object. I can do this:
1.class
And it will return the class of 1 (Fixnum, which is basically a number). You can't do this in Java.
Java, for the reason you mentioned, having primitives, doesn't make it a purely object-oriented programming language. However, the enforcement of having every program be a class makes it very oriented toward object-oriented programming.
Ruby, as you mentioned, and happened to be the first language that came to my mind as well, is a language that does not have primitives, and all values are objects. This certainly does make it more object-oriented than Java. On the other hand, to my knowledge, there is no requirement that a piece of code must be associated with a class, as is the case with Java.
That said, Java does have objects that wrap around the primitives such as Integer, Boolean, Character and such. The reason for having primitives is probably the reason given in Peter's answer -- back when Java was introduced in the mid-90's, memory on systems were mostly in the double-digit megabytes, so having each and every value be an object was large overhead.
(Large, of course is relative. I can't remember the exact numbers, but an object overhead was around 50-100 bytes of memory. Definitely more than the minimum of several bytes required for primitive values)
Today, with many desktops with multiple gigabytes of memory, the overhead of objects are less of an issue.
"Why java doesn't design primitive data types as object way ?"
At Sun developer days, quite a few years ago I remember James Gosling answering this. He said that they'd liked to have totally abstracted away from primitives - to leave only standard objects, but then ran out of time and decided to ship with what they had. Very sad.
So can we consider java as 100% object
oriented language?
No.
Another question : Why java doesn't
design primitive data types as object
way?
Mainly for performance reasons, possibly also to be more familiar to people coming from C++.
One reason Java can't obviously do away with non-object primitives (int, etc.) is that it does not support native data members. Imagine:
class int extends object
{
// need some data member here. but what type?
public native int();
public native int plus(int x);
// several more non-mutating methods
};
On second thoughts, we know Java maintains internal data per object (locks, etc.). Maybe we could define class int with no data members, but with native methods that manipulate this internal data.
Remaining issues: Constants -- but these can be dealt with similarly to strings. Operators are just syntactical sugar and + and would be mapped do the plus method at compile time, although we need to be careful that int.plus(float) returns float as does float.plus(int), and so on.
Ultimately I think the justification for primitives is efficiency: the static analysis needed to determine that an int object can be treated purely as JVM integer value may have been considered too big a problem when the language was designed.
I'd say that full-OO languages are those which have their elements (classes, methods) accessible as objects to work with.
From this POV, Java is not fully OOP language, and JavaScript is (no matter it has primitives).
According to Concepts in Programming Languages book, there is something called Ingalls test, proposed by Dan Ingalls a leader of the Smalltalk group. That is:
Can you define a new kind of integer,
put your new integers into rectangles
(which are already part of the window
system), ask the system to blacken a
rectangle, and have everything work?
And again according to the book Smalltalk passes this test but C++ and Java do not. Unfortunately book is not available online but here are some supporting slides (slide 33 answers your question).
No. Javascript, for example, is.
What would those Integer and Long and Boolean classes be written in?
How would you write an ArrayList or HashMap without primitive arrays?
This is one of those questions that really only matters in an academic sense. Ruby is optimized to treat ints, longs, etc. as primitives whenever possible. Java just made this explicit. If Java had primitives be objects, there would be IntPrimitive, LongPrimitive, etc (by whatever name) classes. which would most likely be final without special methods (e.g. no IntPrimitive.factorial). Which would mean for most purposes they would be primitives.
Java clearly is not 100% OO. You can easily program it in a procedural style. Most people do. It's true that the libraries and containers tend not to be as forgiving of this paradigm.
Java is not fully object oriented. I would consider Smalltalk and Eiffel the most popular fully object oriented languages.

What's the limit to the number of members you can have in a java enum?

Assuming you have a hypothetical enum in java like this (purely for demonstration purposes, this isn't code i'm seriously expecting to use):
enum Example{
FIRST,
SECOND,
THIRD,
...
LAST;
}
What's the maximum number of members you could have inside that enum before the compiler stops you?
Secondly, is there any performance difference at runtime when your code is referencing an enum with say, 10 members as opposed to 100 or 1,000 (other than just the obvious memory overhead required to store the large class)?
The language specification itself doesn't have a limit. Yet, there are many limitations that classfile has that bound the number of enums, with the upper bound being aruond 65,536 (2^16) enums:
Number of Fields
The JVMS 4.1 specifies that ClassFile may have up to 65,536 (2^16) fields. Enums get stored in the classfile as static field, so the maximum number of enum values and enum member fields is 65,536.
Constant Pool
The JVMS also specifies that the Constant Pool may have up to 65,536. Constant Pools store all String literals, type literals, supertype, super interfaces types, method signatures, method names, AND enum value names. So there must be fewer than 2^16 enum values, since the names strings need to share that Constant Pool limit.
Static Method Initialization
The maximum limit for a method is 65,535 bytes (in bytecode). So the static initializer for the Enum has to be smaller than 64Kb. While the compiler may split it into different methods (Look at Bug ID: 4262078) to distribute the initializations into small blocks, the compiler doesn't do that currently.
Long story short, there is no easy answer, and the answer depends not only on the number of enum values there are, but also the number of methods, interfaces, and fields the enums have!
The best way to find out the answer to this type of question is to try it. Start with a little Python script to generate the Java files:
n = input()
print "class A{public static void main(String[] a){}enum B{"
print ','.join("C%d" % x for x in range(n))
print '}}'
Now try with 1,10,100,1000... works fine, then BAM:
A.java:2: code too large
C0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15,C16,C17,C18,C19,C20,C21,C22,...
Seems like I hit some sort of internal limit. Not sure if it's a documented limit, if it's dependent on the specific version of my compiler, or if its some system dependant limit. But for me the limit was around 3000 and appears to be related to the source code size. Maybe you could write your own compiler to bypass this limit.
The maximum number of enum values will I think be just under the 65536 maximum number of fields/constant pool entries in the class. (As I mentioned in a comment above, the actual values shouldn't take up constant pool entries: they can be "inlined" into the bytecode, but the names will.)
As far as the second question is concerned, there's no direct performance difference, but it's conceivable that there'll be small indirect performance differences, partly because of the class file size as you say. Another thing to bear in mind is that when you use enum collections, there are optimised versions of some of the classes for when all of the enum values fit within a certain range (a byte, as I recall). So yes, there could be a small difference. I woudln't get paranoid, though.
This is an extension of the comments to the original question.
There are multiple problems with having a LOT of enums.
The main reason is that when you have a lot of data it tends to change, or if not you often want to add new items. There are exemptions to this like unit conversions that would never change, but for the most part you want to read data like this from a file into a collection of classes rather than an enum.
To add new items is problematic because since it's an enum, you need to physically modify your code unless you are ALWAYS using the enums as a collection, and if you are ALWAYS using them as a collection, why make them enums at all?
The case where your data doesn't change--like "conversion units" where you are converting feet, inches, etc. You COULD do this as enums and there WOULD be a lot of them, but by coding them as enums you lose the ability to have data drive your program. For instance, a user could select from a pull-down list populated by your "Units", but again, this is not an "ENUM" usage, it's using it as a collection.
The other problem will be repetition around the references to your enum. You will almost certainly have something very repetitive like:
if(userSelectedCard() == cards.HEARTS)
graphic=loadFile("Heart.jpg");
if(userSelectedCard() == cards.SPADES)
graphic=loadFile("Spade.jpg");
Which is just wrong (If you can squint to where you can't read the letters and see this kind of pattern in your code, you KNOW you are doing it wrong).
If the cards were stored in a card collection, it would be easier to just use:
graphic=cards.getGraphicFor(userSelectedCard());
I'm not saying that this can't be done with an enum as well, but I am saying that I can't see how you would use these as enums without having some nasty code-block like the one I posted above.
I'm also not saying that there aren't cases for enums--there are lots of them, but when you get more than a few (7 was a good number), you're probably better off with some other structure.
I guess the exception is when you are modeling real-world stuff that has that many types and each must be addressed with different code, but even then you are probably better off using a data file to bind a name to some code to run and storing them in a hash so you can invoke them with code like: hash.get(nameString).executeCode(). This way, again, your "nameString" is data and not hard-coded, allowing refactoring elsewhere.
If you get in the habit of brutally factoring your code like this, you can reduce many programs by 50% or more in size.
If you have to ask, you're probably doing something wrong. The actual limit is probably fairly high, but an enum with more than 10 or so values would be highly suspect, I think. Break that up into related collections, or a type hierarchy, or something.

Categories

Resources