Related
I am unable to get what are the scenarios where we need an immutable class.
Have you ever faced any such requirement? or can you please give us any real example where we should use this pattern.
The other answers seem too focused on explaining why immutability is good. It is very good and I use it whenever possible. However, that is not your question. I'll take your question point by point to try to make sure you're getting the answers and examples you need.
I am unable to get what are the scenarios where we need an immutable class.
"Need" is a relative term here. Immutable classes are a design pattern that, like any paradigm/pattern/tool, is there to make constructing software easier. Similarly, plenty of code was written before the OO paradigm came along, but count me among the programmers that "need" OO. Immutable classes, like OO, aren't strictly needed, but I going to act like I need them.
Have you ever faced any such requirement?
If you aren't looking at the objects in the problem domain with the right perspective, you may not see a requirement for an immutable object. It might be easy to think that a problem domain doesn't require any immutable classes if you're not familiar when to use them advantageously.
I often use immutable classes where I think of a given object in my problem domain as a value or fixed instance. This notion is sometimes dependent on perspective or viewpoint, but ideally, it will be easy to switch into the right perspective to identify good candidate objects.
You can get a better sense of where immutable objects are really useful (if not strictly necessary) by making sure you read up on various books/online articles to develop a good sense of how to think about immutable classes. One good article to get you started is Java theory and practice: To mutate or not to mutate?
I'll try to give a couple of examples below of how one can see objects in different perspectives (mutable vs immutable) to clarify what I mean by perspective.
... can you please give us any real example where we should use this pattern.
Since you asked for real examples I'll give you some, but first, let's start with some classic examples.
Classic Value Objects
Strings and integers are often thought of as values. Therefore it's not surprising to find that String class and the Integer wrapper class (as well as the other wrapper classes) are immutable in Java. A color is usually thought of as a value, thus the immutable Color class.
Counterexample
In contrast, a car is not usually thought of as a value object. Modeling a car usually means creating a class that has changing state (odometer, speed, fuel level, etc). However, there are some domains where it car may be a value object. For example, a car (or specifically a car model) might be thought of as a value object in an app to look up the proper motor oil for a given vehicle.
Playing Cards
Ever write a playing card program? I did. I could have represented a playing card as a mutable object with a mutable suit and rank. A draw-poker hand could be 5 fixed instances where replacing the 5th card in my hand would mean mutating the 5th playing card instance into a new card by changing its suit and rank ivars.
However, I tend to think of a playing card as an immutable object that has a fixed unchanging suit and rank once created. My draw poker hand would be 5 instances and replacing a card in my hand would involve discarding one of those instance and adding a new random instance to my hand.
Map Projection
One last example is when I worked on some map code where the map could display itself in various projections. The original code had the map use a fixed, but mutatable projection instance (like the mutable playing card above). Changing the map projection meant mutating the map's projection instance's ivars (projection type, center point, zoom, etc).
However, I felt the design was simpler if I thought of a projection as an immutable value or fixed instance. Changing the map projection meant having the map reference a different projection instance rather than mutating the map's fixed projection instance. This also made it simpler to capture named projections such as MERCATOR_WORLD_VIEW.
Immutable classes are in general much simpler to design, implement and use correctly. An example is String: the implementation of java.lang.String is significantly simpler than that of std::string in C++, mostly due to its immutability.
One particular area where immutability makes an especially big difference is concurrency: immutable objects can safely be shared among multiple threads, whereas mutable objects must be made thread-safe via careful design and implementation - usually this is far from a trivial task.
Update: Effective Java 2nd Edition tackles this issue in detail - see Item 15: Minimize mutability.
See also these related posts:
non-technical benefits of having string-type immutable
Downsides to immutable objects in Java?
Effective Java by Joshua Bloch outlines several reasons to write immutable classes:
Simplicity - each class is in one state only
Thread Safe - because the state cannot be changed, no synchronization is required
Writing in an immutable style can lead to more robust code. Imagine if Strings weren't immutable; Any getter methods that returned a String would require the implementation to create a defensive copy before the String was returned - otherwise a client may accidentally or maliciously break that state of the object.
In general it is good practise to make an object immutable unless there are severe performance problems as a result. In such circumstances, mutable builder objects can be used to build immutable objects e.g. StringBuilder
Hashmaps are a classic example. It's imperative that the key to a map be immutable. If the key is not immutable, and you change a value on the key such that hashCode() would result in a new value, the map is now broken (a key is now in the wrong location in the hash table.).
Java is practically one and all references. Sometimes an instance is referenced multiple times. If you change such an instance, it would be reflected into all its references. Sometimes you simply don't want to have this to improve robustness and threadsafety. Then an immutable class is useful so that one is forced to create a new instance and reassign it to the current reference. This way the original instance of the other references remain untouched.
Imagine how Java would look like if String was mutable.
Let's take an extreme case: integer constants. If I write a statement like "x=x+1" I want to be 100% confidant that the number "1" will not somehow become 2, no matter what happens anywhere else in the program.
Now okay, integer constants are not a class, but the concept is the same. Suppose I write:
String customerId=getCustomerId();
String customerName=getCustomerName(customerId);
String customerBalance=getCustomerBalance(customerid);
Looks simple enough. But if Strings were not immutable, then I would have to consider the possibility that getCustomerName could change customerId, so that when I call getCustomerBalance, I am getting the balance for a different customer. Now you might say, "Why in the world would someone writing a getCustomerName function make it change the id? That would make no sense." But that's exactly where you could get in trouble. The person writing the above code might take it as just obvious that the functions would not change the parameter. Then someone comes along who has to modify another use of that function to handle the case where where a customer has multiple accounts under the same name. And he says, "Oh, here's this handy getCustomer name function that's already looking up the name. I'll just make that automatically change the id to the next account with the same name, and put it in a loop ..." And then your program starts mysteriously not working. Would that be bad coding style? Probably. But it's precisely a problem in cases where the side effect is NOT obvious.
Immutability simply means that a certain class of objects are constants, and we can treat them as constants.
(Of course the user could assign a different "constant object" to a variable. Someone can write
String s="hello";
and then later write
s="goodbye";
Unless I make the variable final, I can't be sure that it's not being changed within my own block of code. Just like integer constants assure me that "1" is always the same number, but not that "x=1" will never be changed by writing "x=2". But I can be confidant that if I have a handle to an immutable object, that no function I pass it to can change it on me, or that if I make two copies of it, that a change to the variable holding one copy will not change the other. Etc.
We don't need immutable classes, per se, but they can certainly make some programming tasks easier, especially when multiple threads are involved. You don't have to perform any locking to access an immutable object, and any facts that you've already established about such an object will continue to be true in the future.
There are various reason for immutability:
Thread Safety: Immutable objects cannot be changed nor can its internal state change, thus there's no need to synchronise it.
It also guarantees that whatever I send through (through a network) has to come in the same state as previously sent. It means that nobody (eavesdropper) can come and add random data in my immutable set.
It's also simpler to develop. You guarantee that no subclasses will exist if an object is immutable. E.g. a String class.
So, if you want to send data through a network service, and you want a sense of guarantee that you will have your result exactly the same as what you sent, set it as immutable.
My 2 cents for future visitors:
2 scenarios where immutable objects are good choices are:
In multi-threading
Concurrency issues in multi-threaded environment can very well be solved by synchronization but synchronization is costly affair (wouldn't dig here on "why"), so if you are using immutable objects then there is no synchronization to solve concurrency issue because state of immutable objects cannot be changed, and if state cannot be changed then all threads can seamless access the object. So, immutable objects makes a great choice for shared objects in multi-threaded environment.
As key for hash based collections
One of the most important thing to note when working with hash based collection is that key should be such that its hashCode() should always return the same value for the lifetime of the object, because if that value is changed then old entry made into the hash based collection using that object cannot be retrieved, hence it would cause memory leak. Since state of immutable objects cannot be changed so they makes a great choice as key in hash based collection. So, if you are using immutable object as key for hash based collection then you can be sure that there will not be any memory leak because of that (of course there can still be memory leak when the object used as key is not referenced from anywhere else, but that's not the point here).
I'm going to attack this from a different perspective. I find immutable objects make life easier for me when reading code.
If I have a mutable object I am never sure what its value is if it's ever used outside of my immediate scope. Let's say I create MyMutableObject in a method's local variables, fill it out with values, then pass it to five other methods. ANY ONE of those methods can change my object's state, so one of two things has to occur:
I have to keep track of the bodies of five additional methods while thinking about my code's logic.
I have to make five wasteful defensive copies of my object to ensure that the right values get passed to each method.
The first makes reasoning about my code difficult. The second makes my code suck in performance -- I'm basically mimicking an immutable object with copy-on-write semantics anyway, but doing it all the time whether or not the called methods actually modify my object's state.
If I instead use MyImmutableObject, I can be assured that what I set is what the values will be for the life of my method. There's no "spooky action at a distance" that will change it out from under me and there's no need for me to make defensive copies of my object before invoking the five other methods. If the other methods want to change things for their purposes they have to make the copy – but they only do this if they really have to make a copy (as opposed to my doing it before each and every external method call). I spare myself the mental resources of keeping track of methods which may not even be in my current source file, and I spare the system the overhead of endlessly making unnecessary defensive copies just in case.
(If I go outside of the Java world and into, say, the C++ world, among others, I can get even trickier. I can make the objects appear as if they're mutable, but behind the scenes make them transparently clone on any kind of state change—that's copy-on-write—with nobody being the wiser.)
Immutable objects are instances whose states do not change once initiated.
The use of such objects is requirement specific.
Immutable class is good for caching purpose and it is thread safe.
By the virtue of immutability you can be sure that the behavior/state of the underlying immutable object do not to change, with that you get added advantage of performing additional operations:
You can use multiple core/processing(concurrent/parallel processing) with ease(as the sequence of operations will no longer matter.)
Can do caching for expensive operations (as you are sure of the same
result).
Can do debugging with ease(as the history of run will not be a concern
anymore)
Using the final keyword doesn't necessarily make something immutable:
public class Scratchpad {
public static void main(String[] args) throws Exception {
SomeData sd = new SomeData("foo");
System.out.println(sd.data); //prints "foo"
voodoo(sd, "data", "bar");
System.out.println(sd.data); //prints "bar"
}
private static void voodoo(Object obj, String fieldName, Object value) throws Exception {
Field f = SomeData.class.getDeclaredField("data");
f.setAccessible(true);
Field modifiers = Field.class.getDeclaredField("modifiers");
modifiers.setAccessible(true);
modifiers.setInt(f, f.getModifiers() & ~Modifier.FINAL);
f.set(obj, "bar");
}
}
class SomeData {
final String data;
SomeData(String data) {
this.data = data;
}
}
Just an example to demonstrate that the "final" keyword is there to prevent programmer error, and not much more. Whereas reassigning a value lacking a final keyword can easily happen by accident, going to this length to change a value would have to be done intentionally. It's there for documentation and to prevent programmer error.
Immutable data structures can also help when coding recursive algorithms. For example, say that you're trying to solve a 3SAT problem. One way is to do the following:
Pick an unassigned variable.
Give it the value of TRUE. Simplify the instance by taking out clauses that are now satisfied, and recur to solve the simpler instance.
If the recursion on the TRUE case failed, then assign that variable FALSE instead. Simplify this new instance, and recur to solve it.
If you have a mutable structure to represent the problem, then when you simplify the instance in the TRUE branch, you'll either have to:
Keep track of all changes you make, and undo them all once you realize the problem can't be solved. This has large overhead because your recursion can go pretty deep, and it's tricky to code.
Make a copy of the instance, and then modify the copy. This will be slow because if your recursion is a few dozen levels deep, you'll have to make many many copies of the instance.
However if you code it in a clever way, you can have an immutable structure, where any operation returns an updated (but still immutable) version of the problem (similar to String.replace - it doesn't replace the string, just gives you a new one). The naive way to implement this is to have the "immutable" structure just copy and make a new one on any modification, reducing it to the 2nd solution when having a mutable one, with all that overhead, but you can do it in a more efficient way.
One of the reasons for the "need" for immutable classes is the combination of passing everything by reference and having no support for read-only views of an object (i.e. C++'s const).
Consider the simple case of a class having support for the observer pattern:
class Person {
public string getName() { ... }
public void registerForNameChange(NameChangedObserver o) { ... }
}
If string were not immutable, it would be impossible for the Person class to implement registerForNameChange() correctly, because someone could write the following, effectively modifying the person's name without triggering any notification.
void foo(Person p) {
p.getName().prepend("Mr. ");
}
In C++, getName() returning a const std::string& has the effect of returning by reference and preventing access to mutators, meaning immutable classes are not necessary in that context.
They also give us a guarantee. The guarantee of immutability means that we can expand on them and create new patters for efficiency that are otherwise not possible.
http://en.wikipedia.org/wiki/Singleton_pattern
One feature of immutable classes which hasn't yet been called out: storing a reference to a deeply-immutable class object is an efficient means of storing all of the state contained therein. Suppose I have a mutable object which uses a deeply-immutable object to hold 50K worth of state information. Suppose, further, that I wish to on 25 occasions make a "copy" of my original (mutable) object (e.g. for an "undo" buffer); the state could change between copy operations, but usually doesn't. Making a "copy" of the mutable object would simply require copying a reference to its immutable state, so 20 copies would simply amount to 20 references. By contrast, if the state were held in 50K worth of mutable objects, each of the 25 copy operations would have to produce its own copy of 50K worth of data; holding all 25 copies would require holding over a meg worth of mostly-duplicated data. Even though the first copy operation would produce a copy of the data that will never change, and the other 24 operations could in theory simply refer back to that, in most implementations there would be no way for the second object asking for a copy of the information to know that an immutable copy already exists(*).
(*) One pattern that can sometimes be useful is for mutable objects to have two fields to hold their state--one in mutable form and one in immutable form. Objects can be copied as mutable or immutable, and would begin life with one or the other reference set. As soon as the object wants to change its state, it copies the immutable reference to the mutable one (if it hasn't been done already) and invalidates the immutable one. When the object is copied as immutable, if its immutable reference isn't set, an immutable copy will be created and the immutable reference pointed to that. This approach will require a few more copy operations than would a "full-fledged copy on write" (e.g. asking to copy an object which has been mutated since the last copy would require a copy operation, even if the original object is never again mutated) but it avoids the threading complexities that FFCOW would entail.
Why Immutable class?
Once an object is instantiated it state cannot be changed in lifetime. Which also makes it thread safe.
Examples :
Obviously String, Integer and BigDecimal etc. Once these values are created cannot be changed in lifetime.
Use-case :
Once Database connection object is created with its configuration values you might not need to change its state where you can use an immutable class
from Effective Java;
An immutable class is simply a class whose instances cannot be modified. All of
the information contained in each instance is provided when it is created and is
fixed for the lifetime of the object. The Java platform libraries contain many
immutable classes, including String, the boxed primitive classes, and BigInte-
ger and BigDecimal. There are many good reasons for this: Immutable classes
are easier to design, implement and use than mutable classes. They are less prone
to error and are more secure.
An immutable class is good for caching purposes because you don't have to worry about the value changes. Another benefit of an immutable class is that it is inherently thread-safe, so you don't have to worry about thread safety in case of a multi-threaded environment.
Just a theoretical question that could lead to some considerations in terms of design. What if you were to replace POJOs with this reusable class ? It might avoid some boilerplate code but what issues could it bring about ?
// Does not include failsafes, guards, defensive copying, whatever...
class MySingleGetterAndSetterClass{
private HashMap<String,Object> myProperties;
public SingleGetterAndSetter( String name ){
myProperties = new HashMap<String,Object>();
myProperties.put( "name", name );
}
public Object get( string propertyName ){
return myProperties.get( propertyName );
}
public Object set( string propertyName, Object value ){
myProperties.put( propertyName, value );
}
}
The main disadvantages
much slower
uses more memory
less type safety
more error prone
more difficult to maintain
more code to write/read
more thread safety problems (more ways to break) and more difficult to make thread safe.
harder to debug, note the order of fields can be arranged pseudo randomly, different for different objects of the same "type" making them harder to read.
more difficult to refactor
little or not support in code analysis.
no support in code completion.
BTW Some dynamic languages do exactly what you suggest and they have all these issues.
That would lead to very unstable code. None of your getting/setting would be compile-time checked. Generally you want your code to fail-fast, and compile-time is the absolute fastest that can be done.
To make it even relatively safe you'd have to have null-checks/exception handling all over the place, and then how do you consistently handle the case where the value isn't found, all over your code? It would get very bloated very fast.
Not compile checking.
You have to downcasting, this is not good.
Difficult to mantain.
Against OOP,
Your pojos are classes represents an abstraction of something in real world.
If i understood well you want to put their properties inside a map, this is not a good design. Your are against using OOP. If you think in this way you can take all classes in a single big String and search them by position and this would be better than having only a dictionary with property as key.
I overheard two of my colleagues arguing about whether or not to create a new data model class which only contains one string field and a setter and a getter for it. A program will then create a few objects of the class and put them in an array list. The guy who is storing them argue that there should be a new type while the guy who is getting the data said there is not point going through all this trouble while you can simple store string.
Personally I prefer creating a new type so we know what's being stored in the array list, but I don't have strong arguments to persuade the 'getting' data guy. Do you?
Sarah
... a new data model class which only contains one string field and a setter and a getter for it.
If it was just a getter, then it is not possible to say in general whether a String or a custom class is better. It depends on things like:
consistency with the rest of your data model,
anticipating whether you might want to change the representation,
anticipating whether you might want to implement validation when creating an instance, add helper methods, etc,
implications for memory usage or persistence (if they are even relevant).
(Personally, I would be inclined to use a plain String by default, and only use a custom class if for example, I knew that it was likely that a future representation change / refinement would be needed. In most situations, it is not a huge problem to change a String into custom class later ... if the need arises.)
However, the fact that there is proposed to be a setter for the field changes things significantly. Instances of the class will be mutable, where instances of String are not. On the one hand this could possibly be useful; e.g. where you actually need mutability. On the other hand, mutability would make the class somewhat risky for use in certain contexts; e.g. in sets and as keys in maps. And in other contexts you may need to copy the instances. (This would be unnecessary for an immutable wrapper class or a bare String.)
(The simple answer is to get rid of the setter, unless you really need it.)
There is also the issue that the semantics of equals will be different for a String and a custom wrapper. You may therefore need to override equals and hashCode to get a more intuitive semantic in the custom wrapper case. (And that relates back to the issue of a setter, and use of the class in collections.)
Wrap it in a class, if it matches the rest of your data model's design.
It gives you a label for the string so that you can tell what it represents at run time.
It makes it easier to take your entity and add additional fields, and behavior. (Which can be a likely occurrence>)
That said, the key is if it matches the rest of your data model's design... be consistent with what you already have.
Counterpoint to mschaef's answer:
Keep it as a string, if it matches the rest of your data model's design. (See how the opening sounds so important, even if I temper it with a sentence that basically says we don't know the answer?)
If you need a label saying what it is, add a comment. Cost = one line, total. Heck, for that matter, you need a line (or three) to comment your new class, anyway, so what's the class declaration for?
If you need to add additional fields later, you can refactor it then. You can't design for everything, and if you tried, you'd end up with a horrible mess.
As Yegge says, "the worst thing that can happen to a code base is size". Add a class declaration, a getter, a setter, now call those from everywhere that touches it, and you've added size to your code without an actual (i.e., non-hypothetical) purpose.
I disagree with the other answers:
It depends whether there's any real possibility of adding behavior to the type later [Matthew Flaschen]
No, it doesn’t. …
Never hurts to future-proof the design [Alex]
True, but not relevant here …
Personally, I would be inclined to use a plain String by default [Stephen C]
But this isn’t a matter of opinion. It’s a matter of design decisions:
Is the entity you store logically a string, a piece of text? If yes, then store a string (ignoring the setter issue).
If not – then do not store a string. That data may be stored as a string is an implementation detail, it should not be reflected in your code.
For the second point it’s irrelevant whether you might want to add behaviour later on. All that matters is that in a strongly typed language, the data type should describe the logical entity. If you handle things that are not text (but may be represented by text, may contain text …) then use a class that internally stores said text. Do not store the text directly.
This is the whole point of abstraction and strong typing: let the types represent the semantics of your code.
And finally:
As Yegge says, "the worst thing that can happen to a code base is size". [Ken]
Well, this is so ironic. Have you read any of Steve Yegge’s blog posts? I haven’t, they’re just too damn long.
It depends whether there's any real possibility of adding behavior to the type later. Even if the getters and setters are trivial now, a type makes sense if there is a real chance they could do something later. Otherwise, clear variable names should be sufficient.
In the time spent discussing whether to wrap it in a class, it could be wrapped and done with. Never hurts to future-proof the design, especially when it only takes minimal effort.
I see no reason why the String should be wrapped in a class. The basic perception behind the discussion is, the need of time is a String object. If it gets augmented later, get it refactored then. Why add unnecessary code in the name of future proofing.
Wrapping it in a class provides you with more type safety - in your model you can then only use instances of the wrapper class, and you can't easily make a mistake where you put a string that contains something different into the model.
However, it does add overhead, extra complexity and verbosity to your code.
I'm looking at some Java code that are maintained by other parts of the company, incidentally some former C and C++ devs. One thing that is ubiquitous is the use of static integer constants, such as
class Engine {
private static int ENGINE_IDLE = 0;
private static int ENGINE_COLLECTING = 1;
...
}
Besides a lacking 'final' qualifier, I'm a bit bothered by this kind of code. What I would have liked to see, being trained primarily in Java from school, would be something more like
class Engine {
private enum State { Idle, Collecting };
...
}
However, the arguments fail me. Why, if at all, is the latter better than the former?
Why, if at all, is the latter better
than the former?
It is much better because it gives you type safety and is self-documenting. With integer constants, you have to look at the API doc to find out what values are valid, and nothing prevents you from using invalid values (or, perhaps worse, integer constants that are completely unrelated). With Enums, the method signature tells you directly what values are valid (IDE autocompletion will work) and it's impossible to use an invalid value.
The "integer constant enums" pattern is unfortunately very common, even in the Java Standard API (and widely copied from there) because Java did not have Enums prior to Java 5.
An excerpt from the official docs, http://java.sun.com/j2se/1.5.0/docs/guide/language/enums.html:
This pattern has many problems, such as:
Not typesafe - Since a season is just an int you can pass in any other int value where a season is required, or add two seasons together (which makes no sense).
No namespace - You must prefix constants of an int enum with a string (in this case SEASON_) to avoid collisions with other int enum types.
Brittleness - Because int enums are compile-time constants, they are compiled into clients that use them. If a new constant is added between two existing constants or the order is changed, clients must be recompiled. If they are not, they will still run, but their behavior will be undefined.
Printed values are uninformative - Because they are just ints, if you print one out all you get is a number, which tells you nothing about what it represents, or even what type it is.
And this just about covers it. A one word argument would be that enums are just more readable and informative.
One more thing is that enums, like classes. can have fields and methods. This gives you the option to encompass some additional information about each type of state in the enum itself.
Because enums provide type safety. In the first case, you can pass any integer and if you use enum you are restricted to Idle and Collecting.
FYI : http://www.javapractices.com/topic/TopicAction.do?Id=1.
By using an int to refer to a constant, you're not forcing someone to actually use that constant. So, for example, you might have a method which takes an engine state, to which someone might happy invoke with:
engine.updateState(1);
Using an enum forces the user to stick with the explanatory label, so it is more legible.
There is one situation when static constance is preferred (rather that the code is legacy with tonne of dependency) and that is when the member of that value are not/may later not be finite.
Imagine if you may later add new state like Collected. The only way to do it with enum is to edit the original code which can be problem if the modification is done when there are already a lot of code manipulating it. Other than this, I personally see no reason why enum is not used.
Just my thought.
Readabiliy - When you use enums and do State.Idle, the reader immediately knows that you are talking about an idle state. Compare this with 4 or 5.
Type Safety - When use enum, even by mistake the user cannot pass a wrong value, as compiler will force him to use one of the pre-declared values in the enum. In case of simple integers, he could even pass -3274.
Maintainability - If you wanted to add a new state Waiting, then it would be very easy to add new state by adding a constant Waiting in your enum State without casuing any confusion.
The reasons from the spec, which Lajcik quotes, are explained in more detail in Josh Bloch's Effective Java, Item 30. If you have access to that book, I'd recommend perusing it. Java Enums are full-fledged classes which is why you get compile-time type safety. You can also give them behavior, giving you better encapsulation.
The former is common in code that started pre-1.5. Actually, another common idiom was to define your constants in an interface, because they didn't have any code.
Enums also give you a great deal of flexibility. Since Enums are essentially classes, you can augment them with useful methods (such as providing an internationalized resource string corresponding to a certain value in the enumeration, converting back and forth between instances of the enum type and other representations that may be required, etc.)
Assuming you have a hypothetical enum in java like this (purely for demonstration purposes, this isn't code i'm seriously expecting to use):
enum Example{
FIRST,
SECOND,
THIRD,
...
LAST;
}
What's the maximum number of members you could have inside that enum before the compiler stops you?
Secondly, is there any performance difference at runtime when your code is referencing an enum with say, 10 members as opposed to 100 or 1,000 (other than just the obvious memory overhead required to store the large class)?
The language specification itself doesn't have a limit. Yet, there are many limitations that classfile has that bound the number of enums, with the upper bound being aruond 65,536 (2^16) enums:
Number of Fields
The JVMS 4.1 specifies that ClassFile may have up to 65,536 (2^16) fields. Enums get stored in the classfile as static field, so the maximum number of enum values and enum member fields is 65,536.
Constant Pool
The JVMS also specifies that the Constant Pool may have up to 65,536. Constant Pools store all String literals, type literals, supertype, super interfaces types, method signatures, method names, AND enum value names. So there must be fewer than 2^16 enum values, since the names strings need to share that Constant Pool limit.
Static Method Initialization
The maximum limit for a method is 65,535 bytes (in bytecode). So the static initializer for the Enum has to be smaller than 64Kb. While the compiler may split it into different methods (Look at Bug ID: 4262078) to distribute the initializations into small blocks, the compiler doesn't do that currently.
Long story short, there is no easy answer, and the answer depends not only on the number of enum values there are, but also the number of methods, interfaces, and fields the enums have!
The best way to find out the answer to this type of question is to try it. Start with a little Python script to generate the Java files:
n = input()
print "class A{public static void main(String[] a){}enum B{"
print ','.join("C%d" % x for x in range(n))
print '}}'
Now try with 1,10,100,1000... works fine, then BAM:
A.java:2: code too large
C0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15,C16,C17,C18,C19,C20,C21,C22,...
Seems like I hit some sort of internal limit. Not sure if it's a documented limit, if it's dependent on the specific version of my compiler, or if its some system dependant limit. But for me the limit was around 3000 and appears to be related to the source code size. Maybe you could write your own compiler to bypass this limit.
The maximum number of enum values will I think be just under the 65536 maximum number of fields/constant pool entries in the class. (As I mentioned in a comment above, the actual values shouldn't take up constant pool entries: they can be "inlined" into the bytecode, but the names will.)
As far as the second question is concerned, there's no direct performance difference, but it's conceivable that there'll be small indirect performance differences, partly because of the class file size as you say. Another thing to bear in mind is that when you use enum collections, there are optimised versions of some of the classes for when all of the enum values fit within a certain range (a byte, as I recall). So yes, there could be a small difference. I woudln't get paranoid, though.
This is an extension of the comments to the original question.
There are multiple problems with having a LOT of enums.
The main reason is that when you have a lot of data it tends to change, or if not you often want to add new items. There are exemptions to this like unit conversions that would never change, but for the most part you want to read data like this from a file into a collection of classes rather than an enum.
To add new items is problematic because since it's an enum, you need to physically modify your code unless you are ALWAYS using the enums as a collection, and if you are ALWAYS using them as a collection, why make them enums at all?
The case where your data doesn't change--like "conversion units" where you are converting feet, inches, etc. You COULD do this as enums and there WOULD be a lot of them, but by coding them as enums you lose the ability to have data drive your program. For instance, a user could select from a pull-down list populated by your "Units", but again, this is not an "ENUM" usage, it's using it as a collection.
The other problem will be repetition around the references to your enum. You will almost certainly have something very repetitive like:
if(userSelectedCard() == cards.HEARTS)
graphic=loadFile("Heart.jpg");
if(userSelectedCard() == cards.SPADES)
graphic=loadFile("Spade.jpg");
Which is just wrong (If you can squint to where you can't read the letters and see this kind of pattern in your code, you KNOW you are doing it wrong).
If the cards were stored in a card collection, it would be easier to just use:
graphic=cards.getGraphicFor(userSelectedCard());
I'm not saying that this can't be done with an enum as well, but I am saying that I can't see how you would use these as enums without having some nasty code-block like the one I posted above.
I'm also not saying that there aren't cases for enums--there are lots of them, but when you get more than a few (7 was a good number), you're probably better off with some other structure.
I guess the exception is when you are modeling real-world stuff that has that many types and each must be addressed with different code, but even then you are probably better off using a data file to bind a name to some code to run and storing them in a hash so you can invoke them with code like: hash.get(nameString).executeCode(). This way, again, your "nameString" is data and not hard-coded, allowing refactoring elsewhere.
If you get in the habit of brutally factoring your code like this, you can reduce many programs by 50% or more in size.
If you have to ask, you're probably doing something wrong. The actual limit is probably fairly high, but an enum with more than 10 or so values would be highly suspect, I think. Break that up into related collections, or a type hierarchy, or something.