Changing Implementation/Class at runtime

Changing Implementation/Class at runtime - java

I am looking for real world examples of (open source) programs (or algorithms) that change the concrete class of an object (or variable) at runtime.
An example of such behaviour in Java could look like the code snipper below.
Here, a LinkedList, which performs well in the context of frequent inserts and/or removes, is changed into an ArrayList, which performs well in the context of random access and iteration.
List myList = new LinkedList();
/* Lots of inserts */
...
myList = new ArrayList( myList ); // 'change' into different class
/* Lots of iteration */
...
The Java example above changes between LinkedList and ArrayList for the
sake of performance.
However, examples in any language, for any data structure, using any technique*, and for any reason are welcome.
*Technique: plain and simple like in the example above, or
using become: in SmallTalk,or __class__ in Python, or ...

You might want to check use cases for become method in Smalltalk. The method changes the class of the instance at runtime (or to change all references to the instance to reference different instance)
Become is commonly used to grow/shrink collections, e.g. Dictionary with more buckets, ByteArray with bigger buffer etc. It is possible to convert from SmallInteger to BigIntegers (former are limited in size, latter are not, but are much slower), and the programmer wouldn't even notice (this is reasonable only if you have mutable integers, therefore this is not how this is done in Smalltalk. But it could be :)
Another case might be when loading an instance from serialized form back into the running system, and updating its class to the newest version.

Yes, look at #become in Smalltalk (for instance MIT licensed Pharo.org).
Beside the examples already given #become is for instance usefull when
you work with proxies. Think of a proxy object within an ORM framework like
Glorp where you first have the proxy and when the real full object is needed
it can be loaded from a database and easily all references will be switched.
Another example is the Fuel framework in Pharo.

Don't know if this is relevant but maybe the usage of spy (partial mocks) also fits your description (see http://docs.mockito.googlecode.com/hg/1.9.5/org/mockito/Spy.html):
An example:
Person person = new Person();
person = spy(person);
doReturn("dominiek").when(person).getName();
Behind the scenes a subclass is created and the behavior of the class is altered according to the users' behavior declarations.

I've just run across an instance of this in the (Python) NLTK source. The LazyCorpusLoader (an object used to load a dataset from disk) "morphs" into the dataset itself. Here's the relevant section of the linked source code (creating a dataset object and then becoming it):
corpus = self.__reader_cls(root, *self.__args, **self.__kwargs)
# This is where the magic happens! Transform ourselves into
# the corpus by modifying our own __dict__ and __class__ to
# match that of the corpus.
args, kwargs = self.__args, self.__kwargs
name, reader_cls = self.__name, self.__reader_cls
self.__dict__ = corpus.__dict__
self.__class__ = corpus.__class__
Here's the rationale given (in the header of the same file) for this technique:
LazyCorpusLoader is a proxy object which is used to stand in for a
corpus object before the corpus is loaded. This allows NLTK to
create an object for each corpus, but defer the costs associated
with loading those corpora until the first time that they're
actually accessed.
So the purpose of changing the class at runtime in this case is to emulate lazy evaluation.
(Edit: Since I'm quoting verbatim from the NLTK source (Apache 2.0 license), here's the mandatory link to the license itself: http://www.apache.org/licenses/LICENSE-2.0)

Related

How to write new methods in predefined java classes

I want a swap method for array objects that could swap two numbers in the array
int[] a = new int[5];
a.swap(i, j);
which could swap elements at i and j
Also I don't want another method that takes array as an argument and does a swap like thus
swap(a,i,j) where a is array object; i and j are indexes

This is not possible in java. This concept (where code is grafted onto a type; the central notion being that the code being grafted on lives separate from the code that defines the original type) is called 'extension methods' or 'monkey patching'.
'Extension methods' tends to refer to the notion that at some level in your source code, such as per file, per directory, per package, or per project, you explicitly document a repository of extensions, and these then count for the entire file/directory/package/project). It's a compile-time concept.
'Monkey patching' refers to the notion that at runtime you dynamically extend a class or even a single object; this fits more in languages where 'everything is a dictionary' such as javascript or python.
Java is the kind of language where you'd have extension methods.
As of early november 2019 (SOURCE: I asked Brian Goetz at Devoxx 2019 about it), this is intentional and the current shepherds of the language have no intention of changing it; they believe it is better to be explicit about where source lives.
NB: You can use Project Lombok's #ExtensionMethod feature if you just must have them (of which I am a developer)

Where to patch back the information gathered during program analysis

I'm new to compiler design and have few years with java.
Using this and the paper
It's look like after Class hierarchy analysis and rapid type analysis will get information to do de-virtualisation. But where to patch back the information on source code or on Byte-code. And how to check the results?
Trying to understand how things really happens but stuck here.
For example : We have an example program taken from paper specified above.
public class MyProgram {
public static void main(String[] args) {
EUCitizen citizen = getCitizen();
citizen.hasRightToVote(); // Call site 1
Estonian estonian = getEstonian();
estonian.hasRightToVote(); // Call site 2
}
private static EUCitizen getCitizen() {
return new Estonian();
}
private static Estonian getEstonian() {
return new Estonian();
}
}
Using Class hieracrchy method we can conclude as none of the subclasses override hasRightToVote() , the dynamic method invocation can be replaced with a static procedure call to Estonian#hasRightToVote() . But where to replace this information and How? How to tell JVM (feed JVM) that information that we have gathered during analysis.
You can't change source code and put this there ? Could anyone provide me an example so i can start trying new ways to do analysis and still be able to patch that information.
Thanks.

Class Hierarchy Analysis is an optimization done by the virtual machine itself at runtime, you do not have to tell the VM anything. It simply does the analysis by itself based on the information available in the class files.

What generally happens is that analysis results are typically stored as some kind of association with a program representation, or are used immediately to effect the optimization so "nothing" needs to be stored.
You are right: there is generally no "good" way to annotate the source code with an analysis result (you can use Java annotations as a way). But the compiler has already read the source code and isn't going read it again.
In general, the program is parsed and variety of compiler-like structures are built (ASTs, symbol tables, control flow graphs, data flow arcs, ...) by the compiler pretty much before any serious analysis/optimization begins. A low level model of the program (data flow over the operators) is normally what gets analyzed, and the optimization analyzer will either decorate this structure with its opinions, or often just directly modify this structure to achieve the effect of the optimization.
With Java, there are two opportunities to do this: in JavaC, and in the JITter. My understanding (probably wrong, probably varies across JavaC implementations) is that not much optimization occurs in JavaC at all; it just generates naive JVM bytecode, and that all the real work is done in the JITter. The JITter doesn't have source code, but it can do all the same kinds of analysis (control flow, dataflow, ...) on the byte code that one can do on classic compiler structures, and thus achieve the same effect.

I had some doubts with the same and Rohan Padhey Cleared the ones.
In Java, I don't think there is a way to specify monomophrism of virtual method calls in byte-code. The de-virtualization analysis usually happens in the JIT compiler which compiles bytecode to native code and it does so using dynamic analysis.
Why Patching is a Problem :
In Java bytecode, the only method call instructions are: invokestatic, invokedynamic, invokevirtual, invokeinterface and invokespecial (the last is used for constructors, etc). The only type of call that does not refer to virtual method table lookups is the invokestatic call, since static methods cannot be overridden and used polymorphically on objects.
Hence, while there is no way to do a compile-time specification of the target method, you can replace virtual calls with static calls. How? consider an object "x" with a method "foo", and a call-site:
x.foo(arg1, arg2, ...)
If you know for sure that "x" is of the class "A", then you can transform this to:
A.static_foo(x, arg1, arg2, ...)
where "static_foo" is a newly created static method in class A whose body contains exactly everything that the body of "foo()" in "A" would have done, except that references to "this" inside the body should now be replaced by the first parameter, whatever you may call it.
That is exactly what the Whole-Jimple-Optimization-Pack (WJOP) in Soot does.
As regards static analysis using Soot, there is an optimization pack that does devirtualization using a work-around: https://github.com/Sable/soot/wiki/Whole-program-Devirtualization-Optimizations
But That's just a hack.
Why JIT Times Its Better :
JIT doing this better is due to the fact that static analysis has to be sound because you need to be sure when doing this transformation that 100% of the time the target of the virtual call will be one class. With JIT compilation, you can find more opportunities for optimization because even if the target is a single class 90% of the time, but not 10%, you can just-in-time compile the code to use the most-frequently taken route, and fall-back to using bytecode in the 10% of the cases where this prediction was wrong, because you can check this mistake dynamically. While the fall-back is expensive, the common-case of correct predictions 90% of the time leads to overall benefit. With static transformation, you have to make a decision of whether or not to optimize and it better be sound.

C++ STL datastructures compared to Java

I'm currently learning C++ and trying to get used to the standard data structures that come with it, but they all seem very bare. For example, list doesn't have simple accessors like get(index) that I'm used to in Java. Methods like pop_back and pop_front don't return the object in the list either. So you have to do something like:
Object blah = myList.back();
myList.pop_back();
Instead of something simple like:
Object blah = myList.pop_back();
In Java, just about every data structure returns the object back so you don't have to make these extra calls. Why is the STL containers for C++ designed like this? Are common operations like this that I do in Java not so common for C++?
edit: Sorry, I guess my question was worded very poorly to get all these downvotes, but surely somebody could have edited it. To clarify, I'm wondering why the STL data structures are created like this in comparison to Java. Or am I using the wrong set of data structures to begin with? My point is that these seem like common operations you might use on (in my example) a list and surely everybody does not want to write their own implementation each time.
edit: reworded the question to be more clear.

Quite a few have already answered the specific points you raised, so I'll try to take a look for a second at the larger picture.
One of the must fundamental differences between Java and C++ is that C++ works primarily with values, while Java works primarily with references.
For example, if I have something like:
class X {
// ...
};
// ...
X x;
In Java, x is only a reference to an object of type X. To have an actual object of type X for it to refer to, I normally have something like: X x = new X;. In C++, however, X x;, by itself, defines an object of type X, not just a reference to an object. We can use that object directly, not via a reference (i.e., a pointer in disguise).
Although this may initially seem like a fairly trivial difference, the effects are substantial and pervasive. One effect (probably the most important in this case) is that in Java, returning a value does not involve copying the object itself at all. It just involves copying a reference to the value. This is normally presumed to be extremely inexpensive and (probably more importantly) completely safe -- it can never throw an exception.
In C++, you're dealing directly with values instead. When you return an object, you're not just returning a reference to the existing object, you're returning that object's value, usually in the form of a copy of that object's state. Of course, it's also possible to return a reference (or pointer) if you want, but to make that happen, you have to make it explicit.
The standard containers are (if anything) even more heavily oriented toward working with values rather than references. When you add a value to a collection, what gets added is a copy of the value you passed, and when you get something back out, you get a copy of the value that was in the container itself.
Among other things, this means that while returning a value might be cheap and safe just like in Java, it can also be expensive and/or throw an exception. If the programmer wants to store pointers, s/he can certainly do so -- but the language doesn't require it like Java does. Since returning an object can be expensive and/or throw, the containers in the standard library are generally built around ensuring they can work reasonably well if copying is expensive, and (more importantly) work correctly, even when/if copying throws an exception.
This basic difference in design accounts not only for the differences you've pointed out, but quite a few more as well.

back() returns a reference to the final element of the vector, which makes it nearly free to call. pop_back() calls the destructor of the final element of the vector.
So clearly pop_back() cannot return a reference to an element that is destroyed. So for your syntax to work, pop_back() would have to return a copy of the element before it is destroyed.
Now, in the case where you do not want that copy, we just needlessly made a copy.
The goal of C++ standard containers is to give you nearly bare-metal performance wrapped up in nice, easy to use dressing. For the most part, they do NOT sacrifice performance for ease of use -- and a pop_back() that returned a copy of the last element would be sacrificing performance for ease of use.
There could be a pop-and-get-back method, but it would duplicate other functionality. And it would be less efficient in many cases than back-and-pop.
As a concrete example,
vector<foo> vec; // with some data in it
foo f = std::move( vec.back() ); // tells the compiler that the copy in vec is going away
vec.pop_back(); // removes the last element
note that the move had to be done before the element was destroyed to avoid creating an extra temporary copy... pop_back_and_get_value() would have to destroy the element before it returned, and the assignment would happen after it returned, which is wasteful.

A list doesn't have a get(index) method because accessing a linked list by index is very inefficient. The STL has a philosophy of only providing methods that can be implemented somewhat efficiently. If you want to access a list by index in spite of the inefficiency, it's easy to implement yourself.
The reason that pop_back doesn't return a copy is that the copy constructor of the return value will be called after the function returns (excluding RVO/NRVO). If this copy constructor throws an exception, you have removed the item from the list without properly returning a copy. This means that the method would not be exception-safe. By separating the two operations, the STL encourages programming in an exception-safe manner.

Why is the STL containers for C++ designed like this?
I think Bjarne Stroustrup put it best:
C++ is lean and mean. The underlying principle is that you don't pay
for what you don't use.
In the case of a pop() method that would return the item, consider that in order to both remove the item and return it, that item could not be returned by reference. The referent no longer exists because it was just pop()ed. It could be returned by pointer, but only if you make a new copy of the original, and that's wasteful. So it would most likely be returned by value which has the potential to make a deep copy. In many cases it won't make a deep copy (through copy elision), and in other cases that deep copy would be trivial. But in some cases, such as large buffers, that copy could be extremely expensive and in a few, such as resource locks, it might even be impossible.
C++ is intended to be general-purpose, and it is intended to be fast as possible. General-purpose doesn't necessarily mean "easy to use for simple use cases" but "an appropriate platform for the widest range of applications."

list doesn't even have simple accessors like get(index)
Why should it? A method that lets you access the n-th element from the list would hide the complexity of O(n) of the operation, and that's the reason C++ doesn't offer it. For the same reason, C++'s std::vector doesn't offer a pop_front() function, since that one would also be O(N) in the size of the vector.
Methods like pop_back and pop_front don't return the object in the list either.
The reason is exception safety. Also, since C++ has free functions, it's not hard to write such an extension to the operations of std::list or any standard container.
template<class Cont>
typename Cont::value_type return_pop_back(Cont& c){
typename Cont::value_type v = c.back();
c.pop_back();
return v;
}
It should be noted, though, that the above function is not exception-safe, meaning if the return v; throws, you'll have a changed container and a lost object.

Concerning pop()-like functions, there are two things (at least) to consider:
1) There is no clear and safe action for a returning pop_back() or pop_front() for cases when there is nothing there to return.
2) These functions would return by value. If there were an exception thrown in the copy constructor of the type stored in the container, the item would be removed from the container and lost. I guess this was deemed to be undesirable and unsafe.
Concerning access to list, it is a general design principle of the standard library not to avoid providing inefficient operations. std::list is a double-linked list, and accessing a list element by index means traversing the list from the beginning or end until you get to the desired position. If you want to do this, you can provide your own helper function. But if you need random access to elements, then you should probably use a structure other than a list.

In Java a pop of a general interface can return a reference to the object popped.
In C++ returning the corresponding thing is to return by value.
But in the case of non-movable non-POD objects the copy construction might throw an exception. Then, an element would have been removed and yet not have been made accessible to the client code. A convenience return-by-value popper can always be defined in terms of more basic inspector and pure popper, but not vice versa.
This is also a difference in philosophy.
With C++ the standard library only provides basic building blocks, not directly usable functionality (in general). The idea is that you're free to choose from thousands of third party libraries, but that freedom of choice comes at a great cost, in usability, portability, training, etc. In contrast, with Java you have mostly all you need (for typical Java programming) in the standard library, but you're not effectively free to choose (which is another kind of cost).

Refactoring large data object

What are some common strategies for refactoring large "state-only" objects?
I am working on a specific soft-real-time decision support system which does online modeling/simulation of the national airspace. This piece of software consumes a number of live data feeds, and produces a once-per-minute estimate of the "state" of a large number of entities in the airspace. The problem breaks down neatly until we hit what is currently the lowest-level entity.
Our mathematical model estimates/predicts upwards of 50 parameters for a timeline of several hours into the past and future for each of these entities, roughly once per minute. Currently, these records are encoded as a single Java class with a lot of fields (some get collapsed into an ArrayList). Our model is evolving, and the dependencies among the fields are not yet set in stone, so each instance wanders through a convoluted model, accumulating settings as it goes along.
Currently we have something like the following, which uses a builder pattern approach to build up the contents of the record, and enforce what the known dependencies are (as a check against programmer error as evolve the mode.) Once the estimate is done, we convert the below into an immutable form using a .build() type method.
final class OneMinuteEstimate {
enum EstimateState { INFANT, HEADER, INDEPENDENT, ... };
EstimateState state = EstimateState.INFANT;
// "header" stuff
DateTime estimatedAtTime = null;
DateTime stamp = null;
EntityId id = null;
// independent fields
int status1 = -1;
...
// dependent/complex fields...
... goes on for 40+ more fields...
void setHeaderFields(...)
{
if (!EstimateState.INFANT.equals(state)) {
throw new IllegalStateException("Must be in INFANT state to set header");
}
...
}
}
Once a very large number of these estimates are complete, they are assembled into timelines where aggregate patterns/trends are analyzed. We have looked at using an embedded database but have struggled with performance issues; we'd rather get this sorted out in terms of data modeling and then incrementally move portions of the soft-real-time code into an embedded data store.
Once the "time sensitive" pieces of this are done, the products are flushed to flat files and a database.
Problems:
It's a giant class, with way too many fields.
There is very little behavior encoded in the class; it's mostly a holder for data fields.
Maintaining the build() method is extremely cumbersome.
It feels clumsy to manually maintain a "state machine" abstraction merely for the purpose of ensuring that a large number of dependent modeling components are properly populating a data object, but it has saved us a lot of frustration as the model evolves.
There is a lot of duplication, particularly when the records described above are aggregated into very similar "rollups" which amount to rolling sums/averages or other statistical products of the above structure in time series.
While some of the fields could be clumped together, they are all logically "peers" of one another, and any breakdown we've tried has resulted in having behavior/logic artificially split and needing to reach two levels deep in indirection.
Out of the box ideas entertained, but this is something we need to evolve incrementally. Before anyone else says it, I'll note that one could suggest that our mathematical model is insufficiently crisp if the data representation for that model is this hard to get ahold of. Fair point, and we're working that, but I think that's a side-effect of an R&D environment with a lot of contributors, and a lot of concurrent hypotheses in play.
(Not that it matters, but this is implemented in Java. We use HSQLDB or Postgres for output products. We don't use any persistence framework, partly out of a lack of familiarity, partly because we have enough performance trouble with just the database alone and hand-coded storage routines... we're skeptical of moving towards additional abstraction.)

I had much of the same problem you did.
At least I think I did, sounds like I did. Representation was different, but at 10,000 feet, sounds pretty much the same. Crapload of discrete, "arbitrary" variables and a bunch of ad hoc relationships among them (essentially business driven), subject to change at a moment's notice.
You also have another issue, which you sorta mentioned, and that was the performance requirement. Sounds like faster is better, and likely a slow perfect solution would be tossed out for the fast lousy one, simply because the slower one can't meet a baseline performance requirement, no matter how good it is.
To put it simply, what I did was I designed a simple domain specific rule language for my system.
The entire point of the DSL was to implicitly express relationships and package them up in to modules.
Very crude, contrived example:
D = 7
C = A + B
B = A / 5
A = 10
RULE 1: IF (C < 10) ALERT "C is less than 10"
RULE 2: IF (C > 5) ALERT "C is greater than 5"
RULE 3: IF (D > 10) ALERT "D is greater than 10"
MODULE 1: RULE 1
MODULE 2: RULE 3
MODULE 3: RULE 1, RULE 2
First, this is not representative of my syntax.
But you can see from the Modules, that it is 3, simple rules.
The key though, is that it's obvious from this that Rule 1 depends on C, which depends on A and B, and B depends on A. Those relationships are implied.
So, for that module, all of those dependencies "come with it". You can see if I generated code for Module 1 it might look something like:
public void module_1() {
int a = 10;
int b = a / 5;
int c = a + b;
if (c < 10) {
alert("C is less than 10");
}
}
Whereas if I created Module 2, all I would get is:
public void module_2() {
int d = 7;
if (d > 10) {
alert("D is greater than 10.");
}
}
In Module 3 you see the "free" reuse:
public void module_3() {
int a = 10;
int b = a / 5;
int c = a + b;
if (c < 10) {
alert("C is less than 10");
}
if (c > 5) {
alert("C is greater than 5");
}
}
So, even though I have one "soup" of rules, the Modules root the base of the dependencies, and thus filter out the stuff it doesn't care about. Grab a module, shake the tree and keep what's left hanging.
My system used the DSL to generate source code, but you can easily have it create a mini runtime interpreter as well.
Simple topological sorting handled the dependency graph for me.
So, the nice thing about this is that while there was inevitable duplication in the final, generated logic, at least across modules, there wasn't any duplication in the rule base. What you as a developer/knowledge worker maintain is the rule base.
What is also nice is that you can change an equation, and not worry so much about the side effects. For example, if I change do C = A / 2, then, suddenly, B drops out completely. But the rule for IF (C < 10) doesn't change at all.
With a few simple tools, you can show the entire dependency graph, you can find orphaned variables (like B), etc.
By generating source code, it's going to run as fast as you want.
In my case, it was interesting to see a rule drop a single variable and see 500 lines of source code vanish from the resulting module. That's 500 lines I didn't have to crawl through by hand and remove during maintenance and development. All I had to do was change a single rule in my rule base and let "magic" happen.
I was even able to do some simple peephole optimization and eliminate variables.
It's not that hard to do. Your rule language can be XML, or a simple expression parser. No reason to go full boat Yacc or ANTLR on it if you don't want to. I'll put a plug in for S-Expressions, no grammar needed, brain dead parsing.
Spreadsheets also make a great input tool, actually. Just be strict on the formatting. Kind of sucks for merging in SVN (so, Don't Do That), but end users love it.
You may well be able to get away with an actual rule based system. My system wasn't dynamic at runtime, and didn't really need sophisticated goal seeking and inference, so I didn't need the overhead of such a system. But if one works for you out of the box, then happy day.
Oh, and for an implementation note, for those who don't believe you can hit the 64K code limit in a Java method, well I can assure you it can be done :).

Splitting a Large Data Object is very similar to Normalizing a Large Relational Table (first and second normal form). Follow the rules to reach at least second normal form and you may have a good decomposition of the original class.

From experience working also with R&D stuff with soft real-time performance constrains (and sometimes monster fat classes), I would suggest NOT to use OR mappers. In such situations, you'll be better off dealing "touching the metal" and working directly with JDBC result sets. This is my suggestion for apps with soft real-time constrains and massive amounts of data items per package. More importantly, if the number of distinct classes (not class instances, but class definitions) that need to persisted is large, and you also have memory constrains in your specs, you will also want to avoid ORMs like Hibernate.
Going back to your original question:
What you seem to have is a typical problem of 1) mapping multiple data items into a OO model and 2) such multiple data items do not exhibit a good way of grouping or segregation (and any attempt to grouping tends simply not to feel right.) Sometimes the domain model does not lend itself for such aggregation, and coming up with an artificial way of doing so typically ends up in compromises that don't satisfy all design requirements and desires.
To make matters worse, a OO model typically requires/expects you to have all the items present in a class as class' fields. Such a class is typically without behavior, so it is just a struct-like construct, aka data envelope or data shuttle.
But such situations beg the following questions:
Does your application need to read/write all 40, 50+ data items at once, always?
*Must all data items be always present?*
I do not know the specifics of your problem domain, but in general I've found that we rarely ever need to deal with all data items at once. This is where a relational model shines because you don't have to query all rows from a table at once. You only pulls those you need as projections of the table/view in question.
In a situation where we have a potentially large number of data items, but on average the number of data items being passed down the wire is less than the maximum, you'd be better off using a Properties pattern.
Instead of defining a monster envelope class holding all items :
// java pseudocode
class envelope
{
field1, field2, field3... field_n;
...
setFields(m1,m2,m3,...m_n){field1=m1; .... };
...
}
Define a dictionary (based on a map for example):
// java pseudocode
public enum EnvelopeField {field1, field2, field3,... field_n);
interface Envelope //package visible
{
// typical map-based read fields.
Object get(EnvelopeField field);
boolean isEmpty();
// new methods similar to existing ones in java.lang.Map, but
// more semantically aligned with envelopes and fields.
Iterator<EnvelopeField> fields();
boolean hasField(EnvelopeField field);
}
// a "marker" interface
// code that only needs to read envelopes must operate on
// these interfaces.
public interface ReadOnlyEnvelope extends Envelope {}
// the read-write version of envelope, notice that
// it inherits from Envelope, but not from ReadOnlyEnvelope.
// this is done to make it difficult (but not impossible
// unfortunately) to "cast-up" a read only envelope into a
// mutable one.
public interface MutableEnvelope extends Envelope
{
Object put(EnvelopeField field);
// to "cast-down" or "narrow" into a read only version type that
// cannot directly be "cast-up" back into a mutable.
ReadOnlyEnvelope readOnly();
}
// the standard interface for map-based envelopes.
public interface MapBasedEnvelope extends
Map<EnvelopeField,java.lang.Object>
MutableEnvelope
{
}
// package visible, not public
class EnvelopeImpl extends HashMap<EnvelopeField,java.lang.Object>
implements MapBasedEnvelope, ReadOnlyEnvelope
{
// get, put, isEmpty are automatically inherited from HashMap
...
public Iterator<EnvelopeField> fields(){ return this.keySet().iterator(); }
public boolean hasField(EnvelopeField field){ return this.containsKey(field); }
// the typecast is redundant, but it makes the intention obvious in code.
public ReadOnlyEnvelope readOnly(){ return (ReadOnlyEnvelope)this; }
}
public class final EnvelopeFactory
{
static public MapBasedEnvelope new(){ return new EnvelopeImpl(); }
}
No need to set up read-only internal flags. All you need to do is downcast your envelope instances as Envelope instances (that only provide getters).
Code that expects to read should operate on read-only envelopes and code that expects to change fields should operate on mutable envelopes. Creation of the actual instances would be compartmentalized in factories.
That is, you use the compiler to enforce things to be read-only (or allow things to be mutable) by establishing some code conventions, rules governing what interfaces to use where and how.
You can layer your code into sections that need to write separate from code that only needs to read. Once that's done, simple code reviews (or even grep) can identify code that is using the wrong interface.)
Problems:
Non-public Parent Interface:
Envelope is not declared as a public interface to prevent erroneous/malicious code from casting a read-only envelope down to a base envelope and then back to a mutable envelope. The intended flow is from mutable to read-only only - it is not intended to be bi-directional.
The problem here is that extension of Envelope is restricted to the package that contains it. Whether that is a problem will depend on the particular domain and intended usage.
Factories:
The problem is that factories can (and most likely will) be very complex. Again, the nature of the beast.
Validation:
Another problem introduced with this approach is that now you have to worry about code that expects field X to be present. Having the original monster envelope class partially frees you from that worry because, at least syntactically, all fields are there...
... whether the fields are set or not, that was another matter that still remains with this new model I'm proposing.
So if you have client code that expects to see field X, the client code has to throw some type of exception if the field is not present (or to computer or read a sensible default somehow.) In such cases, you will have to
Identify patterns of field presence. Clients that expect field X to be present might be grouped separately (layered apart) from clients that expect some other field to be present.
Associate custom validators (proxies to read-only envelope interfaces) that either throw exceptions or compute default values for missing fields according to some rules (rules provided programmatically, with an interpreter, or with a rules engine.)
Lack of Typing:
This might be debatable, but people used to work with static typing might feel uneasy with losing the benefits of static typing by going to a loosely typied map-based approach. The counter-argument of this is that most of the web works on a loose typing approach, even on the Java side (JSTL, EL.)
Problems aside, the larger the maximum number of possible fields and the lower the average number of fields present at any given time, the most effective wrt performance this approach will be. It adds additional code complexity, but that's the nature of the beast.
That complexity doesn't go away, and either will be present in your class model or in your validation code. Serialization and transferring down the wire is much more efficient, though, specially if you expect massive numbers of individual data transfers.
Hope it helps.

Actually this looks like a frequent problem that game developers face, bloated classes holding numerous variables and methods because of a deep inheritance tree etc.
There's this blog post about how and why to select composition over inheritance, maybe it would help.

One way you may be able to intelligently break up a large data class is to look at patterns of access by client classes. For example, if a set of classes only accesses fields 1-20 and another set of classes only accesses fields 25-30, maybe those groups of fields belong in separate classes.

Can I add and remove elements of enumeration at runtime in Java

It is possible to add and remove elements from an enum in Java at runtime?
For example, could I read in the labels and constructor arguments of an enum from a file?
#saua, it's just a question of whether it can be done out of interest really. I was hoping there'd be some neat way of altering the running bytecode, maybe using BCEL or something. I've also followed up with this question because I realised I wasn't totally sure when an enum should be used.
I'm pretty convinced that the right answer would be to use a collection that ensured uniqueness instead of an enum if I want to be able to alter the contents safely at runtime.

No, enums are supposed to be a complete static enumeration.
At compile time, you might want to generate your enum .java file from another source file of some sort. You could even create a .class file like this.
In some cases you might want a set of standard values but allow extension. The usual way to do this is have an interface for the interface and an enum that implements that interface for the standard values. Of course, you lose the ability to switch when you only have a reference to the interface.

Behind the curtain, enums are POJOs with a private constructor and a bunch of public static final values of the enum's type (see here for an example). In fact, up until Java5, it was considered best-practice to build your own enumeration this way, and Java5 introduced the enum keyword as a shorthand. See the source for Enum<T> to learn more.
So it should be no problem to write your own 'TypeSafeEnum' with a public static final array of constants, that are read by the constructor or passed to it.
Also, do yourself a favor and override equals, hashCode and toString, and if possible create a values method
The question is how to use such a dynamic enumeration... you can't read the value "PI=3.14" from a file to create enum MathConstants and then go ahead and use MathConstants.PI wherever you want...

I needed to do something like this (for unit testing purposes), and I came across this - the EnumBuster:
http://www.javaspecialists.eu/archive/Issue161.html
It allows enum values to be added, removed and restored.
Edit: I've only just started using this, and found that there's some slight changes needed for java 1.5, which I'm currently stuck with:
Add array copyOf static helper methods (e.g. take these 1.6 versions: http://www.docjar.com/html/api/java/util/Arrays.java.html)
Change EnumBuster.undoStack to a Stack<Memento>
In undo(), change undoStack.poll() to undoStack.isEmpty() ? null : undoStack.pop();
The string VALUES_FIELD needs to be "ENUM$VALUES" for the java 1.5 enums I've tried so far

I faced this problem on the formative project of my young career.
The approach I took was to save the values and the names of the enumeration externally, and the end goal was to be able to write code that looked as close to a language enum as possible.
I wanted my solution to look like this:
enum HatType
{
BASEBALL,
BRIMLESS,
INDIANA_JONES
}
HatType mine = HatType.BASEBALL;
// prints "BASEBALL"
System.out.println(mine.toString());
// prints true
System.out.println(mine.equals(HatType.BASEBALL));
And I ended up with something like this:
// in a file somewhere:
// 1 --> BASEBALL
// 2 --> BRIMLESS
// 3 --> INDIANA_JONES
HatDynamicEnum hats = HatEnumRepository.retrieve();
HatEnumValue mine = hats.valueOf("BASEBALL");
// prints "BASEBALL"
System.out.println(mine.toString());
// prints true
System.out.println(mine.equals(hats.valueOf("BASEBALL"));
Since my requirements were that it had to be possible to add members to the enum at run-time, I also implemented that functionality:
hats.addEnum("BATTING_PRACTICE");
HatEnumRepository.storeEnum(hats);
hats = HatEnumRepository.retrieve();
HatEnumValue justArrived = hats.valueOf("BATTING_PRACTICE");
// file now reads:
// 1 --> BASEBALL
// 2 --> BRIMLESS
// 3 --> INDIANA_JONES
// 4 --> BATTING_PRACTICE
I dubbed it the Dynamic Enumeration "pattern", and you read about the original design and its revised edition.
The difference between the two is that the revised edition was designed after I really started to grok OO and DDD. The first one I designed when I was still writing nominally procedural DDD, under time pressure no less.

You can load a Java class from source at runtime. (Using JCI, BeanShell or JavaCompiler)
This would allow you to change the Enum values as you wish.
Note: this wouldn't change any classes which referred to these enums so this might not be very useful in reality.

A working example in widespread use is in modded Minecraft. See EnumHelper.addEnum() methods on Github
However, note that in rare situations practical experience has shown that adding Enum members can lead to some issues with the JVM optimiser. The exact issues may vary with different JVMs. But broadly it seems the optimiser may assume that some internal fields of an Enum, specifically the size of the Enum's .values() array, will not change. See issue discussion. The recommended solution there is not to make .values() a hotspot for the optimiser. So if adding to an Enum's members at runtime, it should be done once and once only when the application is initialised, and then the result of .values() should be cached to avoid making it a hotspot.
The way the optimiser works and the way it detects hotspots is obscure and may vary between different JVMs and different builds of the JVM. If you don't want to take the risk of this type of issue in production code, then don't change Enums at runtime.

You could try to assign properties to the ENUM you're trying to create and statically contruct it by using a loaded properties file. Big hack, but it works :)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.