Is there anything wrong with replacing class attributes with a HashMap? - java

Just a theoretical question that could lead to some considerations in terms of design. What if you were to replace POJOs with this reusable class ? It might avoid some boilerplate code but what issues could it bring about ?
// Does not include failsafes, guards, defensive copying, whatever...
class MySingleGetterAndSetterClass{
private HashMap<String,Object> myProperties;
public SingleGetterAndSetter( String name ){
myProperties = new HashMap<String,Object>();
myProperties.put( "name", name );
}
public Object get( string propertyName ){
return myProperties.get( propertyName );
}
public Object set( string propertyName, Object value ){
myProperties.put( propertyName, value );
}
}

The main disadvantages
much slower
uses more memory
less type safety
more error prone
more difficult to maintain
more code to write/read
more thread safety problems (more ways to break) and more difficult to make thread safe.
harder to debug, note the order of fields can be arranged pseudo randomly, different for different objects of the same "type" making them harder to read.
more difficult to refactor
little or not support in code analysis.
no support in code completion.
BTW Some dynamic languages do exactly what you suggest and they have all these issues.

That would lead to very unstable code. None of your getting/setting would be compile-time checked. Generally you want your code to fail-fast, and compile-time is the absolute fastest that can be done.
To make it even relatively safe you'd have to have null-checks/exception handling all over the place, and then how do you consistently handle the case where the value isn't found, all over your code? It would get very bloated very fast.

Not compile checking.
You have to downcasting, this is not good.
Difficult to mantain.
Against OOP,
Your pojos are classes represents an abstraction of something in real world.
If i understood well you want to put their properties inside a map, this is not a good design. Your are against using OOP. If you think in this way you can take all classes in a single big String and search them by position and this would be better than having only a dictionary with property as key.

Related

Giving a class member a reference to another classes members

On a scale of one to ten, how bad is the following from a perspective of safe programming practices? And if you find it worse than a five, what would you do instead?
My goal below is to get the data in the List of Maps in B into A. In this case, to me, it is ok if it is either a copy of the data or a reference to the original data. I found the approach below fastest, but I have a queasy feeling about it.
public class A {
private List<Map<String, String>> _list = null;
public A(B b) {
_list = b.getList();
}
}
public class B {
private List<Map<String, String>> _list = new ArrayList<Map<String, String>>();
public List<Map<String, String>> getList() {
// Put some data in _list just for the sake of this example...
_list.add(new HashMap<String, String>());
return _list;
}
}
The underlying problem is a bit more complex:
From a security perspective, this is very, very bad.
From a performance perspective, this is very, very good.
From a testing perspective, it's good because there is nothing in the class that you can't easily reach from a test
From an encapsulation perspective, it's bad since you expose the inner state of your class.
From a coding safety perspective, it's bad because someone will eventually abuse this for some "neat" trick that will cause odd errors elsewhere and you will waste a lot of time to debug this.
From an API perspective, it can be either: It's hard to imagine an API to be more simple but at the same time, it doesn't communicate your intent and things will break badly if you ever need to change the underlying data structure.
When designing software, you need to keep all of these points in the back of your mind. With time, you will get a feeling which kinds of errors you make and how to avoid them. Computers being as dump and slow as they are, there is never a perfect solution. You can just strive to make it as good as you can make it at the when you write it.
If you want to code defensively, you should always copy any data that you get or expose. Of course, if "data" is your whole data model, then you simply can't copy everything each time you call a method.
Solutions to this deadlock:
Use immutables as often as you can. Immutables and value objects are created and never change after that. These are always safe and the performance is OK unless the creation is very expensive. Lazy creation would help here but that is usually its own can of worms. Guava offers a comprehensive set of collections which can't be changed after creation.
Don't rely too much on Collections.unmodifiable* because the backing collection can still change.
Use copy-on-write data structures. The problem above would go away if the underlying list would clone itself as soon as A or B start to change it. That would give each its own copy effectively isolation them from each other. Unfortunately, Java doesn't have support for these built in.
In this case, to me, it is ok if it is either a copy of the data or a reference to the original data.
That is the sticking point.
Passing the object instance around is the fastest, but allows the caller to change it, and also makes later changes visible (there is no snapshot).
Usually, that is not a problem, since the caller is not malicious (but you may want to protect against coding errors).
If you do not want the caller to make changes, you could wrap it into an immutable wrapper.
If you need a snapshot, you can clone the list.
Either way, this will only snapshot/protect the list itself, not its individual elements. If those are mutable, the same reasoning applies again.
I would say that you will have too choose between efficiency and encapsulation. By directly accessing a member of the class it will have its state changed. That might be unexpected and lead to nasty surprises. I would also say that it increases the coupling between the two classes.
An alternative is to let the information expert principle decide and leave the job to the class that have the information. You will have to judge if the work that was suppose to be done with class A really is the responsibility of class B.
But really, speed and clean code can be conflicting interests. Some times you just have to play dirty to get it quick enough.
All you're creating is a reference to B._list. So 10 if you wanted to copy the items.
You could iterate over all b._list items and add them to the A._list manually:
public A(B b) {
_list = new List<Map<String, String>> ();
for (Map<String,String> map : b.getList()) {
Map<String,String> newMap = new HashMap<String,String>();
while(map.keySet().iterator().hasNext()) {
String key = map.keySet().iterator().next();
newMap.put(key,map.get(key));
}
_list.add(newMap);
}

Use a HashMap to store instance variables?

I would like to create a base class that all classes in my program will extend. One thing I wanted to do was find a uniform way to store all instance variables inside the object.
What I have come up with is to use a HashMap to store the key/value pairs for the object and then expose those values through a get and set method.
The code that I have for this so far is as follows:
package ocaff;
import java.util.HashMap;
public class OcaffObject {
private HashMap<String, Object> data;
public OcaffObject() {
this.data = new HashMap<String, Object>();
}
public Object get(String value) {
return this.data.get(value);
}
public void set(String key, Object value) {
this.data.put(key, value);
}
}
While functionally this works, I am curious if there are any real issues with this implementation or if there is a better way to do this?
In my day to day work I am a PHP programmer and my goal was to mimic functionality that I used in PHP in Java.
I don't think this is a good way to deal with what you mean.
Programming in java is quite different than programming in php, from my point of view.
You need to keep things clean and strongly typed, using the real paradigm of clean object oriented programming.
Some problems with this technique comes to my mind, here are some, not in importance order.
First problem you have with this is performance and memory footprint: this will consume a lot of memory and will perform very badly.
Second problem is concurrency, HashMap is not thread safe.
Third problem is type safety: you don't have type safety anymore, you can write to a field whatever you want and no one is checking it, a real anti-pattern.
Fourth problem is debugging... it will be hard to debug your code.
Fifth problem is: everyone can write and read any field knowing his name.
Sixth problem: when you change the name of a field in the hash set you don't get any kind of compile time error, you only get strange run-time behavior. Refactoring will become impossible.
Typed fields are much more useful and clean.
If you're taking the time to make a class for this, I would simply add what you need as members of the class. This will give you compile time checking of your class members, greatly reducing your subtle bugs.

Java = Return Object list/array vs. Result-Object (the same with method parameters)

This might seem to be a strange question: I am struggling to decide whether it is a good practice and "efficient" to work with "Typed Objects" on a very granular level.
public Object[] doSomething() {
Object[] resultList = new Object[] {new Foo(), new Bar()};
return resultList;
}
versus
public Result doSomething() {
Result result = new Result();
result.foo = new Foo();
result.bar = new Bar();
return result;
}
public class Result{
Foo foo;
Bar bar;
}
My question is concrete as follows:
In terms of CPU Cycles (as a relative figure), how much does the second approach consume more resources. (like 100% more)
The same question in regard to memory consumption
NB (these two are questions to understand it more, its not about premature optimization)
In terms of "good design practice". Do you think version 1 is an absolute No-Go or do you rather think it actually does not matter...Or would you propose never returning "object Arrays" (((in an object oriented programming language)))...
This is something, I am always wondering if I should create dedicated Objects for everything (for passing values) or I should rather use generic objects (and common method parameters...)
The question also applies to
public doSomething(Query query )
versus
public doSomething(Foo foo, Bar bar, Aaaa, a, Bbbbb)
thanks
Markus
3.) In terms of "good design pratice". Do you think version 1 is an absolute No-Go or do you rather think it actually does not matter...Or would you propose never returnung "object Arrays" (((in an object oriented programming langauge/regarding encapsulation ...)))...
Version 1 is absolutely a no-go. It's almost completely untyped. The caller has to know the actual types and where they are in the array, and cast appropriately. You lose any useful compile-time type checking, and the code itself is significantly less clear.
I would never return an Object[] unless the values it contained were constructed with new Object().
I don't believe that defining a Result class and returning that consumes any more resources at run time than constructing an Object[]. (Granted, there's a miniscule cost for storing and loading the class definition.) Do you have data that indicate otherwise?
Returning an untyped object array is poor practice for various reasons, among which are:
It's prone to error.
It's harder to maintain.
Casting back to the "real" type is not free, either.
Regarding your other query:
public doSomething(Query query)
versus
public doSomething(Foo foo, Bar bar)
This is less clear-cut. If packaging up a Foo and a Bar into a Query object makes sense in the problem domain, then I would definitely do it. If it's just a packaging up for the sake of minimizing the number of arguments (that is, there's no "query object" concept in your problem domain), then I would probably not do it. If it's a question of run-time performance, then the answer is (as always) to profile.
I'd have to do an experiment to really know, but I'd guess that the object array would not be significantly faster. It might even be slower. After all, in either case you have to create an object: either the array object or the Result object. With the Result object you have to read the class definition from disk the first time you use it, and the class definition has to float around in memory, so there'd be some extra cost there. But with the array object you have to do casts when you pull the data out, and the JVM has to do bounds checkings on the array (What happens if the caller tries to retrieve resultList[12]?), which also involves extra work. My guess is that if you do it only once or twice, the array would be faster (because of the class load time), but if you do it many times, the dedicated object would be faster (because of the cast and array access time). But I admit I'm just guessing.
In any case, even if the array does have a slight performance edge, the loss in readability and maintainability of the code almost surely isn't worth it.
The absolute worst thing that can happen is if values you're returning in the array are of the same class but have different semantic meanings. Like suppose you did this:
public Object[] getCustomerData(int customerid)
{
String customerName=... however you get it ...
BigDecimal currentDue=...
BigDecimal pastDue=...
return new Object[] {customerName, pastDue, currentDue};
}
... meanwhile, back at the ranch ...
Object[] customerData=getCustomerData(customerid);
BigDecimal pastDue=(BigDecimal)customerData[2];
if (pastDue>0)
sendNastyCollectionLetter();
Do you see the error? I retrieve entry #2 as pastDue when it's supposed to be #1. You could easily imagine this happenning if a programmer in a moment of thoughtlessness counted the fields starting from one instead of zero. Or in a long list if he miscounted and said #14 when it's really #15. As both have the same data type, this will compile and run just fine. But we'll be sending inappropriate collection letters to customers who are not over due. This would be very bad for customer relations.
Okay, maybe this is a bad example -- I just pulled it off the top of my head -- because we would be likely to catch that in testing. But what if the values we switched were rarely used, so that no one thought to include a test scenario for them. Or their effect was subtle, so that an error might slip through testing. For that matter, maybe you wouldn't catch this one in testing if you were rushing a change through, or if the tester slipped up, etc etc.

String vs. A new Data class

I overheard two of my colleagues arguing about whether or not to create a new data model class which only contains one string field and a setter and a getter for it. A program will then create a few objects of the class and put them in an array list. The guy who is storing them argue that there should be a new type while the guy who is getting the data said there is not point going through all this trouble while you can simple store string.
Personally I prefer creating a new type so we know what's being stored in the array list, but I don't have strong arguments to persuade the 'getting' data guy. Do you?
Sarah
... a new data model class which only contains one string field and a setter and a getter for it.
If it was just a getter, then it is not possible to say in general whether a String or a custom class is better. It depends on things like:
consistency with the rest of your data model,
anticipating whether you might want to change the representation,
anticipating whether you might want to implement validation when creating an instance, add helper methods, etc,
implications for memory usage or persistence (if they are even relevant).
(Personally, I would be inclined to use a plain String by default, and only use a custom class if for example, I knew that it was likely that a future representation change / refinement would be needed. In most situations, it is not a huge problem to change a String into custom class later ... if the need arises.)
However, the fact that there is proposed to be a setter for the field changes things significantly. Instances of the class will be mutable, where instances of String are not. On the one hand this could possibly be useful; e.g. where you actually need mutability. On the other hand, mutability would make the class somewhat risky for use in certain contexts; e.g. in sets and as keys in maps. And in other contexts you may need to copy the instances. (This would be unnecessary for an immutable wrapper class or a bare String.)
(The simple answer is to get rid of the setter, unless you really need it.)
There is also the issue that the semantics of equals will be different for a String and a custom wrapper. You may therefore need to override equals and hashCode to get a more intuitive semantic in the custom wrapper case. (And that relates back to the issue of a setter, and use of the class in collections.)
Wrap it in a class, if it matches the rest of your data model's design.
It gives you a label for the string so that you can tell what it represents at run time.
It makes it easier to take your entity and add additional fields, and behavior. (Which can be a likely occurrence>)
That said, the key is if it matches the rest of your data model's design... be consistent with what you already have.
Counterpoint to mschaef's answer:
Keep it as a string, if it matches the rest of your data model's design. (See how the opening sounds so important, even if I temper it with a sentence that basically says we don't know the answer?)
If you need a label saying what it is, add a comment. Cost = one line, total. Heck, for that matter, you need a line (or three) to comment your new class, anyway, so what's the class declaration for?
If you need to add additional fields later, you can refactor it then. You can't design for everything, and if you tried, you'd end up with a horrible mess.
As Yegge says, "the worst thing that can happen to a code base is size". Add a class declaration, a getter, a setter, now call those from everywhere that touches it, and you've added size to your code without an actual (i.e., non-hypothetical) purpose.
I disagree with the other answers:
It depends whether there's any real possibility of adding behavior to the type later [Matthew Flaschen]
No, it doesn’t. …
Never hurts to future-proof the design [Alex]
True, but not relevant here …
Personally, I would be inclined to use a plain String by default [Stephen C]
But this isn’t a matter of opinion. It’s a matter of design decisions:
Is the entity you store logically a string, a piece of text? If yes, then store a string (ignoring the setter issue).
If not – then do not store a string. That data may be stored as a string is an implementation detail, it should not be reflected in your code.
For the second point it’s irrelevant whether you might want to add behaviour later on. All that matters is that in a strongly typed language, the data type should describe the logical entity. If you handle things that are not text (but may be represented by text, may contain text …) then use a class that internally stores said text. Do not store the text directly.
This is the whole point of abstraction and strong typing: let the types represent the semantics of your code.
And finally:
As Yegge says, "the worst thing that can happen to a code base is size". [Ken]
Well, this is so ironic. Have you read any of Steve Yegge’s blog posts? I haven’t, they’re just too damn long.
It depends whether there's any real possibility of adding behavior to the type later. Even if the getters and setters are trivial now, a type makes sense if there is a real chance they could do something later. Otherwise, clear variable names should be sufficient.
In the time spent discussing whether to wrap it in a class, it could be wrapped and done with. Never hurts to future-proof the design, especially when it only takes minimal effort.
I see no reason why the String should be wrapped in a class. The basic perception behind the discussion is, the need of time is a String object. If it gets augmented later, get it refactored then. Why add unnecessary code in the name of future proofing.
Wrapping it in a class provides you with more type safety - in your model you can then only use instances of the wrapper class, and you can't easily make a mistake where you put a string that contains something different into the model.
However, it does add overhead, extra complexity and verbosity to your code.

Might EnumMap be considered a reasonable alternative to Java beans?

Curious if anybody has considered using EnumMap in place of Java beans, particularly "value objects" (with no behavior)? To me it seems that one advantage would be that the name of a "property" would be directly accessible from the backing Enum, with no need for reflection, and therefore I'd assume it would be faster.
It may be a little faster then using reflection (I didn't measure it, didn't find any metrics in Google either); however there are big disadvantages to this approach:
You're losing type safety. Instead of int getAge() and String getName() everything is Object get(MyEnum.FIELD_NAME). That'll provide for some ugly code and run-time errors right there.
All the javabean niceties we've come to love and enjoy (for example, property-level annotations) are gone.
Since you can have NO BEHAVIOR AT ALL, the applicability of this approach seems rather limited.
The bottom line is - if you really truly need that alleged :-) boost in performance (which you'll have to measure to prove it exists) this may be a viable approach under very specific circumstances. Is it a viable alternative to javabeans at large? Most certainly not.
A bean is meant to be mutable, hence the setter methods. EnumMap is comparable in speed to using a HashMap with integers as the Key, but are Keys are Immutable. Beans and EnumMaps serve two different purposes. If all of the Keys are known at design time and are guaranteed to never change, then using an EnumMap will be fine.
Updating a bean is much simpler than changing the backing Enum of the EnumMap with much less chance of creating errors downstream in the code.
I wrote a Record class that maps keys to values and works by delegating to a fully synchronized EnumMap. The idea is that a Record can get new fields at runtime whereas the Bean can't. My conclusion is that with this flexibility comes a performance hit. Here's a run comparing the Record class to a fully synchronized Bean. For 10 million operations:
Record set(Thing, a) 458 ms
Bean setThing(a) 278 ms
Record get(Thing) 398 ms
Bean getThing 248 ms
So, there is something to gain in knowing your data objects and writing a class that models them statically. If you want to have new fields padded on to your data at runtime, it will cost you.
I don't understand how you can remove 'class profileration' with EnumMaps. Unless you have a generic enum with 20-odd properties to reuse for every 'bean', you're still inventing an enum to use for each enum map, e.g.
public enum PersonDTOEnum {
A, S, L;
}
as opposed to
class Person {
int a;
int s;
String l;
// getters + setters elided
}
Not to mention that everything is a String now.
I had not previously specified this, but I am working with a ResultSet. Therefore I want to provide this answer for the sake of completeness.
Commons/BeanUtil's "RowSetDynaClass" could be the happy medium between the excessive boilerplate associated with concrete beans, and the limitations of EnumMap

Categories

Resources