I'm having a data which is a many-to-one relation.
// My class with lot of data
class MyClass{}
Map<Integer, MyClass> idToClass = ...;
MyClass c = new MyClass();
idToClass.put(1, c);
idToClass.put(2, c);
The map is serialized and sent over the network. I'm looking for either work-around or implementation of map which considers duplicate values.
It's not only efficiency question, putting the same value in a map has a semantic meaning (it's necessary for my program to produce correct results).
Edit:
I'm considering keeping all distinct values in an array and the mapping in a map Map<Integer,Integer>. I would then make two lookups, first for index and the other in array for a value.
Two important things to consider:
a) if you are just concerned with serializing "too much" because of duplicated hash values ... you could prevent that by simply creating your a custom serialization code. Meaning: instead of directly serializing this map, you would instead serialize: all the values (once), and then the "actual" mapping in a more efficient way. Doing that allows your other code to go unchanged.
b) step back from your current design/implementation. Meaning: maybe it would make sense to abstract from using this simple map; and making this information about relationship-via-multiple-map-values more obvious internally. You know, if you are not dealing with a single map, but with a class that represents all that knowledge (similar in the way that I suggested above for "a)".
Obviously "a)" might be the "more local" change, but from 10000 miles away ... I am tempted to suggest you focus on "b)".
Related
Is nesting collections in Java something that I should be doing?
I'm currently working on a project where I want to have a bunch of hashmaps that would contain a String key and an arrayList value. That way when I create and add an object of another class to the collection, it would be able to use some piece of information that if it matched up with one of the keys of one of the hashmaps it would then be deposited in the associated arrayList value. That way the list can later on be accessed through the correct key for a specific hashmap.
Is this a good idea? Or is it too convoluted and if so is there a better way to do this?
There are times to nest, for sure. But in the humble opinion of this seasoned dev, you shouldn't do it unless you have a good reason. All too often you would be much better off with some class that represents the inner collection.
So if you find yourself with a Map<String,List<Foo>> ask yourself what that List<Foo really represents. If it's Map<String,List<Student>> then maybe you need Map<String, Roster> or Map<String, Team>. I find this yields faster time to market and fewer bugs. The fact you're asking the question means you think there's a chance that might be true too.
Say, for example, I want to make a cash register program. Ignoring, for the sake of being compact, that one wouldn't use floats for currency my first instinct is to use an enum for the denominations, something along the lines of :
private enum Currency {
ONE_HUNDRED(100.00f),
FIFTY( 50.00f),
TWENTY( 20.00f),
TEN( 10.00f),
FIVE( 5.00f),
TWO( 2.00f),
ONE( 1.00f),
HALF_DOLLAR( 0.50f),
QUARTER( 0.25f),
DIME( 0.10f),
NICKEL( 0.05f),
PENNY( 0.01f);
private final float value;
Currency(float value) {
this.value = value;
}
public float getValue() {
return this.value;
}
#Override
public String toString() {
return this.name().replace("_", " ");
}
}
But last I followed instinct, sans forethought, and did something similar for a Morse Code Converter, someone suggested that I use a map instead, explicitly a Bimap. I see the appeal of that collection in that particular scenario, but generally speaking I wanted to inquire if there were any reason to prefer one when the other could be used? If instead of the above code I did this:
Map<String, Float> currency = new LinkedHashMap<>();
currency.put("One Hundred", 100.00f);
currency.put("Fifty", 50.00f);
currency.put("Twenty", 20.00f);
currency.put("Ten", 10.00f);
currency.put("Five", 5.00f);
currency.put("Two", 2.00f);
currency.put("One", 1.00f);
currency.put("Half Dollar", 0.50f);
currency.put("Quarter", 0.25f);
currency.put("Dime", 0.10f);
currency.put("Nickel", 0.05f);
currency.put("Penny", 0.01f);
Would it be superior for any reason?
In cases like these were either could be utilized, are there any performance advantages to using one over another? Is one more preferable/conventional? More maintainable/adaptable?
Is there any rule of thumb I could use for when I should use one over the other?
Here are things I like to keep in mind:
Enums are best used (and in the languages I know of, may only be used) to define a known set of items ahead of time. This has a nice benefit of treating what really boils down to frequently used "data" as code in a very readable way.
In my opinion, any code that relies on frequently hardcoded strings, like you would need to use if implementing data like that in a map is more difficult to read and maintain. This leads to "magic strings", which is a no-no when avoidable.
It's not immediately clear what should exist in the map until you go check, and it's not clear if it's potentially being modified elsewhere. Consider, that if you got an enum value wrong, the code will not even compile. Get a string key wrong, and you might not notice until much later.
Regarding performance, I doubt there is a large difference between the two. Enums are treated largely the same as objects, I suppose the benefit comes from accessing the data as a field on the object rather than a hash lookup.
This article doesn't go in depth as I would like, but may be a good starting point: Memory Consumption of Java Data Types
It is quite common practice to use an enum as keys for a known map and that offers another way of associating data with a set of specific items (rather than setting them as fields on the enum). I believe this approach would be my preferred method since setting lots of fields on an enum makes them feel too much like a class rather than a method of referencing. This doesn't have the same problems as a normal map because since the keys must be enums you don't need to worry about any other keys "accidentally" being added to the map. It seems Java as a whole supports this approach as they provide the EnumMap class.
I would say that the main difference between your two pieces of code is that in case of enum you have fixed list of denominations which are "type-safe". While operating with strings and maps it is very easy to misspell some string, introducing bugs that are hard to spot.
I would use enum in this case it is more sensible and if this were something that were to be used by other people enum's have the associated values display for you if you are using pretty much any ide, where as if you are using a map neither the key or the value is readily available to you. There are other reasons but that was one that came to mind.
Would it be superior for any reason?
The Map design would be appropriate for dynamic data, whereas the enum design would be appropriate for fixed data.
In cases like these were either could be utilized, are there any
performance advantages to using one over another?
Insignificant.
Is one more preferable/conventional?
Only when considering the specific problem to be solved.
More maintainable/adaptable?
Again, it depends on the problem you're trying to solve.
Is there any rule of thumb I could use for when I should use one over
the other?
Whether you're working with a limited, non-varying dataset known at compile time.
My coworker tells me that it's lazy to use Maps, and that oftentimes the programmer's purpose would be better served by an actual object. But I don't know the best way to do so. This is further complicated (to me) by the fact that the key is an Enum type.
Say I have a Hashmap<MyEnum, MyObj> which is expected to have four hashmap keys (one for each value in MyEnum). The MyObj hashmap value is the latest of several MyObjs in a database which have the given enum value.
My best guess involves an object with four fields, or maybe two arrays containing the keys and values in order.
Not sure if this is clear or not (It's 5PM on Thursday = I'm brain-dead), so please ask for clarification if necessary.
While there's nothing wrong with using Maps for their intended purpose, Maps are sometimes misused as substitutes for strongly-typed objects.
String firstname = (String)myMap.get("first_name");
... as opposed to:
String firstName = person.getFirstName();
Since Java implements enums as classes, you might want to consider putting the value you're looking for onto your enum class directly:
MyEnum val = getVal();
MyObj obj = val.getMostRecentMyObj();
But I'd pay attention to separation of concerns to determine whether this really makes sense. It could well be that a Map is the appropriate tool for this job.
Lets say we have a bunch of data (temp,wind,pressure) that ultimately comes in as a number of float arrays.
For example:
float[] temp = //get after performing some processing (takes time)
float[] wind =
Say we want to store these values in memory for different hours of the day. Is it better to put these on a HashMap like:
HashMap maphr1 = new HashMap();
maphr1.put("temp",temp);
maphr1.put("wind",wind);
...
Or is it better to create a Java object like:
public class HourData(){
private float[] temp,wind,pressure;
//getters and setters for above!
}
...
// use it like this
HourData hr1 = new HourData();
hr1.setTemp(temp);
hr1.setWind(wind);
Out of these two approaches which is better in terms of performance, readability, good OOP practice etc
You're best off having an HourData class that stores a single set of temperature, wind, and pressure values, like this:
public class HourData {
private float temp, wind, pressure;
// Getters and setters for the above fields
}
If you need to store more than one set of values, you can use an array, or a collection of HourData objects. For example:
HourData[] hourDataArray = new HourData[10000];
This is ultimately much more flexible, performant, and intuitive to use than putting storing the arrays of data in your HourData class.
Flexibility
I say that this approach is more flexible because it leaves the choice of what kind of collection implementation to use (e.g. ArrayList, LinkedList, etc.) to users of the HourData class. Moreover, if he/she wishes to deal just with a single set of values, this approach doesn't force them to deal with an array or collection.
Performance
Suppose you have a list of HourData instances. If you used three float arrays in the way that you described, then accessing the i'th temp, wind, and pressure values may cause three separate pages to be accessed in memory. This happens because all of the temp values will be stored contiguously, followed by all of the wind values, followed by all of the pressure values. If you use a class to group these values together, then accessing the i'th temp, wind, and pressure values will be faster because they will all be stored adjacent to each other in memory.
Intuitive
If you use a HashMap, anyone who needs to access any of the fields will have to know the field names in advance. HashMap objects are better suited to key/value pairs where the keys are not known at compile time. Using an HourData class that contains clearly defined fields, one only needs to look at the class API to know that HourData contains values for temp, wind, and pressure.
Also, getter and setter methods for array fields can be confusing. What if I just want to add a single set of temp, wind, and pressure values to the list? Do I have to get each of the arrays, and add the new values to the end of them? This kind of confusion is easily avoided by using a "wrapper" collection around an HourData that deals only with single values.
For readability i would definately go for a object since it makes more sense. Especially since you store different datacollections like the wind longs have a different meaning as the temp longs.
Besides this you can also store other information like the location and time of your measurement.
Well if you dont have any key to differentiate different instances of the same object. I would create HourData objects and store them in a array list.
Putting data in a contained object always increases the readability.
You have mentioned bunch of data, So I would rather read it as collection of data.
So the answer is , if something already available in Java collection framework out of box , why do you want to write one for you.
You should look at Java collection classes and see which fits your requirement better, whether it is concurrent access, fast retrieve time or fast add time etc etc..
Hope this helps
EDIT----
Adding one more dimension to this.
The type of application you are building also affects your approach.
The above discussion rightly mentions readability, flexibility , performance as driving criteria for your design.
But the type of application you are building is also one of the influencing factors.
For example, Lets say you are building a web application.
A Object which is stored in memory for a long time would be either in Application or Session Scope. So you will have to make it immutable by design or use it for thread safe manner.
The business data which remains same across different implementations should be designed as per OOP or best practices but the infrastructure or Application logic should more be your framework driven.
I feel what you are talking, like keeping an object for a long time in memory is more a framework driven outlook, hence I suggested use Java Collection and put your business objects inside it. Important points are
Concurrent Access Control
Immutable by design
If you have a limited and already defined list of parameters then it's better to use the second approach.
In terms of performance: you don't need to search for key in hashmap
In terms of readability: data.setTemp(temp) is better than map.put("temp", temp). One of the benefits of the first approach is that typing errors will be catched during the compilation
In terms of good OOP practices: first approach has nothing to do with OOP practices. Using the second approach you can easily change the implementation, add new methods, provide several alternative data object implementations, etc.
But you might want to use collections if you don't know the parameters and if you want to work with uncategorized(extensible) set of parameters.
Is there any reason why you would use anything other than String as a hashmap key? It seems like Strings are good enough 99% of the time. Plus, you don't have to implement hashCode() and equals(Object o). Thanks
Imagine a case where you want to create a map keyed off a user's type and name. In that case you would be best to write a class composed of 2 fields (and overriding hashCode and equals). Using a single String in this case, and trying to somehow compose these 2 pieces of information down into a single String would be very awkward and bug prone.
Like Mark Elliot said, sometimes there is a natural key which doesn't have a natural String form. Why not use it?
Strings are good enough when you have a String form handy or one is obvious, such as when you're dealing with database entities.
However, performance is another reason.
Of course, if you do implement a custom key you must ABSOLUTELY ensure it is immutable. it's not only a preference, as trashgod says. Also, Comparable is pretty irrelevant.
If you have a mutable key in a hash map, you can break your hashmap completely.
Strings can cover a lot of use cases, but using more complicated objects can give you a lot more flexibility in your code. Seeing that in the US they've just announced the NCAA college basketball tournament seeds, I'll see if I can come up with a tournament-themed example.
Suppose I want to build a Hashmap that stores all of the teams that are in each region. There are four regions to the tournament: East, West, Southwest, and Southeast. You could certainly create your map as:
HashMap<String, List<Team>> teamsInRegions = new HashMap<String, List<Team>>();
But then, what if you already have an object that represents a region? You likely would, since each region has a lot more information associated with it than a name. For instance, I figure a Region class might look like:
public class Region {
private String name;
private Calendar firstRound;
private String firstRoundLocation;
private Calendar secondRound;
private String secondRoundLocation;
private Calendar thirdRound;
private String thirdRoundLocation;
....
}
You get the idea. If the HashMap were keyed by a String, you could certainly get away with having a reasonable Region.toString() method and using that as the key, but using the more complex object will allow you to write more flexible code.
While others may disagree, I find myself using Maps in somewhat unpredictable ways sometimes, where calling HashMap.keySet() gives me a Set of fully populated, complex objects with a plethora of information. And having the Map there means I can use those objects to make decisions elsewhere as well.
For instance, suppose I wanted to build up a list of all the locations of all the games in the basketball tournament. If I had my HashMap with complex objects, that would be as simple as:
Set<Region> regions = myMap.keyset();
Set<String> gameLocations = new HashSet();
for (Region region : regions) {
gameLocations.add(region.getAllLocales());
}
And so on. YMMV.
Sometimes you want an object instead of a string
Suppose that you have a map of... lets say a map (dictionary, or other hashtables) of customers and restaurant orders.
Instead of having
Customer c = new Customer("Bob");
Order o = new Order("Fries");
HashMap<string, Order> map = new HashMap<string, Order>();
map.put(c.Name, o);
You could access everything with a Customer object with
Hashmap<Customer, Order> map = new HashMap<Customer, Order>();
map.put(c, o);
Now why would you prefer this?
Maybe in your restaurant, you have multiple Customers called "Bob" on that particular night, so just having a HashMap of names isn't enough anymore. Your Customer class will reflect the differences between the "Bob"s, but just a string representing each of them will not do.
By having a HashMap of Customers, you can simply change the Customer class hashing algorithm and everything will work perfectly (hopefully). If you had proceeded with strings, it would not be as easy to make modifications and uniquely identify each object instance
Strings are probably the most common use case but it really depends on what you want to store. You would not want to turn a Long into String just to keep your key a String. Just use the type as your data structures suggest. For types like e.g. Long you don't have to implement hashCode/equals either. If you are thinking about custom classes implementing equals and hashCode might be a good idea anyway. Implementing it with e.g. Apache Commons' EqualBuilder it's also pretty straight forward.
I suppose that you could say that String is "good enough" because you can turn typical key types into Strings. However, doing this could well be expensive relative to the cost of using the real key type as the hash table key. Consider for example, converting an Integer (or int) to a String each time you want to use it as a hash table key ...
The other point is that implementing hashCode and equals on a key type is dead-easy. So easy in fact that IDEs like Eclipse have the ability to generate these two methods for you.
Advantages:
1) String's immutable => Thread Safe, no concurrency issues
2) String's hashcode is cached(look for private int hash in JDK for String class). Saves a lot of processing time.
3) String pool for literals(reuse of string objects) or intern strings created using new operator to add them into pool
4) Security, no one can edit a key
5) Equals and hashcode is implemented well
Downside:
1) String pool poses a security risk of exposing password in clear text to anyone who has access to memory of java application. Core dump of java application, generating memory dump in /tmp can put passwords in real threat.
Solution: For passwords use char[] you can erase convents by setting it blank or any other character which reduces security risk of exposing password.