I'm implementing a graph (as in Vertices, Edges, not cartesian). I'm modelling the graph as a physical collection of Nodes (a class I've made).
I want to have a collection of Forces, as Vectors (in the Maths sense), to represent the forces acting upon each node, and ideally I would like to be able to perform a lookup with a Node as a key, which sounds to me like some kind of Hash Lookup Table.
What's a good collection to use, or will I have to make my own?
If anything needs clarifying, just ask.
Thanks
If I have understood your needs correctly, you basically want to do a one-to-many mapping of Node->Vector.
Provided your Node properly implements hashCode() and equals(), you could use a Multimap from Google Guava. This provides the Map<Node,Collection<Vector>> mapping automatically.
The benefit of using Multimap is that you don't need to do this:
Collection<Vector> vectors = nodeToVectorMapping.get(node);
if (vectors == null) {
vectors = new HashSet<Vector>();
nodeToVectorMapping.put(node, vectors);
}
vectors.add(vector);
instead, you only need to do this:
nodeToVectorMapping.put(node,vector);
The Multimap takes care of checking whether the inner Collection exists or not. If you find yourself going into a multithreaded environment, the 'do it by hand' approach would involve synchronising to ensure that two threads didn't create the Collection at the same time, and so-on. Google's Guava helps a lot with all of that, and a lot more besides.
As a big fan of Google Collections (the original home of Multimap before it was absorbed into the larger Guava project), I should also point you in the direction of MapMaker, which has all sorts of amazing goodness in it that you will perhaps find useful - size limitations, concurrency levels, lazy initialisation of Values based upon keys, that sort of thing. I've used these in a highly-concurrent application and they've saved my life on many an occasion! :)
You could simply add the vector as a field in your Node class.
public class Node {
private ForceVector force = ForceVector.getZeroForceVector();
public ForceVector getForceVector() {
return force;
}
public void addForceVector(ForceVector forceToAdd) {
force = force.add(forceToAdd);
}
}
I'm imagining ForceVector to be some (immutable) class you have written to describe a force vector.
If you have several forces acting on each node, you need to map a node to a collection of fources, for instance by using a HashMap<Node, Set<Vector>>. Just remember to properly implement equals and hashCode for your Nodes.
Others may suggest to you, to put the forces acting on a node, in a field in the node class. This may or may not be a good alternative. If you aim to have a graph-framework reusable in other applications, you may be better off with a separate node-to-forces-map.
Related
Java offers us Collections, where every option is best used in a certain scenario.
But what would be a good solution for the combination of following tasks:
Quickly iterate through every element in the list (order does not matter)
Check if the list contains (a) certain element(s)
Some options that were considered which may or may not be good practice:
It could be possible to, for example, first use a LinkedList, and
then convert it to a HashSet when the amount of elements
is unknown in advance (and if duplicates will not be present)
Pick a solution for one of both tasks and use the same implementation for the other task (if switching to another implementation is not worth it)
Perhaps some implementation exists that does both (failed to find one)
Is there a 'best' solution to this, and if so, what is it?
EDIT: For potential future visitors, this page contains many implementations with big O runtimes.
A HashSet can be iterated through quickly and provides efficient lookups.
HashSet<Object> set = new HashSet<>();
set.add("Hello");
for (Object obj : set) {
System.out.println(obj);
}
if (set.contains("Hello")) {
System.out.println("Found");
}
Quickly iterate through every element in the list (order does not matter)
It the order does not matter, you should go with a Collection implementation with a time complexity of O(n), since each of them is implementing Iterable and if you want to iterate over each element, you have to visit each element at least once (hence there is nothing better than O(n)). Practically, of course, one implementation is more suited compared to another one, since more often you have multiple considerations to take into account.
Check if the list contains (a) certain element(s)
This is typically the user case for a Set, you will have much better time complexity for contains operations. One thing to note here is that a Set does not have a predefined order when iterating over elements. It can change between implementations and it is risky to make assumptions about it.
Now to your question:
From my perspective, if you have the choice to choose the data structure of a class yourself, go with the most natural one for that use case. If you can imagine that you have to call contains a lot, then a Set might be suited for your use case. You can also use a List and each time you need to call contains (multiple times) you can create a Set with all elements from the List before. Of course, if you call this method often, it would be expensive to create the Set for each invocation. You may use a Set in the first place.
Your comment stated that you have a world of players and you want to check if a player is part of a certain world object. Since the world owns the players, it should also contain a Collection of some kind to store them. Now, in this case i would recommend a Map with a common identifier of the player as key, and the player itself as value.
public class World {
private Map<String, Player> players = new HashMap<>();
public Collection<Player> getPlayers() { ... }
public Optional<Player> getPlayer(String nickname) { ... }
// ...
}
I've got an ArrayList that can be anywhere from 0 to 5000 items long (pretty big objects, too).
At one point I compare it against another ArrayList, to find their intersection. I know this is O(n^2).
Is creating a HashMap alongside this ArrayList, to achieve constant-time lookup, a valid strategy here, in order to reduce the complexity to O(n)? Or is the overhead of another data structure simply not worth it? I believe it would take up no additional space (besides for the references).
(I know, I'm sure 'it depends on what I'm doing', but I'm seriously wondering if there's any drawback that makes it pointless, or if it's actually a common strategy to use. And yes, I'm aware of the quote about prematurely optimizing. I'm just curious from a theoretical standpoint).
First of all, a short side note:
And yes, I'm aware of the quote about prematurely optimizing.
What you are asking about here is not "premature optimization"!
You are not talking about replacing a multiplication with some odd bitwise operations "because they are faster (on a 90's PC, in a C-program)". You are thinking about the right data structure for your application pattern. You are considering the application cases (though you did not tell us many details about them). And you are considering the implications that the choice of a certain data structure will have on the asymptotic running time of your algorithms. This is planning, or maybe engineering, but not "premature optimization".
That being said, and to tell you what you already know: It depends.
To elaborate this a bit: It depends on the actual operations (methods) that you perform on these collections, how frequently you perform then, how time-critical they are, and how memory-sensitive the application is.
(For 5000 elements, the latter should not be a problem, as only references are stored - see the discussion in the comments)
In general, I'd also be hesitant to really store the Set alongside the List, if they are always supposed to contain the same elements. This wording is intentional: You should always be aware of the differences between both collections. Primarily: A Set can contain each element only once, whereas a List may contain the same element multiple times.
For all hints, recommendations and considerations, this should be kept in mind.
But even if it is given for granted that the lists will always contain elements only once in your case, then you still have to make sure that both collections are maintained properly. If you really just stored them, you could easily cause subtle bugs:
private Set<T> set = new HashSet<T>();
private List<T> list = new ArrayList<T>();
// Fine
void add(T element)
{
set.add(element);
list.add(element);
}
// Fine
void remove(T element)
{
set.remove(element);
list.remove(element); // May be expensive, but ... well
}
// Added later, 100 lines below the other methods:
void removeAll(Collection<T> elements)
{
set.removeAll(elements);
// Ooops - something's missing here...
}
To avoid this, one could even consider to create a dedicated collection class - something like a FastContainsList that combines a Set and a List, and forwards the contains call to the Set. But you'll qickly notice that it will be hard (or maybe impossible) to not violate the contracts of the Collection and List interfaces with such a collection, unless the clause that "You may not add elements twice" becomes part of the contract...
So again, all this depends on what you want to do with these methods, and which interface you really need. If you don't need the indexed access of List, then it's easy. Otherwise, referring to your example:
At one point I compare it against another ArrayList, to find their intersection. I know this is O(n^2).
You can avoid this by creating the sets locally:
static <T> List<T> computeIntersection(List<T> list0, List<T> list1)
{
Set<T> set0 = new LinkedHashSet<T>(list0);
Set<T> set1 = new LinkedHashSet<T>(list1);
set0.retainAll(set1);
return new ArrayList<T>(set0);
}
This will have a running time of O(n). Of course, if you do this frequently, but rarely change the contents of the lists, there may be options to avoid the copies, but for the reason mentioned above, maintainng the required data structures may become tricky.
Lets say we have a bunch of data (temp,wind,pressure) that ultimately comes in as a number of float arrays.
For example:
float[] temp = //get after performing some processing (takes time)
float[] wind =
Say we want to store these values in memory for different hours of the day. Is it better to put these on a HashMap like:
HashMap maphr1 = new HashMap();
maphr1.put("temp",temp);
maphr1.put("wind",wind);
...
Or is it better to create a Java object like:
public class HourData(){
private float[] temp,wind,pressure;
//getters and setters for above!
}
...
// use it like this
HourData hr1 = new HourData();
hr1.setTemp(temp);
hr1.setWind(wind);
Out of these two approaches which is better in terms of performance, readability, good OOP practice etc
You're best off having an HourData class that stores a single set of temperature, wind, and pressure values, like this:
public class HourData {
private float temp, wind, pressure;
// Getters and setters for the above fields
}
If you need to store more than one set of values, you can use an array, or a collection of HourData objects. For example:
HourData[] hourDataArray = new HourData[10000];
This is ultimately much more flexible, performant, and intuitive to use than putting storing the arrays of data in your HourData class.
Flexibility
I say that this approach is more flexible because it leaves the choice of what kind of collection implementation to use (e.g. ArrayList, LinkedList, etc.) to users of the HourData class. Moreover, if he/she wishes to deal just with a single set of values, this approach doesn't force them to deal with an array or collection.
Performance
Suppose you have a list of HourData instances. If you used three float arrays in the way that you described, then accessing the i'th temp, wind, and pressure values may cause three separate pages to be accessed in memory. This happens because all of the temp values will be stored contiguously, followed by all of the wind values, followed by all of the pressure values. If you use a class to group these values together, then accessing the i'th temp, wind, and pressure values will be faster because they will all be stored adjacent to each other in memory.
Intuitive
If you use a HashMap, anyone who needs to access any of the fields will have to know the field names in advance. HashMap objects are better suited to key/value pairs where the keys are not known at compile time. Using an HourData class that contains clearly defined fields, one only needs to look at the class API to know that HourData contains values for temp, wind, and pressure.
Also, getter and setter methods for array fields can be confusing. What if I just want to add a single set of temp, wind, and pressure values to the list? Do I have to get each of the arrays, and add the new values to the end of them? This kind of confusion is easily avoided by using a "wrapper" collection around an HourData that deals only with single values.
For readability i would definately go for a object since it makes more sense. Especially since you store different datacollections like the wind longs have a different meaning as the temp longs.
Besides this you can also store other information like the location and time of your measurement.
Well if you dont have any key to differentiate different instances of the same object. I would create HourData objects and store them in a array list.
Putting data in a contained object always increases the readability.
You have mentioned bunch of data, So I would rather read it as collection of data.
So the answer is , if something already available in Java collection framework out of box , why do you want to write one for you.
You should look at Java collection classes and see which fits your requirement better, whether it is concurrent access, fast retrieve time or fast add time etc etc..
Hope this helps
EDIT----
Adding one more dimension to this.
The type of application you are building also affects your approach.
The above discussion rightly mentions readability, flexibility , performance as driving criteria for your design.
But the type of application you are building is also one of the influencing factors.
For example, Lets say you are building a web application.
A Object which is stored in memory for a long time would be either in Application or Session Scope. So you will have to make it immutable by design or use it for thread safe manner.
The business data which remains same across different implementations should be designed as per OOP or best practices but the infrastructure or Application logic should more be your framework driven.
I feel what you are talking, like keeping an object for a long time in memory is more a framework driven outlook, hence I suggested use Java Collection and put your business objects inside it. Important points are
Concurrent Access Control
Immutable by design
If you have a limited and already defined list of parameters then it's better to use the second approach.
In terms of performance: you don't need to search for key in hashmap
In terms of readability: data.setTemp(temp) is better than map.put("temp", temp). One of the benefits of the first approach is that typing errors will be catched during the compilation
In terms of good OOP practices: first approach has nothing to do with OOP practices. Using the second approach you can easily change the implementation, add new methods, provide several alternative data object implementations, etc.
But you might want to use collections if you don't know the parameters and if you want to work with uncategorized(extensible) set of parameters.
I need a java data structure/solution that meets these requirements. What best fits these?
1) Object's insertion order must be kept
2) Object's must be unique (These are database objects that are uniquely identified by a UUID).
3) If a newer object with the same ID is added, the older version of the object should be over-written/removed
4) The Solution should be accessible by many threads.
5) When the first object added to the Structure is read/used, it should be removed from the data structure
There are a couple of possibilities here. The simplest might be to start with a LinkedHashSet. That will provide you with the uniqueness and predictable ordering that you require. Then, you could wrap the resulting set to make it thread-safe:
Set<T> s = Collections.synchronizedSet(new LinkedHashSet<T>(...));
Note: Since a Set doesn't really define a method for retrieving items from it, your code would have to manually invoke Set.remove(Object).
Alternatively, you could wrap a LinkedHashMap, which does provide a hook for the delete-on-read semantics you require:
class DeleteOnReadMap<K, V> implements Map<K, V> {
private Map<K, V> m = new LinkedHashMap<K, V>();
// implement Map "read" methods Map with delete-on-read semantics
public V get(K key) {
// ...
}
// (other read methods here)
// implement remaining Map methods by forwarding to inner Map
public V put(K key, V value) {
return m.put(key, value);
}
// (remaining Map methods here)
}
Finally, wrap an instance of your custom Map to make it thread-safe:
Map<K, V> m = Collections.synchronizedMap(new DeleteOnReadMap<K, V>(...));
My thought is something like the following:
Collections.synchronizedMap(new LinkedHashMap<K, V>());
I think that takes care of everything except requirement 5, but you can do that by using the remove() method instead of get().
This won't be quite as efficient as a ConcurrentMap would be - synchronization locks the entire map on every access, but I think ConncurrentMap implementations can use read-write locks and selective locking on only part of the map to allow multiple non-conflicting accesses to go on simultaneously. If you wanted, you could probably get better performance by writing your own subclass of some existing Map implementation.
1) Object's insertion order must be
kept
This is any "normal" data structure - array, arrayList, tree. So avoid self-balancing or self-sorting data structures: heaps, hashtables, or move-to-front trees (splay trees, for example.) Then again, you could use one of those structures, but then you have to keep track of its insertion order in each node.
2) Object's must be unique (These are
database objects that are uniquely
identified by a UUID).
Keep a unique identifier associated with each object. If this is a C program, then the pointer to that node is unique (I guess this applies in Java as well.) If the node's pointer is not sufficient to maintain "uniqueness", then you need to add a field to each node which you gaurantee to have a unique value.
3) If a newer object with the same ID
is added, the older version of the
object should be over-written/removed
Where do you want to place the node? Do you want to replace the existing node? Or do you want to delete the old node,and then add the new one to the end? This is important because it is related to your requirement #1, where the order of insertion must be preserved.
4) The Solution should be accessible
by many threads.
The only way I can think of to do this is to implement some sort of locking. Java lets you wrap strucutres and code within an synchronized block.
5) When the first object added to the
Structure is read/used, it should be
removed from the data structure
Kinda like a "dequeue" operation.
Seems like an ArrayList is a pretty good option for this: simply because of #5. The only problem is that searches are linear. But if you have a relatively small amount of data, then it isn't really that much of a problem.
Otherwise, like others have said: a HashMap or even a Tree of some sort would work - but that will depend on the frequency of accesses. (For example, if the "most recent" element is most likely to be accessed, I'd use a linear structure. But if accesses will be of "random" elements, I'd go with a HashMap or Tree.)
The solutions talking about LinkedHashSet would be a good starting point.
However, you would have to override the equals and hashcode methods on the objects that you are going to be putting in the set in order to satisfy your requirement number 3.
Sounds like you have to create your own data structure, but it sounds like a pretty easy class assignment.
Basically you start with anything like an Array or Stack but then you have to extend it for the rest of the functionality.
You can look at the 'Contains' method as you will need that.
I usually always find it sufficient to use the concrete classes for the interfaces listed in the title. Usually when I use other types (such as LinkedList or TreeSet), the reason is for functionality and not performance - for example, a LinkedList for a queue.
I do sometimes construct ArrayList with an initial capcacity more than the default of 10 and a HashMap with more than the default buckets of 16, but I usually (especially for business CRUD) never see myself thinking "hmmm...should I use a LinkedList instead ArrayList if I am just going to insert and iterate through the whole List?"
I am just wondering what everyone else here uses (and why) and what type of applications they develop.
Those are definitely my default, although often a LinkedList would in fact be the better choice for lists, as the vast majority of lists seem to just iterate in order, or get converted to an array via Arrays.asList anyway.
But in terms of keeping consistent maintainable code, it makes sense to standardize on those and use alternatives for a reason, that way when someone reads the code and sees an alternative, they immediately start thinking that the code is doing something special.
I always type the parameters and variables as Collection, Map and List unless I have a special reason to refer to the sub type, that way switching is one line of code when you need it.
I could see explicitly requiring an ArrayList sometimes if you need the random access, but in practice that really doesn't happen.
For some kind of lists (e.g. listeners) it makes sense to use a CopyOnWriteArrayList instead of a normal ArrayList. For almost everything else the basic implementations you mentioned are sufficient.
Yep, I use those as defaults. I generally have a rule that on public class methods, I always return the interface type (ie. Map, Set, List, etc.), since other classes (usually) don't need to know what the specific concrete class is. Inside class methods, I'll use the concrete type only if I need access to any extra methods it may have (or if it makes understanding the code easier), otherwise the interface is used.
It's good to be pretty flexible with any rules you do use, though, as a dependancy on concrete class visibility is something that can change over time (especially as your code gets more complex).
Indeed, always use base interfaces Collection, List, Map instead their implementations. To make thinkgs even more flexible you could hide your implementations behind static factory methods, which allow you to switch to a different implementation in case you find something better(I doubt there will be big changes in this field, but you never know). Another benefit is that the syntax is shorter thanks to generics.
Map<String, LongObjectClasName> map = CollectionUtils.newMap();
instead of
Map<String, LongObjectClasName> map = new HashMap<String, LongObjectClasName>();
public class CollectionUtils {
.....
public <T> List<T> newList() {
return new ArrayList<T>();
}
public <T> List<T> newList(int initialCapacity) {
return new ArrayList<T>(initialCapacity);
}
public <T> List<T> newSynchronizedList() {
return new Vector<T>();
}
public <T> List<T> newConcurrentList() {
return new CopyOnWriteArrayList<T>();
}
public <T> List<T> newSynchronizedList(int initialCapacity) {
return new Vector<T>(initialCapacity);
}
...
}
Having just come out of a class about data structure performance, I'll usually look at the kind of algorithm I'm developing or the purpose of the structure before I choose an implementation.
For example, if I'm building a list that has a lot of random accesses into it, I'll use an ArrayList because its random access performance is good, but if I'm inserting things into the list a lot, I might choose a LinkedList instead. (I know modern implementations remove a lot of performance barriers, but this was the first example that came to mind.)
You might want to look at some of the Wikipedia pages for data structures (especially those dealing with sorting algorithms, where performance is especially important) for more information about performance, and the article about Big O notation for a general discussion of measuring the performance of various functions on data structures.
I don't really have a "default", though I suppose I use the implementations listed in the question more often than not. I think about what would be appropriate for whatever particular problem I'm working on, and use it. I don't just blindly default to using ArrayList, I put in 30 seconds of thought along the lines of "well, I'm going to be doing a lot of iterating and removing elements in the middle of this list so I should use a LinkedList".
And I almost always use the interface type for my reference, rather than the implementation. Remember that List is not the only interface that LinkedList implements. I see this a lot:
LinkedList<Item> queue = new LinkedList<Item>();
when what the programmer meant was:
Queue<Item> queue = new LinkedList<Item>();
I also use the Iterable interface a fair amount.
If you are using LinkedList for a queue, you might consider using the Deque interface and ArrayDeque implementing class (introduced in Java 6) instead. To quote the Javadoc for ArrayDeque:
This class is likely to be faster than
Stack when used as a stack, and faster
than LinkedList when used as a queue.
I tend to use one of *Queue classes for queues. However LinkedList is a good choice if you don't need thread safety.
Using the interface type (List, Map) instead of the implementation type (ArrayList, HashMap) is irrelevant within methods - it's mainly important in public APIs, i.e. method signatures (and "public" doesn't necessarily mean "intended to be published outside your team).
When a method takes an ArrayList as a parameter, and you have something else, you're screwed and have to copy your data pointlessly. If the parameter type is List, callers are much more flexible and can, e.g. use Collections.EMPTY_LIST or Collections.singletonList().
I too typically use ArrayList, but I will use TreeSet or HashSet depending on the circumstances. When writing tests, however, Arrays.asList and Collections.singletonList are also frequently used. I've mostly been writing thread-local code, but I could also see using the various concurrent classes as well.
Also, there were times I used ArrayList when what I really wanted was a LinkedHashSet (before it was available).