I have a class A which can contain many instances of class B which may in turn contain many instances of Class C, which can contain many instances of class D
Now, in class A I have a method getAllD. Currently every time this is called there is a lot of iterating that takes place, and a rather large list is freshly created and returned. This cannot be very efficient.
I was wondering how I could do this better. This question Combine multiple Collections into a single logical Collection? seems to touch upon a similar topic, but I'm not really sure how I could apply it to my situation.
All comments are much appreciated!
I would combine Iterables.concat with Iterables.transform to obtain a live view of Ds:
public class A {
private Collection<B> bs;
/**
* #return a live concatenated view of the Ds contained in the Cs
* contained in the Bs contained in this A.
*/
public Iterable<D> getDs() {
Iterable<C> cs = Iterables.concat(Iterables.transform(bs, BToCsFunction.INSTANCE));
Iterable<D> ds = Iterables.concat(Iterables.transform(cs, CToDsFunction.INSTANCE));
return ds;
}
private enum BToCsFunction implements Function<B, Collection<C>> {
INSTANCE;
#Override
public Collection<C> apply(B b) {
return b.getCs();
}
}
private enum CToDsFunction implements Function<C, Collection<D>> {
INSTANCE;
#Override
public Collection<D> apply(C c) {
return c.getDs();
}
}
}
public class B {
private Collection<C> cs;
public Collection<C> getCs() {
return cs;
}
}
public class C {
private Collection<D> ds;
public Collection<D> getDs() {
return ds;
}
}
This works well if your goal is simply to iterate over the Ds and you don't really need a collection view. It avoids the instantiation of a big temporary collection.
The answer to your question is going to depend on the specifics of your situation. Are these collections static or dynamic? How big is your collection of B's in A? Are you only going to access the Ds from A, or will you sometimes want to be farther down in the tree or returning Bs or Cs? How frequently are you going to want to access the same set of Ds from a particular A? Can a D (or C or B) be associated with more than 1 A?
If everything is dynamic, then the best chance of improving performance is to have parent references from the Cs to A, and then updating the parent whenever C's list of Ds changes. This way, you can keep a collection of Ds in your A object and update A whenever one of the Cs gets a new one or has one deleted.
If everything is static and there is some reuse of the D collections from each A, then caching may be a good choice, particularly if there are a lot of Bs. A would have a map with a key of B and a value of a collection of Ds. The getAllDs() method would first check to see if the map had a key for B and if so return its collection of Ds. If not, then it would generate the collection, store it into the cache map, and return the collection.
You could also use a tree to store the objects, particularly if they were fairly simple. For example, you could create an XML DOM object and use XPath expressions to pull out the subset of Ds that you wanted. This would allow far more dynamic access to the sets of objects you were interested in.
Each of these solutions has different tradeoffs in terms of cost to setup, cost to maintain, timeliness of results, flexibility of use, and cost to fetch results. Which you should choose is going to depend on your context.
Actually, I think Iterables.concat (or IteratorChain from Apache Commons) would work fine for your case:
class A {
Collection<B> children;
Iterator<D> getAllD() {
Iterator<Iterator<D>> iters = new ArrayList<Iterator<D>>();
for (B child : children) {
iters.add(child.getAllD());
}
Iterator<D> iter = Iterables.concat(iters);
return iter;
}
}
class B {
Collection<C> children;
Iterator<D> getAllD() {
Iterator<Iterator<D>> iters = new ArrayList<Iterator<D>>();
for (C child : children) {
iters.add(child.getAllD());
}
Iterator<D> iter = Iterables.concat(iters);
return iter;
}
}
class C {
Collection<D> children;
Iterator<D> getAllD() {
Iterator<D> iter = children.iterator();
return iter;
}
}
This cannot be very efficient.
Iterating in-memory is pretty damn fast. Also the efficiency of creating an ArrayList of 10 k elements compared to creating 10 ArrayList with 1k elements each won't be that drastically different. So, in conclusion, you should probably first just go with the most straight-forward iterating. Chances are that this works just fine.
Even if you have gazillion elements, it is probably wise to implement a straight-forward iterating anyways for comparison. Otherwise you don't know if you are being able to optimize or if you are slowing things down by doing things clever.
Having said that, if you want to optimize for sequential read access of all Ds, I'd maintain an "index" outside. The index could be a LinkedList, ArrayList, TreeList etc. depending on your situation. For example, if you aren't sure of the length of the index, it is probably wise to avoid ArrayList. If you want to efficiently remove random elements using the reference of that element, OrderedSet might be much better than a list etc.
When you do this you have to worry about the consistency of the index & actual references in your classes. I.e. more complexity = more place to hide bugs. So, unless you find it necessary through performance testing, it is really not advisable to attempt an optimization.
(btw avoiding instantiation of new collection objects are unlikely to make things much faster unless you are talking about EXTREME high-performing code. Object instantiation in modern JVMs only take a few ten nano seconds or something. Also, you could mistakenly use an ArrayList having small initial length or something and make things worse)
Related
How to change the object of the class itself from inside itself? I have one class like this:
public class AdvancedArrayList<T> extends ArrayList<T>
And I want to assign a new object to itself. But, I am unable to do that. Just see this code:
AdvancedArrayList<T> abcd = this; // works fine
this = abcd; // ERROR: variable expected
And this is my entire class:
public class AdvancedArrayList<T> extends ArrayList<T> {
public void reverse(){
Collections.reverse(this);
}
public void removeDuplicates(){
ArrayList<T> wordDuplicate = new ArrayList<>();
ArrayList<T> tempList= new ArrayList<>();
for (T dupObject : wordDuplicate) {
if (!tempList.contains(dupObject)) {
tempList.add(dupObject);
}
}
this = tempList;
}
}
So, how can I change the object of the class itself?
Your #removeDuplicates method is attempting to mutate the list object it belongs to (known by this, within this class), and as such should actually make those modifications to the list itself (rather than a copy). Note as well, that by iterating your newly-made list wordDuplicate, the for loop will not actually execute (since there are no values inside the list). I would personally keep track using a Set (namely, HashSet with a roughly O(1) lookup, aka nearly constant-time), while taking advantage of the Iterator#remove method:
public void removeDuplicates() {
Set<T> seen = new HashSet<>();
Iterator<T> itr = this.iterator(); //similar to the mechanics behind for-each
while (itr.hasNext()) { //while another element can be found
T next = itr.next(); //get the next element
if (!seen.add(next)) { //#add returns "true" only if it was not in the Set previously
itr.remove(); //if it was previously in the set, remove it from the List
}
}
}
Note as well, that the semantics you are describing nicely fit the definition of a Set, which is a collection of items that do not contain duplicates. There are varients of Set (i.e. LinkedHashSet) which also allow linear traversal of the elements, based on insertion-order.
For a simpler, non-iterator solution, you could also do a "dirty swap" of the elements after your calculation, which would be similar to what you attempted to achieve via this = list:
public void removeDuplicates() {
Set<T> seen = new LinkedHashSet<>(); //to preserve iteration order
for (T obj : this) { //for each object in our current list
seen.add(obj); //will only add non-duplicates!
}
this.clear(); //wipe out our old list
this.addAll(seen); //add back our non-duplicated elements
}
This can be quicker to write, but would be less performant than the in-place removal (since the entire list must be wiped and added to again). Lastly, if this is a homework problem, try to avoid copy-pasting code directly, and instead write it out yourself at the very least. Merely working towards a solution can help to solidify the information down the line.
It's not possible in Java. this cannot be reassigned. Why (and when) do you need to re-assign an object from within itself?
When you have code such as obj.replace(otherObj) with an imaginary implementation of void replace(Object other) { this = other; }, then why not simply write obj = otherObj to update the reference?
Further clarifications after comments:
ArrayList is not meant to be inherited. If you want to write your own list collection, extend AbstractList. Your custom list can then use composition to delegate to its inner list.
I have a class and an enum that looks something like this
class Container{
static int next_id = 0;
final int id = next_id++;
State state = State.one;
}
enum State{
one, two, three, four, five;
}
I want to maintain several collections of Container but maintain that for any instance of Container, it is only present in one collection. The collections need to be thread safe, and I cannot store the Container directly in a hash based collection, as its hash will change based on its current state.
--edit--
To further clarify, the goal of this is to be able to retrieve all Containers that are in a given state, without having to inspect every single container's state as there are several thousand containers.
You won't be able to ensure that without writing some code yourself, as maintaining consistency, i.e. a Container belonging to exactly one collection depending on its state, requires not only the collections' but also the members' support.
Disregarding that the target language is java, I would probably suggest linked lists, something like this:
class Container{
static int next_id = 0;
final int id = next_id++;
final Node<Container> node;
State state;
public Container() {
node = new Node<>(this);
setState(State.one);
}
public void setState(State state) {
if(this.state != null) node.unlink();
this.state = state;
if(this.state != null) this.state.containers.add(node);
}
}
enum State{
one, two, three, four, five;
final List<Container> containers = ...;
}
This code only works efficiently because, knowing the node, an element can be unlinked from the linked list in O(1). (add is of course O(1) as well). Java's default linked list does not expose nodes, so every access requires a linear search, which is inefficient.
In the light of this, the next best thing is using a hash based approach. Some of the code can still remain:
setting the state is somewhat complex, so the logic should be put in a setter, not in the client doing container.state = ...
as long as the collections of containers are application wide (hint: in a managed environment, like an application server, they hardly are!), they can be maintained directly by the enum. otherwise, give the containers some kind of Context that holds an EnumMap<State, Map<Integer, Container>> or similar.
The result:
class Container{
static int next_id = 0;
final int id = next_id++;
State state;
public Container() {
setState(State.one);
}
public void setState(State state) {
if(this.state != null) this.state.containers.remove(id);
this.state = state;
if(this.state != null) this.state.containers.put(id, node);
}
}
enum State{
one, two, three, four, five;
final Map<Integer, Container> containers = new HashMap<>();
}
by using the id as the key, you don't have to worry about your own hashCode and equals implementations.
Regarding visibility: You really, really want to make a bunch of this private - or default (I'm looking at the maps in State) at the very least.
Regarding thread safety: I don't want to lean out of the window here (I'm no expert), but using ConcurrentHashMaps will probably do much of what you need. Also, synchronize setState. If you don't, two concurrent updates could insert the container into two maps. Finally, final int id = next_id++; is not thread safe to my knowledge, because ++ is not atomic. you could use AtomicInteger here.
Just override the hashCode() method to not factor in its State state so the hash would remain the same no matter the content of the state variable.
Even better, because your Containers id variable is what you are sorting by, and there exists only one Container per id, you could just do:
#Override
public int hashCode() {
return this.id;
}
If you want the hashCodes to not be linear, you could just XOR it against some number, and it will still be guaranteed unique per id
#Override
public int hashCode() {
return (2807 ^ this.id);
}
You probably can get the effect that you have described (add it to the proper collection on state change and have it automatically disappear from the prior collection) by using HashMapfor your collections. I say 'probably' because there may be other, unstated, requirements.
As you note, when you update the state, the hashcode value will change. This means you will no longer be able to retrieve it from the HashMap that has had the prior state when it was put, and it effectively disappears from the prior HashMap. Whenever you update state you should put it in the HashMap relevant for the new state.
Your unit test would call the "update state" method with various state changes and validate that when you get from each collection that it is found in only one collection and it is the right one.
java.util.Set specifies only methods that return all records (via Iterator or array).
Why is there no option to return any value from Set?
It has a lot of sense in the real life. For example, I have a bowl of strawberries and I want to take just one of them. I totally don't care which one.
Why I can't do the same in java?
This is not answerable. You'd have to ask the original designers of the Java collections framework.
One plausible reason is that methods with non-deterministic behavior tend to be problematic:
They make unit testing harder.
They make bugs harder to track down.
They are more easily misunderstood and misused by programmers who haven't bothered to read the API documentation.
For hashtable-based set organizations, the behavior a "get some element" method is going to be non-deterministic, or at least difficult to determine / predict.
By the way, you can trivially get some element of a non-empty set as follows:
Object someObject = someSet.iterator().next();
Getting a truly (pseudo-)random element is a bit more tricky / expensive because you can't index the elements of a set. (You need to extract all of the set elements into an array ...)
On revisiting this, I realized that there is another reason. It is simply that Set is based on the mathematical notion of a set, and the elements of a set in mathematics have no order. It is simply meaningless to talk about the first element of a mathematical set.
A java.util.Set is an unordered collection; you can see it as a bag that contains things, but not in any particular order. It would not make sense to have a get(int index) method, because elements in a set don't have an index.
The designers of the standard Java library didn't include a method to get a random element from a Set. If you want to know why, that's something you can only speculate about. Maybe they didn't think it was necessary, or maybe they didn't even think about it.
It's easy to write a method yourself that gets a random element out of a Set.
If you don't care about the index of the elements, try using Queue instead of Set.
Queue q = new ArrayDeque();
q.element(); // retrieves the first object but doesn't remove
q.poll(); // retrieves and removes first object
While a plain Set is in no particular, SortedSet and NavigableSet provide a guaranteed order and methods which support this. You can use first() and last()
SortedSet<E> set = ...
E e1 = set.first(); // a value
E e2 = set.last(); // also a value.
Actually the iterator is a lot better then using get(position) (which is something you can do on a java.util.List). It allows for collection modifications during the iterations for one thing. The reason you don't have them in sets is probably because most of them don't guarantee order of insertion. You can always do something like new ArrayList<?>(mySet).get(position)
If you are not concerned with performance you can create a new type and back the data in an arraylist.
( Please note before donwvoting this is just an naive implementation of the idea and not the proposed final solution )
import ...
public class PickableSet<E> extends AbstractSet<E>{
private final List<E> arrayList = new ArrayList<E>();
private final Set<E> hashSet = new HashSet<E>();
private final Random random = new Random();
public boolean add( E e ) {
return hashSet.add( e ) && arrayList.add( e );
}
public int size() {
return arrayList.size();
}
public Iterator<E> iterator() {
return hashSet.iterator();
}
public E pickOne() {
return arrayList.get( random.nextInt( arrayList.size() ) );
}
}
Of course, since you're using a different interface you'll have to cast to invoke the method:
Set<String> set = new PickableSet<String>();
set.add("one");
set.add("other");
String oneOfThem = ((PickableSet)set).pickOne();
ie
https://gist.github.com/1986763
Well, you can with a little bit of work like this
Set<String> s = new HashSet<String>();
Random r = new Random();
String res = s.toArray(new String[0])[r.nextInt(s.toArray().length)];
This grabs a randomly selected object from the set.
I have a factory that creates objects of class MyClass, returning already generated ones when they exist. As I have the creation method (getOrCreateMyClass) taking multiple parameters, which is the best way to use a Map to store and retrieve the objects?
My current solution is the following, but it doesn't sound too clear to me.
I use the hashCode method (slightly modified) of class MyClass to build an int based on the parameters of class MyClass, and I use it as the key of the Map.
import java.util.HashMap;
import java.util.Map;
public class MyClassFactory {
static Map<Integer, MyClass> cache = new HashMap<Integer, MyClass>();
private static class MyClass {
private String s;
private int i;
public MyClass(String s, int i) {
}
public static int getHashCode(String s, int i) {
final int prime = 31;
int result = 1;
result = prime * result + i;
result = prime * result + ((s == null) ? 0 : s.hashCode());
return result;
}
#Override
public int hashCode() {
return getHashCode(this.s, this.i);
}
}
public static MyClass getOrCreateMyClass(String s, int i) {
int hashCode = MyClass.getHashCode(s, i);
MyClass a = cache.get(hashCode);
if (a == null) {
a = new MyClass(s, i);
cache.put(hashCode , a);
}
return a;
}
}
Your getOrCreateMyClass doesn't seem to add to the cache if it creates.
I think this will also not perform correctly when hashcodes collide. Identical hashcodes do not imply equal objects. This could be the source of the bug you mentioned in a comment.
You might consider creating a generic Pair class with actual equals and hashCode methods and using Pair<String, Integer> class as the map key for your cache.
Edit:
The issue of extra memory consumption by storing both a Pair<String, Integer> key and a MyClass value might be best dealt with by making the Pair<String, Integer> into a field of MyClass and thereby having only one reference to this object.
With all of this though, you might have to worry about threading issues that don't seem to be addressed yet, and which could be another source of bugs.
And whether it is actually a good idea at all depends on whether the creation of MyClass is much more expensive than the creation of the map key.
Another Edit:
ColinD's answer is also reasonable (and I've upvoted it), as long as the construction of MyClass is not expensive.
Another approach that might be worth consideration is to use a nested map Map<String, Map<Integer, MyClass>>, which would require a two-stage lookup and complicate the cache updating a bit.
You really shouldn't be using the hashcode as the key in your map. A class's hashcode is not intended to necessarily guarantee that it will not be the same for any two non-equal instances of that class. Indeed, your hashcode method could definitely produce the same hashcode for two non-equal instances. You do need to implement equals on MyClass to check that two instances of MyClass are equal based on the equality of the String and int they contain. I'd also recommend making the s and i fields final to provide a stronger guarantee of the immutability of each MyClass instance if you're going to be using it this way.
Beyond that, I think what you actually want here is an interner.... that is, something to guarantee that you'll only ever store at most 1 instance of a given MyClass in memory at a time. The correct solution to this is a Map<MyClass, MyClass>... more specifically a ConcurrentMap<MyClass, MyClass> if there's any chance of getOrCreateMyClass being called from multiple threads. Now, you do need to create a new instance of MyClass in order to check the cache when using this approach, but that's inevitable really... and it's not a big deal because MyClass is easy to create.
Guava has something that does all the work for you here: its Interner interface and corresponding Interners factory/utility class. Here's how you might use it to implement getOrCreateMyClass:
private static final Interner<MyClass> interner = Interners.newStrongInterner();
public static MyClass getOrCreateMyClass(String s, int i) {
return interner.intern(new MyClass(s, i));
}
Note that using a strong interner will, like your example code, keep each MyClass it holds in memory as long as the interner is in memory, regardless of whether anything else in the program has a reference to a given instance. If you use newWeakInterner instead, when there isn't anything elsewhere in your program using a given MyClass instance, that instance will be eligible for garbage collection, helping you not waste memory with instances you don't need around.
If you choose to do this yourself, you'll want to use a ConcurrentMap cache and use putIfAbsent. You can take a look at the implementation of Guava's strong interner for reference I imagine... the weak reference approach is much more complicated though.
I have a list. The list can contain multiple items of the same enum type.
Lets say i have an enum : TOY which has values: BALL, DOLL, PLAYSTATION. I want to know how many PLAYSTATION items are in a list with the type TOY. (ie, List<Toy> toys)
What is the best possible solution for this? I don't want to keep iterating through the list everytime.
You can use Apache commons-collections' HashBag. It has a getCount(Object) method which will suit you.
java.util.Collections has a method called frequency(Collection c, Object type).
Usage in my question:
int amountOfPlayStations = Collections.frequency(toys, TOY.PLAYSTATION);
Why don't you create a decorator for the type of list you're using which stores a list of counts for each enum type have been added/removed internally. That way you could use it as a normal list but also add some extra functionality for querying how many of which type are currently contained.
All you'd need to do would be to override the add/remove/addAll etc methods and increment your counters before passing it on to the real list type. The best part about it would be that you could decorate any list type with your new wrapper.
At the very least, a utility method like:
public int count(List<Toy> haystack, Toy needle) {
int result;
for (Toy t : haystack) {
if (t == needle) {
result++;
}
}
return result;
}
Would let you concisely refer to the number of PLAYSTATIONs from elsewhere in the code. Alternatively if you knew the list was unlikely to change, building a Map<Toy, Integer> would let you build up the counts for all items once.
If you don't want to have to iterate over the entire collection each time, another alternative would be to write a ForwardingList implementation. The main benefits of this over the HashBag suggestion are:
it supports generics
it implements the List interface, so you can pass it to any method that expects a List
There is a downside to this approach however, in that you have to write a bit of plumbing code to get it up and running.
Below is a quick example of how you could do it. Note that if you do this you should override all methods that add/delete from the list, otherwise you may end up in an inconsistent state:
import com.google.common.collect.ForwardingList;
public class CountingList<E> extends ForwardingList<E> {
private List<E> backingList = new LinkedList<E>();
private Map<E, Integer> countMap = new HashMap<E, Integer>();
#Override
protected List<E> delegate() {
return backingList;
}
#Override
public boolean add(E element) {
backingList.add(element);
if(countMap.containsKey(element)) {
countMap.put(element, countMap.get(element) + 1);
} else {
countMap.put(element, 1);
}
return true;
}
public int getCount(E element) {
Integer count = countMap.get(element);
return count != null ? count.intValue() : 0;
}
}
Extend java.util.List method and override all mutator methods, i.e. the ones that are used for add or delete elements and also ones used to clear the list. Add a reference to a private java.util.Map which will hold the number of items per type. Add accessor methods which will return current number of elements per type.
The HashBag (by Bozho) seems to be your best bet. But a bit more general would be Googles Collections 2 with an appropriate Predicate:
List<Toy> toys;
List<Toy> playstations = Collections2.filter( toys, new Predicate() {
boolean apply(TOY toy){
return toy == TOY.PLAYSTATION;
}
});
Besides all those solutions (I have a weakness for the Collections.Frequency call), i would recommend you to take a look at google collections, and particularly to [Collections2.transform][2], which could give you a live view on items.
[2]: http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Collections2.html#transform(java.util.Collection, com.google.common.base.Function)