What is the fastest way to get the elements of a collection? - java

I have a List<Pair<String, String>> into which I want to copy the data from a Collection.
What is the best way to read the collection and add items to list?
List<Pair<String, String>> identityMemoPairs = new LinkedList<Pair<String, String>>();
Collection result = handler.getResult();
while(result.iterator().hasNext()){
IdentityM im =(IdentityM) result.iterator().next();
identityMemoPairs.add(Pair.of(im.identity,im.memo));
}

Your code is wrong, because you create a new iterator at each iteration of the while loop (actually you create two of them). Each fresh iterator will point to the beginning of the result collection. Thus you have created an infinite loop.
To fix this, call result.iterator() only once and store the result in a variable.
But even better (better to read, less error-prone) would be the for-each loop, which is (almost always) the preferred variant to iterate over collections:
for (IdentityM im : (Collection<IdentityM>)result) {
identityMemoPairs.add(Pair.of(im.identity,im.memo));
}
The compiler will automatically transform this into code using the iterator, so no performance difference. Generally, performance is anyway not a problem when iterating over a collection, as long as you avoid a few bad things like calling get(i) on a LinkedList.
Note that the compiler will give you a warning here, which has nothing to do with the iteration, but with the use of the raw type Collection (instead of Collection<IdentityM>). Please check if handler.getResult() actually returns a Collection<IdentityM> and change the type of the result variable to this if possible.
Another question is whether you really need that list to be a list of pairs. Usually using plain pair classes is not recommended, because their name does not show the meaning of the object they represent. For example, it is better to use a class PersonName which has fields for first and last name instead of a Pair<String, String>. Why can't you just use a List<IdentityM>? And if you can use this, are you sure you can't use a Collection<IdentityM>? (List and Collection are often exchangeable.) Then you could avoid the copying altogether.

Your code is imho good as it is. But you could probabbly make the handler return a collection of Pairs directly so you could call identityMemoPairs.addAll() instead of iterating the collection yourself. But that only makes it "prettier" it doesn't give more performance.

Related

Using temp variable using collectors

I have the following code. I am trying to understand if it would make any changes to memory.
Approach 1: Using collectors I can directly return map like so:
List<Customer> customerList = new ArrayList<>();
customerList.add(new Customer("1", "pavan"));
customerList.add(new Customer("2", "kumar"));
return customerList.stream().collect(Collectors.toMap(t->t.getId(), t->t));
Approach 2: Using an explicit map to collect results, like so:
Map<String,Customer> map = new HashMap<String, Customer>();
map = customerList.stream().collect(Collectors.toMap(t->t.getId(), t->t));
return map;
Compared to the first, does the second approach make any difference to memory/ GC, if I iterate over a million times?
Aside from instantiating a Map instance that you don't need in the second example, both pieces of code are identical. You immediately replace the Map reference you created with the one returned by the stream. Most likely the compiler would eliminate that as redundant code.
The collect method of the streams API will instantiate a Map for you; the code has been well optimised, which is one of the advantages of using the Stream API over doing it yourself.
To answer your specific question, you can iterate over both sections of code as many times as you like and it won't make any difference to the GC impact.
There code is by far not identical; specifically Collectors.toMap says that it will return a Map :
There are no guarantees on the type, mutability, serializability, or thread-safety of the Map returned.
There are absolutely no guarantees what-so-ever that the returned Map is actually a HashMap. It could be anything other - any other Map here; so assigning it to a HashMap is just wrong.
Then there is the way you build the Collector. It could be simplified to:
customerList.stream().collect(Collectors.toMap(Customer::getId, Function.identity()));
The method reference Customer::getId, as opposed to the lambda expression, will create one less method (since lambda expressions are de-sugared to methods and method references are not).
Also Function.identity() instead of t -> t will create less objects if used in multiple places. Read this.
Then there is the fact how a HashMap works internally. If you don't specify a default size, it might have to re-size - which is an expensive operation. By default Collectors.toMap will start with a default Map of 16 entries and a load_factor of 0.75 - which means you can put 12 entries into it before the next resize.
You can't omit that using Collectors.toMap since the supplier of that will always start from HashMap::new - using the default 16 entries and load_factor of 0.75.
Knowing this, you can drop the stream entirely:
Map<String, Customer> map = new HashMap<String, Customer>((int)Math.ceil(customerList.size() / 0.75));
customerList.forEach(x -> map.put(x.getId(), x));
In the second approach, you are instantiating a Map, and reassigning the reference to the one returned by the call to stream.collect().
Obviously, the first Map object referenced by "map" is lost.
The first approach does not have this problem.
In short, yes, this makes a minor difference in terms of memory usage, but it is likely negligible considering you have a million entries to iterate over.

Iterable vs Iterator as a return behavior (Best Practice?)

I´d just want to know your opinion regarding to change all the Collections function output to an Iterable type.
This seems to me probably the most common code in Java nowadays, and everybody returns always a List/Set/Map in 99% of times, but shouldn´t be the standard returning something like
public final Iterable<String> myMethod() {
return new Iterable<String>() {
#Override
public Iterator<String> iterator() {return myVar.getColl();}
};
}
Is this bad at all? You know all the DAO classes and this stuff would be like
Iterable<String> getName(){}
Iterable<Integer> getNums(){}
Iterable<String> getStuff(){}
instead of
List<String> getName(){}
List<Integer> getNums(){}
Set<String> getStuff(){}
After all, 99% of times you will use it in a for loop...
What dod you think?
This would be a really bad plan.
I wouldn't say that 90% of the time you just use it in a for loop. Maybe 40-50%. The rest of the time, you need more information: size, contains, or get(int).
Additionally, the return type is a sort of documentation by itself. Returning a Set guarantees that the elements will be unique. Returning a List documents that the elements will be in a consistent order.
I wouldn't recommend returning specific collection implementations like HashSet or ArrayList, but I would usually prefer to return a Set or a List rather than a Collection or an Iterable, if the option is available.
List, Set & Map are interfaces, so they're not tied to a particular implementation. So they are good candidates for returning types.
The difference between List/etc and Iterable/Iterator is the kind of access. One is for random access, you have direct access to all the data, and Iterable avoids the need for having all available. Ideal in cases where you have a lot of data and it's not efficient to have it all inplace. Example: iterating over a large database resultset.
So it depends on what you are accessing. If you data can be huge and must need iterating to avoid performance degradation, then force it using iterators. In other cases List is ok.
Edit: returning an iterator means the only thing you can do is looping through the items without other possibility. If you need this trade-off to ensure performance, ok, but as said, only use when needed.
Well what you coded is partially right:
you need to test on some methods of the items like :
size
contains()
get(index)
exists()
So, you should rethink about your new architecture or override it with this method to take every-time what you need.

Filtering List without using iterator

I need to filter a List of size 1000 or more and get a sublist out of it.
I dont want to use an iterator.
1) At present I am iterating the List and comparing it using Java. This is time consuming task. I need to increase the performance of my code.
2) I also tried to use Google Collections(Guava), but I think it will also iterate in background.
Predicate<String> validList = new Predicate<String>(){
public boolean apply(String aid){
return aid.contains("1_15_12");
}
};
Collection<String> finalList =com.google.common.collect.Collections2.filter(Collection,validList);
Can anyone suggest me how can I get sublist faster without iterating or if iterator is used I will get result comparatively faster.
Consider what happens if you call size() on your sublist. That has to check every element, as every element may change the result.
If you have a very specialized way of using your list which means you don't touch every element in it, don't use random access, etc, perhaps you don't want the List interface at all. If you could tell us more about what you're doing, that would really help.
List is an ordered collection of objects. So You must to iterate it in order to filter.
I enrich my comment:
I think iterator is inevitable during filtering, as each element has to be checked.
Regarding to Collections2.filter, it's different from simple filter: the returned Collection is still "Predicated". That means IllegalArgumentException will be thrown if unsatisfied element is added to the Collection.
If the performance is really your concern, most probably the predicate is pretty slow. What you can do is to Lists.partition your list, filter in parallel (you have to write this) and then concatenate the results.
There might be better ways to solve your problem, but we would need more information about the predicate and the data in the List.

Using Java List when array is enough

Is it advisable to use Java Collections List in the cases when you know the size of the list before hand and you can also use array there? Are there any performance drawbacks?
Can a list be initialised with elements in a single statement like an array (list of all elements separated by commas) ?
Is it advisable to use Java Collections List in the cases when you know the size of the list before hand and you can also use array there ?
In some (probably most) circumstances yes, it is definitely advisable to use collections anyway, in some circumstances it is not advisable.
On the pro side:
If you use an List instead of an array, your code can use methods like contains, insert, remove and so on.
A lot of library classes expect collection-typed arguments.
You don't need to worry that the next version of the code may require a more dynamically sized array ... which would make an initial array-based approach a liability.
On the con side:
Collections are a bit slower, and more so if the base type of your array is a primitive type.
Collections do take more memory, especially if the base type of your array is a primitive type.
But performance is rarely a critical issue, and in many cases the performance difference is not relevant to the big picture.
And in practice, there is often a cost in performance and/or code complexity involved in working out what the array's size should be. (Consider the hypothetical case where you used a char[] to hold the concatenation of a series. You can work out how big the array needs to be; e.g. by adding up the component string sizes. But it is messy!)
Collections/lists are more flexible and provide more utility methods. For most situations, any performance overhead is negligible.
And for this single statement initialization, use:
Arrays.asList(yourArray);
From the docs:
Returns a fixed-size list backed by the specified array. (Changes to the returned list "write through" to the array.) This method acts as bridge between array-based and collection-based APIs, in combination with Collection.toArray. The returned list is serializable and implements RandomAccess.
My guess is that this is the most performance-wise way to convert to a list, but I may be wrong.
1) a Collection is the most basic type and only implies there is a collection of objects. If there is no order or duplication use java.util.Set, if there is possible duplication and ordering use java.util.List, is there is ordering but no duplication use java.util.SortedSet
2) Curly brackets to instantiate an Array, Arrays.asList() plus generics for the type inference
List<String> myStrings = Arrays.asList(new String[]{"one", "two", "three"});
There is also a trick using anonymous types but personally I'm not a big fan:
List<String> myStrings = new ArrayList<String>(){
// this is the inside of an anonymouse class
{
// this is the inside of an instance block in the anonymous class
this.add("one");
this.add("two");
this.add("three");
}};
Yes, it is advisable.
Some of the various list constructors (like ArrayList) even take arguments so you can "pre-allocate" sufficient backing storage, alleviating the need for the list to "grow" to the proper size as you add elements.
There are different things to consider: Is the type of the array known? Who accesses the array?
There are several issues with arrays, e.g.:
you can not create generic arrays
arrays are covariant: if A extends B -> A[] extends B[], which can lead to ArrayStoreExceptions
you cannot make the fields of an array immutable
...
Also see, item 25 "Prefer lists to arrays" of the Effective Java book.
That said, sometimes arrays are convenient, e.g. the new Object... parameter syntax.
How can a list be initialised with elements in a single statement like an array = {list of all elements separated by commas} ?
Arrays.asList(): http://download.oracle.com/javase/6/docs/api/java/util/Arrays.html#asList%28T...%29
Is it advisable to use Java Collections List in the cases when you know the size of the list before hand and you can also use array there ? Performance drawbacks ???
If an array is enough, then use an array. Just to keep things simple. You may even get a slightly better performance out of it. Keep in mind that if you...
ever need to pass the resulting array to a method that takes a Collection, or
if you ever need to work with List-methods such as .contains, .lastIndexOf, or what not, or
if you need to use Collections methods, such as reverse...
then may just as well go for the Collection/List classes from the beginning.
How can a list be initialised with elements in a single statement like an array = {list of all elements separated by commas} ?
You can do
List<String> list = Arrays.asList("foo", "bar");
or
List<String> arrayList = new ArrayList<String>(Arrays.asList("foo", "bar"));
or
List<String> list = new ArrayList<String>() {{ add("foo"); add("bar"); }};
Is it advisable to use Java
Collections List in the cases when you
know the size of the list before hand
and you can also use array there ?
Performance drawbacks ?
It can be perfectly acceptable to use a List instead of an array, even if you know the size before hand.
How can a list be initialised with
elements in a single statement like an
array = {list of all elements
separated by commas} ?
See Arrays.asList().

Best way to remove repeats in a collection in Java?

This is a two-part question:
First, I am interested to know what the best way to remove repeating elements from a collection is. The way I have been doing it up until now is to simply convert the collection into a set. I know sets cannot have repeating elements so it just handles it for me.
Is this an efficient solution? Would it be better/more idiomatic/faster to loop and remove repeats? Does it matter?
My second (related) question is: What is the best way to convert an array to a Set? Assuming an array arr The way I have been doing it is the following:
Set x = new HashSet(Arrays.asList(arr));
This converts the array into a list, and then into a set. Seems to be kinda roundabout. Is there a better/more idiomatic/more efficient way to do this than the double conversion way?
Thanks!
Do you have any information about the collection, like say it is already sorted, or it contains mostly duplicates or mostly unique items? With just an arbitrary collection I think converting it to a Set is fine.
Arrays.asList() doesn't create a brand new list. It actually just returns a List which uses the array as its backing store, so it's a cheap operation. So your way of making a Set from an array is how I'd do it, too.
Use HashSet's standard Collection conversion constructor. According to The Java Tutorials:
Here's a simple but useful Set idiom.
Suppose you have a Collection, c, and
you want to create another Collection
containing the same elements but with
all duplicates eliminated. The
following one-liner does the trick.
Collection<Type> noDups = new HashSet<Type>(c);
It works by creating a Set (which, by
definition, cannot contain a
duplicate), initially containing all
the elements in c. It uses the
standard conversion constructor
described in the The Collection
Interface section.
Here is a minor variant of this idiom
that preserves the order of the
original collection while removing
duplicate element.
Collection<Type> noDups = new LinkedHashSet<Type>(c);
The following is a generic method that
encapsulates the preceding idiom,
returning a Set of the same generic
type as the one passed.
public static <E> Set<E> removeDups(Collection<E> c) {
return new LinkedHashSet<E>(c);
}
Assuming you really want set semantics, creating a new Set from the duplicate-containing collection is a great approach. It's very clear what the intent is, it's more compact than doing the loop yourself, and it leaves the source collection intact.
For creating a Set from an array, creating an intermediate List is a common approach. The wrapper returned by Arrays.asList() is lightweight and efficient. There's not a more direct API in core Java to do this, unfortunately.
I think your approach of putting items into a set to produce the collection of unique items is the best one. It's clear, efficient, and correct.
If you're uncomfortable using Arrays.asList() on the way into the set, you could simply run a foreach loop over the array to add items to the set, but I don't see any harm (for non-primitive arrays) in your approach. Arrays.asList() returns a list that is "backed by" the source array, so it doesn't have significant cost in time or space.
1.
Duplicates
Concurring other answers: Using Set should be the most efficient way to remove duplicates. HashSet should run in O(n) time on average. Looping and removing repeats would run in the order of O(n^2). So using Set is recommended in most cases. There are some cases (e.g. limited memory) where iterating might make sense.
2.
Arrays.asList() is a cheap operation that doesn't copy the array, with minimal memory overhead. You can manually add elements by iterating through the array.
public static Set arrayToSet(T[] array) {
Set set = new HashSet(array.length / 2);
for (T item : array)
set.add(item);
return set;
}
Barring any specific performance bottlenecks that you know of (say a collection of tens of thousands of items) converting to a set is a perfectly reasonable solution and should be (IMO) the first way you solve this problem, and only look for something fancier if there is a specific problem to solve.

Categories

Resources