Set<Type> union = new HashSet<Type>(s1);
And
Set<Type> union = new HashSet<Type>();
Set<Type> s1 = new HashSet<Type>();
union.addAll(s1);
The first will be more efficient because the second Set will be created with the correct amount of space in the backing data structure where in the second piece of code, the Set will have to resize to make room for the new elements.
As far as end results go, they are the same.
Assuming that Set s1 contains the same contents in the first and second examples, the end results should come out to be the same.
(However, the second example will not compile because a Set is an interface rather than a concrete class.)
One advantage of using the HashSet(Collection) constructor is that it will have an initial capacity that is enough to hold the Collection (in this case, the Set s1) that is passed into the constructor:
Constructs a new set containing the
elements in the specified collection.
The HashMap is created with default
load factor (0.75) and an initial
capacity sufficient to contain the
elements in the specified collection.
However, using the HashSet() constructor, the initial size is 16, so if the Set that was added via the Collection.addAll is larger than 16, there would have to be a resizing of the data structure:
Constructs a new, empty set; the
backing HashMap instance has default
initial capacity (16) and load factor
(0.75).
Therefore, using the HashSet(Collection) constructor to create the HashSet would probably be a better option in terms of performance and efficiency.
However, from the standpoint of readability of the code, the variable name union seems to imply that the newly created Set is an union of another Set, so probably the one using the addAll method would be more understandable code.
If the idea is just to make a new Set from an existing one, then the newly created Set should probably be named differently, such as newSet, copyOfS1 or something to that effect.
There is no difference. Both will create a new set that contains all of the elements in the set s1.
Of course, your second code sample won't even compile, since you can't directly instantiate the Set interface, but I'm assuming that you meant to say new HashSet instead of new Set.
There is a little difference, in that the new HashSet(s1) will make sure beforehand that all elements will fit, and no rehashing is needed.
Related
How can I know if a List has a fixed size?
List<String> fixed = Arrays.asList(new String[100]);
This will create a fixed List. but the instantiated array of String object is not referenced to the String array anymore.
Ref: Is it possible to find out if some list is fixed size or not?
Is it possible to find out if some list is fixed size or not?
In theory - No. Fixed sizedness is an emergent property of the implementation of a list class. You can only determine if a list has that property by trying to add an element.
And note that a simple behavioral test would not reliably distinguish between a fixed sized list and a bounded list or a list that was permanently or temporarily read-only.
In practice, a fixed sized list will typically have a different class to an ordinary one. You can test the class of an object to see if it or isn't a specific class. So if you understand what classes would be used to implement fixed sized lists in your code-base, then you can test if a specific list is fixed sized.
For example the Arrays.asList(...) method returns a List object whose actual class is java.util.Arrays.ArrayList. That is a private nested class, but you could use reflection find it, and then use Object.getClass().equals(...) to test for it.
However, this approach is fragile. Your code could break if the implementation of Arrays was modified, or if you started using other forms of fixed sized list as well.
you can use Set,Set can enter unlimited object
I have some classes that have ArrayList fields that are only sometimes used. I generally initialize these fields like so:
private List<Widget> widgets = new ArrayList<>();
I understand about using the overload constructors to set the initial capacity, so I'm wondering if I should declare these fields as such:
private List<Widget> widgets = new ArrayList<>(0);
The dilemma, is that if I initialize the list with 0 then the list will always have to re-initialize itself for adding even one item. But, if I use the default constructor, which gives a default capacity of 10, then I may have a bunch of items (and there can be many) that are wasting memory with unused capacity.
I know some of you are going to push back asking 'how often' and 'how many items are you expecting' but I'm really looking for the "best practices" approach. All things being ~equal, should one initialize with (0) or () on a list that is sometimes used?
It's our department policy to always initialize lists, so I may not simply leave the lists as null, besides, that would just side-step the question.
Premature optimisation is the root of all evil. - D. Knuth.
This seems like the kind of "performance issue" which actually never has any effect on performance. For one thing, how sure are you that these empty lists are actually initialised? I suspect that most modern compilers delay initialisation of objects until they know for sure that there will be a call on them. So if you pass the no arg constructor it will most likely never be used unless something is added to the list. On the other hand, if you use the 0 argument constructor, it guarantees that it has to resize every one that it uses.
These are the three laws of performance optimisation
Never assume that you know what the compiled code is actually doing, or that you can sport small optimisations better than the compiler can.
Never optimise without using a profiler to work out where the bottleneck is. If you think that you know, refer to rule number (1).
Don't bother unless your application has a performance issue. Then refer to rule (2).
On the off chance that you somehow still believe that you understand compilers, check out this question: Why is it faster to process a sorted array than an unsorted array?
If the list is not always used use lazy initialization
private List<Widget> widgets;
private List<Widget> getList() {
if (widgets == null) {
widgets = new ArrayList<>();
}
return widgets;
}
If you set it to 0 the ArrayList will have to resize anyhow, so really you're shooting yourself in the foot. The only time you would benefit from an explicit declaration of size would be if you already know the maximum bounds that you will be reaching in your list.
As stated, this is a micro-optimization, it's more likely you will find other things that you can significantly improve than the initial size of your ArrayList.
I tent to disagree that these optimisation are bad. If you declare an arraylist that holds n elements (8-th by default if Im not mistaken), and you put one more, then arraylist, internally will double the size it holds. When you remove this element later, the list will not decrease.
ArrayList utilizes processor cache a lot and actually is so fast, that you don't need to optimize it any furter. Still, if you have to create millions of tiny ArrayList instances it may worth thinking of reworking your overall design and not bother about default AL capacity.
As has already been said, Lazy initialization can help you by postponing the moment when you have to initialize the list (and therefore choose its initial size).
If Lazy initialization is not possible because of your department policy that does not allow to initialize with null an object (for which I do not find much sense), a workaround might be to initialize an empty list as
List widget = new ArrayList<>(0)
and only when (and if) you actually need to work with the list, you create a new list object:
widget = new ArrayList<>(someSize)
and hopefully at that moment you could know the max size that the list can reach (or at least its order of magnitude).
I know, this is a very stupid trick, but it adhers to your policy.
Our class is learning about hash tables, and one of my study questions involves me creating a dictionary using a hash table with separate chaining. However, the catch is that we are not allowed to use Java's provided methods for creating hash tables. Rather, our lecture notes mention that separate chaining involves each cell in an array pointing to a linked list of entries.
Thus, my understanding is that I should create an array of size n (where n is prime), and insert an empty linked list into each position in the array. Then, I use my hash function to hash strings and insert them into the corresponding linked list in the proper array position. I created my hash function, and so far my Dictionary constructor takes in a size and creates an array of that size (actually, of size 4999, both prime and large as discussed in class). Am I on the right track here? Should I now insert a new linked list into each position and then work on insert/remove methods?
What you have sounds good so far.
Bear in mind that an array of object references has each cell null by default, and you can write your insert and remove functions to work with that. If you choose to create a linked list object that contains no data (sometimes called a sentinel node) it may be advantageous to create a single immutable (read-only) instance to put in every empty slot, rather than create 4,999 separate instances with new (where most don't hold any data).
It sounds like you are on the right track.
Some extra pointers:
It's not worth creating a LinkedList in each bucket until it is actually used. So you can leave the buckets as null until they are added to. Just remember to write your accessor functions to take account of this.
It's not always efficient to create a large array immediately. It can be better to start with a small array, keep track of the capacity used, and enlarge the array when necessary (which involves re-bucketing the values into the new array)
It's a good idea to make your class implement the whole of the Map<K,V> interface - just to get some practice implementing the other standard Java collection methods.
I was wondering if I could create an array without having to enter a value. I don't fully understand how they work, but I'm doing an inventory program and want my array to be set up in a way that the user can enter products and their related variables until they are done, then it needs to use a method to calculate the total cost for all the products. What would be the best way to do that?
Use an ArrayList.
This will allow you to create a dynamic array.
http://download.oracle.com/javase/1.5.0/docs/api/java/util/ArrayList.html
Here is an example/overview:
http://www.anyexample.com/programming/java/java_arraylist_example.xml
Yes, you can do this. Instead of using a primitive type array, for example new int[10], use something like the Vector class, or perhaps ArrayList (checkout API docs for the differences). Using an ArrayList looks like this:
ArrayList myList = new ArrayList();
myList.add("Item 1");
myList.add("Item 2");
myList.add("Item 3");
// ... etc
In other words, it grows dynamically as you add things to it.
As Orbit pointed out, use ArrayList or Vector for your data storage requirements, they don't need specific size to be assigned while declaration.
You should get familiar with the Java Collections Framework, which includes ArrayList as others have pointed out. It's good to know what other collection objects are available as one might better fit your needs than another for certain requirements. For instance, if you want to make sure your "list" contains no duplicate elements a HashSet might be the answer.
http://download.oracle.com/javase/tutorial/collections/index.html
The other answers already told how to do it right. For completeness, in Java every array has a fixed size (length) which is determined at creation and never changes. (An array also has a component type, which never changes.)
So, you'll have to create a new (bigger) array when your old array is full, and copy the old content over. Luckily, the ArrayList class does that for you when its internal backing array is full, so you can concentrate on the actual business task at hand.
This is a two-part question:
First, I am interested to know what the best way to remove repeating elements from a collection is. The way I have been doing it up until now is to simply convert the collection into a set. I know sets cannot have repeating elements so it just handles it for me.
Is this an efficient solution? Would it be better/more idiomatic/faster to loop and remove repeats? Does it matter?
My second (related) question is: What is the best way to convert an array to a Set? Assuming an array arr The way I have been doing it is the following:
Set x = new HashSet(Arrays.asList(arr));
This converts the array into a list, and then into a set. Seems to be kinda roundabout. Is there a better/more idiomatic/more efficient way to do this than the double conversion way?
Thanks!
Do you have any information about the collection, like say it is already sorted, or it contains mostly duplicates or mostly unique items? With just an arbitrary collection I think converting it to a Set is fine.
Arrays.asList() doesn't create a brand new list. It actually just returns a List which uses the array as its backing store, so it's a cheap operation. So your way of making a Set from an array is how I'd do it, too.
Use HashSet's standard Collection conversion constructor. According to The Java Tutorials:
Here's a simple but useful Set idiom.
Suppose you have a Collection, c, and
you want to create another Collection
containing the same elements but with
all duplicates eliminated. The
following one-liner does the trick.
Collection<Type> noDups = new HashSet<Type>(c);
It works by creating a Set (which, by
definition, cannot contain a
duplicate), initially containing all
the elements in c. It uses the
standard conversion constructor
described in the The Collection
Interface section.
Here is a minor variant of this idiom
that preserves the order of the
original collection while removing
duplicate element.
Collection<Type> noDups = new LinkedHashSet<Type>(c);
The following is a generic method that
encapsulates the preceding idiom,
returning a Set of the same generic
type as the one passed.
public static <E> Set<E> removeDups(Collection<E> c) {
return new LinkedHashSet<E>(c);
}
Assuming you really want set semantics, creating a new Set from the duplicate-containing collection is a great approach. It's very clear what the intent is, it's more compact than doing the loop yourself, and it leaves the source collection intact.
For creating a Set from an array, creating an intermediate List is a common approach. The wrapper returned by Arrays.asList() is lightweight and efficient. There's not a more direct API in core Java to do this, unfortunately.
I think your approach of putting items into a set to produce the collection of unique items is the best one. It's clear, efficient, and correct.
If you're uncomfortable using Arrays.asList() on the way into the set, you could simply run a foreach loop over the array to add items to the set, but I don't see any harm (for non-primitive arrays) in your approach. Arrays.asList() returns a list that is "backed by" the source array, so it doesn't have significant cost in time or space.
1.
Duplicates
Concurring other answers: Using Set should be the most efficient way to remove duplicates. HashSet should run in O(n) time on average. Looping and removing repeats would run in the order of O(n^2). So using Set is recommended in most cases. There are some cases (e.g. limited memory) where iterating might make sense.
2.
Arrays.asList() is a cheap operation that doesn't copy the array, with minimal memory overhead. You can manually add elements by iterating through the array.
public static Set arrayToSet(T[] array) {
Set set = new HashSet(array.length / 2);
for (T item : array)
set.add(item);
return set;
}
Barring any specific performance bottlenecks that you know of (say a collection of tens of thousands of items) converting to a set is a perfectly reasonable solution and should be (IMO) the first way you solve this problem, and only look for something fancier if there is a specific problem to solve.