I have a large list (about 12,000 objects) of custom Objects inside which I will have to search for a specific Object a number of times. As of now, I am mostly using brute force to find the object, but it becomes extremely slow as the list grows larger. This is how I search as of now:
List<MyObject> objectsToSearch; //List containing about 12000 objects
MyObject objectToCompare = new MyObject("this is a parameter"); //Object to compare with list
for(MyObject compareFrom : objectsToSearch){
if(compareFrom.equals(objectToCompare)){
System.out.println("Object found");
}
}
Surely there must be a better way to achieve this. Increasing performance becomes especially important since I will be needing to perform this operation multiple times.
Despite my research I haven't found any detailed tutorial. How do I achieve this?
Make your class MyObject implement the Comparator and then sort the list using Collections.sort() and then you can apply Collections.binarySearch()
Are you sure that you really need a List? It seems that you are just checking the object for presence. If you don't need to keep the order of the objects and if it's sufficient to keep each object only once, consider using a Set instead, most probably HashSet.
Do not forget to implement equals() and hashCode() for the objects you store otherwise HashSet will not work.
If you have implemented equals method for your custom class correctly you can use the contains(Object o) method.
if(objectsToSearch.contains(objectToCompare)){
System.out.println("Object found");
Related
Is there a java collection that only allow unique object in and with a get (index i) method ?
i firstly think of a treeSet but there is no get methods in ...
what i want be able to :
// replace object with any class that implement the right things to make it work
Collection<Object> collection = dunno<Object>();
Object o = new Object()
Object o2 = new Object()
collection.add(o)
collection.add(o)
collection.size() // should get 1
collection.get(0) // should return o
// let's suppose that o2 is lower than o (if the collection doesn't sort the way i want i can change it anyway)
collection.add(o2)
collection.get(0) // should return o2
so basicly like a treeSet but with a get methods does anyone know something like that ?
There is no such collection in the in the standard library, and I am also not aware of something like that in other widespread libraries like guava or apache commons.
Thus the answer is: you will have to implement your own collection for that. A straight forward solution would use a set and a list to provide the required interface, which will work but obviously increase the memory footprint to a certain degree.
override equals and hashcode in ur custom class
use Arraylist and u can check list.contains before adding to remove duplicate
The LinkedHashSet maintains the order of insertion while maintaining also the uniqueness of the items inserted.
Although it doesn’t have a dedicated get(i) method you can implement one by just iterating through the set.
You will pay for the performance of get() though.
You should use the LinkedHashSet.
This collection does only accept unique objects and keeps track of the order
Is it possible to find out if some a list is fixed size or not?
I mean, for example this code:
String[] arr = {"a", "b"};
List<String> list = Arrays.asList(array);
returns fixed size List backed by an array. But is it possible to understand programmatically if List is fixed-size or not without trying to add/remove elements and catching the exception? For example:
try {
list.add("c");
}
catch(UnsupportedOperationException e) {
// Fixed-size?
}
A list created from a String[] by
List<String> list = Arrays.asList(array);
will have Arrays as enclosing class, while one created by for example new ArrayList() won't have the enclosing class. So the following should work to check if the List was produced as a result of calling Arrays.toList():
static <T> boolean wasListProducedAsAResultOfCallingTheFunctionArrays_asList(List<T> l) {
return Arrays.class.equals(l.getClass().getEnclosingClass());
}
Beware that this method relies on undocumented behavior. It will break if they added another nested List subclass to the Arrays class.
Is it possible to find out if some list is fixed size or not?
In theory - No. Fixed sizedness is an emergent property of the implementation of a list class. You can only determine if a list has that property by trying to add an element.
And note that a simple behavioral test would not reliably distinguish between a fixed sized list and a bounded list or a list that was permanently or temporarily read-only.
In practice, a fixed sized list will typically have a different class to an ordinary one. You can test the class of an object to see if it or isn't a specific class. So if you understand what classes would be used to implement fixed sized lists in your code-base, then you can test if a specific list is fixed sized.
For example the Arrays.asList(...) method returns a List object whose actual class is java.util.Arrays.ArrayList. That is a private nested class, but you could use reflection find it, and then use Object.getClass().equals(...) to test for it.
However, this approach is fragile. Your code could break if the implementation of Arrays was modified, or if you started using other forms of fixed sized list as well.
No.
The List API is identical regardless of whether a List is expandable or not, something that was deliberate.
There is also nothing in the List API that allows you to query it to determine this feature.
You can't completely reliably determine this information by reflection, because you will be depending on internal details of the implementation, and because there is an unbounded number of classes that are potentially fixed-size. For example, in addition to Arrays.asList, there is also Arrays.asList().subList, which happens to return a different class. There can also be wrappers around the base list like Collections.checkedList, Collections.synchronizedList and Collections.unmodifiableList. There are also other fixed-size lists: Collections.emptyList, Collections.singletonList, and Collections.nCopies. Outside the standard library, there are things like Guava's ImmutableList. It's also pretty trivial to hand-roll a list for something by extending AbstractList (for a fixed-size list you need only implement the size() and get(int) methods).
Even if you detect that your list is not fixed-size, the specification of List.add allows it to refuse elements for other reasons. For example, Collections.checkedList wrappers throw a ClassCastException for elements of unwanted type.
And even if you know your list is expandable, and allows arbitrary elements, that doesn't mean you want to use it. Perhaps it's synchronized, or not synchronized, or isn't serializable, or it's a slow linked list, or has some other quality that you don't want.
If you want control over the type, mutability, serializability, or thread-safety of the list, or you want to be sure that no other code has kept a reference to it, the practice is that you create a new one yourself. It's not expensive to do so when unnecessary (memcopies are blazing fast), and it lets you reason more definitely about your code will actually do at runtime. If you'd really like to avoid creating unnecessary copies, try whitelisting instead of blacklisting list classes. For example:
if (list.getClass() != ArrayList.class) {
list = new ArrayList<>(list);
}
(Note: That uses getClass instead of instanceof, because instanceof would also be true for any weird subclasses of ArrayList.)
There are immutable collections in java-9, but there is still no common #Immutable annotation for example or a common marker interface that we could query to get this information.
The simplest way I can think of would be simply to get the name of the class of such an instance:
String nameList = List.of(1, 2, 3).getClass().getName();
System.out.println(nameList.contains("Immutable"));
but that still relies on internal details, since it queries the name of the common class ImmutableCollections, that is not public and obviously can change without notice.
I have a program that collects objects over time. Those objects are often, but not always duplicates of objects the program has already received. The number of unique objects can sometimes be up in the tens of thousands. As my lists grow, it takes more time to identify whether an object has appeared or not before.
My current method is to store everything in an ArrayList, al; use Collections.sort(al); and use Collections.binarySearch(al, key) to determine whether I've used an object. Everytime I come across a new object I have to insert and sort however.
I'm wondering if there's just a better way to do this. Contains tends to slow up too quickly. I'm looking for something as close to O(1) as possible.
Thanks much.
This is java. For the purpose of understanding what I'm talking about, I basically need a method that does this:
public boolean objectAlreadyUsed(Object o) {
return \\ Have we seen this object already?
}
Instead of using an ArrayList, why wouldn't you use a Set implementation (likely a HashSet)? You'll get constant-time lookup, no sorting needed.
N.B. your objects will need to correctly override hashCode() and equals().
This begs the question - why not use a data structure that doesn't allow duplicates (e.g. Set)? If you attempt to add a duplicate item, the method will return false and the data structure will remain unchanged.
Make sure the objects have correct equals() and hashCode() methods, and store them in a HashSet. Lookup then becomes constant time.
If retaining unwanted objects becomes an issue, by the way, you could consider using one of the many WeakHashSet implementations available on the Internet -- it will hold the objects but still allow them to be garbage collected if necessary.
Is it possible to allow duplicate values in the Set collection?
Is there any way to make the elements unique and have some copies of them?
Is there any functions for Set collection for having duplicate values in it?
Ever considered using a java.util.List instead?
Otherwise I would recommend a Multiset from Google Guava (the successor to Google Collections, which this answer originally recommended -ed.).
The very definition of a Set disallows duplicates. I think perhaps you want to use another data structure, like a List, which will allow dups.
Is there any way to make the elements unique and have some copies of them?
If for some reason you really do need to store duplicates in a set, you'll either need to wrap them in some kind of holder object, or else override equals() and hashCode() of your model objects so that they do not evaluate as equivalent (and even that will fail if you are trying to store references to the same physical object multiple times).
I think you need to re-evaluate what you are trying to accomplish here, or at least explain it more clearly to us.
From the javadocs:
"sets contain no pair of elements e1
and e2 such that e1.equals(e2), and at
most one null element"
So if your objects were to override .equals() so that it would return different values for whatever objects you intend on storing, then you could store them separately in a Set (you should also override hashcode() as well).
However, the very definition of a Set in Java is,
"A collection that contains no
duplicate elements. "
So you're really better off using a List or something else here. Perhaps a Map, if you'd like to store duplicate values based on different keys.
Sun's view on "bags" (AKA multisets):
We are extremely sympathetic to the desire for type-safe collections. Rather than adding a "band-aid" to the framework that enforces type-safety in an ad hoc fashion, the framework has been designed to mesh with all of the parameterized-types proposals currently being discussed. In the event that parameterized types are added to the language, the entire collections framework will support compile-time type-safe usage, with no need for explicit casts. Unfortunately, this won't happen in the the 1.2 release. In the meantime, people who desire runtime type safety can implement their own gating functions in "wrapper" collections surrounding JDK collections.
(source; note it is old and possibly obsolete -ed.)
Apart from Google's collections API, you can use Apache Commons Collections.
Apache Commons Collections:
http://commons.apache.org/collections/
Javadoc for Bag
I don't believe that you can have duplicate values within a set. A set is defined as a collection of unique values. You may be better off using an ArrayList.
These sound like interview questions, so I'll answer them like interview questions...
Is it possible to allow duplicate values in the Set collection?
Yes, but it requires that the person implementing the Set violate the design contract upon which Set is built. Basically, I could write a class that extends Set and doesn't enforce Set's promises.
In addition, other violations are possible. I could use a Set implementation that relies upon Java's hashCode() contract. Then if I provided an Object that violates Java's hashcode contract, I might be able to place two objects into the set which are equal, but yeild different hashcodes (because they might not be checked in equality against each other due to being in different hash bucket chains.
Is there any way to make the elements unique and have some copies of them?
It basically depends on how you define uniqueness. If an object's uniqueness is determined by its value, then one can have multiple copies of the same unique object; however, if the object's uniqueness is determined by its instance, then by definition it would not be possible to have multiple copies of the same object. You could however have multiple references to them.
Is there any functions for Set collection for having duplicate values in it?
The Set interface doesn't have any functions for detecting / reporting duplicates; however, it is based on the Collections interface, which has to support the List interface, so it is possible to pass duplicates into a Set; however, a properly implemented Set will just ignore the duplicates, and present one copy of every element determined to be unique.
I don't think so. The only way would be to use a List. You can also trick with function equals(), hashcode() or compareTo() but it is going to be ankward.
NO chance.... you can not have duplicate values in SET interface...
If you want duplicates then you can try Array-List
As mentioned choose the right collection for the task and likely a List will be what you need. Messing with the equals(), hashcode() or compareTo() to break identity is generally a bad idea simply to wedge an instance into the wrong collection to start with. Worse yet it may break code in other areas of the application that depend on these methods producing valid comparison results and be very difficult to debug or track down such errors.
This question was asked to me also in an interview. I think the answer is, ofcourse Set will not allow duplicate elements and instead ArrayList or other collections should be used for the same, however overriding equals() for the type of the object being stored in the set will allow you to manipulate on the comparison logic. And hence you may be able to store duplicate elements in the Set. Its more of a hack, which would allow non-unique elements in the Set and ofcourse is not recommended in production level code.
You can do so by overriding hashcode as given below:
public class Test
{
static int a=0;
#Override
public int hashCode()
{
a++;
return a;
}
public static void main(String[] args)
{
Set<Test> s=new HashSet<Test>();
Test t1=new Test();
Test t2=t1;
s.add(t1);
s.add(t2);
System.out.println(s);
System.out.println("--Done--");
}
}
Well, In this case we are trying to break the purpose of specific collection. If we want to allow duplicate records simply use list or multimap.
Set will store unique values and if you wants to store duplicate values then for list,but still if you want duplicate values in set then create set of ArrayList so that you can put duplicate elements into it.
Set<ArrayList> s = new HashSet<ArrayList>();
ArrayList<String> arr = new ArrayList<String>();
arr.add("First");
arr.add("Second");
arr.add("Third");
arr.add("Fourth");
arr.add("First");
s.add(arr);
You can use Tree Map instead :
Key can be used as element you wish to store
and Value will be the frequency of input element.
The insertion and removal will require custom handling.
Insertion : Check if the map already contains the element , if yes then increment its frequency. O(log N)
Removal : if the element's frequency is 1 then remove it , else decrease frequency by 1. O(log N)
More details can be found in the java docs of tree map
Overall time complexity will remain same as TreeSet O(log N) but worse than a HashSet O(1)
firstEntry() -> provides smallest element entry, Time Complexity : O(Log N)
lastEntry() -> provides greatest element entry, Time Complexity : O(Log N)
public class SET {
public static void main(String[] args) {
Set set=new HashSet();
set.add(new AB(10, "pawan#email"));
set.add(new AB(10, "pawan#email"));
set.add(new AB(10, "pawan#email"));
Iterator it=set.iterator();
while(it.hasNext()){
Object o=it.next();
System.out.println(o);
}
}
}
public class AB{
int id;
String email;
public AB() {
System.out.println("DC");
}
AB(int id,String email){
this.id=id;
this.email=email;
}
#Override public String toString() {
// TODO Auto-generated method stub return ""+id+"\t"+email;}
}
}
This is a two-part question:
First, I am interested to know what the best way to remove repeating elements from a collection is. The way I have been doing it up until now is to simply convert the collection into a set. I know sets cannot have repeating elements so it just handles it for me.
Is this an efficient solution? Would it be better/more idiomatic/faster to loop and remove repeats? Does it matter?
My second (related) question is: What is the best way to convert an array to a Set? Assuming an array arr The way I have been doing it is the following:
Set x = new HashSet(Arrays.asList(arr));
This converts the array into a list, and then into a set. Seems to be kinda roundabout. Is there a better/more idiomatic/more efficient way to do this than the double conversion way?
Thanks!
Do you have any information about the collection, like say it is already sorted, or it contains mostly duplicates or mostly unique items? With just an arbitrary collection I think converting it to a Set is fine.
Arrays.asList() doesn't create a brand new list. It actually just returns a List which uses the array as its backing store, so it's a cheap operation. So your way of making a Set from an array is how I'd do it, too.
Use HashSet's standard Collection conversion constructor. According to The Java Tutorials:
Here's a simple but useful Set idiom.
Suppose you have a Collection, c, and
you want to create another Collection
containing the same elements but with
all duplicates eliminated. The
following one-liner does the trick.
Collection<Type> noDups = new HashSet<Type>(c);
It works by creating a Set (which, by
definition, cannot contain a
duplicate), initially containing all
the elements in c. It uses the
standard conversion constructor
described in the The Collection
Interface section.
Here is a minor variant of this idiom
that preserves the order of the
original collection while removing
duplicate element.
Collection<Type> noDups = new LinkedHashSet<Type>(c);
The following is a generic method that
encapsulates the preceding idiom,
returning a Set of the same generic
type as the one passed.
public static <E> Set<E> removeDups(Collection<E> c) {
return new LinkedHashSet<E>(c);
}
Assuming you really want set semantics, creating a new Set from the duplicate-containing collection is a great approach. It's very clear what the intent is, it's more compact than doing the loop yourself, and it leaves the source collection intact.
For creating a Set from an array, creating an intermediate List is a common approach. The wrapper returned by Arrays.asList() is lightweight and efficient. There's not a more direct API in core Java to do this, unfortunately.
I think your approach of putting items into a set to produce the collection of unique items is the best one. It's clear, efficient, and correct.
If you're uncomfortable using Arrays.asList() on the way into the set, you could simply run a foreach loop over the array to add items to the set, but I don't see any harm (for non-primitive arrays) in your approach. Arrays.asList() returns a list that is "backed by" the source array, so it doesn't have significant cost in time or space.
1.
Duplicates
Concurring other answers: Using Set should be the most efficient way to remove duplicates. HashSet should run in O(n) time on average. Looping and removing repeats would run in the order of O(n^2). So using Set is recommended in most cases. There are some cases (e.g. limited memory) where iterating might make sense.
2.
Arrays.asList() is a cheap operation that doesn't copy the array, with minimal memory overhead. You can manually add elements by iterating through the array.
public static Set arrayToSet(T[] array) {
Set set = new HashSet(array.length / 2);
for (T item : array)
set.add(item);
return set;
}
Barring any specific performance bottlenecks that you know of (say a collection of tens of thousands of items) converting to a set is a perfectly reasonable solution and should be (IMO) the first way you solve this problem, and only look for something fancier if there is a specific problem to solve.