How could i do a list without duplicate data?

How could i do a list without duplicate data? - java

I want to do a SET or LIST without duplicates, but also want access your index. For exemple: setvariable.get(0).getName(). To access the object.
In java documentation is not recommended that use conditions to eliminate the duplicates in a list.
Then, how could i do?
I'm trying create an association between two objects, course and student. But i don't want that both to have duplicated data.
Thank you very much in advance!
Note:
I was trying to do a project but i was having problems of knowledge in association of OOP, and i come back for practice. At this time i'm doing exercise to practice OOP that i was having difficultly. Then i'm practicing OOP thinking ahead of the exercise implementing conditions that in fact dificult a real program, as duplicates. And how the teacher resolve all exercise with your index i was getting confused in keeping your index, but in fact seeing the explanations is not something necessary.
Solution: Use the SET because the index is not necessary in this case.

I was going to explain that Java doesn't have an "indexable" set class ... but then I looked at what you are actually trying to implement.
I going to assume that the association you are trying to implement is many-to-many. In the real world, a student may take many courses, and a course may be taken by many students. I assume your are trying to model that in your program.
The natural way to represent a (queryable) many-to-many association between two Java classes is using a pair of Map objects; e.g.
Map<Student, Set<Course>> enrolledCourses;
Map<Course, Set<Student>> enrolledStudents;
Depending on the kind of queries you need to perform, the maps could be HashMap or TreeMap. Each time you add or remove a relation between a Student and a Course you will need to update both Map objects.
Assuming that you maintain the maps correctly, there won't be any duplicate copies of either Student or Course objects.
On the other hand, if the association doesn't need to be queryable, you can get away with Set valued fields; e.g.
// In the Course class
Set<Student> enrolledStudents;
// In the Student class
Set<Course> enrolledCourses;
You could also use a List class and use contains to remove duplicates manually. That will lead to O(N) insertion and deletion operations, but that may be acceptable under normal real world assumptions. (Course enrollment numbers will typically be capped, and a student will only be allowed to enroll in a limited number of courses.)
Note that your idea of using list positions as some kind of identifier is not practical. When you remove an element in the middle of a list, all following elements in the list change positions. Positions are not stable in the long term.

You don't want that there are duplicates in the combinations or each list should not contain any duplicates?
For the second option you can check your list by using .contains before adding any additional element:
List<String> testList = null;
String element = null;
if (testList.contains(element)) {
testList.add(element);
}

Related

Nesting collections in Java (list of hashmaps containing String key, arrayList value)

Is nesting collections in Java something that I should be doing?
I'm currently working on a project where I want to have a bunch of hashmaps that would contain a String key and an arrayList value. That way when I create and add an object of another class to the collection, it would be able to use some piece of information that if it matched up with one of the keys of one of the hashmaps it would then be deposited in the associated arrayList value. That way the list can later on be accessed through the correct key for a specific hashmap.
Is this a good idea? Or is it too convoluted and if so is there a better way to do this?

There are times to nest, for sure. But in the humble opinion of this seasoned dev, you shouldn't do it unless you have a good reason. All too often you would be much better off with some class that represents the inner collection.
So if you find yourself with a Map<String,List<Foo>> ask yourself what that List<Foo really represents. If it's Map<String,List<Student>> then maybe you need Map<String, Roster> or Map<String, Team>. I find this yields faster time to market and fewer bugs. The fact you're asking the question means you think there's a chance that might be true too.

Is switching between Collections worth it?

Java offers us Collections, where every option is best used in a certain scenario.
But what would be a good solution for the combination of following tasks:
Quickly iterate through every element in the list (order does not matter)
Check if the list contains (a) certain element(s)
Some options that were considered which may or may not be good practice:
It could be possible to, for example, first use a LinkedList, and
then convert it to a HashSet when the amount of elements
is unknown in advance (and if duplicates will not be present)
Pick a solution for one of both tasks and use the same implementation for the other task (if switching to another implementation is not worth it)
Perhaps some implementation exists that does both (failed to find one)
Is there a 'best' solution to this, and if so, what is it?
EDIT: For potential future visitors, this page contains many implementations with big O runtimes.

A HashSet can be iterated through quickly and provides efficient lookups.
HashSet<Object> set = new HashSet<>();
set.add("Hello");
for (Object obj : set) {
System.out.println(obj);
}
if (set.contains("Hello")) {
System.out.println("Found");
}

Quickly iterate through every element in the list (order does not matter)
It the order does not matter, you should go with a Collection implementation with a time complexity of O(n), since each of them is implementing Iterable and if you want to iterate over each element, you have to visit each element at least once (hence there is nothing better than O(n)). Practically, of course, one implementation is more suited compared to another one, since more often you have multiple considerations to take into account.
Check if the list contains (a) certain element(s)
This is typically the user case for a Set, you will have much better time complexity for contains operations. One thing to note here is that a Set does not have a predefined order when iterating over elements. It can change between implementations and it is risky to make assumptions about it.
Now to your question:
From my perspective, if you have the choice to choose the data structure of a class yourself, go with the most natural one for that use case. If you can imagine that you have to call contains a lot, then a Set might be suited for your use case. You can also use a List and each time you need to call contains (multiple times) you can create a Set with all elements from the List before. Of course, if you call this method often, it would be expensive to create the Set for each invocation. You may use a Set in the first place.
Your comment stated that you have a world of players and you want to check if a player is part of a certain world object. Since the world owns the players, it should also contain a Collection of some kind to store them. Now, in this case i would recommend a Map with a common identifier of the player as key, and the player itself as value.
public class World {
private Map<String, Player> players = new HashMap<>();
public Collection<Player> getPlayers() { ... }
public Optional<Player> getPlayer(String nickname) { ... }
// ...
}

ArrayList vs HashSet vs HashMap and questions regarding data-structure design

I need to create 3 datastructures.
The first is a collection of Persons (PersonRegister):
public Person {
private final int uniquePersonId; // Unique identifier
private long personalNumber;
private String name;
// additional code
}
The second is a collection of Insurances (InsuranceRegister):
public Insurance {
private final int uniqueInsuranceId; // Unique identifier
private int uniquePersonId; // Is used as a link between the insurance and person
private Date date;
private boolean active;
// additional code
}
The third is a collection of Claims (ClaimsRegister):
public Insurance {
private final int uniqueClaimId; // Unique identifer
private int uniquePersonId; // Is used as a link between the claim and person
private Date date;
// additional code
}
Each Objects have overriden equals() and hashCode() methods.
Non of the datastructures will need any removal, as old data will be used for statistics etc.
It is also important that the datastructures contains no duplicates.
Methods that will be used on these datastructures are for example:
Find the insurance/claim of a specific personId.
Find all claims/insurances submitted at a given timeframe.
Find a person based on personId or personalNumber
Find all persons with a specific lastName
Find all active insurances to a specific person.
Find all insurances/claims of a specific subclass type (e.g. CarInsurance or HomeInsurance, both are sublcasses to Insurance)
The list goes on till you named it. Any kind of data that fits a comprehensive statistics- and search functionality.
A lot of these methods will be using an advanced for loop with an iterator.
As it stands now, the only way to link a Person to its insurances is
by comparing the uniquePersonId variables to each other. Would it be
better to also have each Person own a List with object references to
its Insurances and Claims?
Additionally have each Insurance/Claim have an object reference to its
parent/owner? Or is this considered bad practice, by the "seperation
of concerns"?
It would make methods such as determining boolean totalCustomer(True if you have
more than 3 active insurances) much easier to place inside the Person
class. Any suggestions?
Anyhow, to the main question. What Collections would be best suited for each datastructure? (Limited to Java Collections)
Currently/temporarily I have 3 ArrayLists with a if(!list.contains(newElement)) { add(newElement) }; to prevent duplicates.
Isn't ArrayList faster than HashSet or HashMap when iterating through a for loop? But is it substantially fast enough to be of value?
I've been thinking that HashSet would be a better pracitice considering no duplicates are allowed, or is my "solution to duplicates" good enough? HashMap could make sense for Persons considering uniquePersonId will be used everytime an Insurance or Claim is searched for.
However I still want the functionality of searching for personalNumber or any other member of person for that matter. Will iterating like this Iterate through a HashMap still be effective enough, and make for good programming design?
I have spent countless hours on stackoverflow and google trying to figure this out. Any suggestions and help would be very much appreciated. Thanks.
edit: Difference between HashSet and HashMap? Was linked as possible duplicate question, however that thread only explains the difference between HashSet and HashMap, with a link to Oracles Collection tutorial. I'm aware of most of the differences. I have also read the Collection tutorial. But because of a wide usage of the structures, I'm still having some difficulties finding the most fitting Collection. My main goal is regarding what would make for the best data-structure design. Sorry if I was being unclear.

I suggest using a HashSet to store a list of your various objects. Also make your objects (Person, etc) implement Comparable, that way you can override the compareTo method and use it to for sorting your HashSet. When it comes to sorting you could simply make use of the Collections.sort() method, which can also take a Comparator for custom sorting.

How simple is too simple for a (nested) class?

I am new to OOP, and still trying to wrap my head around just how encapsulated things should be. This question is about best practices, NOT about how to achieve functionality.
For an assignment, we are asked to make linked lists whose nodes contain two Strings: the name of the person spreading a disease, and the name of the person becoming infected. Each case of infection is only a record of who is involved, an infection doesn't actually do anything.
The assignment description suggests we add the two names as fields to the Nodes of the linked list. But my fledgling OOP-radar is booping, and I am unsure of whether or not I should instead create a nested Infection class within the node, or a top-level class of its own, which stores the two Strings.
So my internal conflict (and question) here is: at what point does an object become too simple to merit being an object anymore, while still keeping within the OOP-paradigm? Should I create an Infection class, or add data to the Node to keep it simple?

I would approach it the same way as Collections API does it: Create generic data structures that can hold any kind of objects and let the objects to define their internal structure/functionality.
The type could be generified, that would be the best practice.

Use a Java hash map even when there is no "mapping"?

I want to store some objects and then be able to retrieve them later as efficiently as possible. I will also remove some of them under certain conditions. It seems a hash map would be the right choice.
But, from what I've seen, hash maps always associate a value with another? For example, "john" and "555-5555", his phone number.
Now, my situation. Suppose I have a bunch of people, and each person is connected to other people. So, I need each person to store its contacts.
What I'm doing is have each person have a hashmap, and then I'd add to the hash otherPerson, otherPerson. Basically, the key is the value. Am I doing it wrong?
EDIT I don't think the HashSet would solve my problem because I have to retrieve the value to update it and there is no get method. Remove returns a boolean, so I can't even remove it to put it back again, which would probably be a bad idea anyway.

If all you need is checking if A is one of B's contacts, then Set is choice. It has contains() for that purpose.
Otherwise, the most suitable might be Map, as you need efficient retrieval operation. You said currently you use same object as key and value, but I'm not sure how you get the the key in the first place. Say you'd like to get contact A from B's contacts, and you use something like 'B.contacts.get(A)', where do you get A from? If you already have A, what's for to get it from the map again? (maybe there are multiple instances of the same person?)
Unless there are multiple instances of the same person, I'd say for each Person, define a ID like unique attribute, and use that as the key for the contacts map. Also, do you define equal()/hashCode() for person class? Map/Set uses hashCode() and equal() for finding the match. Depending on your usage, you might need to consider rewrite them for efficiency.

I don't think the HashSet would solve my problem because I have to retrieve the value to update it and there is no get method.
This is a puzzling statement. Why would you want to retrieve a value using a get method to update it? Surely, if you know which object you need to retrieve from the set/map, you don't need to retrieve it.
For example:
HashSet<Person> relations = ...
Person p = ...
if (relations.remove(p)) {
// we removed an object such that p.equals(obj) is true.
}
Now if you are worried that the object that was removed was equal to, but not identical to p, it seems to me that something is wrong with your design. Either:
you should not be creating multiple Person instances that are equal, or
you should not be caring that Person instances are not identical, or
you should not have overridden equals(Object).
In short, the problem is that you are not managing object identity properly.

Well, the data structure you'd be looking for here, would be a HashSet (or some other kind of set), I think (if your framework/library offers it). A set just says "I have the following items" instead of "I have the following items mapped to the following values". Which would be what you're modeling here.
As for HashSet vs. other implementations (if present): That all depends on what you're doing. If you need fast lookup, i. e. "is this element in the set?" questions, then hashing is a good thing. Other underlying data structures are perhaps better optimized for other set operations, such as union, intersection, etc.

A hash table/map simply requires that you have a way to get the values you're interested in looking up later; that's what the key is for.
However, in your specific case, it sounds like you're looking for a way to store relationships between people, and what you're keeping track of is whether or not person A has a relationship with person B. A better representation for that sort of thing is an adjacency list.

Am I missing something or don't you simply need an ArrayList<Person>?

I would just store the contacts in a List<Person>. E.g.
public class Person {
private List<Person> contacts;
}
With regard to editing the individual contact, it is really not the parent Person's responsibility to do that. It should at highest add/remove contacts. You can perfectly do that by contacts.add(otherPerson) or contacts.remove(otherPerson).
When you want to edit an individual Person, which may be one of the contacts, just get a handle to it independently, e.g. personDAO.find(personId) and then update it accordingly. It's actually also the Person's own responsibility to edit own details. With a good ORM under the hood, the changes will be reflected in the contact list of other Persons.

If you need to iterate through the people, or require them to have ordering, consider TreeMap or TreeSet instead of hashing.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How could i do a list without duplicate data? - java

Related

Nesting collections in Java (list of hashmaps containing String key, arrayList value)

Is switching between Collections worth it?

ArrayList vs HashSet vs HashMap and questions regarding data-structure design

How simple is too simple for a (nested) class?

Use a Java hash map even when there is no "mapping"?

Categories

Resources