Hash items in a 2d array, but only on one index

Hash items in a 2d array, but only on one index - java

So, I have a 2d array (really, a List of Lists) that I need to squish down and remove any duplicates, but only for a specific field.
The basic layout is a list of Matches, with each Match having an ID number and a date. I need to remove all duplicates such that each ID only appears once. If an ID appears multiple times in the List of Matches, then I want to take the Match with the most recent date.
My current solution has me taking the List of Matches, adding it to a HashSet, and then converting that back to an ArrayList. However all that does is remove any exact Match duplicates, which still leaves me with the same ID appearing multiple times if they have different dates.
Set<Match> deDupedMatches = new HashSet<Match>();
deDupedMatches.addAll(originalListOfMatches);
List<Match> finalList = new ArrayList<Match>(deDupedMatches)
If my original data coming in is
{(1, 1-1-1999),(1, 2-2-1999),(1, 1-1-1999),(2, 3-3-2000)}
then what I get back is
{(1, 1-1-1999),(1, 2-2-1999),(2, 3-3-2000)}
But what I am really looking for is a solution that would give me
{(1, 2-2-1999),(2, 3-3-2000)}
I had some vague idea of hashing the original list in the same basic way, but only using the IDs. Basically I would end up with "buckets" based on the ID that I could iterate over, and any bucket that had more than one Match in it I could choose the correct one for. The thing that is hanging me up is the actual hashing. I am just not sure how or if I can get the Matches broken up in the way that I am thinking of.

If I understand your question correctly you want to take distinct IDs from a list with the latest date by which it occurs.
Because your Match is a class it is not as easy to compare with each other because of the fields not being looked at by Set.
What I would do to get around this problem is use a HashMap which allows distinct keys and values to be linked.
Keys cannot be repeated, values can.
I would do something like this while looping through:
if(map.putIfAbsent(match.getID(), match) != null &&
map.get(match.getID()).getDate() < match.getDate()){
map.replace(match.getID(),match);
}
So what that does is it loops through your matches.
Put the current Match with its ID in if that ID doesn't exist yet.
.putIfAbsent returns the old value which is null if it did not exist.
You then check if there was an item in the map at that ID using the putIfAbsent (2 birds with one stone).
after that it is safe to compare the two dates (one in map and one from iteration - the < is an exams for your comparison method)
if the new one is later then replace the current Match.
And finally in order to get your list you use .getValues()
This will remove duplicate IDs and leave only the latest ones.
Apologies for typos or code errors, this was done on a phone. Please notify me of any errors in the comments.
Java 7 does not have the .putIfAbsent and .replace functionality, but they can be substitued for .contains and .put

Related

using one dimension array, how to identify if duplicated or not

allows me to enter the Id numbers without fear of duplication. my program should be able to display the list of ID numbers at the end

You're looking to use a HashSet, which is built-in to Java. This is a datastructure which you can insert elements into, and also check if a certain element exists in constant time. Thus, when you read in an element, first check if the set contains said element; if so, skip it, otherwise, put it in your list of IDs and also insert it into your HashSet. You can see the documentation here

Remove all the strings in an array list containing certain characters

so i have an array list that contains strings such as:
ArrayList<String> list = new ArrayList<>();
list.add("bookshelf");
list.add("bookstore");
list.add("library");
list.add("pencil");
Now i wanna search and remove all the strings in the arraylist that contain the word "book" in them. As far as i understand list.remove("book"); will only search for the particular string "book" and not the strings that contain the word "book". How can i solve this?

You can use removeIf like this:
list.removeIf(s -> s.contains("book"));

Note: this answers applies to Java version 7 and below (of course that it will work for higher versions as well but YCF_L's answer is simpler to implement in versions 8 and above).
The requirement is to iterate the list, check every element, and if it answers a certain condition: remove it.
Since this is the case we fall into a risky scenario where we modify the list while iterating it which is problematic because when we remove an element in the list its size changes.
In order to work around this problem we can iterate the list by index from the last element and back until the first one, this way, removing an element at index n will not effect accessing any element at index < n.
I'll leave the implementation details to you in order not to "spoon feed" and destroy your exercise :)

1 to 1 association of 2 string lists in java

I am a relatively new programmer and am working on my first project to build a portfolio. In my project I have 2 rather large lists of strings (about 3.1 million) and I need to "associate" the elements in each one with a 1 to 1 relationship from predetermined values (elements are selected according to a set method) not just linearly (from top to bottom). For example:
lista(0) = list1(5);
listb(0) = list2(2);
lista(1) = list1(1);
listb(1) = list2(4);
lista(2) = list1(3);
listb(2) = list2(1);
The point of this is to reorder the lists in a manner that can be recreated at a later time or by a different program by "remembering" a set of values. I am using 2 lists because I need to be able to search one list for a String then pull the value from the corresponding element in the other list.
I have tried many different methods like storing each list in an arrayList then accessing the elements in the preset order and storing them in new arrayLists in the new order, then removing the elements from the old arrayLists. This would be ideal but didn't work because removing elements from a really large arrayList was very slow. I figured that removing an element from the lists will prevent it from being used again.
I tried storing them in String arrays, then accessing each element in the predefined method, storing them in another array then nulling out the elements so that they wont be used again, but creating null spaces made searching a nightmare, because if the program hit a null element during the predefined "move" value, I had to add in checks for nulls, then more movement which made things more complicated and harder to reproduce later.
I need an easy, and efficient way to create these associations between these 2 lists and ANY ideas are welcome.
This is my first post to stackoverflow and I apologize if its formatted improperly or confusing, but please be gentle.

if you need to pull one value from a given string, why not using a map ? The key is the value of the first list and the value is the value of the second list

use Map<String,String> which stores Key as a string and value as a string.And the best part is time complexity of removing an element would be O(1).

As mentioned before, Map is an option.More specifically HashMap, or another option could be Hashtable. Make sure you look at what each has to offer. Some major differences are HashMap allows nulls but it is not synchronized. On the other hand Hashtable is synchronized and does not accept null as key.

Efficient Way to Find Index of Object in ArrayList

I have an ArrayList which I fill with objects of type Integer in a serial fashion (i.e. one-by-one) from the end of the ArrayList (i.e. using the method add(object)). Every time I do this, the other objects in the ArrayList are of course left-shifted by one index.
In my code I want to find the index of a random object in the ArrayList. I want to avoid using the indexOf method because I have a very big ArrayList and the looping will take an enormous amount of time. Are there any workarounds? Some idea how to keep in some data structure maybe the indexes of the objects that are in the ArrayList?
EDIT: Apparently my question was not clear or I had a missunderstanding of the arraylist.add(object) method (which is also very possible!). What I want to do is to have something like a sliding-window with objects being inserted at one end of the arraylist and dropped from the other, and as an object is inserted to one end the others are shifted by one index. I could use arraylist.add(0, object) for inserting the objects from the left of the arraylist and right-shifting each time the previous objects by one index, but making a google search I found that this is very processing-intensive operation - O(N) if I remember right. Thus, I thought "ok, let's insert the objects from the right-end of the arraylist, no problem!", assuming that still each insertion will move the previous objects by one index (to the left this time).
Also when I use the term "index" I simply mean the position of the object in the ArrayList - maybe there is some more formall term "index" which means something different.

You have a couple of options. Here are the two basic options:
You can maintain a Map<Object,Integer> that holds indexes, in parallel to the array. When you append an element to the array you can just add it to the map. When you remove an element from the beginning you will have to iterate through the entire map and subtract one from every index.
If it's appropriate for your situation and the Map does not meet your performance requirements, you could add an index field to your objects and store the index directly when you add it to the array. When you remove an element from the beginning you will have to iterate through all objects in the list and subtract one from their index. Then you can obtain the index in constant time given an object.
These still have the performance hit of updating the indexes after a remove. Now, after you choose one of these options, you can avoid having to iterate through the map / list to update after removal if you make a simple improvement:
Instead of storing the index of each object, store a count of the total number of objects added so far. Then to get the actual index, simply subtract the count value of the first object from the value of the one you are looking for. E.g. when you add:
add a to end;
a.counter = counter++;
remove first object;
(The initial value of counter when starting the program doesn't really matter.) Then to find an object "x":
index = x.counter - first object.counter;
Whether you store counter as a new field or in a map is up to you. Hope that helps.
By the way; a linked list will have better performance when removing object from the front of the list, but worse when accessing an object by index. It may be more appropriate depending on your balance of add/remove vs. random access (if you only care about the index but never actually need to retrieve an object by index, random access performance doesn't matter). If you really need to optimize further you could consider using a fixed-capacity ring buffer instead (back inserts, front removes, and random access will all be O(1)).
Of course, option 3 is to reconsider your algorithm at a higher level; perhaps there is a way to accomplish the behavior you are seeking that does not require finding the objects in the list.

storing sets of integers to check if a certain set has already been mentioned

I've come across an interesting problem which I would love to get some input on.
I have a program that generates a set of numbers (based on some predefined conditions). Each set contains up to 6 numbers that do not have to be unique with integers that ranges from 1 to 100).
I would like to somehow store every set that is created so that I can quickly check if a certain set with the exact same numbers (order doesn't matter) has previously been generated.
Speed is a priority in this case as there might be up to 100k sets stored before the program stops (maybe more, but most the time probably less)! Would anyone have any recommendations as to what data structures I should use and how I should approach this problem?
What I have currently is this:
Sort each set before storing it into a HashSet of Strings. The string is simply each number in the sorted set with some separator.
For example, the set {4, 23, 67, 67, 71} would get encoded as the string "4-23-67-67-71" and stored into the HashSet. Then for every new set generated, sort it, encode it and check if it exists in the HashSet.
Thanks!

if you break it into pieces it seems to me that
creating a set (generate 6 numbers, sort, stringify) runs in O(1)
checking if this string exists in the hashset is O(1)
inserting into the hashset is O(1)
you do this n times, which gives you O(n).
this is already optimal as you have to touch every element once anyways :)
you might run into problems depending on the range of your random numbers.
e.g. assume you generate only numbers between one and one, then there's obviously only one possible outcome ("1-1-1-1-1-1") and you'll have only collisions from there on. however, as long as the number of possible sequences is much larger than the number of elements you generate i don't see a problem.
one tip: if you know the number of generated elements beforehand it would be wise to initialize the hashset with the correct number of elements (i.e. new HashSet<String>( 100000 ) );
p.s. now with other answers popping up i'd like to note that while there may be room for improvement on a microscopic level (i.e. using language specific tricks), your overal approach can't be improved.

Create a class SetOfIntegers
Implement a hashCode() method that will generate reasonably unique hash values
Use HashMap to store your elements like put(hashValue,instance)
Use containsKey(hashValue) to check if the same hashValue already present
This way you will avoid sorting and conversion/formatting of your sets.

Just use a java.util.BitSet for each set, adding integers to the set with the set(int bitIndex) method, you don't have to sort anything, and check a HashMap for already existing BitSet before adding a new BitSet to it, it will be really very fast. Don't use sorting of value and toString for that purpose ever if speed is important.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.