Comparing existing data entries in Java

Comparing existing data entries in Java - java

I have a HashMap relating Keys to Strings, and I need to compare some of the Strings against each other. However, some of the Strings may or may not be in the HashMap.
Example: Let's say I have 4 Strings that I plan to compare to each other if possible, but only 3 of them end up in the HashMap. How can I compare the Strings that are present without trying to compare them to the String that isn't, and without doing a bunch of nested ifs and elses?
edit: Alohci's solution was easy and fast, and it worked.

Loop through the .values collection of the HashMap
Store the first entry.
Compare each remaining entry with the stored one.
As soon as you find one that doesn't match, throw your error.
If you reach the end of the loop then all the strings match.

It sounds like you need a reverse mapping, that maps all the values to their set of keys.
Map<Key,Value> forwardMap;
Map<Value, Set<Key> reverseMap;
You can then see if all of the entries you are looking at are in the set. Make sure that you put the reverse mapping in when you add/remove the forward mapping.
The benefit of this approach, is the test will be O(n) where n is the size of the keys you are testing, and not O(m) where m is the size of the forward map.

Related

storing sets of integers to check if a certain set has already been mentioned

I've come across an interesting problem which I would love to get some input on.
I have a program that generates a set of numbers (based on some predefined conditions). Each set contains up to 6 numbers that do not have to be unique with integers that ranges from 1 to 100).
I would like to somehow store every set that is created so that I can quickly check if a certain set with the exact same numbers (order doesn't matter) has previously been generated.
Speed is a priority in this case as there might be up to 100k sets stored before the program stops (maybe more, but most the time probably less)! Would anyone have any recommendations as to what data structures I should use and how I should approach this problem?
What I have currently is this:
Sort each set before storing it into a HashSet of Strings. The string is simply each number in the sorted set with some separator.
For example, the set {4, 23, 67, 67, 71} would get encoded as the string "4-23-67-67-71" and stored into the HashSet. Then for every new set generated, sort it, encode it and check if it exists in the HashSet.
Thanks!

if you break it into pieces it seems to me that
creating a set (generate 6 numbers, sort, stringify) runs in O(1)
checking if this string exists in the hashset is O(1)
inserting into the hashset is O(1)
you do this n times, which gives you O(n).
this is already optimal as you have to touch every element once anyways :)
you might run into problems depending on the range of your random numbers.
e.g. assume you generate only numbers between one and one, then there's obviously only one possible outcome ("1-1-1-1-1-1") and you'll have only collisions from there on. however, as long as the number of possible sequences is much larger than the number of elements you generate i don't see a problem.
one tip: if you know the number of generated elements beforehand it would be wise to initialize the hashset with the correct number of elements (i.e. new HashSet<String>( 100000 ) );
p.s. now with other answers popping up i'd like to note that while there may be room for improvement on a microscopic level (i.e. using language specific tricks), your overal approach can't be improved.

Create a class SetOfIntegers
Implement a hashCode() method that will generate reasonably unique hash values
Use HashMap to store your elements like put(hashValue,instance)
Use containsKey(hashValue) to check if the same hashValue already present
This way you will avoid sorting and conversion/formatting of your sets.

Just use a java.util.BitSet for each set, adding integers to the set with the set(int bitIndex) method, you don't have to sort anything, and check a HashMap for already existing BitSet before adding a new BitSet to it, it will be really very fast. Don't use sorting of value and toString for that purpose ever if speed is important.

How to get common values from two different maps in the fastest possible way?

i have two hash-maps. I want to find the values that are common to both the maps. One way is to iterate through the first map,get the value and use that value to match it with the values of other map through iteration. But this takes lot of time. Is there any other way to find the common values in the fastest possible way??

firstMap.keySet().retainAll(secondMap.keySet()) does what you want.
I am still not sure that this is the really fastest way. Probably if you can control the population of these 2 maps you can create third map that will accumulate shared keys during the data population?

haven't tried this , and I am not sure whether its going to be faster or not , but you can consider converting HashMaps to HashSets and then call Set1.retainAll(Set2)

Obviously a very old post but if anyone else is looking for the answer for where values specifically have to match up you can do something like this
map1.entrySet().retainAll(map.entrySet());
If you use keySet() you'll retain common keys which has it's own use but if you want to match keys AND values, use entrySet().

Java - I have a HashSet of Strings, and I want to somehow sort these strings by their length

I really just need some way to find all the strings in the hashSet with the greatest length (whether it be one string or multiple). I figured I should sort the set on string length somehow first and then maybe iterate through it (going from longest strings to shortest so I can stop iteration after I've seen all the strings of greatest length). Can anyone help me figure out how best to go about this (mainly just concerned with finding out how to sort them by length efficiently)? Thanks.

The efficiency gained by storing the strings in a HashSet for fast lookups will not help you when trying to find the longest string. You'll need to update your data structure for that.
One option would be to store two separate data structures - a TreeSet<Set<String>> of sets of strings, where the comparator for the TreeSet just compares the length of the strings, plus the earlier HashSet. You could insert a string into this hybrid data structure efficiently by just updating the appropriate set in the TreeSet to contain the new string and inserting the string into the HashSet like before. It would also let you efficiently find all the largest strings simply by querying the TreeSet for its largest element.
Hope this helps!

Linked list in hash tables

Suppose we wish to repeatedly search a linked list of length N elements, each of which contains a very long string key. How might we take advantage of the hash value when searching the list for an element with a given key?

Insert the keys into a hash table. Then you can search in (theoretically) constant time.

You need to have the hashes prepared before searching the list and you need to be able to access hash of a string in constant time. Then you can first compare the hashes and only compare strings when hashes match. You can use a hashtable instead of a linked list.

The hash value of a String (hashCode) is a bit like an id for the string. Not completely unique, but usually pretty unique. You can a HashMap to store the String keys and their values. HashMap, as its name suggests, uses the Strings' hash values to efficiently store and retrieve values.

Not sure what constraints you're working under here (i.e. this might take way too much memory depending on how big the strings are), but if you have to maintain the linked list you could create a HashMap that maps the strings to their position in the list which would allow you to retrieve any string from the list with 2 constant time operations.

Put it in HashSet. The search algorithms will use the hash for each String value inserted.

Efficient way to subtract values in two HashMaps by key

I am wondering how to efficiently subtract the values of two maps when their keys match. Currently I have 2 HashMap<String,Integer> and do it like this:
for (String key: map1.keySet()){
if (map2.keySet().contains(key)){
//subtract
}
}
Is there a better way to do it?

Theoretically speaking, this is about as fast as it can be done unless you can somehow do a faster than O(n) way of finding the matching keys between the two HashMaps.
Iterate over keys in first map's keySet() - O(n)
See if key is in other map - O(1)
Do your operation - O(1)

Realise this is an old thread but do check out guava from google
https://code.google.com/p/guava-libraries/wiki/CollectionUtilitiesExplained#Maps
You can use Map.difference and then get the entries in common, only in left, right etc.

I think there isn't a better method unless you use a different approach, and/or different data structures. You can for example create a class named ValuePair that can contain (up to) two values, which represent the values you are currently storing in two different maps, but you instead store all the pairs in a single map, and when it comes to "subtract" you can iterate in a single set of keys. Please note that a pair can be incomplete, so that no subtraction is done.
But that's probabily overkill.

have you considered using Apache Commons Collections?
CollectionUtils.subtract( collection1, collection2 );

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Comparing existing data entries in Java - java

Loop through the .values collection of the HashMap Store the first entry. Compare each remaining entry with the stored one. As soon as you find one that doesn't match, throw your error. If you reach the end of the loop then all the strings match.

Related

storing sets of integers to check if a certain set has already been mentioned

How to get common values from two different maps in the fastest possible way?

Java - I have a HashSet of Strings, and I want to somehow sort these strings by their length

Linked list in hash tables

Efficient way to subtract values in two HashMaps by key

Categories

Resources