Java sorting collection/api - java

I am wondering what API or collection would be best to use for using scanner to search through a document, count the number of times a word appears and create an alphabetical list of both that and for each word a sublist of how many times it is followed by another word.
This is for a class, so please just point me in the right direction as I am completely new to Java and packages, but I don't want any actual coding tips, thank you.

I imagine you could do something like that with Map<String, Map<String, Integer>>. Essentially what you'll have a word, which is associated with a map that contains all the successive words along with their frequency (i.e., the number of times they appear). So what you'd have is:
Map<String, Map<String, Integer>> frequencyTable = new HashMap<String, Map<String, Integer>>();
For sorting, you could create a class that holds a word and its frequency. Then you can use a TreeSet with a comparator (or implement compareTo on your class) to enforce ordering. Then your map would look like this:
Map<String, TreeSet<Frequency>> frequencyTable = new HashMap<String, TreeSet<Frequency>>();
Assuming Frequency is the class that holds information about the string and the number of times it appears. The only difficulty here is looking up the word each time you need to update its frequency because you will have to iterate over the set.

See if its about Sorting........
Here are you few options...
Use Collections.sort(T t) along with Comparable interface, if you want to sort in only
one way.
Use Collection.sort(T t, Comparator c) along with Comparator interface to sort in more than one way.
If uniqueness is important, you can also use TreeSet with comparator.

Related

Creating Dictionary in java?

Everywhere on net, here is the way
Map<String, String> map = new HashMap<String, String>();
map.put("dog", "type of animal");
System.out.println(map.get("dog"));
My point is should it not be Treemap considering dictionary has to be sorted? Agreed lookup wont be optimized in case of Treemap but considering sorting its best data structure
UPDATE :- one more requirement is return the lexicographically nearest word if the word searched is not present . I am not sure how to achieve it?
If you need the map sorted by its keys, then use TreeMap, which "...provides guaranteed log(n) time cost for the containsKey, get, put and remove operations." If not, use the more general HashMap (which "...provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets..."), or one of the other Map implementations, depending on your need.
If you want to get value for given key and if the probability of having the exact match of key in hashmap is less then using hashmap wont give you benefit of direct lookup.
If using TreeMap you can get list of keys which is already ordered and can perform a binary search on the list. While searching compare key lexicographically. Continue binary search till the lexicographic distance between two keys is minimum or 0.
Dictionary is no longer a term used in the language. You'll get multiple answers.
I know that Objective-C uses a class called Dictionary that is as a Key / Value data structure. The fact that it's named Dictionary leads me to believe that is the ordering of the objects, I imagine the Key has to be a string or char
So, it depends on the entire question.
When someone says they want to create a Key/Value data structure that is ordered alphabetically, or a "Dictionary", the answer is:
TreeMap<String, Object> map = new TreeMap<>()
If someone is asking how to create a Key/Value object similar to a Dictionary in whatever language, they will likely get any of the java.util classes that implement the Map<K, V> interface, for example HashMap, TreeMap. A good answer would be a TreeMap.
In this case telling someone to use a HashMap is not debatable, because the answer is as vague as the question.

How to sort map inside of List in java

List<Map<String,String>> consolidErr = new LinkedList<Map<String,String>>();
Map m1= new HashMap<String,String>();
m1.put("id","1");
m1.put("value","value1");
Map m2= new HashMap<String,String>();
m2.put("id","2");
m2.put("value","value2");
Map m3= new HashMap<String,String>();
m3.put("id","3");
m3.put("value","value3");
add all m1,m3 and m2 in list
then sort maps based on ids in map now i want maps in list as m1,m2 and m3.
Now I want to sort based on the ids in the map, I did that using iteration of list and keep first id of map as checker and compare with next it works if there any other better way than that using built-in methods.?Please give your ideas.am using bubble sort for this now.
The simplest way to do this in java (or at least, with the least mess) is to use a custom comparator.
The idea is that if you have objects with a natural sort (anything that extends Comparable) you can just ask for the sorting , e.g.
Collections.sort(List<Integer> ..
otherwise you can just pass in a Comparator that describes how you want objects compared, with any custom logic you want, e.g. (roughly - this is off the top of my head and doesn't have error checking, but should be enough to give you the idea) -
List<Map<String,String>> consolidErr = ...
enter code here
Collections.sort(consolidErr, new Comparator<Map<String,String>>(){
public int compare(Map<String,String> a, Map<String,String> b){
return a.get("id").compareTo(b.get("id"));}
})
In Java 8, we can sort the list of maps in a single line.
list.sort(Comparator.comparing((Map<String,String> mp) -> mp.get("Id")));
I would use instead the PriorityQueue
as a wrapper for your list. By providing the Comparator to the constructor when creating it, would assure you that your list will remain sorted after each insertion of a new element to the list.

comparing hashmap data

I was wondering if it is possible to compare items in multiple hashMaps to each other:
HashMap<String,String> valueMap = new HashMap<String, String>();
HashMap<String,Integer> formulaMap = new HashMap<String, Integer>();
What I would basically like to do is something like:
if(the second string in valueMap is the same as the first string in formulaMap){
}
Is there a short way to achieve this or do I have to compare the strings before they are included into the hashMaps. My Integer at this stage of the program is required to take a null value. I can achieve my goals with a multi-dimensional array, but a solution like this would be more elegant and less time consuming.
By using a LinkedHashMap you can have a map that respects the insertion order of different values. Everything you have to do is iterate over the entrySet of the map until you reach the position you're looking for.
Plus: If you also need ordering, you can have a look at the TreeMap which inserts elements in order based on a criteria defined by you (You can pass a Comparator as a parameter for the map).
This order will apply to the keys of the map tough, so if you need value ordering you're going to have to come up with a little more complex solution (as in sorting the entry set directly and adding the values to another map, for example).

Comparing TreeMap contents gives incorrect answer

I use a TreeMap as a 'key' inside another TreeMap
ie
TreeMap<TreeMap<String, String>, Object>
In my code 'object' is a personal construct, but for this intance I have used a string.
I have created a pair of TreeMaps to test the TreeMap.CompareTo() and TreeMap.HashCode() methods. this starts with the following...
public class TreeMapTest
public void testTreeMap()
{
TreeMap<String, String> first = new TreeMap<String, String>();
TreeMap<String, String> second = new TreeMap<String, String>();
first.put("one", "une");
first.put("two", "deux");
first.put("three", "trois");
second.put("une", "one");
second.put("deux", "two");
second.put("trois", "three");
TreeMap<TreeMap<String, String>, String> english = new TreeMap<TreeMap<String, String>, String>();
TreeMap<TreeMap<String, String>, String> french = new TreeMap<TreeMap<String, String>, String>();
english.put(first, "english");
french.put(second, "french");
From here I now call the the english item to see if it contains the key
if (english.containsKey(second))
{
System.out.println("english contains the key");
//throws error of ClassCastException: Java.util.TreeMap cannot be cast to
//Java.Lang.Comparable, reading the docs suggests this is the feature if the key is
//not of a supported type.
//this error does not occur if I use a HashMap structure for all maps, why is
//this key type supported for one map structure but not another?
}
However I should note that both HashMap and TreeMap point to the same HashCode() method in the AbstractMap parent.
My first thought was to convert my TreeMap to a HashMap, but this seemed a bit soppy! So I decided to apply the hashCode() method to the 2 treemap objects.
int hc1 = first.hashCode();
int hc2 = second.hashCode();
if(hc1 == hc2)
{
systom.out.printline("values are equal " + hc1 + " " + hc2);
}
prints the following
values are equal 3877431 & 3877431
For me the hashcode should be different as the key values are different, I can't find details on the implementation difference of the hashCode() method between HashMap and TreeMap.
Please not the following.
changing the Keys only to HashMap doesn't stop the ClassCastException error. Changing all the maps to a HashMap does. so there is something with the containsKey() method in TreeMap that isn't working properly, or I have missunderstood - can anyone explain what?
The section where I get the hashCode of the first and second map objects always produces the same output (no matter if I use a Hash or Tree map here) However the if(english.ContainsKey(second)) doesn't print any message when HashMaps are used, so there is obviously something in the HashMap implementation that is different for the compareTo() method.
My principle questions are.
Where can I find details of the types of keys for use in TreeMap objects (to prevent future 'ClassCastException' errors).
If I can't use a certain type of object as a key, why am I allowed to insert it as a key into the TreeMap in the first place? (surely if I can insert it I should be able to check if the key exists?)
Can anyone suggest another construct that has ordered inster / retrieval to replace my TreeMap key objects?
Or have I potentially found strange behaviour. From my understanding I should be able to do a drop in replacement of TreeMap for HashMap, or have I stumbled upon a fringe scenario?
Thanks in advance for your comments.
David.
ps. the problem isn't a problem in my code as I use a personal utility to create a hash that becomes dependent on the Key and Value pairs (ie I calculate key hash values differently to value hash values... sorry that if is a confusing sentence!) I assume that the hashCode method just sums all the values together without considering if a item is a key or a value.
pps. I'm not sure if this is a good question or not, any pointers on how to improve it?
Edit.
from the responses people seem to think I'm doing some sort of fancy language dictionary stuff, not a surprise from my example, so sorry for that. I used this as an example as it came easily to my brain, was quick to write and demonstrated my question.
The real problem is as follows.
I'm accessing a legacy DB structure, and it doesn't talk nicely to anything (result sets aren't forward and reverse readable etc). So I grab the data and create objects from them.
The smallest object represents a single row in a table (this is the object that in the above example I have used a string value 'english' or 'french' for.
I have a collection of these rowObjects, each row has an obvious key (this is the TreeMap that points to the related rowObject).
i don't know if that makes things any clearer!
Edit 2.
I feel I need to elaborate a little further as to my choice of originaly using
hashMap<HashMap<String,string>, dataObject>
for my data structure, then converting to TreeMap to gain an ordered view.
In edit 1 I said that the legacy DB doesn't play nicely (this is an issue with the JDBC.ODBC I suspect, and I'm not about to acquire a JDBC to communicate with the DB). The truth is I apply some modifications to the data as as I create my java 'dataObject'. This means that although the DB may spit out the results in ascending or descending order, I have no way of knowing what order they are inserted into my dataObject. Using a likedHashMap seems like a nice solution (see duffymo's suggestion) but I later need to extract the data in an ordered fashion, not just consecutively (LinkedHashMap only preserves insertion order), and I'm not inclined to mess around with ordering everything and making copies when I need to insert a new item in between 2 others, TreMap would do this for me... but if I create a specific object for the key it will simply contain a TreeMap as a member, and obviously I will then need to supply a compareTo and hashCode method. So why not just extent TreeMap (allthough Duffymo has a point about throwing that solution out)!
This is not a good idea. Map keys must be immutable to work properly, and yours are not.
What are you really trying to do? When I see people doing things like this with data structures, it makes me think that they really need an object but have forgotten that Java's an object-oriented language.
Looks like you want a crude dictionary to translate between languages. I'd create a LanguageLookup class that embedded those Maps and provide some methods to make it easier for users to interact with it. Better abstraction and encapsulation, more information hiding. Those should be your design objectives. Think about how to add other languages besides English and French so you can use it in other contexts.
public class LanguageLookup {
private Map<String, String> dictionary;
public LanguageLookup(Map<String, String> words) {
this.dictionary = ((words == null) ? new HashMap<String, String>() : new HashMap<String, String>(words));
}
public String lookup(String from) {
return this.dictionary.get(from);
}
public boolean hasWord(String word) {
return this.dictionary.containsKey(word);
}
}
In your case, it looks like you want to translate an English word to French and then see if the French dictionary contains that word:
Map<String, String> englishToFrenchWords = new HashMap<String, String>();
englishToFrenchWords.put("one", "une");
Map<String, String> frenchToEnglishWords = new HashMap<String, String>();
frenchToEnglishWords.put("une", "one");
LanguageLookup englishToFrench = new LanguageLookup(englishToFrenchWords);
LanguageLookup frenchToEnglish = new LanguageLookup(frenchToEnglishWords);
String french = englishToFrench.lookup("one");
boolean hasUne = frenchToEnglish.hasWord(french);
Your TreeMap is not Comparable so you can't add it to a SortedMap and its not immutable so you can't add it to a HashMap. What you could use an IdentityMap but suspect an EnumMap is a better choice.
enum Language { ENGLISH, FRENCH }
Map<Language, Map<Language, Map<String, String>>> dictionaries =
new EnumMap<>(Language.class);
Map<Language, Map<String, String>> fromEnglishMap = new EnumMap<>(Language.class);
dictionaries.put(Language.ENGLISH, fromEnglishMap);
fromEnglishMap.put(Language.FRENCH, first);
Map<Language, Map<String, String>> fromFrenchMap = new EnumMap<>(Language.class);
dictionaries.put(Language.FRENCH, fromFrenchMap);
fromEnglishMap.put(Language.ENGLISH, second);
Map<String, String> fromEnglishToFrench= dictionaries.get(Language.ENGLISH)
.get(Language.FRENCH);
To the problem why Hashmap works and Treemap does not:
A Treemap is a "sorted map", meaning that the entries are sorted according to the key. This means that the key must be comparable, by implementing the Comparable interface. Maps usually do NOT implement this, and I would highly suggest you do not create a custom type to add this feature. As duffymo mentions, using maps as keys is a BAD idea.

Finding the highest-n values in a Map

I have a large map of String->Integer and I want to find the highest 5 values in the map. My current approach involves translating the map into an array list of pair(key, value) object and then sorting using Collections.sort() before taking the first 5. It is possible for a key to have its value updated during the course of operation.
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
Could I have some suggestions/alternatives on optimizing this please? Am happy to consider different data structures if there is benefit.
Thanks!
Well, to find the highest 5 values in a Map, you can do that in O(n) time where any sort is slower than that.
The easiest way is to simply do a for loop through the entry set of the Map.
for (Entry<String, Integer> entry: map.entrySet()) {
if (entry.getValue() > smallestMaxSoFar)
updateListOfMaximums();
}
You could use two Maps:
// Map name to value
Map<String, Integer> byName
// Maps value to names
NavigableMap<Integer, Collection<String>> byValue
and make sure to always keep them in sync (possibly wrap both in another class which is responsible for put, get, etc). For the highest values use byValue.navigableKeySet().descendingIterator().
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
There is an approach in between that you can take as well. When a thread requests a "sorted view" of the map, create a copy of the map and then handle the sorting on that.
public List<Integer> getMaxFive() {
Map<String, Integer> copy = null;
synchronized(lockObject) {
copy = new HashMap<String, Integer>(originalMap);
}
//sort the copy as usual
return list;
}
Ideally if you have some state (such as this map) accessed by multiple threads, you are encapsulating the state behind some other class so that each thread is not updating the map directly.
I would create a method like:
private static int[] getMaxFromMap(Map<String, Integer> map, int qty) {
int[] max = new int[qty];
for (int a=0; a<qty; a++) {
max[a] = Collections.max(map.values());
map.values().removeAll(Collections.singleton(max[a]));
if (map.size() == 0)
break;
}
return max;
}
Taking advantage of Collections.max() and Collections.singleton()
There are two ways of doing that easily:
Put the map into a heap structure and retrive the n elements you want from it.
Iterate through the map and update a list of n highest values using each entry.
If you want to retrive an unknown or a large number of highest values the first method is the way to go. If you have a fixed small amount of values to retrieve, the second might be easier to understand for some programmers.
Personally, I prefer the first method.
Please try another data structure. Suppose there's a class named MyClass which its attributes are key (String) and value (int). MyClass, of course, needs to implement Comparable interface. Another approach is to create a class named MyClassComparator which extends Comparator.
The compareTo (no matter where it is) method should be defined like this:
compareTo(parameters){
return value2 - value1; // descending
}
The rest is easy. Using List and invoking Collections.sort(parameters) method will do the sorting part.
I don't know what sorting algorithm Collections.sort(parameters) uses. But if you feel that some data may come over time, you will need an insertion sort. Since it's good for a data that nearly sorted and it's online.
If modifications are rare, I'd implement some SortedByValHashMap<K,V> extends HashMap <K,V>, similar to LinkedHashMap) that keeps the entries ordered by value.

Categories

Resources