Comparator is removing duplicate objects from treemap - java

I'm working on a highscore system that reads from a file line-by-line and adds all lines into a treemap, sorts the treemap and adds the scores and names into a new file, highest score being at the top.
I've gotten the system close but for some unknown reason the code is removing duplicate entries, for example, i have 3 scores.
1 : Sander
1 : Sander
2 : Mark
Printing my treemap would look like this:
I would like the code to show Sander twice.
I've been stuck for quite some time and would appriciate some help, here is my code:
public void sortScores() throws IOException {
File input = new File("scores.txt");
File output = new File("outputscores.txt");
FileInputStream fis = new FileInputStream(input);
FileOutputStream fos = new FileOutputStream(output);
BufferedReader in = new BufferedReader(new InputStreamReader(fis));
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(fos));
String aLine;
while ((aLine = in.readLine()) != null) {
String[] scoreAndName = aLine.split(" : ");
int score1 = Integer.parseInt(scoreAndName[0]);
String name1 = scoreAndName[1];
unsortMap1.put(score1, name1);
}
Map<Integer, String> treeMap = new TreeMap<Integer, String>(
new Comparator<Integer>() {
#Override
public int compare(Integer o1, Integer o2) {
if (o1 >= o2) {
return -1;
} else {
return 1;
}
}
});
treeMap.putAll(unsortMap1);
System.out.println(treeMap);
}

That's exactly the way Map is supposed to work. From documentation:
An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.
put(key, value) replace the old value with the new one.
While you can make it work using Map<Integer, List<String>> (for example), that's cumbersome and doesn't look clean. List is a better data structure for this task.
I would define a class Score, use it to populate the list, and sort it with a custom comparator. You could also make Score implement Comparable instead of using a Comparator.
Here's an example that uses Comparator.comparingInt().
List<Score> scores = new ArrayList<>();
scores.add(new Score(1, "Name")); //adding an element
scores.sort(Comparator.comparingInt(Score::getScore)); //sorting
class Score {
private final int score;
private final String name;
Score(int score, String name) {
this.score = score;
this.name = name;
}
public int getScore() {
return score;
}
public String getName() {
return name;
}
}

You have chosen the wrong collection type.
A Map<Integer, String> will store only one String value for any possible Integer. So given a score (e.g. 1) it can store a single name. Not the same name multiple times.
It appears that you are trying to sort records representing scores. To do that, you are going to need to create a custom class whose instances represent the records.
You can sort the records a number of ways:
Put them into an array use Arrays.sort
Put them into a list and use Collections.sort
Implement your own sort algorithm. (Not recommended. It is better not to waste your time "reinventing the wheel".)
You will either need to implement a Comparator to order the records1, or make your record class implement Comparable.
For more details, please read the respective javadocs.
But you cannot use a TreeMap or TreeSet (or any other Map or Set implementations) for this. Your records are not unique, and those data structures will remove duplicates2.
1 - From Java 8, the Comparator interface has some static helper methods for creating comparators; e.g. Comparator.comparingInt().
2 - Not strictly true. You could do this if your records had a 3rd field containing a unique identifier which you used as a tie-breaker in the comparator. Or you could use a Map<YouRecord, Integer>, where the integer represents a count of records that are "equal".

Related

Should I sort a hashmap that contains frequency with bucketsort or heapsort?

I have a hashmap in Java in this form HashMap<String, Integer> frequency. The key is a string where I hold the name of a movie and the value is the frequency of the said movie.
My program takes input from users so whenever someone is adding a video to favorite I go in the hashmap and I increment its frequency.
Now the problem is at one point I need to take the most k frequent movies. I've found that I could use bucketsort or heapsort in this leetcode problem (check the first comment), however I am not sure if it is more efficient in my case. My hashmap constantly updates, therefore I need to call the sorting algorithm again times if one frequency changed.
From my understanding, it takes O(N) time to build the map, where 'N' is the number of movies even with duplicates as it needs to add to the frequency, which gets me 'M' unique movie titles. Would that mean that heapsort will result in O(M * log(k)) and bucketsort O(M) for any given k?
Having a map that sorts on values (the thing you map to) isn't a thing, unfortunately. You could instead have a set whose keys sort themselves on frequency, but given that frequency is the key at that point, you couldn't look up entries in this set without knowing the frequency beforehand which eliminates the point of the exercise.
One strategy that comes to mind is to have 2 separate data structures. One serves to let you look up the actual object based on the name of the movie, the other is to be self-sorting:
#Data
public class MovieFrequencyTuple implements Comparable<MovieFrequencyTable> {
#NonNull private final String name;
private int frequency;
public void incrementFrequency() {
frequency++;
}
#Override public int compareTo(MovieFrequencyTuple other) {
int c = Integer.compare(frequency, other.frequency);
if (c != 0) return -c;
return name.compareTo(other.name);
}
}
and with that available to you:
SortedSet<MovieFrequencyTuple> frequencies = new TreeSet<>();
Map<String, MovieFrequencyTuple> movies = new HashMap<>();
public int increment(String movieName) {
MovieFrequencyTuple tuple = movies.get(name);
if (tuple == null) {
tuple = new MovieFrequencyTuple(name);
movies.put(name, tuple);
}
// Self-sorting data structures will just fail
// to do the job if you modify a sorting order on
// an object already in the collection. Thus,
// we take it out, modify, put it back in.
frequencies.remove(tuple);
tuple.incrementFrequency();
frequencies.add(tuple);
return tuple.getFrequency();
}
public int get(String movieName) {
MovieFrequencyTuple tuple = movies.get(movieName);
if (tuple == null) return 0;
return tuple.getFrequency();
}
public List<String> getTop10() {
var out = new ArrayList<String>();
for (MovieFrequencyTuple tuple : frequencies) {
out.add(tuple.getName());
if (out.size() == 10) break;
}
return out;
}
Each operation is amortized O(1) or O(logn), even the top10 operation. So, if you run a million times 'increment a movie's frequency, then obtain the top 10', with n = # of times we do that, then the worst case scenario is O(nlogn) performance.
NB: Uses lombok for constructors, getters, etc - if you don't like that, have your IDE generate these things.

Why does adding "|" to Set in Java flip the elements order?

I have two Lists and trying to form a String with element from each List, and in between when I do " ", the sorted order is maintained. But once I put "|" in the middle, which I would want to, the order of the elements in the Set gets switched around.
How can I add "|" and still maintain the sorted order in the Set students?
Here is the code:
Set<String> students = new HashSet<>();
Set<String> fn = new HashSet<>();
Set<String> nums = new HashSet<>();
List<String> firstNames = new ArrayList<>(fn);
Collections.sort(firstNames);
List<String> favNumbers = new ArrayList<>(nums);
Collections.sort(favNumbers);
for(int i=0; i<firstNames.size(); i++) {
students.add(firstNames.get(i) + "|" + favNumbers.get(i));
}
System.out.println(students);
With ... + " " + ..., the order is [Joshua 4, Lyon 7], but if "|" is added in place of " ", the order becomes [Lyon|7, Joshua|4] when I want and should be[Joshua|4, Lyon|7].
A HashSet does not provide any ordering guarantees about its contents, using whatever ordering the underlying HashMap generates, which is in turn based on the hashCode() of the elements.
When you change the contents of a string, you get a different hash code--simple as that. The order in a HashMap is undefined and could change if you inserted additional elements triggering a rehash.
If you want a set with a guaranteed order, you can use a SortedSet implementation (such as TreeSet), but you'd need to write a proper class and implement suitable Comparators. Alternately, you could use LinkedHashSet, which maintains elements in insertion order at the expense of additional overhead.
You should be using object oriented design, as Java is an object oriented language. Instead of trying to represent the various features of a student as independent collection, create a Student POJO which contains these features. Then, create custom comparators to sort by either name or favorite number.
public class Student {
private String firstName;
private String lastName;
private int favNumber;
// getters and setters
public static Comparator<Student> NameComparator
= new Comparator<Student>() {
public int compare(Student s1, Student s2) {
String f1 = s1.getFirstName();
String f2 = s2.getFirstName();
String l1 = s1.getLastName();
String l2 = s2.getLastName();
if (l1.equalsIgnoreCase(l2) {
return f1.toUpperCase().compareTo(f2);
}
else {
return l1.toUpperCase().compareTo(l2);
}
}
};
public static Comparator<Student> FavComparator
= new Comparator<Student>() {
public int compare(Student s1, Student s2) {
return s1.getFavNumber() < s2.getFavNumber();
}
};
}
Now if you have a list of students, List<Student> list, you can sort via:
Collections.sort(list, Student.NameComparator);
Or, to sort by favorite numbers, use:
Collections.sort(list, Student.FavComparator);
From the documentation for HashSet:
It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.
So you cannot rely on HashSet to preserve any kind of order.
It looks to me like you just need to preserve the order of insertion, in which case you're better off not using a Set and rather a List, e.g.,
List<String> students = new ArrayList<>();

How to get individual array names as strings for array of arrays

// Facility table attributes to be read in json format
String facilityName[], recApp[], recFacility[], sendApp[],
sendFacility[], enable[], doneness[], retryLimit[],
watchDelay[], retryDelay[], ackTimeout[],
keepConOpen[], sendTimeout[], cleanupDelay[],
host[], port[];
String facilityTableAttrs[][] = new String[][] {
facilityName, recApp, recFacility, sendApp,
sendFacility, enable, doneness, retryLimit,
watchDelay, retryDelay, ackTimeout, keepConOpen,
sendTimeout, cleanupDelay, host, port};
I have array of arrays called facilityTableAttrs declared as above.
I have 2 questions:
1) Is it possible to do the above array declaration in a single step ?
2) I wish to get the individual array names of these 1D array using something like:
for(i = 0; i < facilityTableAttrs.length; i++) {
System.out.println(facilityTableAttrs[i].toString());
}
but it fails. How to get the individual array names as string?
The first list of arrays you declare don't seem to be initialized anywhere.
As such they are null, and invoking toString on any of them will cause a NullPointerException to be thrown, hence "it fails".
By the way, invoking toString on an non-null array would actually print something similar to the Object.toString representation, which is not what you want (Arrays.toString(myArray) is your friend here).
You could initialize each and every single array and populate them optionally, before adding them to the main String[][] but I recommend you don't.
Instead, investigate the collections framework.
What you could use here is a Map<String, List<String>>.
Or better even, a custom object with properties such as List<String> facilityName, List<String> recApp, etc.
Finally, note the variable naming, which is camelBack according to code conventions.
This is not possible with arrays. You need to use map, like so:
Map<String, List<String>> myMap = new HashMap<String, List<String>>();
You need to choose correct data structure for your problem.
Arrays are used only for storing values, thay are not interestd in bounding names to them.
Maps on the other hands are great with bounding names (keys that are unique) to any type of value.
I propose to use a wrapper class:
public class Facility {
private final String name;
private final List<String> values;
public Facility(String name) {
this.name = name;
this.values = new ArrayList<>();
}
public String getName() {
return name;
}
public List<String> getValues() {
return values;
}
}
and then do:
Facility[] facilities = new Facility[] {
new Facility("facility 1"),
new Facility("facility 2"),
new Facility("facility 3"),
new Facility("facility 4"),
};
for(Facility facility : facilities) {
System.out.println(facility.getName());
}
To add a value to a facility you'd do:
Facility facility = facilities.get(0);
facility.getValues().add("bla");
If you need to look up facilities by name, then use a Map instead of an array:
...
// see createLookup method below
Map<String, Facility> facilities = createLookup(
new Facility("facility 1"),
new Facility("facility 2"),
new Facility("facility 3"),
new Facility("facility 4"));
// print names
for(Facility facility : facilities.values()) {
System.out.println(facility.getName());
}
// add a value
Facility facility = facilities.get("facility 3");
facility.getValues().add("bla");
}
private Map<String, Facility> createLookup(Facility.. facilities) {
// use TreeMap to have sorted keys
Map<String, Facility> lookup = new TreeMap<>();
for(Facility facility : facilities) {
lookup.put(facility.getName(), facility);
}
return lookup;
}

Which collections to use?

Suppose I want to store phone numbers of persons. Which kind of collection should I use for key value pairs? And it should be helpful for searching. The name may get repeated, so there may be the same name having different phone numbers.
In case you want to use key value pair. Good choice is to use Map instead of collection.
So what should that map store ?
As far it goes for key. First thing you want to assure is that your key is unique to avoid collisions.
class Person {
long uniqueID;
String name;
String lastname;
}
So we will use the uniqueID of Person for key.
What about value ?
In this case is harder. As the single Person can have many phone numbers. But for simple task lest assume that a person can have only one phone number. Then what you look is
class PhoneNumberRegistry {
Map<Long,String> phoneRegistry = new HashMap<>();
}
Where the long is taken from person. When you deal with Maps, you should implement the hashCode and equals methods.
Then your registry could look like
class PhoneNumberRegistry {
Map<Person,String> phoneRegistry = new HashMap<>();
}
In case when you want to store more then one number for person, you will need to change the type of value in the map.
You can use Set<String> to store multiple numbers that will not duplicate. But to have full control you should introduce new type that not only store the number but also what king of that number is.
class PhoneNumberRegistry {
Map<Person,HashSet<String>> phoneRegistry = new HashMap<>();
}
But then you will have to solve various problems like, what phone number should i return ?
Your problem has different solutions. For example, I'll go with a LIST: List<Person>, where Person is a class like this:
public class Person{
private String name;
private List<String> phoneNumbers;
// ...
}
For collections searching/filtering I suggest Guava Collections2.filter method.
You should use this:
Hashtable<String, ArrayList<String>> addressbook = new Hashtable<>();
ArrayList<String> persons = new ArrayList<String>()
persons.add("Tom Butterfly");
persons.add("Maria Wanderlust");
addressbook.put("+0490301234567", persons);
addressbook.put("+0490301234560", persons);
Hashtable are save to not have empty elements, the ArrayList is fast in collect small elements. Know that multiple persons with different names may have same numbers.
Know that 2 persons can have the same number and the same Name!
String name = "Tom Butterfly";
String[] array = addressbook.keySet().toArray(new String[] {});
int firstElement = Collections.binarySearch(Arrays.asList(array),
name, new Comparator<String>() {
#Override
public int compare(String top, String bottom) {
if (addressbook.get(top).contains(bottom)) {
return 0;
}
return -1;
}
});
System.out.println("Number is " + array[firstElement]);
Maybe
List<Pair<String, String> (for one number per person)
or
List<Pair<String, String[]> (for multiple numbers per person)
will fit your needs.

How to optimize the updating of values in an ArrayList<Integer>

I want to store all values of a certain variable in a dataset and the frequency for each of these values. To do so, I use an ArrayList<String> to store the values and an ArrayList<Integer> to store the frequencies (since I can't use int). The number of different values is unknown, that's why I use ArrayList and not Array.
Example (simplified) dataset:
a,b,c,d,b,d,a,c,b
The ArrayList<String> with values looks like: {a,b,c,d} and the ArrayList<Integer> with frequencies looks like: {2,3,2,2}.
To fill these ArrayLists I iterate over each record in the dataset, using the following code.
public void addObservation(String obs){
if(values.size() == 0){// first value
values.add(obs);
frequencies.add(new Integer(1));
return;//added
}else{
for(int i = 0; i<values.size();i++){
if(values.get(i).equals(obs)){
frequencies.set(i, new Integer((int)frequencies.get(i)+1));
return;//added
}
}
// only gets here if value of obs is not found
values.add(obs);
frequencies.add(new Integer(1));
}
}
However, since the datasets I will use this for can be very big, I want to optimize my code, and using frequencies.set(i, new Integer((int)frequencies.get(i)+1)); does not seem very efficient.
That brings me to my question; how can I optimize the updating of the Integer values in the ArrayList?
Use a HashMap<String,Integer>
Create the HashMap like so
HashMap<String,Integer> hm = new HashMap<String,Integer>();
Then your addObservation method will look like
public void addObservation(String obs) {
if( hm.contains(obs) )
hm.put( obs, hm.get(obs)+1 );
else
hm.put( obs, 1 );
}
I would use a HashMap or a Hashtable as tskzzy suggested. Depending on your needs I would also create an object that has the name, count as well as other metadata that you might need.
So the code would be something like:
Hashtable<String, FrequencyStatistics> statHash = new Hashtable<String, FrequencyStatistics>();
for (String value : values) {
if (statHash.get(value) == null) {
FrequencyStatistics newStat = new FrequencyStatistics(value);
statHash.set(value, newStat);
} else {
statHash.get(value).incrementCount();
}
}
Now, your FrequencyStatistics objects constructor would automatically set its inital count to 1, while the incrementCound() method would increment the count, and perform any other statistical calculations that you might require. This should also be more extensible in the future than storing a hash of the String with only its corresponding Integer.

Categories

Resources