Mapping a set of strings to a specific key value - java

Say I have 3 sets of string values:
fruit: apple, berry, banana
color: red, blue, orange
vehicle: car, plane, truck
I'm looking for the most efficient way with Java to retrieve the parent value for each set such as:
getParentValue("banana") ---> fruit
Solution 1:
create a bunch of if/else statements or switch case:
if (fruitSet.contains(elem)) {
return "fruit";
}
else if (colorSet.contains(elem)) {
return "color";
} ...
This yields an O(n) lookup, n being numbers of sets.
Solution 2:
Create a hashmap which stores every child to parent value,
Key/Value:
apple/fruit
berry/fruit
banana/fruit
red/color
blue/color
orange/color
...
This yields an O(1) lookup time, but generates a large hash map as it stores every key - for some reason this solution feels ugly.
I am looking for some opinions or other approaches which might be more elegant.

The best approach is definitely your solution #2: if you want to be able to look up the category given a member, then the most efficient way is to have a Map from the member to the category. That's exactly what Map is for.
(Note that regardless of your approach, you'll have to store all the members. Storing them as keys is no uglier than storing them in some less-efficient way.)

Here is some code that only has 3 entries in the Map.
public static void main(String[] args) {
Map<String, List<String>> myMap = new HashMap<>();
myMap.put("fruit", new ArrayList<String>(Arrays.asList("apple","berry","banana")));
myMap.put("color", new ArrayList<String>(Arrays.asList("red","blue","orange")));
myMap.put("vehicle", new ArrayList<String>(Arrays.asList("car","plane","truck")));
System.out.println(getKey(myMap, "blue"));
}
public static String getKey(Map<String, List<String>> map, String value) {
for (String key : (Set<String>)map.keySet()) {
List<String> list = map.get(key);
if (list.contains(value)) {
return key;
}
}
return null;
}

Hashmap idea is quite good but if you wanna further improve, you can set each object to an integer and store it in an array. This would be nice if you have some categories with very long name.
For example,
something1/very very long string
something2/very very long string
After the conversion:
apple/0
berry/0
banana/0
red/1
blue/1
orange/1
...
something1/i
something2/i
And your array will be:
arr = {"fruit", "color", ....., "very very long string", ....};
Unless you have so many entries( tens or millions) in your list with very long strings categories, it won't worth.
For example, your average string size is 16 chars and you have a million different rows, it means only 16 MB which is nothing for a modern computer, and you save only 12MB as an integer is 4B.

Related

Should I sort a hashmap that contains frequency with bucketsort or heapsort?

I have a hashmap in Java in this form HashMap<String, Integer> frequency. The key is a string where I hold the name of a movie and the value is the frequency of the said movie.
My program takes input from users so whenever someone is adding a video to favorite I go in the hashmap and I increment its frequency.
Now the problem is at one point I need to take the most k frequent movies. I've found that I could use bucketsort or heapsort in this leetcode problem (check the first comment), however I am not sure if it is more efficient in my case. My hashmap constantly updates, therefore I need to call the sorting algorithm again times if one frequency changed.
From my understanding, it takes O(N) time to build the map, where 'N' is the number of movies even with duplicates as it needs to add to the frequency, which gets me 'M' unique movie titles. Would that mean that heapsort will result in O(M * log(k)) and bucketsort O(M) for any given k?
Having a map that sorts on values (the thing you map to) isn't a thing, unfortunately. You could instead have a set whose keys sort themselves on frequency, but given that frequency is the key at that point, you couldn't look up entries in this set without knowing the frequency beforehand which eliminates the point of the exercise.
One strategy that comes to mind is to have 2 separate data structures. One serves to let you look up the actual object based on the name of the movie, the other is to be self-sorting:
#Data
public class MovieFrequencyTuple implements Comparable<MovieFrequencyTable> {
#NonNull private final String name;
private int frequency;
public void incrementFrequency() {
frequency++;
}
#Override public int compareTo(MovieFrequencyTuple other) {
int c = Integer.compare(frequency, other.frequency);
if (c != 0) return -c;
return name.compareTo(other.name);
}
}
and with that available to you:
SortedSet<MovieFrequencyTuple> frequencies = new TreeSet<>();
Map<String, MovieFrequencyTuple> movies = new HashMap<>();
public int increment(String movieName) {
MovieFrequencyTuple tuple = movies.get(name);
if (tuple == null) {
tuple = new MovieFrequencyTuple(name);
movies.put(name, tuple);
}
// Self-sorting data structures will just fail
// to do the job if you modify a sorting order on
// an object already in the collection. Thus,
// we take it out, modify, put it back in.
frequencies.remove(tuple);
tuple.incrementFrequency();
frequencies.add(tuple);
return tuple.getFrequency();
}
public int get(String movieName) {
MovieFrequencyTuple tuple = movies.get(movieName);
if (tuple == null) return 0;
return tuple.getFrequency();
}
public List<String> getTop10() {
var out = new ArrayList<String>();
for (MovieFrequencyTuple tuple : frequencies) {
out.add(tuple.getName());
if (out.size() == 10) break;
}
return out;
}
Each operation is amortized O(1) or O(logn), even the top10 operation. So, if you run a million times 'increment a movie's frequency, then obtain the top 10', with n = # of times we do that, then the worst case scenario is O(nlogn) performance.
NB: Uses lombok for constructors, getters, etc - if you don't like that, have your IDE generate these things.

Sort a List of Map<String, String> [duplicate]

This question already has answers here:
Sort a Map<Key, Value> by values
(64 answers)
Closed 8 years ago.
I saw this thread sorting a List of Map<String, String> and I know mine could sound a duplicate, but it is slight differen.
My example is:
List<Map<String, String>> myList = new ArrayList<Map<String, String>>();
...
for(MyClass1 c1 : c1)
{
...
for(MyClass2 c2 : c12)
{
SimpleBindings myBindindings= new SimpleBindings();
myBindindings.put(c1.getName(), c2.getName());
myList.add(myBindindings);
}
}
...
Concretely I can have
{
(John, Mike)
(John, Jack)
(Sam, Jack)
(Gloria, Anna)
(Jane, Carla)
...
}
and would like that my list is sorted by the maps key:
{
(Gloria, Anna)
(Jane, Carla)
(John, Mike)
(John, Jack)
(Sam, Jack)
...
}
Are you sure that
List<Map<String, String>>
is the approriate data type you want?
To me it looks like you are in fact looking simplify for
TreeMap<String, String>
i.e. a sorted map key -> value?
Or do you mean to use a List<StringPair> (for that, please choose a more appropriate name than StringPair, and implement that class to your needs)? I have the impression that in lack of an obvious Pair<String, String> class in Java you have been abusing SimpleBinding as a pair class. The proper way to have pairs in Java is to implement a new class, with a proper class name - "pair" is technical, not semantic.
You could also do
List<String[]>
and implement a Comparator<String[]> for sorting. But that doesn't save you any work over implementing a NamePair class and making it comparable yourself.
You need to implement Comparator to accomplish this...
Collections.sort(myList, new Comparator<ObjectBeingCompared>() {
#Override
public int compare(ObjectBeingCompared obj1, ObjectBeingCompared obj2) {
//Of course you will want to return 1, 0, -1 based on whatever you like
//this is just a simple example
//return 1 if obj1 should be ordered first
//return 0 if obj1 and obj2 are the same
//return -1 if obj1 should be ordered after obj2
return obj1.compareTo(obj2);
}
});
The HashMap data structure is used to allow access to its elements in O(1) time.
Because it is a container of data its pool or keys can vary in time. This mean that you can not assure in long therm an order for list of maps.
In your example you match two Strings and create Pair of data called SimpleBindings.
In case of your simple example you should not use Map<String,String> data structure to represent a Pair of data.
If you SimpleBindings really consist of two string, everything you must do is only implement a Comparable in SimpleBindings class like this:
class SimpleBinding implements Comparable<SimpleBinding> {
private final String key;
private final String value;
public SimpleBinding(String key, String value) {
Objects.nonNull(key);
Objects.nonNull(value);
this.key = key;
this.value = value;
}
#Override
public int compareTo(SimpleBinding that) {
return this.key.compareTo(that.key);
}
}
And the you just use the Collections.sort(bindings ) to have sorted result.
In case you do not have access to the class you should use the Comparator interface like this
enum SimpleBindingComparator implements Comparator<SimpleBinding> {
DEFUALT {
#Override
public int compare(SimpleBinding fist, SimpleBinding second) {
return fist.key.compareTo(second.key);
}
};
Then you sort your bindings like this Collections.sort(bindings ,SimpleBindingComparator.DEFAULT);
But if your case is more complex than this and your store a Map in the list you should define a logic that represent the order. In your case it can be sad that the order must maintained by c1.getName()
One choice is that you should not create a List but a map of list Map<String>,List<String>> this is so called multi map where a single key matches to multiple values. See MultiMap of guava and if you want it to be sorted then i propose to read about TreeMultiMap

Which collections to use?

Suppose I want to store phone numbers of persons. Which kind of collection should I use for key value pairs? And it should be helpful for searching. The name may get repeated, so there may be the same name having different phone numbers.
In case you want to use key value pair. Good choice is to use Map instead of collection.
So what should that map store ?
As far it goes for key. First thing you want to assure is that your key is unique to avoid collisions.
class Person {
long uniqueID;
String name;
String lastname;
}
So we will use the uniqueID of Person for key.
What about value ?
In this case is harder. As the single Person can have many phone numbers. But for simple task lest assume that a person can have only one phone number. Then what you look is
class PhoneNumberRegistry {
Map<Long,String> phoneRegistry = new HashMap<>();
}
Where the long is taken from person. When you deal with Maps, you should implement the hashCode and equals methods.
Then your registry could look like
class PhoneNumberRegistry {
Map<Person,String> phoneRegistry = new HashMap<>();
}
In case when you want to store more then one number for person, you will need to change the type of value in the map.
You can use Set<String> to store multiple numbers that will not duplicate. But to have full control you should introduce new type that not only store the number but also what king of that number is.
class PhoneNumberRegistry {
Map<Person,HashSet<String>> phoneRegistry = new HashMap<>();
}
But then you will have to solve various problems like, what phone number should i return ?
Your problem has different solutions. For example, I'll go with a LIST: List<Person>, where Person is a class like this:
public class Person{
private String name;
private List<String> phoneNumbers;
// ...
}
For collections searching/filtering I suggest Guava Collections2.filter method.
You should use this:
Hashtable<String, ArrayList<String>> addressbook = new Hashtable<>();
ArrayList<String> persons = new ArrayList<String>()
persons.add("Tom Butterfly");
persons.add("Maria Wanderlust");
addressbook.put("+0490301234567", persons);
addressbook.put("+0490301234560", persons);
Hashtable are save to not have empty elements, the ArrayList is fast in collect small elements. Know that multiple persons with different names may have same numbers.
Know that 2 persons can have the same number and the same Name!
String name = "Tom Butterfly";
String[] array = addressbook.keySet().toArray(new String[] {});
int firstElement = Collections.binarySearch(Arrays.asList(array),
name, new Comparator<String>() {
#Override
public int compare(String top, String bottom) {
if (addressbook.get(top).contains(bottom)) {
return 0;
}
return -1;
}
});
System.out.println("Number is " + array[firstElement]);
Maybe
List<Pair<String, String> (for one number per person)
or
List<Pair<String, String[]> (for multiple numbers per person)
will fit your needs.

How to optimize the updating of values in an ArrayList<Integer>

I want to store all values of a certain variable in a dataset and the frequency for each of these values. To do so, I use an ArrayList<String> to store the values and an ArrayList<Integer> to store the frequencies (since I can't use int). The number of different values is unknown, that's why I use ArrayList and not Array.
Example (simplified) dataset:
a,b,c,d,b,d,a,c,b
The ArrayList<String> with values looks like: {a,b,c,d} and the ArrayList<Integer> with frequencies looks like: {2,3,2,2}.
To fill these ArrayLists I iterate over each record in the dataset, using the following code.
public void addObservation(String obs){
if(values.size() == 0){// first value
values.add(obs);
frequencies.add(new Integer(1));
return;//added
}else{
for(int i = 0; i<values.size();i++){
if(values.get(i).equals(obs)){
frequencies.set(i, new Integer((int)frequencies.get(i)+1));
return;//added
}
}
// only gets here if value of obs is not found
values.add(obs);
frequencies.add(new Integer(1));
}
}
However, since the datasets I will use this for can be very big, I want to optimize my code, and using frequencies.set(i, new Integer((int)frequencies.get(i)+1)); does not seem very efficient.
That brings me to my question; how can I optimize the updating of the Integer values in the ArrayList?
Use a HashMap<String,Integer>
Create the HashMap like so
HashMap<String,Integer> hm = new HashMap<String,Integer>();
Then your addObservation method will look like
public void addObservation(String obs) {
if( hm.contains(obs) )
hm.put( obs, hm.get(obs)+1 );
else
hm.put( obs, 1 );
}
I would use a HashMap or a Hashtable as tskzzy suggested. Depending on your needs I would also create an object that has the name, count as well as other metadata that you might need.
So the code would be something like:
Hashtable<String, FrequencyStatistics> statHash = new Hashtable<String, FrequencyStatistics>();
for (String value : values) {
if (statHash.get(value) == null) {
FrequencyStatistics newStat = new FrequencyStatistics(value);
statHash.set(value, newStat);
} else {
statHash.get(value).incrementCount();
}
}
Now, your FrequencyStatistics objects constructor would automatically set its inital count to 1, while the incrementCound() method would increment the count, and perform any other statistical calculations that you might require. This should also be more extensible in the future than storing a hash of the String with only its corresponding Integer.

how to manipulate list in java

Edit: My list is sorted as it is coming from a DB
I have an ArrayList that has objects of class People. People has two properties: ssn and terminationReason. So my list looks like this
ArrayList:
ssn TerminatinoReason
123456789 Reason1
123456789 Reason2
123456789 Reason3
568956899 Reason2
000000001 Reason3
000000001 Reason2
I want to change this list up so that there are no duplicates and termination reasons are seperated by commas.
so above list would become
New ArrayList:
ssn TerminatinoReason
123456789 Reason1, Reason2, Reason3
568956899 Reason2
000000001 Reason3, Reason2
I have something going where I am looping through the original list and matching ssn's but it does not seem to work.
Can someone help?
Code I was using was:
String ssn = "";
Iterator it = results.iterator();
ArrayList newList = new ArrayList();
People ob;
while (it.hasNext())
{
ob = (People) it.next();
if (ssn.equalsIgnoreCase(""))
{
newList.add(ob);
ssn = ob.getSSN();
}
else if (ssn.equalsIgnoreCase(ob.getSSN()))
{
//should I get last object from new list and append this termination reason?
ob.getTerminationReason()
}
}
To me, this seems like a good case to use a Multimap, which would allow storing multiple values for a single key.
The Google Collections has a Multimap implementation.
This may mean that the Person object's ssn and terminationReason fields may have to be taken out to be a key and value, respectively. (And those fields will be assumed to be String.)
Basically, it can be used as follows:
Multimap<String, String> m = HashMultimap.create();
// In reality, the following would probably be iterating over the
// Person objects returned from the database, and calling the
// getSSN and getTerminationReasons methods.
m.put("0000001", "Reason1");
m.put("0000001", "Reason2");
m.put("0000001", "Reason3");
m.put("0000002", "Reason1");
m.put("0000002", "Reason2");
m.put("0000002", "Reason3");
for (String ssn : m.keySet())
{
// For each SSN, the termination reasons can be retrieved.
Collection<String> termReasonsList = m.get(ssn);
// Do something with the list of reasons.
}
If necessary, a comma-separated list of a Collection can be produced:
StringBuilder sb = new StringBuilder();
for (String reason : termReasonsList)
{
sb.append(reason);
sb.append(", ");
}
sb.delete(sb.length() - 2, sb.length());
String commaSepList = sb.toString();
This could once again be set to the terminationReason field.
An alternative, as Jonik mentioned in the comments, is to use the StringUtils.join method from Apache Commons Lang could be used to create a comma-separated list.
It should also be noted that the Multimap doesn't specify whether an implementation should or should not allow duplicate key/value pairs, so one should look at which type of Multimap to use.
In this example, the HashMultimap is a good choice, as it does not allow duplicate key/value pairs. This would automatically eliminate any duplicate reasons given for one specific person.
What you might need is a Hash. HashMap maybe usable.
Override equals() and hashCode() inside your People Class.
Make hashCode return the people (person) SSN. This way you will have all People objects with the same SSN in the same "bucket".
Keep in mind that the Map interface implementation classes use key/value pairs for holding your objects so you will have something like myHashMap.add("ssn",peopleobject);
List<People> newlst = new ArrayList<People>();
People last = null;
for (People p : listFromDB) {
if (last == null || !last.ssn.equals(p.ssn)) {
last = new People();
last.ssn = p.ssn;
last.terminationReason = "";
newlst.add(last);
}
if (last.terminationReason.length() > 0) {
last.terminationReason += ", ";
}
last.terminationReason += p.terminationReason;
}
And you get the aggregated list in newlst.
Update: If you are using MySQL, you can use the GROUP_CONCAT function to extract data in your required format. I don't know whether other DB engines have similar function or not.
Update 2: Removed the unnecessary sorting.
Two possible problems:
This won't work if your list isn't sorted
You aren't doing anything with ob.getTerminationReason(). I think you mean to add it to the previous object.
EDIT: Now that i see you´ve edited your question.
As your list is sorted, (by ssn I presume)
Integer currentSSN = null;
List<People> peoplelist = getSortedList();//gets sorted list from DB.
/*Uses foreach construct instead of iterators*/
for (People person:peopleList){
if (currentSSN != null && people.getSSN().equals(currentSSN)){
//same person
system.out.print(person.getReason()+" ");//writes termination reason
}
else{//person has changed. New row.
currentSSN = person.getSSN();
system.out.println(" ");//new row.
system.out.print(person.getSSN()+ " ");//writes row header.
}
}
If you don´t want to display the contents of your list, you could use it to create a MAP and then use it as shown below.
If your list is not sorted
Maybe you should try a different approach, using a Map. Here, ssn would be the key of the map, and values could be a list of People
Map<Integer,List<People>> mymap = getMap();//loads a Map from input data.
for(Integer ssn:mymap.keyset()){
dorow(ssn,mymap.get(ssn));
}
public void dorow(Integer ssn, List<People> reasons){
system.out.print(ssn+" ");
for (People people:reasons){
system.out.print(people.getTerminationReason()+" ");
}
system.out.println("-----");//row separator.
Last but not least, you should override your hashCode() and equals() method on People class.
for example
public void int hashcode(){
return 3*this.reason.hascode();
}

Categories

Resources