I am trying to extract and count the number of different elements in the values of a map. The thing is that it's not just a map, but many of them and they're to be obtained from a list of maps.
Specifically, I have a Tournament class with a List<Event> event member. An event has a Map<Localization, Set<Timeslot>> unavailableLocalizations member. I would like to count the distincts for all those timeslots values.
So far I managed to count the distincts in just one map like this:
event.getUnavailableLocalizations().values().stream().distinct().count()
But what I can't figure out is how to do that for all the maps instead of for just one.
I guess I would need some way to take each event's map's values and put all that into the stream, then the rest would be just as I did.
Let's do it step by step:
listOfEvents.stream() //stream the events
.map(Event::getUnavailableLocalizations) //for each event, get the map
.map(Map::values) //get the values
.flatMap(Collection::stream) //flatMap to merge all the values into one stream
.distinct() //remove duplicates
.count(); //count
Related
Suppose I read whole files:
JavaPairRDD<String, String> filesRDD = sc.wholeTextFiles(inputDataPath);
Then, I have the following mapper which s:
JavaRDD<List<String>> processingFiles = filesRDD.map(fileNameContent -> {
List<String> results = new ArrayList<String>();
for ( some loop ) {
if (condition) {
results.add(someString);
}
}
. . .
return results;
});
For the sake of argument, suppose that inside the mapper I need to make a list of strings, which I return from each file. Now, each string in each list can be viewed independently and needs to be processed later on independently. I don't want Spark to process each list at once, but each string of each list at once. Later when I use collect() I get a list of lists.
One way to put this is: how to parallelize this list of lists for each string individually not for each list individually?
Instead of mapping filesRDD to get a list of lists, flatmap it and you can get an RDD of strings.
EDIT: Adding comment out of request
Map is a 1:1 function where 1 input row -> 1 output row. Flatmap is a 1:N function where 1 input row -> many (or 0) output rows. If you use flatMap, you can design it so your output RDD is and RDD of strings whereas currently your output RDD is a RDD of lists of strings. It sounds like this is what you want. I'm not a java-spark user, so I can't give you syntax specifics. Check here for help on syntax
I have a Map of Long and String - Map which stores my timestamp as the key and value as my data.
Map<Long, String> eventTimestampHolder = new HashMap<Long, String>();
Now I want to get 100 most recent data from the above map by looking at timestamp which is part of key and then keep on adding those data in a List of String. In general populate the 100 most recent data in a List.
What is the best way to do this? Can I use TreeMap here and it will sort my keys basis on the timestamp properly?
In general my timestamp is going to look like this - 1417686422238 and it will be in milliseconds
In case you mean by "recent" recently added, then you can try LinkedHashMap which will maintain the order of insertion. Then you can iterate over the first 100 items.
You can iterate over the map like this:
for(Long key : eventTimestampHolder.keySet()) {
String value = eventTimestampHolder.get(key);
}
For any key that can be sorted, you should use a SortedMap (unless there are other requirements making it unsuitable). A TreeMap is a sorted map. Since you need the most recent k entries, you need the largest keys first. This can be done by going through the k first keys in the map's descendingKeySet as follows, a one-liner in Java-8:
eventTimestampHolder.descendingKeySet().stream().limit(k); // in your case, k = 100
If you want not just the keys, but the values as well, then you could find the k'th key, and then use
// the 2nd arg is a boolean indicating whether the k'th entry will be included or not
eventTimestampHolder.tailMap(kthTimestamp, true);
One thing to remember when using tailMap is that it will be backed by the original eventTimestampHolder map, and any changes to that will be reflected in the returned tail map.
I am trying to get a balancing algorithm but struggling to understand how to acheieve this.
I have a HashMap as below:
Map criteria = new HashMap();
criteria.put("0",5);
criteria.put("1",8);
criteria.put("2",0);
criteria.put("3",7);
...
Also, I have a two-dimensional array where I need to have the values fetched from the HashMap and put into the array in a balanced way.
String[][] arr = new String[3][1];
The end result I am looking for is something like this:
arr[0][0]="0"; (the key of the hashmap)
arr[1][0]="1";
arr[2][0]="2,3"; (as the key "2" has the value less among the first two entries, the key "3" should be appended top this node)
So, basically the keys of the HashMap should get evenly distributed into the Array based on its values.
Thanks a lot in advance for the help.
I am using multi key bags in order to count occurrences of certain combinations of values and I was wondering if there is an elegant way to convert these bags into nested SortedMaps like TreeMaps. The number of nested TreeMaps being equal to the number of components in the multi key. For instance, let's say I have a multi key bag which has a defined key:
multiKey = new String[]{"age", "height", "gender"}
thus, the object I would like to obtain from it would be:
TreeMap<Integer, TreeMap<Integer, TreeMap<Integer, Integer>>>
and populate it with the values from the multi key bag. So, the nested structure would contain the values from the multi key like this:
TreeMap<"age", TreeMap<"height", TreeMap<"gender", count>>>
where "age" is replaced by the corresponding value from the bag, "height" as well and so on.. count is the number of occurrences of that particular combination (which is returned by the multi key bag itself).
Of course, the number of components of the multi key is dynamic.
If the multiKey would have only two components, then the resulting object would be:
TreeMap<Integer<TreeMap<Integer, Integer>>
Retrieving the values from the bag and populating the (nested) TreeMaps does not represent an issue. Only the conversion. Any help is appreciated.
Instead of using a bunch of wrappers, why don't you just create your own class that groups related data together? It seems like this would very much simplify the process.
Nonetheless, If what you actually want is to be able to perform complex queries on your data (pseudo-code):
SELECT ALL (MALES >= 25 && HEIGHT < 6'1) && (FEMALES < 40 && HEIGHT > 5'0)
Then you should probably look into using a database. I'm not saying that a Tree is bad, but if your goal is to be able to easily/quickly perform complex queries, then a database is the way to go. Of course, you could write your own classes/methods to perform these calculations for you, but why reinvent the wheel if you don't have to?
I have an array list with 5 elements each of which is an Enum. I want to build a method which returns another array list with the most common element(s) in the list.
Example 1:
[Activities.WALKING, Activities.WALKING, Activities.WALKING, Activities.JOGGING, Activities.STANDING]
Method would return: [Activities.WALKING]
Example 2:
[Activities.WALKING, Activities.WALKING, Activities.JOGGING, Activities.JOGGING, Activities.STANDING]
Method would return: [Activities.WALKING, Activities.JOGGING]
WHAT HAVE I TRIED:
My idea was to declare a count for every activity but that means that if I want to add another activity, I have to go and modify the code to add another count for that activity.
Another idea was to declare a HashMap<Activities, Integer> and iterate the array to insert each activity and its occurence in it. But then how will I extract the Activities with the most occurences?
Can you help me out guys?
The most common way of implementing something like this is counting with a Map: define a Map<MyEnum,Integer> which stores zeros for each element of your enumeration. Then walk through your list, and increment the counter for each element that you find in the list. At the same time, maintain the current max count. Finally, walk through the counter map entries, and add to the output list the keys of all entries the counts of which matches the value of max.
In statistics, this is called the "mode" (in your specific case, "multi mode" is also used, as you want all values that appear most often, not just one). A vanilla Java 8 solution looks like this:
Map<Activities, Long> counts =
Stream.of(WALKING, WALKING, JOGGING, JOGGING, STANDING)
.collect(Collectors.groupingBy(s -> s, Collectors.counting()));
long max = Collections.max(counts.values());
List<Activities> result = counts
.entrySet()
.stream()
.filter(e -> e.getValue().longValue() == max)
.map(Entry::getKey)
.collect(Collectors.toList());
Which yields:
[WALKING, JOGGING]
jOOλ is a library that supports modeAll() on streams. The following program:
System.out.println(
Seq.of(WALKING, WALKING, JOGGING, JOGGING, STANDING)
.modeAll()
.toList()
);
Yields:
[WALKING, JOGGING]
(disclaimer: I work for the company behind jOOλ)