Find most common/frequent element in an ArrayList in Java

Find most common/frequent element in an ArrayList in Java - java

I have an array list with 5 elements each of which is an Enum. I want to build a method which returns another array list with the most common element(s) in the list.
Example 1:
[Activities.WALKING, Activities.WALKING, Activities.WALKING, Activities.JOGGING, Activities.STANDING]
Method would return: [Activities.WALKING]
Example 2:
[Activities.WALKING, Activities.WALKING, Activities.JOGGING, Activities.JOGGING, Activities.STANDING]
Method would return: [Activities.WALKING, Activities.JOGGING]
WHAT HAVE I TRIED:
My idea was to declare a count for every activity but that means that if I want to add another activity, I have to go and modify the code to add another count for that activity.
Another idea was to declare a HashMap<Activities, Integer> and iterate the array to insert each activity and its occurence in it. But then how will I extract the Activities with the most occurences?
Can you help me out guys?

The most common way of implementing something like this is counting with a Map: define a Map<MyEnum,Integer> which stores zeros for each element of your enumeration. Then walk through your list, and increment the counter for each element that you find in the list. At the same time, maintain the current max count. Finally, walk through the counter map entries, and add to the output list the keys of all entries the counts of which matches the value of max.

In statistics, this is called the "mode" (in your specific case, "multi mode" is also used, as you want all values that appear most often, not just one). A vanilla Java 8 solution looks like this:
Map<Activities, Long> counts =
Stream.of(WALKING, WALKING, JOGGING, JOGGING, STANDING)
.collect(Collectors.groupingBy(s -> s, Collectors.counting()));
long max = Collections.max(counts.values());
List<Activities> result = counts
.entrySet()
.stream()
.filter(e -> e.getValue().longValue() == max)
.map(Entry::getKey)
.collect(Collectors.toList());
Which yields:
[WALKING, JOGGING]
jOOλ is a library that supports modeAll() on streams. The following program:
System.out.println(
Seq.of(WALKING, WALKING, JOGGING, JOGGING, STANDING)
.modeAll()
.toList()
);
Yields:
[WALKING, JOGGING]
(disclaimer: I work for the company behind jOOλ)

Related

java 8 double findFirst on stream

I have a HashMap that contains a single value which is an ArrayList that also contains a single value as well. I need to extract the single value from the ArrayList. At the moment I'm doing it like this:
map.values()
.stream()
.findFirst()
.orElse(new ArrayList<>()).stream().findFirst().orElse(null)
This works, but I hope for a more elegant way to get the first element of a stream inside the first element of a stream.
In other words, I want eliminate the double stream().findFirst().
Is that possible?

Remove all instances of an item in a list if it appears more than once

Given a List of numbers: { 4, 5, 7, 3, 5, 4, 2, 4 }
The desired output would be: { 7, 3, 2 }
The solution I am thinking of is create below HashMap from the given List:
Map<Integer, Integer> numbersCountMap = new HashMap();
where key is the value of from the list and value is the occurrences count.
Then loop through the HashMap entry set and where ever the number contains count greater than one remove that number from the list.
for (Map.Entry<Int, Int> numberCountEntry : numbersCountMap.entrySet()) {
if(numberCountEntry.getValue() > 1) {
testList.remove(numberCountEntry.getKey());
}
}
I am not sure whether this is an efficient solution to this problem, as the remove(Integer) operation on a list can be expensive. Also I am creating additional Map data structure. And looping twice, once on the original list to create the Map and then on the map to remove duplicates.
Could you please suggest a better way. May be Java 8 has better way of implementing this.
Also can we do it in few lines using Streams and other new structures in Java 8?

By streams you can use:
Map<Integer, Long> grouping = integers.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
grouping.values().removeIf(c -> c > 1);
Set<Integer> result = grouping.keySet();
Or as #Holger mentioned, all you want to know, is whether there is more than one integer in your list, so just do:
Map<Integer, Boolean> grouping = integers.stream()
.collect(Collectors.toMap(Function.identity(),
x -> false, (a, b) -> true,
HashMap::new));
grouping.values().removeIf(b -> b);
// or
grouping.values().removeAll(Collections.singleton(true));
Set<Integer> result = grouping.keySet();

While YCF_L's answer does the thing and yields the correct result, I don't think it's a good solution to go with, since it mixes functional and procedural approaches by mutating the intermediary collection.
A functional approach would assume either of the following solutions:
Using intermediary variable:
Map<Integer, Boolean> map =
integers.stream()
.collect(toMap(identity(), x -> true, (a, b) -> false));
List<Integer> result = map.entrySet()
.stream()
.filter(Entry::getValue)
.map(Entry::getKey)
.collect(toList());
Note that we don't even care about the mutability of the map variable. Thus we can omit the 4th parameter of toMap collector.
Chaining two pipelines (similar to Alex Rudenko's answer):
List<Integer> result =
integers.stream()
.collect(toMap(identity(), x -> true, (a, b) -> false))
.entrySet()
.stream()
.filter(Entry::getValue)
.map(Entry::getKey)
.collect(toList());
This code is still safe, but less readable. Chaining two or more pipelines is discouraged.
Pure functional approach:
List<Integer> result =
integers.stream()
.collect(collectingAndThen(
toMap(identity(), x -> true, (a, b) -> false),
map -> map.entrySet()
.stream()
.filter(Entry::getValue)
.map(Entry::getKey)
.collect(toList())));
The intermediary state (the grouped map) does not get exposed to the outside world. So we may be sure nobody will modify it while we're processing the result.

It's overengineered for just this problem. Also, your code is faulty:
It's Integer, not Int (minor niggle)
More importantly, a remove call removes the first matching element, and to make matters considerably worse, remove on lists is overloaded: There's remove(int) which removes an element by index, and remove(Object) which removes an element by looking it up. In a List<Integer>, it is very difficult to know which one you're calling. You want the 'remove by lookup' one.
On complexity:
On modern CPUs, it's not that simple. The CPU works on 'pages' of memory, and because fetching a new page takes on the order of 500 cycles or more, it makes more sense to simplify matters and consider any operation that does NOT require a new page of memory to be loaded, to be instantaneous.
That means that if we're talking about a list of, say, 10,000 numbers or fewer? None of it matters. It'll fly by. Any debate about 'efficiency' is meaningless until we get to larger counts.
Assuming that 'efficiency' is still relevant:
integers don't have hashcode collisions.
hashmaps with few to no key hash collisions are effectively O(1) on all single element ops such as 'add' and 'get'.
arraylist's .remove(Object) method is O(n). It takes longer the larger the list is. In fact, it is doubly O(n): it takes O(n) time to find the element you want to remove, and then O(n) time again to actually remove it. For fundamental informatics twice O(n) is still O(n) but pragmatically speaking, arrayList.remove(item) is pricey.
You're calling .remove about O(n) times, making the total complexity O(n^2). Not great, and not the most efficient algorithm. Practically or fundamentally.
An efficient strategy is probably to just commit to copying. A copy operation is O(n). For the whole thing, instead of O(n) per item. Sorting is O(n log n). This gives us a trivial algorithm:
Sort the input. Note that you can do this with an int[] too; until java 16 is out and you can use primitives in collections, int[] is an order of magnitude more efficient than a List<Integer>.
loop through the sorted input. Don't immediately copy, but use an intermediate: For the 0th item in the list, remember only 'the last item was FOO' and 'how many times did I see foo?'. Then, for any item, check if it is the same as the previous. If yes, increment count. If not, check the count: if it was 1, add it to the output, otherwise don't. In any case, update the 'last seen value' to the new value and set the count to 1. At the end, make sure to add the last remembered value if the count is 1, and make sure your code works even for empty lists.
That's O(n log n) + O(n) complexity, which is O(n log n) - a lot better than your O(n^2) take.
Use int[], and add another step that you first go through juuust to count how large the output would be (because arrays cannot grow/shrink), and now you have a time complexity of O(n log n) + 2*O(n) which is still O(n log n), and the lowest possible memory complexity, as sort is in-place and doesn't cost any extra.
If you really want to tweak it, you can go with a space complexity of 0 (you can write the reduced list inside the input).
One problem with this strategy is that you mess up the ordering in the input. The algorithm would produce 2, 3, 7. If you want to preserve order, you can combine the hashmap solution with the sort, and make a copy as you loop solution.

You may count the frequency of each number into LinkedHashMap to keep insertion order if it's relevant, then filter out the single numbers from the entrySet() and keep the keys.
List<Integer> data = Arrays.asList(4, 5, 7, 3, 5, 4, 2, 4);
List<Integer> singles = data.stream()
.collect(Collectors.groupingBy(Function.identity(), LinkedHashMap::new, Collectors.counting()))
.entrySet().stream()
.filter(e -> e.getValue() == 1)
.map(Map.Entry::getKey)
.collect(Collectors.toList());
System.out.println(singles);
Printed output:
[7, 3, 2]

You can use 3-argument reduce method and walk through the stream only once, maintaining two sets of selected and rejected values.
final var nums = Stream.of(4, 5, 7, 3, 5, 4, 2, 4);
final var init = new Tuple<Set<Integer>>(new LinkedHashSet<Integer>(), new LinkedHashSet<Integer>());
final var comb = (BinaryOperator<Tuple<Set<Integer>>>) (a, b) -> a;
final var accum = (BiFunction<Tuple<Set<Integer>>, Integer, Tuple<Set<Integer>>>) (t, elem) -> {
if (t.fst().contains(elem)) {
t.snd().add(elem);
t.fst().remove(elem);
} else if (!t.snd().contains(elem)) {
t.fst().add(elem);
}
return t;
};
Assertions.assertEquals(nums.reduce(init, accum, comb).fst(), Set.of(7, 3, 2));
In this example, Tuple were defined as record Tuple<T> (T fst, T snd) { }

Decided against the sublist method due to poor performance on large data sets. The following alternative is faster, and holds its own against stream solutions. Probably because Set access to an element is in constant time. The downside is that it requires extra data structures. Give an ArrayList list of elements, this seems to work quite well.
Set<Integer> dups = new HashSet<>(list.size());
Set<Integer> result = new HashSet<>(list.size());
for (int i : list) {
if (dups.add(i)) {
result.add(i);
continue;
}
result.remove(i);
}

Find Max of Multiple Lists

I'm pretty new to java streams and am trying to determine how to find the max from each list, in a list of lists, and end with a single list that contains the max from each sublist.
I can accomplish this by using a for loop and stream like so:
// databaseRecordsLists is a List<List<DatabaseRecord>>
List<DatabaseRecord> mostRecentRecords = new ArrayList<>();
for (List<DatabaseRecord> databaseRecords : databaseRecordsLists) {
mostRecentRecords.add(databaseRecords.stream()
.max(Comparator.comparing(DatabaseRecord::getTimestamp))
.orElseThrow(NoSuchElementException::new));
}
I've looked into the flatMap api, but then I'll only end up with a single map of all DatabaseRecord objects, where I need a max from each individual list.
Any ideas on a cleaner way to accomplish this?

You don't need flatMap. Create a Stream<List<DatabaseRecord>>, and map each List<DatabaseRecord> of the Stream to the max element. Then collect all the max elements into the output List.
List<DatabaseRecord> mostRecentRecords =
databaseRecordsLists.stream()
.map(list -> list.stream()
.max(Comparator.comparing(DatabaseRecord::getTimestamp))
.orElseThrow(NoSuchElementException::new))
.collect(Collectors.toList());

Based on the comments, I suggested to rather ignore the empty collection, otherwise, no result would be returned and NoSuchElementException thrown even the empty collection might (?) be a valid state. If so, you can improve the current solution:
databaseRecordsLists.stream()
.filter(list -> !list.isEmpty()) // Only non-empty ones
.map(list -> list.stream()
.max(Comparator.comparing(DatabaseRecord::getTimestamp)) // Get these with max
.orElseThrow(NoSuchElementException::new)) // Never happens
.collect(Collectors.toList()); // To List
If you use a version higher than Java 8:
As of Java 10, orElseThrow(NoSuchElementException::new) can be subsituted with orElseThrow().
As of Java 11, you can use Predicate.not(..), therefore the filter part would look like: .filter(Predicate.not(List::isEmpty)).

Java Stream. Extracting distinct values of multiple maps

I am trying to extract and count the number of different elements in the values of a map. The thing is that it's not just a map, but many of them and they're to be obtained from a list of maps.
Specifically, I have a Tournament class with a List<Event> event member. An event has a Map<Localization, Set<Timeslot>> unavailableLocalizations member. I would like to count the distincts for all those timeslots values.
So far I managed to count the distincts in just one map like this:
event.getUnavailableLocalizations().values().stream().distinct().count()
But what I can't figure out is how to do that for all the maps instead of for just one.
I guess I would need some way to take each event's map's values and put all that into the stream, then the rest would be just as I did.

Let's do it step by step:
listOfEvents.stream() //stream the events
.map(Event::getUnavailableLocalizations) //for each event, get the map
.map(Map::values) //get the values
.flatMap(Collection::stream) //flatMap to merge all the values into one stream
.distinct() //remove duplicates
.count(); //count

Java Streams: Organize a collection into a map and select smallest key

I'm pretty sure this is not possible in one line, but I just wanted to check:
List<WidgetItem> selectedItems = null;
Map<Integer, List<WidgetItem>> itemsByStockAvailable = WidgetItems.stream()
.collect(Collectors.groupingBy(WidgetItem::getAvailableStock));
selectedItems = itemsByStockAvailable.get(
itemsByStockAvailable.keySet().stream().sorted().findFirst().get());
Basically I'm collecting all widget items into a map where the key is the availableStock quantity and the value is a list of all widgets that have that quantity (since multiple widgets might have the same value). Once I have that map, I would want to select the map's value that corresponds to the smallest key. The intermediate step of creating a Map isn't necessary, it's just the only way I could think of to do this.

It appears what you want is to keep all the widget items that were grouped with the lowest available stock. In that case, you can collect the grouped data into a TreeMap to ensure the ordering based on increasing values of the stock and retrieve the first entry with firstEntry()
List<WidgetItem> selectedItems =
widgetItems.stream()
.collect(Collectors.groupingBy(
WidgetItem::getAvailableStock,
TreeMap::new,
Collectors.toList()
))
.firstEntry()
.getValue();
The advantage is that it is done is one-pass over the initial list.

Essentially you want to get all the input elements which are minimal according to the custom comparator Comparator.comparingInt(WidgetItem::getAvailableStock). In general this problem could be solved without necessity to store everything into the intermediate map creating unnecessary garbage. Also it could be solved in single pass. Some interesting solutions already present in this question. For example, you may use the collector implemented by Stuart Marks:
List<WidgetItem> selectedItems = widgetItems.stream()
.collect(maxList(
Comparator.comparingInt(WidgetItem::getAvailableStock).reversed()));
Such collectors are readily available in my StreamEx library. The best suitable in your case is MoreCollectors.minAll(Comparator):
List<WidgetItem> selectedItems = widgetItems.stream()
.collect(MoreCollectors.minAll(
Comparator.comparingInt(WidgetItem::getAvailableStock)));

If you want to avoid creating the intermediate map, you can first determine the smallest stock value, filter by that value and collect to list.
int minStock = widgetItems.stream()
.mapToInt(WidgetItem::getAvailableStock)
.min()
.getAsInt(); // or throw if list is empty
List<WidgetItem> selectItems = widgetItems.stream()
.filter(w -> minStock == w.getAvailableStock())
.collect(toList());
Also, do not use sorted().findFirst() to find the min value of a stream. Use min instead.

You can find the smallest key in a first pass and then get all the items having that smallest key:
widgetItems.stream()
.map(WidgetItem::getAvailableStock)
.min(Comparator.naturalOrder())
.map(min ->
widgetItems.stream()
.filter(item -> item.getAvailableStock().equals(min))
.collect(toList()))
.orElse(Collections.emptyList());

I would collect the data into a NavigableMap, which involves only a small change to your original code:
List<WidgetItem> selectedItems = null;
NavigableMap<Integer, List<WidgetItem>> itemsByStockAvailable =
WidgetItems.stream()
.collect(Collectors.groupingBy(WidgetItem::getAvailableStock,
TreeMap::new, Collectors.toList()));
selectedItems = itemsByStockAvailable.firstEntry().getValue();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find most common/frequent element in an ArrayList in Java - java

Related

java 8 double findFirst on stream

Remove all instances of an item in a list if it appears more than once

Find Max of Multiple Lists

Java Stream. Extracting distinct values of multiple maps

Java Streams: Organize a collection into a map and select smallest key

Categories

Resources