create map of int occurrences using Java 8 - java

I am aware my question is very similar to Count int occurrences with Java8
, but I still cannot solve my case, which must be easier to solve.
The need is to count how many times integers repeat in a stream of integers (will be coming from a file, may be up to 1000000 integers). I thought it might be useful to create a map, where Integer will be a Key, and number of occurrences will be a value.
The exception is
Error:(61, 66) java: method collect in interface
java.util.stream.IntStream cannot be applied to given types;
required:
java.util.function.Supplier,java.util.function.ObjIntConsumer,java.util.function.BiConsumer
found: java.util.stream.Collector> reason: cannot
infer type-variable(s) R
(actual and formal argument lists differ in length)
However, in Java 8 there is a Collectors.groupingBy, which should suffice
Collector<T, ?, Map<K, D>> groupingBy(Function<? super T, ? extends K> classifier,Collector<? super T, A, D> downstream)
The problem is that my code is not compiling and I do not see - why.
I simplified it to this:
Map<Integer,Integer> result = IntStream.range(0,100).collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
What is the reason for not compiling?
Thank you in advance :)

IntStream has one method collect where the second argument operates on an int not an Object. Using boxed() turns an IntStream into a Stream<Integer>
Also counting() returns a long.
Map<Integer, Long> result = IntStream.range(0, 100).boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

I have solved the task at hand using Peter's idea.
I'm posting the solution in case somebody is studying Java 8 and does not want to repeat my mistakes.
The task was to:
read numbers from file
find how often each number occurs
find how
many pairs can be found for numbers occurring more than once. For
example, if number 3 occurs 4 times, we will have 6 pairs (I used Apache's
CombinatoricsUtils.binomialCoefficient for that).
My solution:
long result = Arrays.stream(Files.lines(Paths.get(fileName)).mapToInt(Integer::parseInt).collect(() ->
new int[BYTE_MAX_VALUE], (array, value) -> array[value] += 1, (a1, a2) ->
Arrays.setAll(a1, i -> a1[i] + a2[i]))).map((int i) -> combinatorics(i, 2)).sum()

If you're open to using a third party library with primitive collections, you can potentially avoid the boxing operations. For example, if you use Eclipse Collections, you can write the following.
IntBag integers = Interval.oneTo(100).collectInt(i -> i % 10).toBag();
Assert.assertEquals(10, integers.occurrencesOf(0));
Assert.assertEquals(10, integers.occurrencesOf(1));
Assert.assertEquals(10, integers.occurrencesOf(9));
An IntHashBag is implemented by using an IntIntHashMap, so neither the keys (your integers) nor the values (the counts) are boxed.
The same can be accomplished if you loop through your file and add your results to an IntHashBag from an IntStream.
MutableIntBag integers = IntBags.mutable.empty();
IntStream.range(1, 101).map(i -> i % 10).forEach(integers::add);
Assert.assertEquals(10, integers.occurrencesOf(0));
Assert.assertEquals(10, integers.occurrencesOf(1));
Assert.assertEquals(10, integers.occurrencesOf(9));
Note: I am a committer for Eclipse Collections.

Related

Remove all instances of an item in a list if it appears more than once

Given a List of numbers: { 4, 5, 7, 3, 5, 4, 2, 4 }
The desired output would be: { 7, 3, 2 }
The solution I am thinking of is create below HashMap from the given List:
Map<Integer, Integer> numbersCountMap = new HashMap();
where key is the value of from the list and value is the occurrences count.
Then loop through the HashMap entry set and where ever the number contains count greater than one remove that number from the list.
for (Map.Entry<Int, Int> numberCountEntry : numbersCountMap.entrySet()) {
if(numberCountEntry.getValue() > 1) {
testList.remove(numberCountEntry.getKey());
}
}
I am not sure whether this is an efficient solution to this problem, as the remove(Integer) operation on a list can be expensive. Also I am creating additional Map data structure. And looping twice, once on the original list to create the Map and then on the map to remove duplicates.
Could you please suggest a better way. May be Java 8 has better way of implementing this.
Also can we do it in few lines using Streams and other new structures in Java 8?
By streams you can use:
Map<Integer, Long> grouping = integers.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
grouping.values().removeIf(c -> c > 1);
Set<Integer> result = grouping.keySet();
Or as #Holger mentioned, all you want to know, is whether there is more than one integer in your list, so just do:
Map<Integer, Boolean> grouping = integers.stream()
.collect(Collectors.toMap(Function.identity(),
x -> false, (a, b) -> true,
HashMap::new));
grouping.values().removeIf(b -> b);
// or
grouping.values().removeAll(Collections.singleton(true));
Set<Integer> result = grouping.keySet();
While YCF_L's answer does the thing and yields the correct result, I don't think it's a good solution to go with, since it mixes functional and procedural approaches by mutating the intermediary collection.
A functional approach would assume either of the following solutions:
Using intermediary variable:
Map<Integer, Boolean> map =
integers.stream()
.collect(toMap(identity(), x -> true, (a, b) -> false));
List<Integer> result = map.entrySet()
.stream()
.filter(Entry::getValue)
.map(Entry::getKey)
.collect(toList());
Note that we don't even care about the mutability of the map variable. Thus we can omit the 4th parameter of toMap collector.
Chaining two pipelines (similar to Alex Rudenko's answer):
List<Integer> result =
integers.stream()
.collect(toMap(identity(), x -> true, (a, b) -> false))
.entrySet()
.stream()
.filter(Entry::getValue)
.map(Entry::getKey)
.collect(toList());
This code is still safe, but less readable. Chaining two or more pipelines is discouraged.
Pure functional approach:
List<Integer> result =
integers.stream()
.collect(collectingAndThen(
toMap(identity(), x -> true, (a, b) -> false),
map -> map.entrySet()
.stream()
.filter(Entry::getValue)
.map(Entry::getKey)
.collect(toList())));
The intermediary state (the grouped map) does not get exposed to the outside world. So we may be sure nobody will modify it while we're processing the result.
It's overengineered for just this problem. Also, your code is faulty:
It's Integer, not Int (minor niggle)
More importantly, a remove call removes the first matching element, and to make matters considerably worse, remove on lists is overloaded: There's remove(int) which removes an element by index, and remove(Object) which removes an element by looking it up. In a List<Integer>, it is very difficult to know which one you're calling. You want the 'remove by lookup' one.
On complexity:
On modern CPUs, it's not that simple. The CPU works on 'pages' of memory, and because fetching a new page takes on the order of 500 cycles or more, it makes more sense to simplify matters and consider any operation that does NOT require a new page of memory to be loaded, to be instantaneous.
That means that if we're talking about a list of, say, 10,000 numbers or fewer? None of it matters. It'll fly by. Any debate about 'efficiency' is meaningless until we get to larger counts.
Assuming that 'efficiency' is still relevant:
integers don't have hashcode collisions.
hashmaps with few to no key hash collisions are effectively O(1) on all single element ops such as 'add' and 'get'.
arraylist's .remove(Object) method is O(n). It takes longer the larger the list is. In fact, it is doubly O(n): it takes O(n) time to find the element you want to remove, and then O(n) time again to actually remove it. For fundamental informatics twice O(n) is still O(n) but pragmatically speaking, arrayList.remove(item) is pricey.
You're calling .remove about O(n) times, making the total complexity O(n^2). Not great, and not the most efficient algorithm. Practically or fundamentally.
An efficient strategy is probably to just commit to copying. A copy operation is O(n). For the whole thing, instead of O(n) per item. Sorting is O(n log n). This gives us a trivial algorithm:
Sort the input. Note that you can do this with an int[] too; until java 16 is out and you can use primitives in collections, int[] is an order of magnitude more efficient than a List<Integer>.
loop through the sorted input. Don't immediately copy, but use an intermediate: For the 0th item in the list, remember only 'the last item was FOO' and 'how many times did I see foo?'. Then, for any item, check if it is the same as the previous. If yes, increment count. If not, check the count: if it was 1, add it to the output, otherwise don't. In any case, update the 'last seen value' to the new value and set the count to 1. At the end, make sure to add the last remembered value if the count is 1, and make sure your code works even for empty lists.
That's O(n log n) + O(n) complexity, which is O(n log n) - a lot better than your O(n^2) take.
Use int[], and add another step that you first go through juuust to count how large the output would be (because arrays cannot grow/shrink), and now you have a time complexity of O(n log n) + 2*O(n) which is still O(n log n), and the lowest possible memory complexity, as sort is in-place and doesn't cost any extra.
If you really want to tweak it, you can go with a space complexity of 0 (you can write the reduced list inside the input).
One problem with this strategy is that you mess up the ordering in the input. The algorithm would produce 2, 3, 7. If you want to preserve order, you can combine the hashmap solution with the sort, and make a copy as you loop solution.
You may count the frequency of each number into LinkedHashMap to keep insertion order if it's relevant, then filter out the single numbers from the entrySet() and keep the keys.
List<Integer> data = Arrays.asList(4, 5, 7, 3, 5, 4, 2, 4);
List<Integer> singles = data.stream()
.collect(Collectors.groupingBy(Function.identity(), LinkedHashMap::new, Collectors.counting()))
.entrySet().stream()
.filter(e -> e.getValue() == 1)
.map(Map.Entry::getKey)
.collect(Collectors.toList());
System.out.println(singles);
Printed output:
[7, 3, 2]
You can use 3-argument reduce method and walk through the stream only once, maintaining two sets of selected and rejected values.
final var nums = Stream.of(4, 5, 7, 3, 5, 4, 2, 4);
final var init = new Tuple<Set<Integer>>(new LinkedHashSet<Integer>(), new LinkedHashSet<Integer>());
final var comb = (BinaryOperator<Tuple<Set<Integer>>>) (a, b) -> a;
final var accum = (BiFunction<Tuple<Set<Integer>>, Integer, Tuple<Set<Integer>>>) (t, elem) -> {
if (t.fst().contains(elem)) {
t.snd().add(elem);
t.fst().remove(elem);
} else if (!t.snd().contains(elem)) {
t.fst().add(elem);
}
return t;
};
Assertions.assertEquals(nums.reduce(init, accum, comb).fst(), Set.of(7, 3, 2));
In this example, Tuple were defined as record Tuple<T> (T fst, T snd) { }
Decided against the sublist method due to poor performance on large data sets. The following alternative is faster, and holds its own against stream solutions. Probably because Set access to an element is in constant time. The downside is that it requires extra data structures. Give an ArrayList list of elements, this seems to work quite well.
Set<Integer> dups = new HashSet<>(list.size());
Set<Integer> result = new HashSet<>(list.size());
for (int i : list) {
if (dups.add(i)) {
result.add(i);
continue;
}
result.remove(i);
}

How can I collect a list of strings to a map, where each string is a key? [duplicate]

This question already has answers here:
Construct a Map using object reference as key using Streams API and Collectors.toMap()
(2 answers)
Closed 3 years ago.
I am working on an exercise to count words in a phrase.
I have a regex I'm happy with to split the phrase into word tokens, so I can complete the work with basic loops - no problem.
But I'd like to use streams to collect the strings into a map instead of using basic loops.
I need each word as a key and, for now, I'd just like the integer 1 as the value.
Having done some research online I should be able to collect the list of words into a map like so:
public Map<String, Integer> phrase(String phrase) {
List<String> words = //... tokenized words from phrase
return words.stream().collect(Collectors.toMap(word -> word, 1));
}
I have tried this, and several variations (casting word, using Function.identity()), but keep getting the error:
The method toMap(Function<? super T,? extends K>, Function<? super T,? extends U>) in the type Collectors is not applicable for the arguments ((<no type> s) -> {}, int)
Any example I've found to date only uses the string as the value, but otherwise indicates that this should be OK.
What do I need to change to make this work?
To get over the compilation error, you need:
return words.stream().collect(Collectors.toMap(word -> word, word -> 1));
however, this would result in all the values of the Map being 1, and if you have duplicate elements in words, you'll get an exception.
You'll need to either use Collectors.groupingBy or Collectors.toMap with a merge function to handle duplicate values.
For example
return words.stream().collect(Collectors.groupingBy(word -> word, Collectors.counting()));
or
return words.stream().collect(Collectors.toMap(word -> word, word -> 1, Integer::sum));

RxJava - groupBy, toMap and flatMap not working well together?

I would expect this small example to print all numbers which are divisible by 3.
#Test
public void test() {
Observable.range(1, 100)
.groupBy(n -> n % 3)
.toMap(g -> g.getKey())
.flatMap(m -> m.get(0))
.subscribe(System.out::println);
}
The println is not printing anything instead, and I don't get why.
I reduced this example from a more complex one, I understand this can be done in a different way, but I need it this way as there are more groups involved which needs to be manipulated in the flatMap at the same time.
Thanks for your help!
Use the method filter(Predicate<? super T> predicate) instead of groupBy(..) to emit the elements that satisfy a specified predicate.
Observable.range(1, 100)
.filter(n -> n % 3 == 0)
.toMap(g -> g.getKey())
.flatMap(m -> m.get(0))
.subscribe(System.out::println);
Java Stream-API works on the same principle:
IntStream.range(1, 100).filter(n -> n%3 == 0).forEach(System.out::println);
// prints 3, 6, 9, 12... on the each line
The groupBy operator maps the input source items to HashMap where the latest value is stored under the key.
Quoting the documentation:
If more than one source item maps to the same key, the HashMap will contain the latest of those items.
So in your case the toMap operator transforms the input sequence into the following HashMap:
{0=99, 1=100, 2=98}
To filter out all numbers indivisible by 3 from the range specified just use filter operator as #Nikolas advised.
As #akarnokd stated on a comment above, the example works with the latest version of RxJava v1 and v2, but doesn't work on the version we are using (com.netflix.rxjava:rxjava-core:jar:0.16.1, which is ancient) probably due to a bug in the library itself.

java 8 make a stream of the multiples of two

I'm practicing streams in java 8 and im trying to make a Stream<Integer> containing the multiples of 2. There are several tasks in one main class so I won't link the whole block but what i got so far is this:
Integer twoToTheZeroth = 1;
UnaryOperator<Integer> doubler = (Integer x) -> 2 * x;
Stream<Integer> result = ?;
My question here probably isn't related strongly to the streams, more like the syntax, that how should I use the doubler to get the result?
Thanks in advance!
You can use Stream.iterate.
Stream<Integer> result = Stream.iterate(twoToTheZeroth, doubler);
or using the lambda directly
Stream.iterate(1, x -> 2*x);
The first argument is the "seed" (ie first element of the stream), the operator gets applied consecutively with every element access.
EDIT:
As Vinay points out, this will result in the stream being filled with 0s eventually (this is due to int overflow). To prevent that, maybe use BigInteger:
Stream.iterate(BigInteger.ONE,
x -> x.multiply(BigInteger.valueOf(2)))
.forEach(System.out::println);
Arrays.asList(1,2,3,4,5).stream().map(x -> x * x).forEach(x -> System.out.println(x));
so you can use the doubler in the map caller

2D Array stream reduce in java

I'm new to using streams in java and I have a question on using streams.
I have a double[][] in which i want to perform the summation of elements, for it I've written the following approach similar to C#Linq, but it doesn't seem to work.
Arrays.stream(myArray).reduce(0,(acc, i) -> acc + Arrays.stream(i).sum());
The error is that acc seems to be a double[], so it can't perform double[]+double.
In C#Linq the accumulator is assumed to be the same type as the seed(0 in this case). What am I missing here?
Thanks in advance.
If you look at the signature of reduce, the type of the identity has to be the type of the stream's elements. Which would be double[] in this case. That would also give acc the type of double[].
There is an overload where you can supply an accumulator of a different type, but you also need to pass a combiner, to combine 2 accumulators.
You can do this:
double result = Arrays.stream(myArray)
.reduce(0D, (acc, i) -> acc + Arrays.stream(i).sum(), Double::sum);
Where 0D is a double literal, and Double::sum is used to combine 2 accumulators.
Alternatively, it might be more convenient to do this:
double result = Arrays.stream(myArray)
.flatMapToDouble(DoubleStream::of)
.sum();

Categories

Resources