Java Lambda stream into different collections - java

I have a Java lambda stream that parses a file and stores the results into a collection, based on some basic filtering.
I'm just learning lambdas so bear with me here if this is ridiculously bad. But please feel free to point out my mistakes.
For a given file:
#ignored
this
is
#ignored
working
fine
The code:
List<String> matches;
Stream<String> g = Files.lines(Paths.get(givenFile));
matches = g.filter(line -> !line.startsWith("#"))
.collect(Collectors.toList());
["this", "is", "working", "fine"]
Now, how would I go about collecting the ignored lines into a second list within this same stream? Something like:
List<String> matches;
List<String> ignored; // to store lines that start with #
Stream<String> g = Files.lines(Paths.get(exclusionFile.toURI()));
matches = g.filter(line -> !line.startsWith("#"))
// how can I add a condition to throw these
// non-matching lines into the ignored collection?
.collect(Collectors.toList());
I realize it would be pretty trivial to open a new stream, alter the logic a bit, and .collect() the ignored lines easily enough. But I don't want to have to loop through this file twice if I can do it all in one stream.

Instead of two streams you can use partitioningBy in Collector
List<String> strings = Arrays.asList("#ignored", "this", "is", "#ignored", "working", "fine");
Map<Boolean, List<String>> map = strings.stream().collect(Collectors.partitioningBy(s -> s.startsWith("#")));
System.out.println(map);
output
{false=[this, is, working, fine], true=[#ignored, #ignored]}
here I used key as Boolean but you can change it to a meaningful string or enum
EDIT
If the strings can starts with some other special characters you could use groupingBy
List<String> strings = Arrays.asList("#ignored", "this", "is", "#ignored", "working", "fine", "!Someother", "*star");
Function<String, String> classifier = s -> {
if (s.matches("^[!##$%^&*]{1}.*")) {
return Character.toString(s.charAt(0));
} else {
return "others";
}
};
Map<String, List<String>> maps = strings.stream().collect(Collectors.groupingBy(classifier));
System.out.println(maps);
Output
{!=[!Someother], #=[#ignored, #ignored], *=[*star], others=[this, is, working, fine]}
also you can nest groupingBy and partitioningBy

I think the closest you could come to a generic approach for this would be something like peek:
g.peek(line -> if (line.startsWith("#")) {
ignored.add(line);
})
.filter(line -> !line.startsWith("#"))
// how can I add a condition to throw these
// non-matching lines into the ignored collection?
.collect(Collectors.toList());
I mention it because unlike with the partitioning Collector you could, at least in theory, change together however many peeks you want--but, as you can see, you have to duplicate logic, so it's not ideal.

Related

Append list according to their size in Java

I am trying to append two list according to their size. With list with bigger size in front.
I have few lists like this.
List<Pair<Double, String>> masterList = new ArrayList<>();
and this is the working Java code that I tried first - with a simple if else loop:
if (listOne.size() >= listTwo.size()){
masterList.addAll(listOne);
masterList.addAll(listTwo);
} else {
masterList.addAll(listTwo);
masterList.addAll(listOne);
}
masterList.addAll(otherList); // and at the end all other list can be added without any condition
I am fairly new to the Java, so I was studying about it and came across Comparators and Lambda. So, I tried to use that for my code, something like this:
List<Pair<Double, String>> masterList = Stream.concat(listOne.stream(), listTwo.stream())
.filter(Comparator.comparingInt(List::size))
.collect(Collectors.toList())
But I am not able to achieve proper results.
Can someone point out my mistake, I am still trying to learn.
The for-loop is very nice, Stream isn't necessary, but to answer the question, you may
not use concat as it'll already join the lists, and you loose the concept of different list
don't use filter but rather sorted
then flatMap to pass from Stream<List<Pair<>>> to Stream<Pair<>>
List<Pair<Double, String>> masterList = Stream.of(listOne, listTwo)
.sorted(Comparator.comparing(List::size, Comparator.reverseOrder()))
.flatMap(List::stream)
.collect(Collectors.toList());
masterList.addAll(otherList);
It may be possible to use Stream.concat to join the contents of otherList and thus to get rid of masterList.addAll
Also, there is an example of using Comparator.reversed() method:
List<Pair<Double, String>> masterList = Stream.concat(
Stream.of(listOne, listTwo) // Stream<List<Pair>>
.sorted(Comparator.<List>comparingInt(List::size).reversed())
.flatMap(List::stream), // Stream<Pair>
otherList.stream() // Stream<Pair>
)
.collect(Collectors.toList());
However, a ternary operator should do fine as well to detect a longer list to place in the beginning:
List<Pair<Double, String>> masterList2 = Stream.concat(
(listOne.size() >= listTwo.size()
? Stream.of(listOne, listTwo)
: Stream.of(listTwo, listOne)
)
.flatMap(List::stream),
otherList.stream()
)
.collect(Collectors.toList());

Existing condition into JAVA 8 stream

How can i convert the below condition to Java 8 streams way ?
List<String> name = Arrays.asList("A", "B", "C");
String id;
if(name.contains("A")){
id = "123";
}else if(name.contains("B")){
id = "234";
}else if(name.contains("C")){
id = "345";
}
I am in process of learning Streams and was wondering how i can convert this one. I tried with foreach, map, filter but it was not getting at it
Yet another (but compact) solution:
Arrays.asList("B", "C", "A", "D").stream()
.map(s -> s.equals("A") ? new SimpleEntry<>(1, "123")
: s.equals("B") ? new SimpleEntry<>(2, "234")
: s.equals("C") ? new SimpleEntry<>(3, "345")
: null)
.filter(x -> x != null)
.reduce((a, b) -> a.getKey() < b.getKey() ? a : b)
.map(Entry::getValue)
.ifPresent(System.out::println);
I cannot see why do you have to convert it to stream. This doesn't seem to be stream API case for me.
But if you want to easily add new items and make code more readable, I can suggest you to use map instead.
private static final ImmutableMap<String, String> nameToId = new ImmutableMap.Builder<String, String>()
.put("A", "123")
.put("B", "234")
.put("C", "345")
.build();
Now you can add new items without changing much code and just call nameToId.get(name) to fetch id by name.
You can add more flexibility here using streams
Stream.of("A", "B", "C").map(nameToId::get)collect(Collectors.toList());
Inspired by Serghey Bishyr's answer to use a map I also used a map (but ordered) and I will rather go through the keys of the map instead of the list to find the appropriate id. That might of course not be the best solution, but you can play with Streams that way ;-)
Map<String, String> nameToId = new LinkedHashMap<>();
// the following order reflects the order of your conditions! (if your first condition would contain "B", you would move "B" at the first position)
nameToId.put("A", "123");
nameToId.put("B", "234");
nameToId.put("C", "345");
List<String> name = Arrays.asList("A", "B", "C");
String id = nameToId.keySet()
.stream()
.filter(name::contains)
.findFirst()
.map(nameToId::get)
.orElse(null)
You gain nothing really... don't try to put too much into the filtering predicates or mapping functions, because then your Stream solution might not be that readable anymore.
The problem you describe is to get a single value (id) from application of a function to two input sets: the input values and the mappings.
id = f(list,mappings)
So basically your question is, to find a f that is based on streams (in other words, solutions that return a list don't solve your problem).
First of all, the original if-else-if-else construct mixes three concerns:
input validation (only considering the value set "A","B","C")
mapping an input value to an output value ("A" -> "123", "B" -> "234", "C" -> "345")
defining an implicit prioritization of input values according to their natural order (not sure if that is intentional or conincidental), "A" before "B" before "C"
When you want to apply this to a stream of input value, you have to make all of them explicit:
a Filter function, that ignores all input value without a mapping
a Mapper function, that maps the input to the id
a Reduce function (BinaryOperator) the performs the prioritization logic implied by the if-else-if-else construct
Mapping Function
The mapper is a discrete function mapping the input values to a one-element-stream of outputput values:
Function<String,Optional<String>> idMapper = s -> {
if("A".equals(s)){
return Optional.of("123");
} else if("B".equals(s)){
return Optional.of("234");
} else if("C".equals(s)){
return Optional.of("345");
}
return Optional.empty();
} ;
For more mappings an immutable map should be used:
Map<String,String> mapping = Collections.unmodifiableMap(new HashMap<String,String>(){{
put("A", "123");
put("B", "234");
put("C", "345");
}}); //the instance initializer is just one way to initialize the map :)
Function<String,Optional<String>> idMapper = s -> Optional.ofNullable(mapping.get(s));
Filter Function
As we only allow input values for which we have a mapping, we could use the keyset of the mapping map:
Predicate<String> filter = s -> mapping.containsKey(s);
Reduce Function
For find the top-priority element of the stream using their natural order, use this BinaryOperator:
BinaryOperator<String> prioritizer = (a, b) -> a.compareTo(b) < 0 ? a : b;
If there is another logic to prioritize, you have to adapt the implementation accordingly.
This operator is used in a .reduce() call. If you prioritize based on natural order, you could use .min(Comparator.naturalOrder()) on the stream instead.
Because the natur
Stream Pipeline
Now you first have to reduce the stream to a single value, using the prioritizer, the result is an Optional which you flatMap by applying the idMapper function (flatMap to not end with Optional>
Optional<String> id = Arrays.asList("C", "B", "A")
.stream()
.filter(filter) //concern: input validation
.reduce(prioritizer) //concern: prioritization
.flatMap(idMapper); //concern: id-mapping
Final Result
To wrap it up, for your particular problem, the most concise version (without defining functions first) using a stream and input validation would be:
//define the mapping in an immutable map (that's just one way to do it)
final Map<String,String> mapping = Collections.unmodifiableMap(
new HashMap<String,String>(){{
put("A", "123");
put("B", "234");
put("C", "345");
}});
Optional<String> result = Arrays.asList("C", "D", "A", "B")
.stream()
.filter(mapping::containsKey)
.min(Comparator.naturalOrder())
.flatMap(s -> Optional.ofNullable(mapping.get(s)));
which is the sought-for f:
BiFunction<List<String>,Map<String,String>,Optional<String>> f =
(list,map) -> list.stream()
.filter(map::containsKey)
.min(Comparator.naturalOrder())
.flatMap(s -> Optional.ofNullable(mapping.get(s)));
There is certainly some appeal to this approach, but the elegance-through-simplicity of the if-else approach cannot be denied either ;)
But for the sake of completeness, let's look at complexity. Assuming the number of mappings and the number of input values is rather large (otherwise it wouldn't really matter).
Solutions based on iterating over the map and searching using contains (as in your if-else construct):
Best-Case: o(1) (first branch in the if-else construct, first item in list)
Worst-Case: O(n^2) (last branch in the if-else construct, last item in list)
For the streaming solution with reduce, you have to iterate completely through the input list (O(n)) while the map lookup is O(1)
Best-Case: o(n)
Worst-Case: O(n)
Thx to Hamlezz for the reduce idea and Holger for pointing out that applying the mapper function directly to the stream does not yield the same result (as first match wins and not the first entry in the if-else construct) and the min(Comparator.naturalOrder()) option.

Simplifying loop with Java 8

I have a method that adds maps to a cache and I was wondering what I could do more to simplify this loop with Java 8.
What I have done so far:
Standard looping we all know:
for(int i = 0; i < catalogNames.size(); i++){
List<GenericCatalog> list = DummyData.getCatalog(catalogNames.get(i));
Map<String, GenericCatalog> map = new LinkedHashMap<>();
for(GenericCatalog item : list){
map.put(item.name.get(), item);
}
catalogCache.put(catalogNames.get(i), map);};
Second iteration using forEach:
catalogNames.forEach(e -> {
Map<String, GenericCatalog> map = new LinkedHashMap<>();
DummyData.getCatalog(e).forEach(d -> {
map.put(d.name.get(), d);
});
catalogCache.put(e, map);});
And third iteration that removes unnecessary bracers:
catalogNames.forEach(objName -> {
Map<String, GenericCatalog> map = new LinkedHashMap<>();
DummyData.getCatalog(objName).forEach(obj -> map.put(obj.name.get(), obj));
catalogCache.put(objName, map);});
My question now is what can be further done to simplify this?
I do understand that it's not really necessary to do anything else with this method at this point, but, I was curios about the possibilities.
There is small issue with solution 2 and 3 they might cause a side effects
Side-effects in behavioral parameters to stream operations are, in
general, discouraged, as they can often lead to unwitting violations
of the statelessness requirement, as well as other thread-safety
hazards.
As an example of how to transform a stream pipeline that
inappropriately uses side-effects to one that does not, the following
code searches a stream of strings for those matching a given regular
expression, and puts the matches in a list.
ArrayList<String> results = new ArrayList<>();
stream.filter(s -> pattern.matcher(s).matches())
.forEach(s -> results.add(s)); // Unnecessary use of side-effects!
So instead of using forEach to populate the HashMap it is better to use Collectors.toMap(..). I am not 100% sure about your data structure, but I hope it is close enough.
There is a List and corresponding Map:
List<Integer> ints = Arrays.asList(1,2,3);
Map<Integer,List<Double>> catalog = new HashMap<>();
catalog.put(1,Arrays.asList(1.1,2.2,3.3,4.4));
catalog.put(2,Arrays.asList(1.1,2.2,3.3));
catalog.put(3,Arrays.asList(1.1,2.2));
now we would like to get a new Map where a map key is element from the original List and map value is an other Map itself. The nested Map's key is transformed element from catalog List and value is the List element itself. Crazy description and more crazy code below:
Map<Integer, Map<Integer, Double>> result = ints.stream().collect(
Collectors.toMap(
el -> el,
el -> catalog.get(el).stream().
collect(Collectors.toMap(
c -> c.intValue(),
c -> c
))
)
);
System.out.println(result);
// {1={1=1.1, 2=2.2, 3=3.3, 4=4.4}, 2={1=1.1, 2=2.2, 3=3.3}, 3={1=1.1, 2=2.2}}
I hope this helps.
How about utilizing Collectors from the stream API? Specifically, Collectors#toMap
Map<String, Map<String, GenericCatalog>> cache = catalogNames.stream().collect(Collectors.toMap(Function.identity(),
name -> DummyData.getCatalog(name).stream().collect(Collectors.toMap(t -> t.name.get(), Function.identity(),
//these two lines only needed if HashMap can't be used
(o, t) -> /* merge function */,
LinkedHashMap::new));
This avoids mutating an existing collection, and provides you your own individual copy of a map (which you can use to update a cache, or whatever you desire).
Also I would disagree with arbitrarily putting end braces at the end of a line of code - most style guides would also be against this as it somewhat disturbs the flow of the code to most readers.

How to partition a list by predicate using java8?

I have a list a which i want to split to few small lists.
say all the items that contains with "aaa", all that contains with "bbb" and some more predicates.
How can I do so using java8?
I saw this post but it only splits to 2 lists.
public void partition_list_java8() {
Predicate<String> startWithS = p -> p.toLowerCase().startsWith("s");
Map<Boolean, List<String>> decisionsByS = playerDecisions.stream()
.collect(Collectors.partitioningBy(startWithS));
logger.info(decisionsByS);
assertTrue(decisionsByS.get(Boolean.TRUE).size() == 3);
}
I saw this post, but it was very old, before java 8.
Like it was explained in #RealSkeptic comment Predicate can return only two results: true and false. This means you would be able to split your data only in two groups.
What you need is some kind of Function which will allow you to determine some common result for elements which should be grouped together. In your case such result could be first character in its lowercase (assuming that all strings are not empty - have at least one character).
Now with Collectors.groupingBy(function) you can group all elements in separate Lists and store them in Map where key will be common result used for grouping (like first character).
So your code can look like
Function<String, Character> firstChar = s -> Character.toLowerCase(s.charAt(0));
List<String> a = Arrays.asList("foo", "Abc", "bar", "baz", "aBc");
Map<Character, List<String>> collect = a.stream()
.collect(Collectors.groupingBy(firstChar));
System.out.println(collect);
Output:
{a=[Abc, aBc], b=[bar, baz], f=[foo]}
You can use Collectors.groupingBy to turn your stream of (grouping) -> (list of things in that grouping). If you don't care about the groupings themselves, then call values() on that map to get a Collection<List<String>> of your partitions.

Split java.util.stream.Stream

I have a text file that contains URLs and emails. I need to extract all of them from the file. Each URL and email can be found more then once, but result shouldn't contain duplicates.
I can extract all URLs using the following code:
Files.lines(filePath).
.map(urlPattern::matcher)
.filter(Matcher::find)
.map(Matcher::group)
.distinct();
I can extract all emails using the following code:
Files.lines(filePath).
.map(emailPattern::matcher)
.filter(Matcher::find)
.map(Matcher::group)
.distinct();
Can I extract all URLs and emails reading the stream returned by Files.lines(filePath) only one time?
Something like splitting stream of lines to stream of URLs and stream of emails.
You can use partitioningBy collector, though it's still not very elegant solution.
Map<Boolean, List<String>> map = Files.lines(filePath)
.filter(str -> urlPattern.matcher(str).matches() ||
emailPattern.matcher(str).matches())
.distinct()
.collect(Collectors.partitioningBy(str -> urlPattern.matcher(str).matches()));
List<String> urls = map.get(true);
List<String> emails = map.get(false);
If you don't want to apply regexp twice, you can make it using the intermediate pair object (for example, SimpleEntry):
public static String classify(String str) {
return urlPattern.matcher(str).matches() ? "url" :
emailPattern.matcher(str).matches() ? "email" : null;
}
Map<String, Set<String>> map = Files.lines(filePath)
.map(str -> new AbstractMap.SimpleEntry<>(classify(str), str))
.filter(e -> e.getKey() != null)
.collect(Collectors.groupingBy(e -> e.getKey(),
Collectors.mapping(e -> e.getValue(), Collectors.toSet())));
Using my free StreamEx library the last step would be shorter:
Map<String, Set<String>> map = StreamEx.of(Files.lines(filePath))
.mapToEntry(str -> classify(str), Function.identity())
.nonNullKeys()
.grouping(Collectors.toSet());
You can perform the matching within a Collector:
Map<String,Set<String>> map=Files.lines(filePath)
.collect(HashMap::new,
(hm,line)-> {
Matcher m=emailPattern.matcher(line);
if(m.matches())
hm.computeIfAbsent("mail", x->new HashSet<>()).add(line);
else if(m.usePattern(urlPattern).matches())
hm.computeIfAbsent("url", x->new HashSet<>()).add(line);
},
(m1,m2)-> m2.forEach((k,v)->m1.merge(k, v,
(s1,s2)->{s1.addAll(s2); return s1;}))
);
Set<String> mail=map.get("mail"), url=map.get("url");
Note that this can easily be adapted to find multiple matches within a line:
Map<String,Set<String>> map=Files.lines(filePath)
.collect(HashMap::new,
(hm,line)-> {
Matcher m=emailPattern.matcher(line);
while(m.find())
hm.computeIfAbsent("mail", x->new HashSet<>()).add(m.group());
m.usePattern(urlPattern).reset();
while(m.find())
hm.computeIfAbsent("url", x->new HashSet<>()).add(m.group());
},
(m1,m2)-> m2.forEach((k,v)->m1.merge(k, v,
(s1,s2)->{s1.addAll(s2); return s1;}))
);
Since you can't reuse a Stream, the only option would be to "do it manually" I think.
File.lines(filePath).forEach(s -> /** match and sort into two lists */ );
If there's another solution for this though I'd be happy to learn about it!
The overall question should be: Why would you want to stream only once?
Extracting the URLs and extracting the emails are different operations and thus should be handled in their own streaming operations. Even if the underlying stream source contains hundreds of thousands of records, the time for iteration can be neglected when compared to the mapping and filtering operations.
The only thing you should consider as a possible performance issue is the IO operation. The cleanest solution therefore is to read the file only once and then stream on a resulting collection twice:
List<String> allLines = Files.readAllLines(filePath);
allLines.stream() ... // here do the URLs
allLines.stream() ... // here do the emails
Of course this requires some memory.

Categories

Resources