Difference between traditional imperative style of programming and functional style of programming - java

I have a problem statement here
what I need to do it iterate over a list find the first integer which is greater than 3 and is even then just double it and return it.
These are some methods to check how many operations are getting performed
public static boolean isGreaterThan3(int number){
System.out.println("WhyFunctional.isGreaterThan3 " + number);
return number > 3;
}
public static boolean isEven(int number){
System.out.println("WhyFunctional.isEven " + number);
return number % 2 == 0;
}
public static int doubleIt(int number){
System.out.println("WhyFunctional.doubleIt " + number);
return number << 1;
}
with java 8 streams I could do it like
List<Integer> integerList = Arrays.asList(1, 2, 3, 5, 4, 6, 7, 8, 9, 10);
integerList.stream()
.filter(WhyFunctional::isGreaterThan3)
.filter(WhyFunctional::isEven)
.map(WhyFunctional::doubleIt)
.findFirst();
and the output is
WhyFunctional.isGreaterThan3 1
WhyFunctional.isGreaterThan3 2
WhyFunctional.isGreaterThan3 3
WhyFunctional.isGreaterThan3 5
WhyFunctional.isEven 5
WhyFunctional.isGreaterThan3 4
WhyFunctional.isEven 4
WhyFunctional.doubleIt 4
Optional[8]
so total 8 operations.
And with imperative style or before java8 I could code it like
for (Integer integer : integerList) {
if(isGreaterThan3(integer)){
if(isEven(integer)){
System.out.println(doubleIt(integer));
break;
}
}
}
and the output is
WhyFunctional.isGreaterThan3 1
WhyFunctional.isGreaterThan3 2
WhyFunctional.isGreaterThan3 3
WhyFunctional.isGreaterThan3 5
WhyFunctional.isEven 5
WhyFunctional.isGreaterThan3 4
WhyFunctional.isEven 4
WhyFunctional.doubleIt 4
8
and operations are same. So my question is what difference does it make if I am using streams rather traditional for loop.

Stream API introduces the new idea of streams which allows you to decouple the task in a new way. For example, based on your task it's possible that you want to do different things with the doubled even numbers greater than three. In some place you want to find the first one, in other place you need 10 such numbers, in third place you want to apply more filtering. You can encapsulate the algorithm of finding such numbers like this:
static IntStream numbers() {
return IntStream.range(1, Integer.MAX_VALUE)
.filter(WhyFunctional::isGreaterThan3)
.filter(WhyFunctional::isEven)
.map(WhyFunctional::doubleIt);
}
Here it is. You've just created an algorithm to generate such numbers (without generating them) and you don't care how they will be used. One user might call:
int num = numbers().findFirst().get();
Other user might need to get 10 such numbers:
int[] tenNumbers = numbers().limit(10).toArray();
Third user might want to find the first matching number which is also divisible by 7:
int result = numbers().filter(n -> n % 7 == 0).findFirst().get();
It would be more difficult to encapsulate the algorithm in traditional imperative style.
In general the Stream API is not about the performance (though parallel streams may work faster than traditional solution). It's about the expressive power of your code.

The imperative style complects the computational logic with the mechanism used to achieve it (iteration). The functional style, on the other hand, decomplects the two. You code against an API to which you supply your logic and the API has the freedom to choose how and when to apply it.
In particular, the Streams API has two ways how to apply the logic: either sequentially or in parallel. The latter is actually the driving force behind the introduction of both lambdas and the Streams API itself into Java.
The freedom to choose when to perform computation gives rise to laziness: whereas in the imperative style you have a concrete collection of data, in the functional style you can have a collection paired with logic to transform it. The logic can be applied "just in time", when you actually consume the data. This further allows you to spread the building up of computation: each method can receive a stream and apply a further step of computation on it, or it can consume it in different ways (by collecting into a list, by finding just the first item and never applying computation to the rest, but calculating an aggregate value, etc.).
As a particular example of the new opportunities offered by laziness, I was able to write a Spring MVC controller which returned a Stream whose data source was a database—and at the time I return the stream, the data is still in the database. Only the View layer will pull the data, implicitly applying the transformation logic it has no knowledge of, never having to retain more than a single stream element in memory. This converted a solution which classically had O(n) space complexity into O(1), thus becoming insensitive to the size of the result set.

Using the Stream API you are describing an operation instead of implementing it. One commonly known advantage of letting the Stream API implement the operation is the option of using different execution strategies like parallel execution (as already said by others).
Another feature which seems to be a bit underestimated is the possibility to alter the operation itself in a way that is impossible to do in an imperative programming style as that would imply modifying the code:
IntStream is=IntStream.rangeClosed(1, 10).filter(i -> i > 4);
if(evenOnly) is=is.filter(i -> (i&1)==0);
if(doubleIt) is=is.map(i -> i<<1);
is.findFirst().ifPresent(System.out::println);
Here, the decision whether to filter out odd numbers or double the result is made before the terminal operation is commenced. In an imperative programming you either have to recheck the flags within the loop or code multiple alternative loops. It should be mentioned that checking such conditions within a loop isn’t that bad on today’s JVM as the optimizer is capable of moving them out of the loop at runtime, so coding multiple loops is usually unnecessary.
But consider the following example:
Stream<String> s = Stream.of("java8 streams", "are cool");
if(singleWords) s=s.flatMap(Pattern.compile("\\s")::splitAsStream);
s.collect(Collectors.groupingBy(str->str.charAt(0)))
.forEach((k,v)->System.out.println(k+" => "+v));
Since flatMap is the equivalent of a nested loop, coding the same in an imperative style isn’t that simple any more as we have either a simple loop or a nested loop based on a runtime value. Usually, you have to resort to splitting the code into multiple methods if you want to share it between both kind of loops.
I already encountered a real-life example where the composition of a complex operation had multiple conditional flatMap steps. The equivalent imperative code is insane…

1) Functional approach allows more declarative way of programming: you just provide a list of functions to apply and don't need to write iterations manually, so your code is more consine sometimes.
2) If you switch to parallel stream (https://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html) it will be possible to automatically convert your program to parallel and execute it faster. It is possbile because you don't explicitly code iteration, just list what functions to apply, so compiler/runtime may parallel it.

In this simple example, there is little difference, and the JVM will try to do the same amount of work in each case.
Where you start to see a difference is in more complicated examples like
integerList.parallelStream()
making the code concurrent for a loop is much harder. Note: you wouldn't actually do this as the overhead would to high and you only want the first element.
BTW The first example returns the result and the second prints.

Related

Java performance issue: Need to iterate more than 8 million records with a target-branch check

We have a system that processes flat-file and (with a couple of validations only) inserts into database.
This code:
//There can be 8 million lines-of-codes
for(String line: lines){
if (!Class.isBranchNoValid(validBranchNoArr, obj.branchNo)){
continue;
}
list.add(line);
}
definition of isBranchNoValid:
//the array length ranges from 2 to 5 only
public static boolean isBranchNoValid(String[] validBranchNoArr, String branchNo) {
for (int i = 0; i < validBranchNoArr.length; i++) {
if (validBranchNoArr[i].equals(branchNo)) {
return true;
}
}
return false;
}
The validation is at line-level (we have to filter or skip the line that doesn't have a branchNo in the array). Earlier, this wasn't (filter) the case.
Now, high-performance degradation is troubling us.
I understand (may be, I am wrong) that this repeated function call is causing a lot of stack creation resulting in a very high GC invocation.
I can't figure out a way (is it even possible) to perform this filter without this high cost of performance degradation (a little difference is fine).
This is not a stack problem for sure, because your function is not recursive nothing is kept in the stack between calls; after each call the variables are erased since they are not needed anymore.
You can put the valid numbers in a set and use that one for some optimization but in your case I am not sure it will bring any benefits at all since you have at most 5 elements.
So there are several possible bottlenecks in your scenario.
reading the lines of the file
Parse the line to construct the object to insert into the database
check the applicability of the object (ie branch no filter)
insert into the db
Generally, you'd say IO is the slowest, so 1. and 2. You're saying nothing except 2. changed, right? That is weird.
Anyway, if you want to optimize that, I wouldn't be passing the array around 8 million times, and I wouldn't iterate it every time either. Since your valid branches are known, create a HashSet from it - it has O(1) access.
Set<String> validBranches = Arrays.stream(branches)
.collect(Collectors.toCollection(HashSet::new));
Then, iterate the lines
for (String line : lines) {
YourObject obj = parse(line);
if (validBranches.contains(obj.branchNo)) {
writeToDb(obj);
}
}
or, in the stream version
Files.lines(yourPath)
.map(this::parse)
.filter(o -> validBranches.contains(o.branchNo))
.forEach(this::writeToDb);
I'd also check if it isn't more efficient to first collect a batch of objects, then write to db. Also, it's possible that handling the lines in parallel gains some speed, in case the parsing is time intensive.

Is it possible to replace all loop constructs in Java with stream-based constructs?

I am exploring the possibilities of the Java Stream API in order to find out if one could possibly replace all loop-based constructs with stream-based constructs.
As an example that would probably hint to the assumption that this is actually possible, consider this:
Is it possible to use the Stream API to split a string containing words delimited by spaces into an array of strings like the following call of String.split(String) would do ?
String[] tokens = "Peter Paul Mary".split(" ");
I am aware of the fact that a stream-based solution could make itself use of the String.split(String) method like so:
Stream.of("Peter Paul Mary".split(" "))
.collect(Collectors.toList());
or make use of Pattern.splitAsStream(CharSequence) (the implementation of which certainly uses a loop-based approach) but I am looking for a Stream-only solution, meaning something along the lines of this Haskell snippet:
words :: String -> [String]
words s = case dropWhile Char.isSpace s of
"" -> []
s' -> w : words s''
where (w, s'') = break Char.isSpace s'
I am asking this because I am still wondering if the introduction of the Stream API will lead to a profound change in the way we handle object collections in Java or just add another option to it, thus making it more challenging to maintain a larger codebase rather than to simplify this task in the long run.
EDIT: Although there is an accepted answer (see below) it only shows that its possible in this special case. I am still interested in any hints to the general case as required in the question.
A distinct non answer here: you are asking the wrong question!
It doesn't matter if all "loop related" lines of Java code can be converted into something streamish.
Because: good programming is the ability to write code that humans can easily read and understand.
So when somebody puts up a rule that says "we only use streams from hereon for everything we do" then that rule significantly reduces your options when writing code. Instead of being able to carefully decide "should I use streams" versus "should I go with a simple old-school loop construct" you are down to "how do I get this working with streams"!
From that point of view, you should focus on coming up with "rules" that work for all people in your development team. That could mean to empasize the use of stream constructs. But you definitely want to avoid the absolutism, and leave it to each developer to write up that code that implements a given requirement in the "most readable" way. If that is possible with streams, fine. If not, don't force people to do it.
And beyond that: depending on what exactly you are doing, using streams comes also with performance cost. So even when you can find a stream solution for a problem - you have to understand its runtime cost. You surely want to avoid using streams in places where they cost too much (especially when that code is already on your "critical path" regarding performance, like: called zillions of times per second).
Finally: to a certain degree, this is a question of skills. Meaning: when you are trained in using streams, it is much easier for you to A) read "streamish" code that others wrote and B) coming up with "streamish" solutions that are in fact easy to read. In other words: this again depends on the context you are working. The other week I was educating another team on "clean code", and my last foil was "Clean Code, Java8 streams/lambdas". One guy asked me "what are streams?" No point in forcing such a community to do anything with streams tomorrow.
Just for fun (this is one horrible way to do it), neither do I know if this fits your needs:
List<String> result = ",,,abc,def".codePoints()
.boxed()
// .parallel()
.collect(Collector.of(
() -> {
List<StringBuilder> inner = new ArrayList<>();
inner.add(new StringBuilder());
return inner;
},
(List<StringBuilder> list, Integer character) -> {
StringBuilder last = list.get(list.size() - 1);
if (character == ',') {
list.add(new StringBuilder());
} else {
last.appendCodePoint(character);
}
},
(left, right) -> {
left.get(left.size() - 1).append(right.remove(0));
left.addAll(right);
return left;
},
list -> list.stream()
.map(StringBuilder::toString)
.filter(x -> !x.equals(""))
.collect(Collectors.toList())
));

Understanding Map and Reduce in Java 8/9 functional programming (lambda expression). How map() and reduce() increases performance?

This one line of Functional Programming code does:
2*3 + 4*3 + 6*3 + 8*3 + 10*3 operation.
int sum = IntStream.rangeClosed(1,10) /* closed range */
.filter(x -> x%2 == 0) /* filter to even numbers in range */
.map(x -> x*3) /* map */
.sum(); /* actual sum operation happens */
System.out.println(sum); /* prints 90 */
I understand what it is doing. I would like to know what is happening under the hood in terms of memory allocation? We can have the similar old alternatives of above operation as below. This is very easy to understand, but above Lambda based code is more expressive.
int sum=0;
for(int i=1; i<=10;i++) {
if(i%2 == 0) {
sum=sum+i*3;
}
}
System.out.println(sum); /* prints 90 */
First the lambda expressions will be de-sugared to static methods inside your class file (use javap to see that).
For the Predicate there will a .class generated (that you can see via -Djdk.internal.lambda.dumpProxyClasses=/Your/Path parameter set when you invoke your class.
The same thing goes for the Function for the map operation.
Since your lambdas are stateless there will be a single instance of the Predicate and the Function created and re-used for each operation. If it would have been a stateful lambda - a new instance would be generated for each element that is processed.
And from your question title map and reduce do not increase performance (unless there are tons of elements and you can parallelize the process with a benefit). Your simple loop will be faster - but not that much faster than streams. You have also chosen a pretty simple example - suppose you choose an example that does some heavy grouping and then a custom collection, etc - the verbosity of the simple approach via stream would be significant.

Is there a way to compare two methods by function rather than value? [duplicate]

This question already has answers here:
Is finding the equivalence of two functions undecidable?
(9 answers)
Closed 6 years ago.
Is there a way to compare if two methods are equivalent by function (i.e. they do the same thing) rather than equivalent by value (i.e. all of the code in the method is the same) ?
For example these two methods are coded differently, but perform the same function.
public int doIt(int a, int b) {
a = a + 1;
b = b + 1;
return a + b;
}
public int doIt2(int z, int x) {
int total = z + x + 2;
return total;
}
I was looking for a way to do this in Eclipse, but am interested if this is even possible beyond a trivial method.
The only way to be 100% is to mathematically prove it
There are ways:
1- Theorem proving
2- Model Checking
and etc
Although these approaches can be very hard, sometime it might take days to prove it even for trivial programs and even days to produce the adequate abstraction level.
There are some heuristic approaches but obviously they are not 100% accurate (heuristic)
A simple heuristic approach would be to try both methods for 1000 inputs and see if the results are the same
EDIT:
here is a list of Model Checker I found on Wikipedia. I haven't used any of them, they may not be exactly what you are looking for.
https://en.wikipedia.org/wiki/List_of_model_checking_tools
Ignoring side effects, 2 functions will be functionally equivalent if for the same input, they produce the same output.
This will only work for pure code though. There's no way I know of to monitor for side effects in general since the side effects a function carries out could be anything.
Note, there wouldn't be a way to completely verify this without testing every possible input. If the input is just a limited Enum, that might be easy. If it's 2 integers though for example, the total number of combinations would be huge.
In general, the purpose of refactoring is to have a function behave the same before and after it is refactored. Developers generally do this by creating extensive unit tests, testing both normal, edge, and exception cases.
In the OP's two functions to be compared, doIt and doIt2, they might usually return the same answer, given any integer inputs a and b. Unit testing would demonstrate this.
But what if a or b were the largest integer that Java could store, MAX_VALUE?
What if there were a side effect from a=a+1?
In these cases, the two functions may appear similar on the surface, but yield different results.

How to handle the case that a single simple test case will drive the whole implementation?

When I learn "Test Driven Development", I found an interesting case from the book "The Productive Programmer":
You need to find all factors of a "complete number". A complete number is the sum of all its factors (except the one which equals to the number itself) is equal to the number. So 6 is the minimal complete number, and its factors are 1, 2, 3.
If I want to TDD, first I want to test an simplest test case:
#Test public void completeNumber6() {
CompleteNumber completeNumber = new CompleteNumber(6)
assertEquals(completeNumber.findFactors(), new Int[] {1,2,3});
}
But ! The problem is this simplest case will driven all the implementation of findFactors(), which seems too much for me.
The author gives some suggestions, we can split the requirements into several steps:
check if number is the factor of another one
provide a way to collect some factors into a collection
check each smaller number to see if it's the factor of the given number, collect them
check if the sum of the collected factor equal to the given number
And we can TDD the first 2 steps first:
#Test public void testIsFactor() {}
#Test public void testAddFactor() {}
So there will be 2 public (at least non-private) methods after that:
boolean isFactor(Int n1, Int n2)
void addFactor(Int factor)
The problem is these 2 methods should be private after the whole implementation, since they should only used by findFactors internally!
But if they are changed to private, what shall we do with the exsiting test cases for them?
The author suggests we can change them to private, and use Java refection API to get and test them. Sounds possible, but I'm not sure if it's a good practice to do so.
I also asked some friends, and they gave some other options:
Keep the methods isFactor and addFactor non private as is, that's acceptable
Extract a class FactorChecker and FactorCollector for the 2 methods
Change them to private, and delete the test cases since the functionality of them has been tested in the later test cases (for step 3 & 4)
I'm really puzzled now, which approach is the best practice of TDD?
Seems to me that the fact that the question states these are complete numbers is somewhat irrelevant. You can calculate the factors of any whole number. So given that I'd start with implementing findFactors(1) and then work my way up.
That makes this a slight variation on the classic Prime Factors Kata, the only difference being that you add a 1 to list of factors.

Categories

Resources