How to check general conditions? - java

I have translated a code from Matlab (array orientation) to Java(OOP). The problem appeared when I have to translate this feature of Matlab:
min_omegaF_restricted=min(omegaF(p>cost));
Here, omegaF is a vector with the net worth of each firm.
p a vector of prices for each firm.
cost a vector of costs for each firm.
The above calculate the minimal net worth of survivor firms with demanded price higher than their costs.
In java can be translated to:
double min_omegaF_restricted=Double.POSITIVE_INFINITY;
for(int f=0; f<F; f++){
Firm fo=firms.get(f);
if(fo.p>fo.cost)
min_omegaF_restricted=Math.min(min_omegaF_restricted, fo.omegaF);
}
Is there a option to generalize this kind of sentence to whatever condition (fo.p>fo.cost)?

Yes. Functional interfaces and lambdas in Java 8 make this simple and easy on the eyes. Create a Predicate which tests an object and returns a boolean value.
Predicate<Firm> predicate = firm -> firm.p > firm.cost;
Then you can defer to the predicate in the loop like so:
double min_omegaF_restricted=Double.POSITIVE_INFINITY;
for(int f=0; f<F; f++){
Firm fo=firms.get(f);
if(predicate.test(fo))
min_omegaF_restricted=Math.min(min_omegaF_restricted, fo.omegaF);
}
What's more, with the new streaming API you can express the whole computation functionally without the explicit for loop.
double min_omegaF_restricted = firms.stream()
.filter(predicate)
.mapToDouble(f -> f.omegaF)
.min()
.orElse(Double.POSITIVE_INFINITY);

Related

Understanding Map and Reduce in Java 8/9 functional programming (lambda expression). How map() and reduce() increases performance?

This one line of Functional Programming code does:
2*3 + 4*3 + 6*3 + 8*3 + 10*3 operation.
int sum = IntStream.rangeClosed(1,10) /* closed range */
.filter(x -> x%2 == 0) /* filter to even numbers in range */
.map(x -> x*3) /* map */
.sum(); /* actual sum operation happens */
System.out.println(sum); /* prints 90 */
I understand what it is doing. I would like to know what is happening under the hood in terms of memory allocation? We can have the similar old alternatives of above operation as below. This is very easy to understand, but above Lambda based code is more expressive.
int sum=0;
for(int i=1; i<=10;i++) {
if(i%2 == 0) {
sum=sum+i*3;
}
}
System.out.println(sum); /* prints 90 */
First the lambda expressions will be de-sugared to static methods inside your class file (use javap to see that).
For the Predicate there will a .class generated (that you can see via -Djdk.internal.lambda.dumpProxyClasses=/Your/Path parameter set when you invoke your class.
The same thing goes for the Function for the map operation.
Since your lambdas are stateless there will be a single instance of the Predicate and the Function created and re-used for each operation. If it would have been a stateful lambda - a new instance would be generated for each element that is processed.
And from your question title map and reduce do not increase performance (unless there are tons of elements and you can parallelize the process with a benefit). Your simple loop will be faster - but not that much faster than streams. You have also chosen a pretty simple example - suppose you choose an example that does some heavy grouping and then a custom collection, etc - the verbosity of the simple approach via stream would be significant.

Java efficiency comparison: 2 Loops with 2 conditions VS 1 Loop with 4 conditions

I have to search a list of objects to find the 2 lower and the 2 bigger values of a couple of atributes. In Java what is more efficient: To check the list once asking for the 4 values or do it twice to check for 2? in my experience 1 loop should be better, but there may be compilation optimization I may not know about.
This is the code for the single loop:
Cell[] getEdges(List<Cell> coordinateList)
{
int minX = coordinateList.get(0).getX;
int minY = coordinateList.get(0).getY;
int maxX = coordinateList.get(0).getX;
int maxY = coordinateList.get(0).getY;
Cell[] edgePair = new Cell[2];
For(Cell currentCell: List<Cell> coordinateList)
{
if(currentCell.getX()<minX)
{
minX = currentCell.getX();
}
if(currentCell.getY()<minY)
{
minY = currentCell.getY();
}
if(currentCell.getX()>maxX)
{
maxX = currentCell.getX();
}
if(currentCell.getY()>maxY)
{
maxY = currentCell.getY();
}
}
edgePair[0] = new Cell(minX, minY);
edgePair[1] = new Cell(maxX, maxY);
return edgePair;
}
this is the code for a 2 loops (Same for the max, just change condition)
Cell getMinEdge(List<Cell> coordinateList)
{
int minX = coordinateList.get(0).getX;
int minY = coordinateList.get(0).getY;
For(Cell currentCell: List<Cell> coordinateList)
{
if(currentCell.getX()<minX)
{
minX = currentCell.getX();
}
if(currentCell.getY()<minY)
{
minY = currentCell.getY();
}
}
return new Cell(minX, minY);
}
Thanks in advance for any suggestions.
My intuition is that the version of the code with one loop should be faster than the version with two loops. (Assuming that you correct the syntax errors first!) I doubt that the optimizer would be able to combine the two loops into one, and performing the list iteration twice is going to take longer than performing it once.
However, my advice to you would be:
Don't rely too much on your (or my, or other peoples') intuition to tell you what is more efficient.
Don't spend too much time up-front thinking about what could / would / should be fastest.
Instead:
Write the code in a simple, natural way, and then benchmark it to measure your application's performance with real input. (Or a good approximation to real.)
Only spend time trying to make the code go faster if the performance measurements say that it needs to go faster.
Once you have decided that you need to optimize, use a profiler to tell you what parts of the code to focus your attention on. Don't rely on your intuition, because intuition is often wrong.
While I strongly agree that early micro-optimizations are bad, this is a case where two different algorithms are considered. And algorithms are indeed the things that make good or bad performance, as opposed to questions like which of (++x) or (x++) or (x+=1) is "more efficient".
Hence, +1 for the question.
(That being said, most probably, you'll have only so many items in your coordinate list, that even the less optimal algorithm will not delay the program in any noticeable way.)
Generalizing a bit, you are asking if, when you need to reduce a list of size n in k different ways, it is better to do k separate reductions or a single reduction that combines the k ways.
The answer being that, if possible, one should strive to combine the k reductions into one. Or, in short: avoid repeated loops, if possible.
I want to point out that this is only a weak advice. If, for example, you had already 2 methods for determining the minima and maxima, it would be probably not worth writing a 3rd one that does it both in one run. (Except if it turns out that this is the absolute performance killer.)
But in this case, you have the choice and combining it all in one loop is also the most logical and hence best understandable thing to do.
If I would see code like this
for (X x: xs) { maxX = ...; }
for (X x: xs) { maxY = ...; }
I would ask myself on the 2nd line: "Why did he not do that in the first loop?"
I would then check the previous code again to see if I have overlooked something that prevented computing the maxY right away. Since I wouldn't find anything, I would have to accept it somehow ... but still with the feel that I might have overlooked something.

Why - in Java 1.8 - is Function<V,R> used and not Function<R,V>?

The order seems odd because in regular Java the return type is always specified first. As in:
public static double sum(Iterable<Number> nums) { ... }
Why then, in the Function and BiFunction classes has the choice been made to specify them the other way around? As in:
interface Function<T,R>
interface BiFunction<T,U,R>
I'm not asking here for opinions as to which is better, but specifically:
a) Is there any technical or other (non-stylistic) benefit in preferring one order over the other? Or is it an arbitrary choice?
b) Is anyone aware of any documented explanation, or any stated reason from an authoritative source, why one was chosen over the other?
Aside: the order seems even more odd if extended to higher arities. For example, a hypothetical QuadFunction:
interface QuadFunction<A,B,C,D,R> { ... }
(At the time of writing the highest arity in the library is 2 - i.e. BiFunction.)
See: http://download.java.net/jdk8/docs/api/java/util/function/package-summary.html
It is to be consistent with prior existing notation.
The mathematical integer division function extended into the rational numbers:
(\): I x I -> Q
Functional programming version of the above (like Haskell, Ocaml)
division :: Integer -> (Integer -> Rational)
or
division :: Integer -> Integer -> Rational
All three say "the division function takes two integers and returns a rational number". It is backwards, in a functional paradigm, to say your returns first. C has taught us to say "we return a rational number in the division function, which takes two integers" (ex float division(int a, int b){}).
In Java, your return type is on the left of methods because Java wants to look like C. The designers of C thought "int main(int argv, char *argv[])" looked better than "main(int argv, char *argv[]) int". When writing code, atleast for me, I more often than not know what a method will return before I know what it will need. (edit 1: and we write lines like String s=removeSpaces(textLine), so the return on the left matches the variable on the left)
In C#, func looks the same way as the Java 8 Function.
My guess is that it's more intuitive for method chaining which might be a typical use case for lambdas, i.e.
IntStream.range(1, 10).map(Ints::random).filter(x -> x % 2 == 0)
So, method sequense here reads left to right and lambdas go left to right. So why not having the type params go left to right?
Escalating this a bit further - the reason might be that the English language reads left to right. :-)
UPDATE
I was very surprised to find out that this is something which takes place for maths modern arabic notation:
Latin complex numbers
Arabic complex numbers
In this example arabic notation in every char mirrors latin. One can track this by the angle sign and i (imagenary unit) char - in both cases it has a dot. In the linked wiki article there is also an example of a reversed lim arrow (compared to Java 8 lamda's arrow direction). This could mean that arabic Java, if it was ever developed, would look a bit differently. :-)
Disclaimer: I have background in maths, but I had no idea of the arabic notation when I was answering this question.
In ordinary procedural and OO programming, functions/methods generally take a list of parameters and return some result:
int max(int num1, int num2)
When rewriting function signatures as callback-based (such as for parallel or asynchronous processing), it has been a longstanding practice to convert the signature by appending the return callback as the last parameter:
void maxAsync(int num1, int num2, Callback<int> callback) // pseudo-Java
A current example of this pattern can be found in GWT RPC processing.
This style originated in the Lisp style of languages with the so-called continuation-passing style, where functions are chained by passing a function to a function as a parameter. Since in Lisp arguments are evaluated left-to-right, the function that's consuming the values needs to be at the end of the list. This arrangement has been adopted by imperative languages for continuity and because it's been traditional to tack on additional optional parameters (boolean flags and the like) at the end of the parameter list.
It is the explicit intent to make it more convenient to program in a functional style in Java. Now, in mathematics, a function is generally written like
f: A -> B
(i.e., a function from As to Bs). This corresponds also to the notation in functional languages, Scala and already existing functional libraries for Java.
In other words: it is just the right thing.
Note that a functional interface is not a method and a method is not a functional interface, hence it is not clear what the syntax of the former has to do with the latter.
Just my opinion: to look the same way as Function in guava does. Having the order the other way around would cause a lot of confusion I guess.
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/base/Function.html

What is the fastest way to compute an epsilon closure?

I am working on a program to convert Non-deterministic finite state automata (NFAs) to Deterministic finite state automata (DFAs). To do this, I have to compute the epsilon closure of every state in the NFA that has an epsilon transition. I have already figured out a way to do this, but I always assume that the first thing I think of is usually the least efficient way to do something.
Here is an example of how I would compute a simple epsilon closure:
Input strings for transition function: format is startState, symbol = endState
EPS is an epsilon transition
1, EPS = 2
Results in the new state { 12 }
Now obviously this is a very simple example. I would need to be able to compute any number of epsilon transitions from any number of states. To this end, my solution is a recursive function that computes the epsilon closure on the given state by looking at the state it has an epsilon transition into. If that state has (an) epsilon transition(s) then the function is called recursively within a for loop for as many epsilon transitions as it has. This will get the job done but probably isn't the fastest way. So my question is this: what is the fastest way to compute an epsilon closure in Java?
Depth first search (or breadth first search - doesn't really matter) over the graph whose edges are your epilson transitions. So in other words, your solution is optimal provided you efficiently track which states you've already added to the closure.
JFLAP does this. You can see their source - specifically ClosureTaker.java. It's a depth-first search (which is what Peter Taylor suggested), and since JFLAP uses it I assume that's the near-optimal solution.
Did you look into an algorithm book? But I doubt you'll find a significantly better approach. But the actual performance of this algorithm may very well depend on the concrete data structure you use to implement your graph. And you can share work, depending on the order you simplify your graph. Think about subgraphs which are epsilon connected and are referenced from two different nodes.
I am not sure whether this can be done in an optimal way, or whether you have to resort to some heuristics.
Scan the the literature on algorithms.
Just so that people looking only for the specific snippet of code referenced by #Xodarap 's answer don't find themselves in the need of downloading both the source code and an application to view the code of the jar file, I took the liberty to attach said snippet.
public static State[] getClosure(State state, Automaton automaton) {
List<State> list = new ArrayList<>();
list.add(state);
for (int i = 0; i < list.size(); i++) {
state = (State) list.get(i);
Transition transitions[] = automaton.getTransitionsFromState(state);
for (int k = 0; k < transitions.length; k++) {
Transition transition = transitions[k];
LambdaTransitionChecker checker = LambdaCheckerFactory
.getLambdaChecker(automaton);
/** if lambda transition */
if (checker.isLambdaTransition(transition)) {
State toState = transition.getToState();
if (!list.contains(toState)) {
list.add(toState);
}
}
}
}
return (State[]) list.toArray(new State[0]);
}
It goes without saying that all credit goes to #Xodarap and the JFLAP project.

Java: micro-optimizing array manipulation

I am trying to make a Java port of a simple feed-forward neural network.
This obviously involves lots of numeric calculations, so I am trying to optimize my central loop as much as possible. The results should be correct within the limits of the float data type.
My current code looks as follows (error handling & initialization removed):
/**
* Simple implementation of a feedforward neural network. The network supports
* including a bias neuron with a constant output of 1.0 and weighted synapses
* to hidden and output layers.
*
* #author Martin Wiboe
*/
public class FeedForwardNetwork {
private final int outputNeurons; // No of neurons in output layer
private final int inputNeurons; // No of neurons in input layer
private int largestLayerNeurons; // No of neurons in largest layer
private final int numberLayers; // No of layers
private final int[] neuronCounts; // Neuron count in each layer, 0 is input
// layer.
private final float[][][] fWeights; // Weights between neurons.
// fWeight[fromLayer][fromNeuron][toNeuron]
// is the weight from fromNeuron in
// fromLayer to toNeuron in layer
// fromLayer+1.
private float[][] neuronOutput; // Temporary storage of output from previous layer
public float[] compute(float[] input) {
// Copy input values to input layer output
for (int i = 0; i < inputNeurons; i++) {
neuronOutput[0][i] = input[i];
}
// Loop through layers
for (int layer = 1; layer < numberLayers; layer++) {
// Loop over neurons in the layer and determine weighted input sum
for (int neuron = 0; neuron < neuronCounts[layer]; neuron++) {
// Bias neuron is the last neuron in the previous layer
int biasNeuron = neuronCounts[layer - 1];
// Get weighted input from bias neuron - output is always 1.0
float activation = 1.0F * fWeights[layer - 1][biasNeuron][neuron];
// Get weighted inputs from rest of neurons in previous layer
for (int inputNeuron = 0; inputNeuron < biasNeuron; inputNeuron++) {
activation += neuronOutput[layer-1][inputNeuron] * fWeights[layer - 1][inputNeuron][neuron];
}
// Store neuron output for next round of computation
neuronOutput[layer][neuron] = sigmoid(activation);
}
}
// Return output from network = output from last layer
float[] result = new float[outputNeurons];
for (int i = 0; i < outputNeurons; i++)
result[i] = neuronOutput[numberLayers - 1][i];
return result;
}
private final static float sigmoid(final float input) {
return (float) (1.0F / (1.0F + Math.exp(-1.0F * input)));
}
}
I am running the JVM with the -server option, and as of now my code is between 25% and 50% slower than similar C code. What can I do to improve this situation?
Thank you,
Martin Wiboe
Edit #1: After seeing the vast amount of responses, I should probably clarify the numbers in our scenario. During a typical run, the method will be called about 50.000 times with different inputs. A typical network would have numberLayers = 3 layers with 190, 2 and 1 neuron, respectively. The innermost loop will therefore have about 2*191+3=385 iterations (when counting the added bias neuron in layers 0 and 1)
Edit #1: After implementing the various suggestions in this thread, our implementation is practically as fast as the C version (within ~2 %). Thanks for all the help! All of the suggestions have been helpful, but since I can only mark one answer as the correct one, I will give it to #Durandal for both suggesting array optimizations and being the only one to precalculate the for loop header.
Some tips.
in your inner most loop, think about how you are traversing your CPU cache and re-arrange your matrix so you are accessing the outer most array sequentially. This will result in you accessing your cache in order rather than jumping all over the place. A cache hit can be two orders of magniture faster than a cache miss.
e.g restructure fWeights so it is accessed as
activation += neuronOutput[layer-1][inputNeuron] * fWeights[layer - 1][neuron][inputNeuron];
don't perform work inside the loop (every time) which can be done outside the loop (once). Don't perform the [layer -1] lookup every time when you can place this in a local variable. Your IDE should be able to refactor this easily.
multi-dimensional arrays in Java are not as efficient as they are in C. They are actually multiple layers of single dimensional arrays. You can restructure the code so you're only using a single dimensional array.
don't return a new array when you can pass the result array as an argument. (Saves creating a new object on each call).
rather than peforming layer-1 all over the place, why not use layer1 as layer-1 and using layer1+1 instead of layer.
Disregarding the actual math, the array indexing in Java can be a performance hog in itself. Consider that Java has no real multidimensional arrays, but rather implements them as array of arrays. In your innermost loop, you access over multiple indices, some of which are in fact constant in that loop. Part of the array access can be move outside of the loop:
final int[] neuronOutputSlice = neuronOutput[layer - 1];
final int[][] fWeightSlice = fWeights[layer - 1];
for (int inputNeuron = 0; inputNeuron < biasNeuron; inputNeuron++) {
activation += neuronOutputSlice[inputNeuron] * fWeightsSlice[inputNeuron][neuron];
}
It is possible that the server JIT performs a similar code invariant movement, the only way to find out is change and profile it. On the client JIT this should improve performance no matter what.
Another thing you can try is to precalculate the for-loop exit conditions, like this:
for (int neuron = 0; neuron < neuronCounts[layer]; neuron++) { ... }
// transform to precalculated exit condition (move invariant array access outside loop)
for (int neuron = 0, neuronCount = neuronCounts[layer]; neuron < neuronCount; neuron++) { ... }
Again the JIT may already do this for you, so profile if it helps.
Is there a point to multiplying with 1.0F that eludes me here?:
float activation = 1.0F * fWeights[layer - 1][biasNeuron][neuron];
Other things that could potentially improve speed at cost of readability: inline sigmoid() function manually (the JIT has a very tight limit for inlining and the function might be larger).
It can be slightly faster to run a loop backwards (where it doesnt change the outcome of course), since testing the loop index against zero is a little cheaper than checking against a local variable (the innermost loop is a potentical candidate again, but dont expect the output to be 100% identical in all cases, since adding floats a + b + c is potentially not the same as a + c + b).
For a start, don't do this:
// Copy input values to input layer output
for (int i = 0; i < inputNeurons; i++) {
neuronOutput[0][i] = input[i];
}
But this:
System.arraycopy( input, 0, neuronOutput[0], 0, inputNeurons );
First thing I would look into is seeing if Math.exp is slowing you down. See this post on a Math.exp approximation for a native alternative.
Replace the expensive floating point sigmoid transfer function with an integer step transfer function.
The sigmoid transfer function is a model of organic analog synaptic learning, which in turn seems to be a model of a step function.
The historical precedent for this is that Hinton designed the back-prop algorithm directly from the first principles of cognitive science theories about real synapses, which in turn were based on real analog measurements, which turn out to be sigmoid.
But the sigmoid transfer function seems to be an organic model of the digital step function, which of course cannot be directly implemented organically.
Rather than model a model, replace the expensive floating point implementation of the organic sigmoid transfer function with the direct digital implementation of a step function (less than zero = -1, greater than zero = +1).
The brain cannot do this, but backprop can!
This not only linearly and drastically improves performance of a single learning iteration, it also reduces the number of learning iterations required to train the network: supporting evidence that learning is inherently digital.
Also supports the argument that Computer Science is inherently cool.
Purely based upon code inspection, your inner most loop has to compute references to a three-dimensional parameter and its being done a lot. Depending upon your array dimensions could you possibly be having cache issues due to have to jump around memory with each loop iteration. Maybe you could rearrange the dimensions so the inner loop tries to access memory elements that are closer to one another than they are now?
In any case, profile your code before making any changes and see where the real bottleneck is.
I suggest using a fixed point system rather than a floating point system. On almost all processors using int is faster than float. The simplest way to do this is simply shift everything left by a certain amount (4 or 5 are good starting points) and treat the bottom 4 bits as the decimal.
Your innermost loop is doing floating point maths so this may give you quite a boost.
The key to optimization is to first measure where the time is spent. Surround various parts of your algorithm with calls to System.nanoTime():
long start_time = System.nanoTime();
doStuff();
long time_taken = System.nanoTime() - start_time;
I'd guess that while using System.arraycopy() would help a bit, you'll find your real costs in the inner loop.
Depending on what you find, you might consider replacing the float arithmetic with integer arithmetic.

Categories

Resources