Java Multithreading Value Corruption

Java Multithreading Value Corruption - java

Asking a question from https://www.baeldung.com/java-thread-safety.
Code given is
public class MathUtils {
public static BigInteger factorial(int number) {
BigInteger f = new BigInteger("1");
for (int i = 2; i <= number; i++) {
f = f.multiply(BigInteger.valueOf(i));
}
return f;
}
}
The above linked website says that it is stateless and multiple threads can run this method at same time and get proper results. My question is: doesn't the value of number variable get corrupted, when many threads call this?

The specific mechanism you need to be aware of is the stack.
Each thread gets its own stack. The stack is a traditional last-in-first-out setup: You can 'push' things on it, and you can 'pop' things off it, which will retrieve and remove the most recently pushed thing.
Each thread has its own unique stack. The stack is used for both local variables and execution pointers. Imagine this code:
public static void main(String[] args) {
a(5);
System.out.println("Done");
}
public static void a(int x) {
b();
System.out.println("In a: " + a);
}
public static void b() {}
Now imagine you're a CPU. You are just pointed at an instruction and are supposed to run it. You don't know java and you don't know what a loop is. You just know about basic instructions, including 'GO TO'. But that's all you have.
How would you know where to go back to once b() is done running? How would you know that you have to jump back to the midpoint of the a method and continue at System.out.println("In a")?
The stack is the answer. When executing b(), what happens under the hood is:
PUSH [position in this method we're at right now]
GOTO [position of the start of the b method]
And the b() method ends in an instruction that means: POP a number off of the stack and then GOTO that number.
Local variable are ALSO stored on the stack. So, the instruction set of all this is essentially:
1 PUSH 4 // position to return back to once a is done
2 PUSH 5 // from a(5)
3 GOTO 8 // a method
4 PUSH 7
5 CREATE_OBJECT "Done" // pushes pos of new object on stack
6 GOTO [position of System.out.println]
7 EXIT_APPLICATION
8 PUSH 10 // position to return back to
9 GOTO 16 // b() method
10 CREATE_OBJECT "In a: " // pushes pos of new object on stack
11 FLIP // flip the top two stack entries
12 CONCAT_STRINGS
13 PUSH 15 // position to return back to
14 GOTO [position of System.out.println]
15 RET // pop number and go to it
16 RET
(This is HIGHLY oversimplified; CPUs and bytecode are way more complicated than this, but it gets the point across, hopefully!)
The stack as a concept explains how java works:
Methods are 're-entrant', and each local variable and parameter is a unique copy every time the method runs. That's because these are represented by things on the stack, and you can of course just keep adding things to it. In that sense the stack is a relative thing: "System.out.println(b)", if b is a local var or parameter, is like an instruction to read a line in a book 'two lines above where you are currently reading' (that'll be a new line as long as you keep reading), vs an instruction to read 'line 8 in this book', which is the same line every time.
Java is pass-by-value, which means everything you get is a copy:
int x = 5;
add1(x);
System.out.println(x);
public void add1(int x) {
x = x + 1;
}
the above prints 5 and not 6, because add1(x) is shorthand for:
PUSH current_value_of_whatever_the_x_variable_holds
CALL add1
and add1 is going to operate on that pushed value, and not on your x variable. It gets a little convoluted when we involve objects (because objects in java are represented by their reference: A pointer. Imagine an object is a house, and a reference is more like an address. I can have an address to my house, and then hand you a copy of that address on a piece of paper. You can take a pen and change that paper all you like, it does not affect either my address book or my house. But if you drive over to the house and toss a brick through the window, even though I handed you a copy of a page of my address book, that's still my window. So:
List<String> list = new ArrayList<String>();
list.add("Hello");
add1(list);
System.out.println(list);
public void add1(List<String> list) {
list.add("World!");
}
This would print Hello, World!. Because . is the java equivalent of 'drive over to the house that address list is pointing to. Had I written list = List.of("Hello", "World!"), nothing would appear to happen, as = is java equivalent of: Wipe out the address card and write a new address on it. Which doesn't affect my house nor my address book.

Multiple threads can indeed call MathUtils.factorial() safely.
Each activation of factorial will have its own copy of f, accessible to it alone. That is the meaning of a "local variable".
The argument number is not modified, and acts like a local variable of `factorial' in any case.
As to your question in the comment. No, there's one copy of the code - no need to have more that one. But each thread has its own execution of the code, so if it helps to think of that as a 'separate copy', not much conceptual harm is done.

Whenever any thread (apart from main thread) starts execution of "factorial(int number)" method, that thread saves a copy of "number" into it's own stack as local variable and hence there is no chance to change the value of "number" by any other thread.
However, if "number" value is coming from any shared object(shared by multiple threads), and if it is copied by multiple threads into stack & after that value got changed by some thread(s), then in this case there could be a chance of data inconsistency (check 'volatile').

Related

Java AtomincInteger vs one element array for streams

int[] arr = new int[]{0};
l.stream().forEach(x -> {if (x > 10 && x < 15) { arr[0] += 1;}});
l is List<Integer>. Here I use one element arr array to store value that is changed inside the stream. An alternative solution is to use an instance of AtomicInteger class. But I don't understand what is the difference between these two approaches in terms of memory usage, time complexity, safety...
Please note: I am not trying to use AtomicInteger (or array) in this particular piece of code. This code is used only as an example. Thanks!

Knowing which is the best way is important and #rzwitserloot's explanation covers that in great detail. In your specific example, you could avoid the issue by doing it like this.
List<Integer> list = List.of(1,2,11,12,15,11,11,9,10,2,3);
int count = list.stream().filter(x->x > 10 && x < 15).reduce(0, (a,b)->a+1);
// or
int count = list.stream().filter(x->x > 10 && x < 15).mapToInt(x->1).sum();
Both return the value 4
In the first example, reduce sets an initial value of 0 and then adds 1 to it (b is syntactically required but not used). To sum the actual elements rather than 1, replace 1 with b in the reduce method.
In the second example, the values are replace with 1 in the stream and then summed. Since the method sum() doesn't exist for streams of objects, the 1 needs to be mapped to an int to create an IntStream. To sum the actual elements here, use mapToInt(x->x)
As suggested in the comments, you can also do it like this.
long count = list.stream().filter(x->x > 10 && x < 15).count();
count() returns a long so it would have to be down cast to an int if that is what you want.

You should always use AtomicInteger:
The performance impact is negligible. Technically, new int[1] is 'faster', but they are the same size, or, the array is actually larger in heap (but unlikely; depends on your OS architecture, usually they'd end up being the same size), and the array does not spend any cycles on guaranteeing proper concurrency protections, but there are really only two options: [A] the concurrency protections are required (because it's a lambda that runs in another thread), and thus the int array is a non-starter; it would result in hard to find bugs, quite horrible, or [B] they aren't required, and the hotspot engine is likely going to figure that out and eliminate this cost entirely. Even if it doesn't, the overhead of concurrency protection when there is no contention is low in any case.
It is more readable. Only slightly so, but new int[1] is weirder than new AtomicInteger(), I'd say. AtomicInteger at least suggests: I want a mutable int that I'm going to mess with from other contexts.
It is more convenient. System.out.println-ing an atomicinteger prints the value. sysouting an array prints garbage.
The convenience methods in AtomicInteger might be relevant. Maybe compareAndSet is useful.
But why?
Lambdas are not transparent in the following 3 things:
Checked exceptions (you cannot throw a checked exception inside a lambda even if the context around your lambda catches it).
Mutable local vars (you cannot touch, let alone change, any variable declared outside of the lambda, unless it is (effectively) final).
Control flow. You can't use break, continue, or return from inside a lambda and have it act like it wasn't: You can't break or continue a loop located outside of your lambda and you can't return form the method outside of your lambda (you can only return from the lambda itself).
These are all very bad things when the lambda runs 'in context', but they are all very good things when the lambda doesn't run in context.
Here is an example:
new TreeSet<String>((a, b) -> a - b);
Here I have created a TreeSet (which is a set that keeps its elements sorted automatically). To make one, you pass in code that determines for any 2 elements which one is 'the higher one', and TreeSet takes care of everything else. That TreeSet can survive your method (just store it in a field or pass it to a method that ends up storing it in a field) and could even escape your thread (have another thread read that field). That means when that code (a - b in this code) is invoked, we could be 5 days from the creation of that TreeSet, in another thread, with the code that 'surrounds' your new TreeSet statement having loooong gone.
In this scenario, all those transparencies make no sense at all:
What does it mean to break back to a loop that has long since completed and the system doesn't even know what it is about anymore?
That catch block uses context that is long gone, such as local vars or the parameters. It can't survive, so if your a - b were to throw something that is checked, the fact that you've wrapped your new TreeSet<> in a try/catch block is meaningless.
What does it mean to access a variable that no longer exists? For that matter, if it still does exist but the lambda runs in a separate thread, do we now start making local vars volatile and declare them on heap instead of stack just in case?
Of course, if your lambda runs within context, as in, you pass the lambda to some method and that method 'uses it or loses it': Runs your lambda a certain amount of times and then forgets all about it, then those lacking transparencies are really annoying.
It's annoying that you can't do this:
public List<String> toLines(List<Path> files) throws IOException {
var allLines = files.stream()
.filter(x -> x.toString().endsWith(".txt"))
.flatMap(x -> Files.readAllLines().stream())
.toList();
}
The only reason the above code fails is that Files.readAllLines() throws IOException. We declared that we throws this onwards but that won't work. You have to kludge up this code, make it bad, by trying to somehow transit that exception out of the lambda or otherwise work around it (the right answer is NOT the use the stream API at all here, write it with a normal for loop!).
Whilst trying to dance around checked exceptions in lambdas is generally just not worth it, you CAN work around the problem of wanting to share a variable with outer context:
int sum = 0;
listOfInts.forEach(x -> sum += x);
The above doesn't work - sum is from the outer scope and thus must be effectively final, and it isn't. There's no particular reason it can't work, but java won't let you. The right answer here is to use int sum = listOfInts.mapToInt(Integer::intValue).sum(); instead, but you can't always find a terminal op that just does what you want. Sometimes you need to kludge around it.
That's where new int[1] and AtomicInteger comes in. These are references - and the reference is final, so you CAN use them in the lambda. But the reference points at an object and you can change it at will, hence, you can use this 'trick' to 'share' a variable:
AtomicInteger sum = new AtomicInteger();
listOfInts.forEach(x -> sum.add(x));
That DOES work.

How does a value get returned in a Java method? [duplicate]

This question already has answers here:
What happens to the stack when exiting a method?
(6 answers)
Closed 3 years ago.
So I'm wondering how is a value returned in Java. If I call function B from function A then we would allocate B's stackframe onto the callstack. Lets say in B we create a variable called Var and we want to return Var to A. If B's stackframe is popped off the stack during the return, and Var is a part of B's stackframe, does'nt Var cease to exist? So how do we return a variable from method B to A?

To get the point how the value returning works, probably it would be better to move down to low level programming, mainly for better understanding. Let's say we executed a method for certain reasons and the result of that method was something that would be returned. The result would be saved in a special register in memory which is responsible only for saving the values that returned from the methods. In low level programming languages(like Assembly) it is noted $v0. I do think that there are two returning registers $v0, $v1. SO lets consider the following example: We have 3 methods. A(), B() and C(). A() needs a value from the B() method and B() needs a value from the C() method. In the compilation process the C() method will be executed first and a special register called Return Address($ra) will remember where the method C() was called(lets say it was called at position 0x100). Also let's suppose that the position of C() in memory is 0x200. Once C() finishes executing the register $v0 will contain the value that C() returned. Now in order to go back and continue executing method B() we have to jump at the address that we saved in $ra. That is made with a function called jr(jump register) - jr $ra. In order to use the returned value from C() now we can access the $v0 register which contains the needed information. Once it is accessed, the value of the $v0 register is not safe anymore, so it might change in any moment, but we have the job done. Same works for B() in which the return value will be overridden in the $v0 and will be used by A(). Actually this job is done by JVM, and the steps followed might not be exactly the same, but in general that is the way how it works.

Fully answering this will require understanding of java byte codes, which I'm leaving as an exercise. Java byte codes are detailed here:
https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html
For example 'lreturn' has this description:
Return long from method
The current method must have return type long. The value must be of
type long. If the current method is a synchronized method, the monitor
entered or reentered on invocation of the method is updated and
possibly exited as if by execution of a monitorexit instruction
(§monitorexit) in the current thread. If no exception is thrown, value
is popped from the operand stack of the current frame (§2.6) and
pushed onto the operand stack of the frame of the invoker. Any other
values on the operand stack of the current method are discarded.
The interpreter then returns control to the invoker of the method,
reinstating the frame of the invoker.
The starting point is the java utility "javap", which would be invoked with "-v to produce a verbose display of a target class:
javap -v Tester.class
For example, for the code:
private final long longV;
public long getLong(long adj) {
return longV + adj;
}
These byte codes are generated:
public long getLong(long);
descriptor: (J)J
flags: ACC_PUBLIC
Code:
stack=4, locals=3, args_size=2
0: aload_0
1: getfield #135 // Field longV:J
4: lload_1
5: ladd
6: lreturn
LineNumberTable:
line 77: 0
LocalVariableTable:
Start Length Slot Name Signature
0 7 0 this Lmy/tests/Tester;
0 7 1 adj J

Passing reference to an object [duplicate]

This question already has answers here:
Is Java "pass-by-reference" or "pass-by-value"?
(93 answers)
Closed 7 years ago.
I am checking whether the specified word can be formed on this boggle board with the canForm method. The board has a graph field which indicates adjacent tiles. I do a DFS and set answer to true if the word can be formed.
I understand why the code as it is below doesn't work: answer is a primitive, its value is copied at every recursion and the initial answer (in the public method) stays false.
If I change boolean answer to Set<String> answer = new HashSet<>() for instance, pass the reference to the set in recursion, eventually add the successfully formed word and test for emptiness in the end, it works.
But why does it not work if I simply declare Boolean answer = new Boolean(false) and pass this container? It passes the reference to the object all right, but it mysteriously changes the reference at assignment answer = true (as seen through the debugger), and the initial answer isn't reset. I don't understand.
public boolean canForm(String word) {
boolean answer = false;
int n = M * N;
char initial = word.charAt(0);
// for each tile that is the first letter of word
for (int u = 0; u < n; u++) {
char c = getLetter(u / N, u % N);
if (c == initial) {
boolean[] marked = new boolean[n];
marked[u] = true;
canForm(u, word, 1, marked, answer);
}
}
return !answer;
}
private void canForm(int u, String word, int d, boolean[] marked, boolean answer) {
if (word.length() == d) {
answer = true;
return;
}
for (int v : graph.adj(u)) {
char c = getLetter(v / N, v % N);
if (c == word.charAt(d) && !marked[v]) {
marked[v] = true;
canForm(v, word, d + 1, marked, answer);
}
}
}

Ah, you're using Java. This matters greatly.
Java is, exclusively, a pass by value language. As such, when you call canForm(int, String, int, boolean[], boolean), you are doing two things:
You are creating 5 new variables in the scope of the canForm method
You are initializing them with values from the call site
Altering the values of of the new variables you created will not have any affect on the values of those variables at the call site. Any re-assignments you make will be lost when the method call ends and will have no impact on the values at the call site.
However, in the case of arrays or objects, the "value" being passed is actually a reference to an object. This can be a bit confusing, but it's like the caller and the method each have their own personal key to a shared mailbox. Either could lose their key or replace it with a key to a different mailbox without affecting the other's ability to access the mailbox.
As such, the method can alter the value of the reference (marked = new boolean[]) without altering the caller's reference. However, if the method changes content WITHIN the referenced structure (marked[0] = false) it will be seen by the caller. It's like the method opened the shared mailbox and changed the mail inside. Regardless of which key you use to open it, you'll see the same changed state.
A great analysis: http://javadude.com/articles/passbyvalue.htm
In general:
If you want to return a value from a method, that should be the return value of the method. This makes the code easier to understand, less likely to have side effects and may make it easier for the compiler to optimize.
If you want to return two values, create an object to hold both of them (A general purpose type safe container called a "Tuple" is available in many languages and frameworks).
If you really need to move state between function calls -- and you usually don't -- wrap it with an object to get the equivalent of pass by reference semantics. That's essentially what you're doing when you add your results to a shared Set. Just be aware: programming with side effects, be they shared objects or global state, is defect prone. As you pass a mutable object around, it becomes hard to keep all the potential mechanisms for change in your head. If you can do a job by exclusively through return values, you should try to do so. Some might calls this a "functional [programming] style."
When you give two methods with different intent the same name, it creates confusion, both for readers and in some cases for the compiler. Be specific. We're in no danger of running out of characters.
Finally, you may want to read up on tail recursion. Due to that loop, I believe this implementation may be a stack overflow waiting to happen -- just give it a string that's longer that your stack is deep.

Does the JVM optimize aliased variables?

I'm curious. If I make a few local variables that are used simply as aliases to other variables (that is, a new "alias" variable is simply assigned the value of some other variable and the alias' assignment never changes), does the JVM optimize this by using the original variables directly?
Say, for example, that I'm writing an (immutable) quaternion class (a quaternion is a vector with four values). I write a multiply() method that takes another Quaternion, multiplies the two, and returns the resulting Quaternion. To save on typing and increase readability, I make several aliases before performing the actual computation:
public class Quaternion {
private double[] qValues;
public Quaternion(double q0, double q1, double q2, double q3) {
qValues = new double[] {q0, q1, q2, q3};
}
// ...snip...
public Quaternion multiply(Quaternion other) {
double a1 = qValues[0],
b1 = qValues[1],
c1 = qValues[2],
d1 = qValues[3],
a2 = other.qValues[0],
b2 = other.qValues[1],
c2 = other.qValues[2],
d2 = other.qValues[3];
return new Quaternion(
a1*a2 - b1*b2 - c1*c2 - d1*d2,
a1*b2 + b1*a2 + c1*d2 - d1*c2,
a1*c2 - b1*d2 + c1*a2 + d1*b2,
a1*d2 + b1*c2 - c1*b2 + d1*a2
);
}
}
So, in this case, would the JVM do away with a1, b1, etc. and simply use qValues[n] and other.qValues[n] directly?

There is no such thing as an alias as you've described it in Java. When you assign a value from one memory location to a new variable, the JVM makes a copy of that value. If it were to create an alias, changing the underlying arrays during the calculation from another thread would alter the result. This does not happen in your example because you specifically told the JVM to make copies of the values first.
If you're worried about performance, don't be. Program correctness trumps all performance concerns. Any program that produces incorrect results faster is worthless. I'm not saying that accessing the arrays directly inside the calculation will necessarily produce incorrect results as I haven't seen the rest of the code, but I am saying that this type of micro-optimization is not worth your effort without first finding a performance problem and next performing timing tests to verify where that problem lies.

The javac compiler won't do it. Disassembling a simple piece of code like this:
int a = 1;
int b = a;
System.out.println("" + (a - b));
Shows:
0: iconst_1
1: istore_1
2: iload_1
3: istore_2
...
But this is what the interpreter will be executing (and even the interpreter sometimes can do some basic optimization). The JIT compiler will handle these kinds of optimizations and many others; in the case of you method, it's even small enough to be inlined, so you don't even get the method call overhead once the JIT kicks in.
(e.g., in my example, the JIT can very easily do constant propagation and just do away with the variables and the calculation, just calling using "" + 0 as the argument to the println() method.)
But, at the end, just follow what JIT hackers always say: write your code to be maintainable. Don't worry about what the JIT will or will not do.
(Note: David is correct about variables not being "aliases", but copies of the original values.)

As other answers have pointed out, there no concept of aliasing variables in Java, and the value of a variable is stored per variable declared.
Using local variables to store values of an array for future calculations is a better idea as it makes code more readable, and eliminates extra reads from an array.
That being said, creating local variables does increase the size of the allocated java stack frame in the thread for the method. This would not be an issue in this specific question, but greater number of local variables would increase the stack size required for execution. This would be especially relevant if recursion is involved.

Your program should work but the answer to your questions is no. JVM will not treat a1 as an alias of qValues[0], instead it copy the value of the latter to the former.
Check this good referenece: http://www.yoda.arachsys.com/java/passing.html

variable definition and assignment detect asm bytecode

I am trying to use the ASM bytecode tree API for static analysis of Java Code.
I have a ClassNode cn, MethodNode m and the list of instructions in that method say InsnList list.
Suppose for a given instruction( i.e. AbstractInsnNode) s, I need to find all the definitions/assignments of the variable at s in the above instruction list. To make it more clear, suppose a variable var is defined and initialized on line 2, then assigned some other value on line number 8 and then used on line number 12. Line number 12 is my s, in this case. Also, assume lots of conditional code in the lines in between.
Is this possible to do with ASM? How??
Thanks and Regards,
SJ
For clarity,
public void funcToAnalyze(String k, SomeClass v) {
int numIter = 0;
/*
Do cool stuff here.... modifies member variables and passed params too
*/
if (v.rank > 1 || numIter>200) {
magicFunction(k, 1);
}
}
Here, suppose the conditional is the JumpInsnNode (current instruction) and I need to find if (and where) any of the variables in the conditional (v.rank and numIter in this case) are modified or assigned anywhere in the above code. Keep it simple, just member variables (no static function or delegation to function of another class).

The SourceInterpreter computes SourceValues
for each Frame for a corresponding instruction in MethodNode. Basically it tells which instructions could place value to a given variable or stack slot.
Also see ASM User Guide for more information about ASM analysis package.
However if you just need to detect if certain variable been assigned, then all you have to do is to look for xSTORE instructions with corresponding variable indexes.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Multithreading Value Corruption - java

Related

Java AtomincInteger vs one element array for streams

How does a value get returned in a Java method? [duplicate]

Passing reference to an object [duplicate]

Does the JVM optimize aliased variables?

variable definition and assignment detect asm bytecode

Categories

Resources