How to follow the origin of a value in Java? - java

I have a variable that very rarely gets an incorrect value. Since the system is quite complex I'm having trouble tracing all the code paths that value goes through - there are multiple threads involved, it can be saved and then loaded from a DB and so on. I'm going to try to use a code graph generator to see if I can spot the problem by looking at the ways the setter can be called, by may be there's some other technique. Perhaps wrapping the value with a class that traces the places and changes it goes through? I'm not sure the question is clear enough, but I'd appreciate input from somebody who encountered such a situation.
[Edit] The problem is not easily reproducible and I can't catch it in a debugger. I'm looking for a static analysis or logging technique to help track down the issue.
[Edit 2] Just to make things clearer, the value I'm talking about is a timestamp represented as the number of milliseconds from the Unix epoch (01/01/1970) in a 64-bit long variable. At some unknown point the top 32 bits of the value are truncated generating completely incorrect (and unrecoverable) timestamps.
[Edit 3] OK, thanks to some of your suggestions and to a couple of hours of pouring through the code, I found the culprit. The millisecond-based timestamp was converted into a second-based timestamp by dividing it by 1000 and stored in an int variable. At a later point in code, the second-based timestamp (an int) was multiplied by 1000 and stored into a new long variable. Since both 1000 and the second-based timestamps were int values, the result of the multiplication was truncated before being converted to long. This was a subtle one, thanks to everyone who helped.

If you are using a setter and only a setter to set your value you can add these lines in order to track the thread and stack trace:
public void setTimestamp(long value) {
if(log.idDebugEnabled) {
log.debug("Setting the value to " + value + ". Old value is " + this.timestamp);
log.debug("Thread is " + Thread.currentThread().getName());
log.debug("Stacktrace is", new Throwable()); // we could also iterate on Thread.currentThread().getStackTrace()
}
// check for bad value
if(value & 0xffffffff00000000L == 0L) {
log.warn("Danger Will Robinson", new IlegalValueException());
}
this.timestamp = value;
}
Also, go over the class that contains the field, and make sure that every reference to it is done via the setter (even in private/protected methods)
Edit
Perhaps FindBugs can help in terms of static analysis, I'll try to find the exact rule later.

The fact that 32 bits of the long get changed, rather than the whole value, suggests strongly that this is a threading problem (two threads update the variable at the same time). Since java does not guarantee atomic access to a long value, if two threads update it at the same time, it could end up with half the bits set one way and half the other. This means that the best way to approach the issue is from a threading point of view. Odds are that there is nothing setting the variable in a way that a static analysis tool will show you is an incorrect value, but rather the syncronization and locking strategy around this variable needs to be examined for potential holes.
As a quick fix, you could wrap that value in an AtomicLong.

I agree - if the value is only changed via a setter (no matter what the orgin) - and it better be - then the best way is to modify the setter to do the tracking for you (print stack trace at every setting, possibly only when the value set is a specific one if that cuts down on the clatter)

Multithreaded programming is jsut hard, but there are IDE tools to help. If you have intellij IDEA, you can use the analyze dataflow feature to work out where things gets changed. If won't show you a live flow (its a static analysis tool), but it can give you a great start.
Alternatively, you can use some Aspects and just print out the value of the variable everywhere, but the resulting debugging info will be too overwhelming to be that meaningful.
The solution is to avoid state shared between threads. Use immutable objects, and program functionally.

Two things:
First, to me, it smells as though some caller is treating their timestamp in an integer context, losing your high 32 bits. It may be, as Yishai surmised, threading-related, but I'd look first at the operations being performed. However, naturally, you need to assure that your value is being updated "atomically" - whether with an AtomicLong, as he suggested, or with some other mechanism.
That speculation aside, given that what you're losing is the high 32 bits, and you know it's milliseconds since the epoch, your setter can enforce validity: if the supplied value is less than the timestamp at program start, it's wrong, so reject it, and of course, print a stack trace.

1) Supposing that foo is the name of your variable, you could add something like this to the setter method:
try {
throw new Exception();
}
catch (Exception e) {
System.out.println("foo == " + foo.toString());
e.printStackTrace();
}
How well this will work depends on how frequently the setter is being called. If it's being called thousands of times over the run of your program, you might have trouble finding the bad value in all the stack traces. (I've used this before to troubleshoot a problem like yours. It worked for me.)
2) If you can run your app in a debugger and you can identify programatically bad values for your variable, then you could set a breakpoint in the setter conditional on whatever it is that makes the value bad. But this requires that you can write a test for badness, which maybe you can't do.
3) Since you said (in a subsequent edit) that the problem is the high 32 bits being zeroed, you can specifically test for that before printing your stack trace. That should cut down the amount of debugging output enough to be manageable.

In your question, you speak of a "variable" that has an incorrect value, and suggest that you could try "wrapping the value with a class". Perhaps I'm reading too much into your choice of words, but would like to see a bit more about the design.
Is the value in question a primitive? Is it a field of a large, complex object that is shared between threads? If it is a field of some object, is that object a DTO or does it implement domain-specific behavior?
In general, I'd agree with the previous comments re instrumenting the object of which the "variable" is a field, but more information about the nature and usage of this variable would help guide more precise suggestions.

Based on your description, I don't know if that means it's not feasible to actual debug the app in real time, but if it is, depending on your IDE there's a bunch of debugging options available.
I know that with Eclipse, you can set conditional breakpoints in the setter method for example. You can specify to suspend only when the value gets set to a specific value, and you can also filter by thread, in case you want to focus on a specific thread.

I will rather keep a breakpoint inside the setter. Eclipse allows you to do that.
There are some IDE which allows you to halt ( wait for execution of next instruction ) the program, if the value of variable is changed.

IMO the best way to debug this type of problem is using a field modification breakpoint. (Especially if you're using reflection extensively)
I'm not sure how to do this in eclipse, but in intellij you can just right click on the field and do an "add breakpoint".

Related

Java - need help untangling compact Java notations: orElse, Optional, Lazy

I'm attempting to understand what's happening in this bit of Java code as its owner are no longer around and possibly fixing it or simplifying it. I'm guessing these blocks had a lot more in them at some point and what's left in place was not cleaned up properly.
It seems all occurrences of orElse(false) don't set anything to false and can be removed.
Then the second removeDiscontinued method is returning a boolean that I don't think is used anywhere. Is this just me or this is written in a way that makes it hard to read?
I'm hesitant removing anything from it since I haven't used much of the syntax like orElse, Lazy, Optional. Some help would be much appreciated.
private void removeDiscontinued(Optional<Map<String, JSONArrayCache>> dptCache, Lazy<Set<String>> availableTps) {
dptCache.map(pubDpt -> removeDiscontinued(pubDpt.keySet(), availableTps)).orElse(false);
}
private boolean removeDiscontinued(Set<String> idList, Lazy<Set<String>> availableTps) {
if (availableTps.get().size() > 0) {
Optional.ofNullable(idList).map(trIds -> trIds.removeIf(id -> !availableTps.get().contains(id)))
.orElse(false);
}
return true;
}
This code is indeed extremely silly. I know why - there's a somewhat common, extremely misguided movement around. This movement makes claims that are generally interpreted as 'write it 'functional' and then it is just better'.
That interpretation is obvious horse exhaust. It's just not true.
We can hold a debate on who is to blame for this - is it folks hearing the arguments / reading the blogposts and drawing the wrong conclusions, or is it the 'functional fanfolks' fanning the flames, so to speak, making ridiculous claims that do not hold up?
Point is: This code is using functional style when it is utterly inappropriate to do so and it has turned into a right mess as a result. The code is definitely bad; the author of this code is not a great programmer, but perhaps most of the blame goes to the functional evangelistst. At any rate, it's very difficult to read; no wonder you're having a hard time figuring out what this stuff does.
The fundamental issue
The fundamental issue is that this functional style strongly likes being a side-effect free process: You start with some data, then the functional pipeline (a chain of stream map, orElse, etc operations) produces some new result, and then you do something with that. Nothing within the pipeline should be changing anything, it's just all in service of calculating new things.
Both of your methods fail to do so properly - the return value of the 'pipeline' is ignored in both of them, it's all about the side effects.
You don't want this: The primary point of the pipelines is that they can skip steps, and will aggressively do so if they think they can, and the pipeline assumes no side-effects, so it makes wrong calls.
That orElse is not actually optional - it doesn't seem to do anything, except: It forces the pipeline to run, except the spec doesn't quite guarantee that it will, so this code is in that sense flat out broken, too.
These methods also take in Optional as an argument type which is completely wrong. Optional is okay as a return value for a functional pipeline (such as Stream's own max() etc methods). It's debatable as a return value anywhere else, and it's flat out silly and a style error so bad you should configure your linter to aggressively flag it as not suitable for production code if they show up in a field declaration or as a method argument.
So get rid of that too.
Let's break down what these methods do
Both of them will call map on an Optional. An optional is either 'NONE', which is like null (as in, there is no value), or it is a SOME, which means there is exactly one value.
Both of your methods invoke map on an optional. This operation more or less boils down, in these specific methods, as:
If the optional is NONE, do nothing, silently. Otherwise, perform the operation in the parens.
Thus, to get rid of the Optional in the argument of your first method, just remove that, and then update the calling code so that it decides what to do in case of no value, instead of this pair of methods (which decided: If passing in an optional.NONE, silently do nothing. "Silently do nothing" is an extremely stupid default behaviour mode, which is a large part of why Optional is not great). Clearly it has an Optional from somewhere - either it made it (with e.g. Optional.ofNullable in which case undo that too, or it got one from elsewhere, for example because it does a stream operation and that returned an optional, in which case, replace:
Optional<Map<String, JSONArrayCache>> optional = ...;
removeDiscontinued(thatOptionalThing, availableTps);
with:
optional.map(v -> removeDiscontinued(v, availableTps));
or perhaps simply:
if (optional.isPresent()) {
removeDiscontinued(optional.get(), availableTps);
} else {
code to run otherwise
}
If you don't see how it could be null, great! Optional is significantly worse than NullPointerException in many cases, and so it is here as well: You do NOT want your code to silently do nothing when some value is absent in a place where the programmer of said code wasn't aware of that possibility - an exception is vastly superior: You then know there is a problem, and the exception tells you where. In contrast to the 'silently do not do anything' approach, where it's much harder to tell something is off, and once you realize something is off, you have no idea where to look. Takes literally hundreds of times longer to find the problem.
Thus, then just go with:
removeDiscontinued(thatOptionalThing.get(), availableTps);
which will NPE if the unexpected happens, which is good.
The methods themselves
Get rid of those pipelines, functional is not the right approach here, as you're only interested in the side effects:
private void removeDiscontinued(Map<String, JSONArrayCache> dptCache, Lazy<Set<String>> availableTps) {
Set<String> keys = dptCache.keySet();
if (availableTps.get().size() > 0) {
keys.removeIf(id -> availableTps.get().contains(id));
}
}
That's it - that's all you need, that's what that code does in a very weird, sloppy, borderline broken way.
Specifically:
That boolean return value is just a red herring - the author needed that code to return something so that they could use it as argument in their map operation. The value is completely meaningless. If a styleguide that promises: "Your code will be better if you write it using this style" ends up with extremely confusing pointless variables whose values are irrelevant, get rid of the style guide, I think.
The ofNullable wrap is pointless: That method is private and its only caller cannot possibly pass null there, unless dptCache is an instance of some bizarro broken implementation of the Map interface that deigns to return null when its keySet() method is invoked: If that's happening, definitely fix the problem at the source, don't work around it in your codebase, no sane java reader would expect .keySet to return null there. That ofNullable is just making this stuff hard to read, it doesn't do anything here.
Note that the if (availableTps.get().size() > 0) check is just an optimization. You can leave it out if you want. That optimization isn't going to have any impact unless that dptCache object is a large map (thousands of keys at least).

returning boolean variable or returning condition both are same?

I have been give comment to not use variable in the return statement and instead use condition directly in return statement.
Is there any difference between line 3 and 4 in the code below?
String str = "Hello Sir";
boolean flag = str.contains("Hello");
return(flag);
// instead ask to use below
return(str.contains("Hello"));
I prefer to use variable, as in complex calculations those are helpful in debugging.
There is really no difference here. That variable lives on the stack, so does the value that is returned directly.
So, theoretically, there might be minor minor performance differences between them.
But rest assured: readability is much more important here, therefore I am with you: you can use such an additional variable when it helps the reader. But when you follow clean code principles, another option would be to have a method that only computes that condition and returns the result.
Please note: the "common" practice is to avoid additional variables, so many tools such as PMD or even IDEs suggest you to directly return (see here for a discussion of this aspect).
And finally, coming back on performance. If your method is invoked often enough, the JIT will inline/compile it anyway, and optimize it. If the method isn't invoked often enough, what would we care about a nanosecond more or less of execution time ...
i don't see a difference..
basically it is returning the value directly vs returning a variable containing the value..
Edit: OK the answer looked like a rewrite of the question.. what i meant is that its passing a value (true/false) or passing a variable for the system to unwrap it's value (var -> true/false)
so, better performance for the first option.. but nothing worth going against your personal preference for..

What could possibly go wrong if I replace all "int"s with "long"s?

Let's say I had over 2 billion objects with a unique int id field and decided that I needed to extend that. I replace all the relevant int tokens in the code with long and perform the conversion in my database. Intuition tells me something would go horribly horribly wrong, but I can't figure out what. What would go horribly horribly wrong?
If you take proper precautions like:
testing your code properly,
doing a trial conversion on a development server using a snapshot of the production data,
using proper configuration management on your production system,
backing up your production database before you start the real conversion,
etcetera
then the worst case is that you might need to recover from the backup if the conversion fails ... or you discover bad things soon afterwards.
In short, this is not that different to any change that modifies the database schema.
This is actually a fairly low risk change, in that anything that is going to go wrong will most likely be found at compile time. Just be careful of anything you are calling through reflection, placing into raw collections, etc.
This situation somewhat based on mistakes I've seen/made. It's also somewhat contrived, but it sounds like you're looking for imaginative answers...
Suppose you've got a JDBC wrapper in which parameters are bound to queries using a bind method that's overloaded as void bind(String name, int x), void bind(String name, long x), ... rather than separately-named setInt, setLong, ... methods.
And suppose someone got into the habit of always including a redundant cast when calling these methods (e.g. query.bind("personId", (int) person.getId()); to make the target method explicit, so that, for instance, the return type of person.getId() couldn't be changed to String without introducing a compile error (so anyone changing that aspect of the person class would be forced to come look at this query and make sure it still makes sense).
Now suppose that in your refactor to change the person ID from int to long, you forget to change one of these casts from (int) to (long), and the int cast that used to be redundant is now a narrowing operation.
Three years down the road, when the identifier sequence rolls past 2^31, suddenly that cast (which ironically was supposed to be defensive) is now truncating identifiers, quietly corrupting your database.

Correct way to get a value?

As part of my AP curriculum I am learning java and while working on a project I wondered which of the following is best way to return a value?
public double getQuarters(){
return quarters;
}
or
public void getQuarters(){
System.out.println(quarters);
}
***Note: I now that the second option is not "technically" returning a value but its still showing my the value so why bother?
Your answer would be correct. The second method doesn't return any value at all, so while you might be able to see the output, your program can't. The second method could still be useful for testing or even for a command line application, but it should be named something like printQuarters instead.
public double getQuarters(){
return quarters;
}
Use this incorder to encapsulate quarters and hide it from being accessed by other programs. That means, you have to declare it as private quarters. Let see the second option:
public void getQuarters(){
System.out.println(quarters);
}
However, this seems wrong as getQuarters is not returning anything. Hence it would make more sense to refactor it as
public void printQuarters(){
System.out.println(quarters);
}
You answered your own question. For most definitions of the word "best", you should go with the first option.
Your question, however, does touch on the object-oriented programming topic of accessors and mutators. In your example, "getQuarters" is an accessor. It is usually best to use accessors to retrieve your values. This is one way to adhere to the Open/Closed Principle.
Also, the Java community has a coding convention for this and many tools and libraries depend on code following those conventions.
If all you need to do is display the value when this method is called, and you are ok with console output, then your System.out.println method will do the job. HOWEVER, a function that actually returns the variable is much more semantically correct and useful.
For example, while you may only need to print the variable for your current project, what if you came back later and decided that you were instead going to output your variable to a file? If you wrote your getQuarters function with a println statement, you would need to rewrite the whole thing. On the other hand, if you wrote the function as a return, you wouldn't need to change anything. All you'd have to do is add new code for the file output, and consume the function where needed.
A returning function is therefore much more versatile, although more so in larger code projects.
You return values to a specific point in your program, so that the program can use it to function.
You print values at a specific point in your program, so that you as an end user can see what value you got back for some function.
Depending on the function - for instance, yours - the result of quarters is no longer regarded in the program; all it did was print a value to the screen, and the application doesn't have a [clean|easy] way to get that back to use it.
If your program needs the value to function, then it must be a return. If you need to debug, then you can use System.out.println() where necessary.
However, more times than not, you will be using the return statement.
Option 1 is far superior.
It can be easily Unit Tested.
What if the spec changes and sometimes you want to print the result, other times put it into a database? Option 1 splits apart the logic of obtaining the value from what to do with it. Now, for a single method getQuarters no big deal, but eventually you may have getDimes, getEuros, etc...
What if there may be an error condition on quarters, like the value is illegal? In option 1, you could return a "special" value, like -1.0, or throw an Exception. The client then decides what to do.

Is it inefficient to reference a hashmap in another class multiple times?

Class A
Class A {
public HashMap <Integer,Double> myHashMap;
public A(){
myHashMap = new HashMap()
}
}
class B
Class B {
private A anInstanceOfA;
public B(A a) {
this.anInstanceOfA = a;
}
aMethod(){
anInstanceOfA.myHashMap.get(1); <--getting hashmap value for key = 1
//proceed to use this value, but instead of storing it to a variable
// I use anInstanceOfA.myHashMap.get(1) each time I need that value.
}
In aMethod() I use anInstanceOfA.myHashMap.get(1) to get the value for key = 1. I do that multiple times in aMethod() and I'm wondering if there is any difference in efficiency between using anInstanceOfA.myHashMap.get(1) multiple times or just assigning it to a variable and using the assigned variable multiple times.
I.E
aMethod(){
theValue = anInstanceOfA.myHashMap.get(1);
//proceed to use theValue in my calculations. Is there a difference in efficiency?
}
In theory the JVM can optimise away the difference to be very small (compared to what the rest of the program is doing). However I prefer to make it a local variable as I believe it makes the code clearer (as I can give it a meaningful name)
I suggest you do what you believe is simpler and clearer, unless you have measured a performance difference.
The question seems to be that you want to know if it is more expensive to call get(l) multiple times instead of just once.
The answer to this is yes. The question is if it is enough to matter. The definitive answer is to ask the JVM by profiling. You can, however, guess by looking at the get method in your chosen implementation and consider if you want to do all that work every time.
Note, that there is another reason that you might want to put the value in a variable, namely that you can give it a telling name, making your program easier to maintain in the future.
This seems like a micro-optimization, that really doesn't make much difference in the scheme of things.
As #peter already suggested, 'optimizing' for style/readability is a better rationale for choosing the second option over the first one. Optimizing for speed only starts making sense if you really do a lot of calls, or if the call is very expensive -- both are probably not the case in your current example.
Put it in a local variable, for multiple reasons:
It will be much faster. Reading a local variable is definitely cheaper than a HashMap lookup, probably by a factor of 10-100x.
You can give the local variable a good, meaningful name
Your code will probably be shorter / simpler overall, particularly if you use the local variable many times.
You may get bugs during future maintenance if someone modifies one of the get calls but forgets to change the others. This is a problem whenever you are duplicating code. Using a local variable minimises this risk.
In concurrent situations, the value could theoretically change if the HashMap is modified by some other code. You normally want to get the value once and work with the same value. Although if you are running into problems of this nature you should probably be looking at other solutions first (locking, concurrent collections etc.)

Categories

Resources