PMD: Avoid instantiating new objects inside loops - java

I've got an issue with the PMD rule Avoid instantiating new objects inside loops. Here is some example code:
import java.awt.Dimension;
public class PMDDemo {
public static void main(final String[] args) {
final Dimension[] arr = new Dimension[10];
for (int i = 0; i < arr.length; i++) {
arr[i] = new Dimension(i, i); // rule violation here
}
}
}
PMD gives me the above mentioned rule violation at the marked spot in the code. How am I supposed to create n instances of a class without creating them within a loop?
I know that some of PMD's rules are controversial (like the onlyOneExit rule). But up to now I at least understood the idea behind them. I don't understand the reasoning behind this rule. Can someone help me with that?

For your specific use case it makes no sense as you keep the reference to the new Object after the loop. So there is no real alternative to your solution.
More generally speaking, creating short lived objects in Java is cheap* (apart from the hidden cost that the GC will run more often). In particular, the allocation is almost free and the time of GC mostly depends on the quantity of reachable objects - dead objects do not increase GC time for typical GC algorithms.
The JIT can also perform various optimisations if it detects that unnecessary objects are created.
Obviously, creating useless is not a recommended practice, but trying to reuse objects is often counterproductive.
As a practical example, you can have a look at this post which shows that creating a new set within a loop is cheaper than creating one before the loop and clearing it at each iteration.
* Thanks #RichardTingle for the link

for (int i = 0; i < arr.length; i++) {
arr[i] = new Dimension(i, i); // rule violation here
}
The Above Pmd can be resolved by
for (int i = 0; i < arr.length; i++) {
arr[i] = createNewDimension(i,i); // rule violation here
}
private static Dimension createNewDimension(i,i) {
return new Dimension(i, i);
}
we should not directly use new operator inside a loop just move this inside a private method.

Related

for list iterator and has to manipulate by index

have a list of bean and I want to manipulate by its index and I tried below way, is there any other way of doing this which is easier and generic?
List<UserBean> resultBean = query.setFirstResult(offset).setMaxResults(limit).getResultList();
for (int i = 0; i < resultBean.size(); i++) {
resultBean.get(i).setChabi(encode(decyptChabi(resultBean.get(i).getChabi())));
}
As you can see from the Java Language Specification (JLS), 14.14, there are two kinds of for loops. The basic for loop, which uses an index, and the enhanced for loop, which doesn't.
You used the basic for loop but violated the DRY (Don't Repeat Yourself) principle in that you're calling resultBean.get(i) twice. To cure that, you can introduce a variable which makes the code much more readable:
for (int i = 0; i < resultBean.size(); i++) {
UserBean user = resultBean.get(i);
user.setChabi(encode(decryptChabi(user.getChabi())));
}
In your example, you don't even need the index variable, so you can replace the basic for loop with an enhanced for loop which would be even more concise:
for (UserBean user : resultBean) {
user.setChabi(encode(decryptChabi(user.getChabi())));
}
Whatever you do, prefer code that is easy to read over code that is fast/easy to write.

GdxRuntimeException: #iterator() cannot be used nested

I'm working on this game which has a World. In this World there are many Units.
The problem stems from the fact that World serves (among other things) two main tasks:
Iterate through each Unit so that they can update their properties based on time passed an so forth.
Find potential targets for each Unit.
In World, this happens:
for (Actor a : stage.getActors())
{
a.draw(batch, 1);
a.act(10);
findTargets((Unit)a);
}
findTargets() is defined as such:
public ArrayList<Unit> findTargets(Unit source) {
double sight = source.SIGHT;
ArrayList<Unit> targets = new ArrayList<Unit>();
for (Actor a : stage.getActors()) {
Unit target = (Unit)a;
if (!(target instanceof Unit))
continue;
if (target.equals(source)) continue;
if (((Unit)target).getPos().dst(source.getPos()) < sight) {
targets.add((Unit)target);
}
}
return targets;
}
The problem is obvious: findTargets() also iterates over every unit, resulting in a nested iteration. However, I'm unsure as to how I should proceed to "un-nest" this, as I'm only seeing a catch 22: Every unit does in effect have to iterate over every other unit to see if they're within their sight-range.
Some fresh eyes on this would be greatly appreciated.
There may be ways to refactor your design to avoid the nesting. But the simplest solution might be to just use the old school for loops for both outer and inner, or just the inner. Don't use the iterator, as that is not allowed here for nested loops. getActors returns a libGDX Array, so just traverse that by index
for (int i=0; i < stage.getActors().size; i++) {
//...etc --> use stage.getActors().items[i] ...

Game optimisation in java concerning ArrayList<Object> elements casting to another variable inside for loop

Is this usage of elements of an ArrayList:
for(int i=0; i<array_list.size(); i++){
Object obj = array_list.get(i);
//do **lots** of stuff with **obj**
}
faster than this one:
for(int i=0; i<array_list.size(); i++){
//do **lots** of stuff with **array_list.get(i)**;
}
It depends on how many times array_list.get(i) is called in the second code. If it is called only once, there is no difference between both methods.
If it's invoked multiple times, saving the value in a variable may be more efficient (it depends on the compiler and the JIT optimizations).
Sample scenario where the first method may be more efficient, compiled using Oracle JDK's javac compiler, assuming the list contains String objects:
for(int i=0; i<array_list.size(); i++){
String obj = array_list.get(i);
System.out.println(obj);
if(!obj.isEmpty()) {
String o = obj.substring(1);
System.out.println(o + obj);
}
}
In this case, obj is saved as a local variable and loaded whenever it is used.
for(int i=0; i<array_list.size(); i++){
System.out.println(array_list.get(i));
if(!array_list.get(i).isEmpty()) {
String o = array_list.get(i).substring(1);
System.out.println(o + array_list.get(i));
}
}
In this case, multiple invokation for List.get are observed in the bytecode.
The performance difference between getting once and a local variable is almost always neglible. But... if you insist on doing it the hardcore way, this is the fast way to go:
ArrayList<Object> array_list = ...
// cache list.size() in variable!
for (int i=0, e=array_list.size(); i < e; ++i) {
// get object only once into local variable
Object object = array_list.get(i);
// do things with object
}
It caches the lists size into a local variable e, to avoid invoking array_list.size() at each loop iteration, as well as each element in the loop to avoid get(index) calls. Be aware that whatever you actually do with the objects in the loop will most likely be by orders of magnitude more expensive than the loop itself.
Therefore, prefer code readability and simply use the advanced for loop syntax:
ArrayList<Object> array_list = ...
for (Object object : array_list) {
// do things with object
}
No hassles, short and clear. Thats worth far more than a few saved clock cycles in most cases.

Is defaulting to an empty lambda better or worse than checking for a potentially null lambda?

I'm working on a small scene graph implementation in Java 8. The basic scene node looks something like this:
public class SceneNode {
private final List<SceneNode> children = new ArrayList<>();
protected Runnable preRender;
protected Runnable postRender;
protected Runnable render;
public final void render() {
preRender.run();
render.run();
for (Renderable child : children) {
child.render();
}
postRender.run();
}
}
This works fine if the Runnables default to () -> {}. However, alternatively I could allow them to be null, but that means that render() method has to look like this:
public final void render() {
if (null != preRender) { preRender.run(); }
if (null != render) { render.run(); }
for (Renderable child : children) {
child.render();
}
if (null != postRender) { postRender.run(); }
}
So my question is, is the implicit cost of the branching introduced by the null check likely to cost more or less than whatever the JVM ends up compiling an empty lambda into? It seems like it should end up costing more to check for null, because a potential branch limits optimization, while presumably the Java compiler or JVM should be smart enough to compile an empty lambda into a no-op.
Interestingly, it seems that checking for null is a little bit faster, than calling an empty lambda or an empty anonymous class, when the JVM is run with the -client argument. When running with -server, the performance is the same for all approaches.
I have done a micro benchmark with Caliper, to test this.
Here is the test class (latest Caliper form git necessary to compile):
#VmOptions("-client")
public class EmptyLambdaTest {
public Runnable emptyLambda = () -> {};
public Runnable emptyAnonymousType = new Runnable() {
#Override
public void run() {}
};
public Runnable nullAbleRunnable;
#Benchmark
public int timeEmptyLambda(int reps){
int dummy = 0;
for (int i = 0; i < reps; i++) {
emptyLambda.run();
dummy |= i;
}
return dummy;
}
#Benchmark
public int timeEmptyAnonymousType(int reps){
int dummy = 0;
for (int i = 0; i < reps; i++) {
emptyAnonymousType.run();
dummy |= i;
}
return dummy;
}
#Benchmark
public int timeNullCheck(int reps){
int dummy = 0;
for (int i = 0; i < reps; i++) {
if (nullAbleRunnable != null) {
nullAbleRunnable.run();
}
dummy |= i;
}
return dummy;
}
}
And here are the benchmark results:
Running with -client
Running with -server
Is defaulting to an empty lambda better or worse than checking for a potentially null lambda?
This is essentially the same as asking if it is better to test for a null String parameter or try to substitute an empty String.
The answer is that it depends on whether you want to treat the null as a programming error ... or not.
My personal opinion is that unexpected nulls should be treated as programming errors, and that you should allow the program to crash with an NPE. That way, the problem will come to your attention earlier and will be easier to track down and fix ... than if you substituted some "make good" value to stop the NPE from being thrown.
But of course, that doesn't apply for expected null values; i.e. when the API javadocs say that a null is a permissible value, and say what it means.
This also relates to how you design your APIs. In this case, the issue is whether your API spec (i.e. the javadoc!) should insist on the programmer providing a no-op lambda, or treat null as meaning the same thing. That boils down to a compromise between:
API client convenience,
API implementor work, and
robustness; e.g. when using the value of an incorrectly initialized variable ...
I'm more concerned about the implications of the runtime performance of using an empty lambda vs using a null and having to do a null check.
My intuition is that testing for null would be faster, but any difference in performance will be small, and that the chances are that it won't be significant to the overall performance of the application.
(UPDATE - Turns out that my intuition is "half right" according to #Balder's micro-benchmarking. For a -client mode JVM, null checking is a bit faster, but not enough to be concerning. For a -server mode JVM, the JIT compiler is apparently optimizing both cases to native code with identical performance.)
I suggest that you treat that you would (or at least should) treat any potential optimization problem:
Put off any optimization until your application is working.
Benchmark the application to see if it is already fast enough
Profile the applications to see where the real hotspots are
Develop and test a putative optimization
Rerun the benchmarks to see if it improved things
Go to step 2.

Hashmap vs Array performance

Is it (performance-wise) better to use Arrays or HashMaps when the indexes of the Array are known? Keep in mind that the 'objects array/map' in the example is just an example, in my real project it is generated by another class so I cant use individual variables.
ArrayExample:
SomeObject[] objects = new SomeObject[2];
objects[0] = new SomeObject("Obj1");
objects[1] = new SomeObject("Obj2");
void doSomethingToObject(String Identifier){
SomeObject object;
if(Identifier.equals("Obj1")){
object=objects[0];
}else if(){
object=objects[1];
}
//do stuff
}
HashMapExample:
HashMap objects = HashMap();
objects.put("Obj1",new SomeObject());
objects.put("Obj2",new SomeObject());
void doSomethingToObject(String Identifier){
SomeObject object = (SomeObject) objects.get(Identifier);
//do stuff
}
The HashMap one looks much much better but I really need performance on this so that has priority.
EDIT: Well Array's it is then, suggestions are still welcome
EDIT: I forgot to mention, the size of the Array/HashMap is always the same (6)
EDIT: It appears that HashMaps are faster
Array: 128ms
Hash: 103ms
When using less cycles the HashMaps was even twice as fast
test code:
import java.util.HashMap;
import java.util.Random;
public class Optimizationsest {
private static Random r = new Random();
private static HashMap<String,SomeObject> hm = new HashMap<String,SomeObject>();
private static SomeObject[] o = new SomeObject[6];
private static String[] Indentifiers = {"Obj1","Obj2","Obj3","Obj4","Obj5","Obj6"};
private static int t = 1000000;
public static void main(String[] args){
CreateHash();
CreateArray();
long loopTime = ProcessArray();
long hashTime = ProcessHash();
System.out.println("Array: " + loopTime + "ms");
System.out.println("Hash: " + hashTime + "ms");
}
public static void CreateHash(){
for(int i=0; i <= 5; i++){
hm.put("Obj"+(i+1), new SomeObject());
}
}
public static void CreateArray(){
for(int i=0; i <= 5; i++){
o[i]=new SomeObject();
}
}
public static long ProcessArray(){
StopWatch sw = new StopWatch();
sw.start();
for(int i = 1;i<=t;i++){
checkArray(Indentifiers[r.nextInt(6)]);
}
sw.stop();
return sw.getElapsedTime();
}
private static void checkArray(String Identifier) {
SomeObject object;
if(Identifier.equals("Obj1")){
object=o[0];
}else if(Identifier.equals("Obj2")){
object=o[1];
}else if(Identifier.equals("Obj3")){
object=o[2];
}else if(Identifier.equals("Obj4")){
object=o[3];
}else if(Identifier.equals("Obj5")){
object=o[4];
}else if(Identifier.equals("Obj6")){
object=o[5];
}else{
object = new SomeObject();
}
object.kill();
}
public static long ProcessHash(){
StopWatch sw = new StopWatch();
sw.start();
for(int i = 1;i<=t;i++){
checkHash(Indentifiers[r.nextInt(6)]);
}
sw.stop();
return sw.getElapsedTime();
}
private static void checkHash(String Identifier) {
SomeObject object = (SomeObject) hm.get(Identifier);
object.kill();
}
}
HashMap uses an array underneath so it can never be faster than using an array correctly.
Random.nextInt() is many times slower than what you are testing, even using array to test an array is going to bias your results.
The reason your array benchmark is so slow is due to the equals comparisons, not the array access itself.
HashTable is usually much slower than HashMap because it does much the same thing but is also synchronized.
A common problem with micro-benchmarks is the JIT which is very good at removing code which doesn't do anything. If you are not careful you will only be testing whether you have confused the JIT enough that it cannot workout your code doesn't do anything.
This is one of the reason you can write micro-benchmarks which out perform C++ systems. This is because Java is a simpler language and easier to reason about and thus detect code which does nothing useful. This can lead to tests which show that Java does "nothing useful" much faster than C++ ;)
arrays when the indexes are know are faster (HashMap uses an array of linked lists behind the scenes which adds a bit of overhead above the array accesses not to mention the hashing operations that need to be done)
and FYI HashMap<String,SomeObject> objects = HashMap<String,SomeObject>(); makes it so you won't have to cast
For the example shown, HashTable wins, I believe. The problem with the array approach is that it doesn't scale. I imagine you want to have more than two entries in the table, and the condition branch tree in doSomethingToObject will quickly get unwieldly and slow.
Logically, HashMap is definitely a fit in your case. From performance standpoint is also wins since in case of arrays you will need to do number of string comparisons (in your algorithm) while in HashMap you just use a hash code if load factor is not too high. Both array and HashMap will need to be resized if you add many elements, but in case of HashMap you will need to also redistribute elements. In this use case HashMap loses.
Arrays will usually be faster than Collections classes.
PS. You mentioned HashTable in your post. HashTable has even worse performance thatn HashMap. I assume your mention of HashTable was a typo
"The HashTable one looks much much
better "
The example is strange. The key problem is whether your data is dynamic. If it is, you could not write you program that way (as in the array case). In order words, comparing between your array and hash implementation is not fair. The hash implementation works for dynamic data, but the array implementation does not.
If you only have static data (6 fixed objects), array or hash just work as data holder. You could even define static objects.

Categories

Resources