Why does Guava Enums.ifPresent use synchronized under the hood?

Why does Guava Enums.ifPresent use synchronized under the hood? - java

Guava's Enums.ifPresent(Class, String) calls Enums.getEnumConstants under the hood:
#GwtIncompatible // java.lang.ref.WeakReference
static <T extends Enum<T>> Map<String, WeakReference<? extends Enum<?>>> getEnumConstants(
Class<T> enumClass) {
synchronized (enumConstantCache) {
Map<String, WeakReference<? extends Enum<?>>> constants = enumConstantCache.get(enumClass);
if (constants == null) {
constants = populateCache(enumClass);
}
return constants;
}
}
Why does it need a synchronized block? Wouldn't that incur a heavy performance penalty? Java's Enum.valueOf(Class, String) does not appear to need one. Further on if synchronization is indeed necessary, why do it so inefficiently? One would hope if enum is present in cache, it can be retrieved without locking. Only lock if cache needs to be populated.
For Reference: Maven Dependency
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>23.2-jre</version>
</dependency>
Edit: By locking I'm referring to a double checking lock.

I've accepted #maaartinus answer, but wanted to write a separate "answer" about the circumstances behind the question and the interesting rabbit hole it lead me to.
tl;dr - Use Java's Enum.valueOf which is thread safe and does not sync unlike Guava's Enums.ifPresent. Also in majority of cases it probably doesn't matter.
Long story:
I'm working on a codebase that utilizes light weight java threads Quasar Fibers. In order to harness the power of Fibers, the code they run should be primarily async and non-blocking because Fibers are multiplexed to Java/OS Threads. It becomes very important that individual Fibers do not "block" the underlying thread. If underlying thread is blocked, it will block all Fibers running on it and performance degrades considerably. Guava's Enums.ifPresent is one of those blockers and I'm certain it can be avoided.
Initially, I started using Guava's Enums.ifPresent because it returns null on invalid enum values. Unlike Java's Enum.valueOf which throws IllegalArgumentException (which to my taste is less preferrable than a null value).
Here is a crude benchmark comparing various methods of converting to enums:
Java's Enum.valueOf with catching IllegalArgumentException to return null
Guava's Enums.ifPresent
Apache Commons Lang EnumUtils.getEnum
Apache Commons Lang 3 EnumUtils.getEnum
My Own Custom Immutable Map Lookup
Notes:
Apache Common Lang 3 uses Java's Enum.valueOf under the hood and are hence identical
Earlier version of Apache Common Lang uses a very similar WeakHashMap solution to Guava but does not use synchronization. They favor cheap reads and more expensive writes (my knee jerk reaction says that's how Guava should have done it)
Java's decision to throw IllegalArgumentException is likely to have a small cost associated with it when dealing with invalid enum values. Throwing/catching exceptions isn't free.
Guava is the only method here that uses synchronization
Benchmark Setup:
uses an ExecutorService with a fixed thread pool of 10 threads
submits 100K Runnable tasks to convert enums
each Runnable task converts 100 enums
each method of converting enums will convert 10 million strings (100K x 100)
Benchmark Results from a run:
Convert valid enum string value:
JAVA -> 222 ms
GUAVA -> 964 ms
APACHE_COMMONS_LANG -> 138 ms
APACHE_COMMONS_LANG3 -> 149 ms
MY_OWN_CUSTOM_LOOKUP -> 160 ms
Try to convert INVALID enum string value:
JAVA -> 6009 ms
GUAVA -> 734 ms
APACHE_COMMONS_LANG -> 65 ms
APACHE_COMMONS_LANG3 -> 5558 ms
MY_OWN_CUSTOM_LOOKUP -> 92 ms
These numbers should be taken with a heavy grain of salt and will change depending on other factors. But they were good enough for me to conclude to go with Java's solution for the codebase using Fibers.
Benchmark Code:
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import com.google.common.base.Enums;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.ImmutableMap.Builder;
public class BenchmarkEnumValueOf {
enum Strategy {
JAVA,
GUAVA,
APACHE_COMMONS_LANG,
APACHE_COMMONS_LANG3,
MY_OWN_CUSTOM_LOOKUP;
private final static ImmutableMap<String, Strategy> lookup;
static {
Builder<String, Strategy> immutableMapBuilder = ImmutableMap.builder();
for (Strategy strategy : Strategy.values()) {
immutableMapBuilder.put(strategy.name(), strategy);
}
lookup = immutableMapBuilder.build();
}
static Strategy toEnum(String name) {
return name != null ? lookup.get(name) : null;
}
}
public static void main(String[] args) {
final int BENCHMARKS_TO_RUN = 1;
System.out.println("Convert valid enum string value:");
for (int i = 0; i < BENCHMARKS_TO_RUN; i++) {
for (Strategy strategy : Strategy.values()) {
runBenchmark(strategy, "JAVA", 100_000);
}
}
System.out.println("\nTry to convert INVALID enum string value:");
for (int i = 0; i < BENCHMARKS_TO_RUN; i++) {
for (Strategy strategy : Strategy.values()) {
runBenchmark(strategy, "INVALID_ENUM", 100_000);
}
}
}
static void runBenchmark(Strategy strategy, String enumStringValue, int iterations) {
ExecutorService executorService = Executors.newFixedThreadPool(10);
long timeStart = System.currentTimeMillis();
for (int i = 0; i < iterations; i++) {
executorService.submit(new EnumValueOfRunnable(strategy, enumStringValue));
}
executorService.shutdown();
try {
executorService.awaitTermination(1000, TimeUnit.SECONDS);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
long timeDuration = System.currentTimeMillis() - timeStart;
System.out.println("\t" + strategy.name() + " -> " + timeDuration + " ms");
}
static class EnumValueOfRunnable implements Runnable {
Strategy strategy;
String enumStringValue;
EnumValueOfRunnable(Strategy strategy, String enumStringValue) {
this.strategy = strategy;
this.enumStringValue = enumStringValue;
}
#Override
public void run() {
for (int i = 0; i < 100; i++) {
switch (strategy) {
case JAVA:
try {
Enum.valueOf(Strategy.class, enumStringValue);
} catch (IllegalArgumentException e) {}
break;
case GUAVA:
Enums.getIfPresent(Strategy.class, enumStringValue);
break;
case APACHE_COMMONS_LANG:
org.apache.commons.lang.enums.EnumUtils.getEnum(Strategy.class, enumStringValue);
break;
case APACHE_COMMONS_LANG3:
org.apache.commons.lang3.EnumUtils.getEnum(Strategy.class, enumStringValue);
break;
case MY_OWN_CUSTOM_LOOKUP:
Strategy.toEnum(enumStringValue);
break;
}
}
}
}
}

I guess, the reason is simply that enumConstantCache is a WeakHashMap, which is not thread-safe.
Two threads writing to the cache at the same time could end up with an endless loop or alike (at least such thing happened with HashMap as I tried it years ago).
I guess, you could use DCL, but it mayn't be worth it (as stated in a comment).
Further on if synchronization is indeed necessary, why do it so inefficiently? One would hope if enum is present in cache, it can be retrieved without locking. Only lock if cache needs to be populated.
This may get too tricky. For visibility using volatile, you need a volatile read paired with a volatile write. You could get the volatile read easily by declaring enumConstantCache to be volatile instead of final. The volatile write is trickier. Something like
enumConstantCache = enumConstantCache;
may work, but I'm not sure about that.
10 threads, each one having to convert String values to Enums and then perform some task
The Map access is usually way faster than anything you do with the obtained value, so I guess, you'd need much more threads to get a problem.
Unlike HashMap, the WeakHashMap needs to perform some cleanup (called expungeStaleEntries). This cleanup gets performed even in get (via getTable). So get is a modifying operation and you really don't want to execute it concurrently.
Note that reading WeakHashMap without synchronization means performing mutation without locking and it's plain wrong and that's not only theory.
You'd need an own version of WeakHashMap performing no mutations in get (which is simple) and guaranteeing some sane behavior when written during read by a different thread (which may or may not be possible).
I guess, something like SoftReference<ImmutableMap<String, Enum<?>> with some re-loading logic could work well.

Related

Thread safety writing in a map

I have a block of code provided below:
Map<String, BigDecimal> salesMap = new HashMap<>();
orderItems.parallelStream().forEach(orderItem -> {
synchronized (this) {
int itemId = orderItem.getItemId();
Item item = settingsClient.getItemByItemId(itemId);
String revenueCenterName = itemIdAndRevenueCenterNameMap.get(itemId);
updateSalesMap(salesMap, "Gross Sales: " + revenueCenterName, orderItem.getNetSales().toPlainString());
}
});
private void updateSalesMap(Map<String,BigDecimal> salesMap, String key, String amount) {
BigDecimal bd = getSalesAmount(salesMap, key);
int scale = 2;
if (StringUtils.isBlank(amount)) {
amount = "0.00";
}
BigDecimal addMe = BigDecimal.valueOf(Double.valueOf(amount)).setScale(scale, RoundingMode.HALF_UP);
salesMap.put(key, bd.add(addMe));
}
The code works fine, but if I don't use the synchronized block, it will end of varying data in the map. As far I know, the streams are thread safe, so I get curious about whats happening. I tried to use ConcurrentHashMap but it seems nothing changed.
My idea is the map data is not written in the RAM and read/ write is done in the thread cache and hence, we end up having various data.
Is it correct? If so, I will use volatile keyword then using a synchronized block.
Note: just find that I cant declare a variable volatile inside a method.

As far I know, the streams are thread safe, so I get curious about whats happening.
They are. As long as you only operate on the stream itself. The problem is that you try to manipulate other variable at the same time (map in this case). The idea of streams is that operations on each of elements are totally independent - check idea of funcional programming.
I tried to use ConcurrentHashMap but it seems nothing changed.
The issue comes from your approach. The general idea is that atomic operations on ConcurrentHashMap are thread safe. However, if you perform two thread safe operations together, it won't be atomic and thread safe. You need to synchronize it yourself or come up with some other solution.
In updateSalesMap() method you first get value from the map, do some calculations and then update the value. This sequence of operations isn't atomic - performing them on ConcurrentHashMap won't change much.
One of possible ways to achieve concurrency in this case would be to utilize CuncurrentHashMap.compute() Javadocs

You are doing read operation using getSalesAmount(salesMap, key) and write operation using salesMap.put(key, bd.add(addMe)), in separate statements. The non-atomicity of this breakup of these operations is not going to change, irrespective of the kind of Map, you use. The synchronized block will solve this ofcourse.
Alternatively, You can use ConcurrentHashMap's compute(K key, BiFunction<? super K, ? super V, ? extends V> remappingFunction), for the kind of atomicity, you are looking for.

I make the updateSalesMap thread-safe and that works for me:
protected synchronized void updateSalesMap(Map<String, BigDecimal> salesMap, String s, String amount) {
BigDecimal bd = updateSalesAmount(salesMap, s);
int scale = 2;
if (StringUtils.isBlank(amount)) {
amount = "0.00";
}
BigDecimal addMe = BigDecimal.valueOf(Double.valueOf(amount)).setScale(scale, RoundingMode.HALF_UP);
salesMap.put(s, bd.add(addMe));
}

Confused about java concurrency results

So I am studying up on java concurrency by trying to create bad concurrent examples, watch them fail and then fix them.
But the code never seems to be breaking... What am I missing here?
I have a "shared object", being my HotelWithMaximum instance. As far as I can tell, this class is not thread safe:
package playground.concurrent;
import java.util.ArrayList;
import java.util.List;
public class HotelWithMaximum {
private static final int MAXIMUM = 20;
private List<String> visitors = new ArrayList<String>();
public void register(IsVisitor visitor) {
System.out.println("Registering : " + visitor.getId());
System.out.println("Amount of visitors atm: " + visitors.size());
if(visitors.size() < MAXIMUM) {
//At some point, I do expect a thread to be interfering here where the condition is actually evaluated to
//true, but some other thread interfered, adds another visitor, causing the previous thread to go over the limit
System.out.println("REGISTERING ---------------------------------------------------------------------");
//The interference might also happen here i guess...
visitors.add(visitor.getId());
}
else{
System.out.println("We cant register anymore, we have reached our limit! " + visitors.size());
}
}
public int getAmountOfRegisteredVisitors() {
return visitors.size();
}
public void printVisitors() {
for(String visitor: visitors) {
System.out.println(visitors.indexOf(visitor) + " - " + visitor);
}
}
}
The visitors are 'Runnables' (they implement my interface IsVisitor which extends from Runnable), and they are implemented like this:
package playground.concurrent.runnables;
import playground.concurrent.HotelWithMaximum;
import playground.concurrent.IsVisitor;
public class MaxHotelVisitor implements IsVisitor{
private final String id;
private final HotelWithMaximum hotel;
public MaxHotelVisitor(String id, HotelWithMaximum hotel) {
this.hotel = hotel;
this.id = id;
}
public void run() {
System.out.println(String.format("My name is %s and I am trying to register...", id));
hotel.register(this);
}
public String getId() {
return this.id;
}
}
Then, to make all of this run in an example, I have the following code in a different class:
public static void executeMaxHotelExample() {
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(6);
HotelWithMaximum hotel = new HotelWithMaximum();
for(int i = 0; i<100; i++) {
executor.execute(new MaxHotelVisitor("MaxHotelVisitor-" + i, hotel));
}
executor.shutdown();
try{
boolean finished = executor.awaitTermination(30, TimeUnit.SECONDS);
if(finished) {
System.out.println("FINISHED WITH THE MAX HOTEL VISITORS EXAMPLE");
hotel.printVisitors();
}
}
catch(InterruptedException ie) {
System.out.println("Something interrupted me....");
}
}
public static void main(String[] args) {
executeMaxHotelExample();
}
Now, what am I missing? Why does this never seem to fail? The hotel class is not thread safe, right? And the only thing to make it 'enough' thread safe for this example (since no other code is messing with the thread unsafe List in the hotel class ), I should just make the register method "synchronized", right?
The result of the "printVisitors()" method in the main method, always looks like this:
FINISHED WITH THE MAX HOTEL VISITORS EXAMPLE
0 - MaxHotelVisitor-0
1 - MaxHotelVisitor-6
2 - MaxHotelVisitor-7
3 - MaxHotelVisitor-8
4 - MaxHotelVisitor-9
5 - MaxHotelVisitor-10
6 - MaxHotelVisitor-11
7 - MaxHotelVisitor-12
8 - MaxHotelVisitor-13
9 - MaxHotelVisitor-14
10 - MaxHotelVisitor-15
11 - MaxHotelVisitor-16
12 - MaxHotelVisitor-17
13 - MaxHotelVisitor-18
14 - MaxHotelVisitor-19
15 - MaxHotelVisitor-20
16 - MaxHotelVisitor-21
17 - MaxHotelVisitor-22
18 - MaxHotelVisitor-23
19 - MaxHotelVisitor-24
There are nevere more then 20 visitors in the list... I find that quite weird...

ThreadPoolExecutor is from the java.util.concurrent package
The Java Concurrency Utilities framework in the java.util.concurrent package is a library that contains thread-safe types that are used to handle concurrency in Java applications
So ThreadPoolExecutor is taking care of the syncorinous processing
take note: ThreadPoolExecutor uses BlockingQueue to manage its job queue
java.util.concurrent.BlockingQueue is an interface that request all implementations to be thread-safe.
From my understanding one of the main goals of java.util.concurrent was that you can to a large extent operate without the need to use java's low-level concurrency primitives synchronized, volatile, wait(), notify(), and notifyAll() which are difficult to use.
Also note that ThreadPoolExecutor implements ExecutorService which does not guarantee all implementations are thread-safe but according to the documentation
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html
Actions in a thread prior to the submission of a Runnable or Callable task to an ExecutorService happen-before any actions taken by that task, which in turn happen-before the result is retrieved via Future.get().
Explanation of happen-before:
Java™ Language Specification defines the happens-before relation on memory operations such as reads and writes of shared variables. The results of a write by one thread are guaranteed to be visible to a read by another thread only if the write operation happens-before the read operation.
In other words - generally not thread-safe. BUT
The methods of all classes in java.util.concurrent and its subpackages extend these guarantees to higher-level synchronization.

the code never seems to be breaking... What am I missing here?
The Java Language Specification gives implementors a lot of leeway to make the most efficient use of any given multi-processor architecture.
If you obey the rules for writing "safe" multi-threaded code, then that's supposed to guarantee that a correctly implemented JVM will run your program in the way that you expect. But if you break the rules, that does not guarantee that your program will misbehave.
Finding concurrency bugs by testing is a hard problem. A non "thread-safe" program might work 100% of the time on one platform (i.e., architecture/OS/JVM combination), it might always fail on some other platform, and its performance in some third platform might depend on what other processes are running, on the time of day or, on other variables that you can only guess at.

You are right.
You can reproduce the concurrency issues when you use more executors at the same time, say Executors.newFixedThreadPool(100); instead of 6. Then more threads will try it at the same time and the probability is higher. Because the race condition/overflow can only happen once, you will have to run your main more times to get more visitors.
Further you need to add a Thread.yield() at both places where you expect the "interference", to make it more likely to happen. If the execution is very short/fast there will not be a task switch and the execution will be atomic (but not guaranteed).
You might also write the code using ThreadWeaver which does byte code manipulation (adding yields) to make such issues more likely.
With both changes I get 30 and more visitors in the hotel from time to time. I have 2x2 CPUs.

Massive tasks alternative pattern for Runnable or Callable

For massive parallel computing I tend to use executors and callables. When I have thousand of objects to be computed I feel not so good to instantiate thousand of Runnables for each object.
So I have two approaches to solve this:
I. Split the workload into a small amount of x-workers giving y-objects each. (splitting the object list into x-partitions with y/x-size each)
public static <V> List<List<V>> partitions(List<V> list, int chunks) {
final ArrayList<List<V>> lists = new ArrayList<List<V>>();
final int size = Math.max(1, list.size() / chunks + 1);
final int listSize = list.size();
for (int i = 0; i <= chunks; i++) {
final List<V> vs = list.subList(Math.min(listSize, i * size), Math.min(listSize, i * size + size));
if(vs.size() == 0) break;
lists.add(vs);
}
return lists;
}
II. Creating x-workers which fetch objects from a queue.
Questions:
Is creating thousand of Runnables really expensive and to be avoided?
Is there a generic pattern/recommendation how to do it by solution II?
Are you aware of a different approach?

Creating thousands of Runnable (objects implementing Runnable) is not more expensive than creating a normal object.
Creating and running thousands of Threads can be very heavy, but you can use Executors with a pool of threads to solve this problem.

As for the different approach, you might be interested in java 8's parallel streams.

Combining various answers here :
Is creating thousand of Runnables really expensive and to be avoided?
No, it's not in and of itself. It's how you will make them execute that may prove costly (spawning a few thousand threads certainly has its cost).
So you would not want to do this :
List<Computation> computations = ...
List<Thread> threads = new ArrayList<>();
for (Computation computation : computations) {
Thread thread = new Thread(new Computation(computation));
threads.add(thread);
thread.start();
}
// If you need to wait for completion:
for (Thread t : threads) {
t.join();
}
Because it would 1) be unnecessarily costly in terms of OS ressource (native threads, each having a stack on the heap), 2) spam the OS scheduler with a vastly concurrent workload, most certainly leading to plenty of context switchs and associated cache invalidations at the CPU level 3) be a nightmare to catch and deal with exceptions (your threads should probably define an Uncaught exception handler, and you'd have to deal with it manually).
You'd probably prefer an approach where a finite Thread pool (of a few threads, "a few" being closely related to your number of CPU cores) handles many many Callables.
List<Computation> computations = ...
ExecutorService pool = Executors.newFixedSizeThreadPool(someNumber)
List<Future<Result>> results = new ArrayList<>();
for (Computation computation : computations) {
results.add(pool.submit(new ComputationCallable(computation));
}
for (Future<Result> result : results {
doSomething(result.get);
}
The fact that you reuse a limited number threads should yield a really nice improvement.
Is there a generic pattern/recommendation how to do it by solution II?
There are. First, your partition code (getting from a List to a List<List>) can be found inside collection tools such as Guava, with more generic and fail-proofed implementations.
But more than this, two patterns come to mind for what you are achieving :
Use the Fork/Join Pool with Fork/Join tasks (that is, spawn a task with your whole list of items, and each task will fork sub tasks with half of that list, up to the point where each task manages a small enough list of items). It's divide and conquer. See: http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ForkJoinTask.html
If your computation were to be "add integers from a list", it could look like (there might be a boundary bug in there, I did not really check) :
public static class Adder extends RecursiveTask<Integer> {
protected List<Integer> globalList;
protected int start;
protected int stop;
public Adder(List<Integer> globalList, int start, int stop) {
super();
this.globalList = globalList;
this.start = start;
this.stop = stop;
System.out.println("Creating for " + start + " => " + stop);
}
#Override
protected Integer compute() {
if (stop - start > 1000) {
// Too many arguments, we split the list
Adder subTask1 = new Adder(globalList, start, start + (stop-start)/2);
Adder subTask2 = new Adder(globalList, start + (stop-start)/2, stop);
subTask2.fork();
return subTask1.compute() + subTask2.join();
} else {
// Manageable size of arguments, we deal in place
int result = 0;
for(int i = start; i < stop; i++) {
result +=i;
}
return result;
}
}
}
public void doWork() throws Exception {
List<Integer> computation = new ArrayList<>();
for(int i = 0; i < 10000; i++) {
computation.add(i);
}
ForkJoinPool pool = new ForkJoinPool();
RecursiveTask<Integer> masterTask = new Adder(computation, 0, computation.size());
Future<Integer> future = pool.submit(masterTask);
System.out.println(future.get());
}
Use Java 8 parallel streams in order to launch multiple parallel computations easily (under the hood, Java parallel streams can fall back to the Fork/Join pool actually).
Others have shown how this might look like.
Are you aware of a different approach?
For a different take at concurrent programming (without explicit task / thread handling), have a look at the actor pattern. https://en.wikipedia.org/wiki/Actor_model
Akka comes to mind as a popular implementation of this pattern...

#Aaron is right, you should take a look into Java 8's parallel streams:
void processInParallel(List<V> list) {
list.parallelStream().forEach(item -> {
// do something
});
}
If you need to specify chunks, you could use a ForkJoinPool as described here:
void processInParallel(List<V> list, int chunks) {
ForkJoinPool forkJoinPool = new ForkJoinPool(chunks);
forkJoinPool.submit(() -> {
list.parallelStream().forEach(item -> {
// do something with each item
});
});
}
You could also have a functional interface as an argument:
void processInParallel(List<V> list, int chunks, Consumer<V> processor) {
ForkJoinPool forkJoinPool = new ForkJoinPool(chunks);
forkJoinPool.submit(() -> {
list.parallelStream().forEach(item -> processor.accept(item));
});
}
Or in shorthand notation:
void processInParallel(List<V> list, int chunks, Consumer<V> processor) {
new ForkJoinPool(chunks).submit(() -> list.parallelStream().forEach(processor::accept));
}
And then you would use it like:
processInParallel(myList, 2, item -> {
// do something with each item
});
Depending on your needs, the ForkJoinPool#submit() returns an instance of ForkJoinTask, which is a Future and you may use it to check for the status or wait for the end of your task.
You'd most probably want the ForkJoinPool instantiated only once (not instantiate it on every method call) and then reuse it to prevent CPU choking if the method is called multiple times.

Is creating thousand of Runnables really expensive and to be avoided?
Not at all, the runnable/callable interfaces have only one method to implement each, and the amount of "extra" code in each task depends on the code you are running. But certainly no fault of the Runnable/Callable interfaces.
Is there a generic pattern/recommendation how to do it by solution II?
Pattern 2 is more favorable than pattern 1. This is because pattern 1 assumes that each worker will finish at the exact same time. If some workers finish before other workers, they could just be sitting idle since they only are able to work on the y/x-size queues you assigned to each of them. In pattern 2 however, you will never have idle worker threads (unless the end of the work queue is reached and numWorkItems < numWorkers).
An easy way to use the preferred pattern, pattern 2, is to use the ExecutorService invokeAll(Collection<? extends Callable<T>> list) method.
Here is an example usage:
List<Callable<?>> workList = // a single list of all of your work
ExecutorService es = Executors.newCachedThreadPool();
es.invokeAll(workList);
Fairly readable and straightforward usage, and the ExecutorService implementation will automatically use solution 2 for you, so you know that each worker thread has their use time maximized.
Are you aware of a different approach?
Solution 1 and 2 are two common approaches for generic work. Now, there are many different implementation available for you choose from (such as java.util.Concurrent, Java 8 parallel streams, or Fork/Join pools), but the concept of each implementation is generally the same. The only exception is if you have specific tasks in mind with non-standard running behavior.

Java Multithreading large arrays access

My main class, generates multiple threads based on some rules. (20-40 threads live for long time).
Each thread create several threads (short time ) --> I am using executer for this one.
I need to work on Multi dimension arrays in the short time threads --> I wrote it like it is in the code below --> but I think that it is not efficient since I pass it so many times to so many threads / tasks --. I tried to access it directly from the threads (by declaring it as public --> no success) --> will be happy to get comments / advices on how to improve it.
I also look at next step to return a 1 dimension array as a result (which might be better just to update it at the Assetfactory class ) --> and I am not sure how to.
please see the code below.
thanks
Paz
import java.util.concurrent.*;
import java.util.logging.Level;
public class AssetFactory implements Runnable{
private volatile boolean stop = false;
private volatile String feed ;
private double[][][] PeriodRates= new double[10][500][4];
private String TimeStr,Bid,periodicalRateIndicator;
private final BlockingQueue<String> workQueue;
ExecutorService IndicatorPool = Executors.newCachedThreadPool();
public AssetFactory(BlockingQueue<String> workQueue) {
this.workQueue = workQueue;
}
#Override
public void run(){
while (!stop) {
try{
feed = workQueue.take();
periodicalRateIndicator = CheckPeriod(TimeStr, Bid) ;
if (periodicalRateIndicator.length() >0) {
IndicatorPool.submit(new CalcMvg(periodicalRateIndicator,PeriodRates));
}
}
if ("Stop".equals(feed)) {
stop = true ;
}
} // try
catch (InterruptedException ex) {
logger.log(Level.SEVERE, null, ex);
stop = true;
}
} // while
} // run
Here is the CalcMVG class
public class CalcMvg implements Runnable {
private double [][][] PeriodRates = new double[10][500][4];
public CalcMvg(String Periods, double[][][] PeriodRates) {
System.out.println(Periods);
this.PeriodRates = PeriodRates ;
}
#Override
public void run(){
try{
// do some work with the data of PeriodRates array e.g. print it (no changes to array
System.out.println(PeriodRates[1][1][1]);
}
catch (Exception ex){
System.out.println(Thread.currentThread().getName() + ex.getMessage());
logger.log(Level.SEVERE, null, ex);
}
}//run
} // mvg class

There are several things going on here which seem to be wrong, but it is hard to give a good answer with the limited amount of code presented.
First the actual coding issues:
There is no need to define a variable as volatile if only one thread ever accesses it (stop, feed)
You should declare variables that are only used in a local context (run method) locally in that function and not globally for the whole instance (almost all variables). This allows the JIT to do various optimizations.
The InterruptedException should terminate the thread. Because it is thrown as a request to terminate the thread's work.
In your code example the workQueue doesn't seem to do anything but to put the threads to sleep or stop them. Why doesn't it just immediately feed the actual worker-threads with the required workload?
And then the code structure issues:
You use threads to feed threads with work. This is inefficient, as you only have a limited amount of cores that can actually do the work. As the execution order of threads is undefined, it is likely that the IndicatorPool is either mostly idle or overfilling with tasks that have not yet been done.
If you have a finite set of work to be done, the ExecutorCompletionService might be helpful for your task.
I think you will gain the best speed increase by redesigning the code structure. Imagine the following (assuming that I understood your question correctly):
There is a blocking queue of tasks that is fed by some data source (e.g. file-stream, network).
A set of worker-threads equal to the amount of cores is waiting on that data source for input, which is then processed and put into a completion queue.
A specific data set is the "terminator" for your work (e.g. "null"). If a thread encounters this terminator, it finishes it's loop and shuts down.
Now the following holds true for this construct:
Case 1: The data source is the bottle-neck. It cannot be speed-up by using multiple threads, as your harddisk/network won't work faster if you ask more often.
Case 2: The processing power on your machine is the bottle neck, as you cannot process more data than the worker threads/cores on your machine can handle.
In both cases the conclusion is, that the worker threads need to be the ones that seek for new data as soon as they are ready to process it. As either they need to be put on hold or they need to throttle the incoming data. This will ensure maximum throughput.
If all worker threads have terminated, the work is done. This can be i.E. tracked through the use of a CyclicBarrier or Phaser class.
Pseudo-code for the worker threads:
public void run() {
DataType e;
try {
while ((e = dataSource.next()) != null) {
process(e);
}
barrier.await();
} catch (InterruptedException ex) {
}
}
I hope this is helpful on your case.

Passing the array as an argument to the constructor is a reasonable approach, although unless you intend to copy the array it isn't necessary to initialize PeriodRates with a large array. It seems wasteful to allocate a large block of memory and then reassign its only reference straight away in the constructor. I would initialize it like this:
private final double [][][] PeriodRates;
public CalcMvg(String Periods, double[][][] PeriodRates) {
System.out.println(Periods);
this.PeriodRates = PeriodRates;
}
The other option is to define CalcMvg as an inner class of AssetFactory and declare PeriodRate as final. This would allow instances of CalcMvg to access PeriodRate in the outer instance of AssetFactory.
Returning the result is more difficult since it involves publishing the result across threads. One way to do this is to use synchronized methods:
private double[] result = null;
private synchronized void setResult(double[] result) {
this.result = result;
}
public synchronized double[] getResult() {
if (result == null) {
throw new RuntimeException("Result has not been initialized for this instance: " + this);
}
return result;
}
There are more advanced multi-threading concepts available in the Java libraries, e.g. Future, that might be appropriate in this case.
Regarding your concerns about the number of threads, allowing a library class to manage the allocation of work to a thread pool might solve this concern. Something like an Executor might help with this.

Java Reflection Performance

Does creating an object using reflection rather than calling the class constructor result in any significant performance differences?

Yes - absolutely. Looking up a class via reflection is, by magnitude, more expensive.
Quoting Java's documentation on reflection:
Because reflection involves types that are dynamically resolved, certain Java virtual machine optimizations can not be performed. Consequently, reflective operations have slower performance than their non-reflective counterparts, and should be avoided in sections of code which are called frequently in performance-sensitive applications.
Here's a simple test I hacked up in 5 minutes on my machine, running Sun JRE 6u10:
public class Main {
public static void main(String[] args) throws Exception
{
doRegular();
doReflection();
}
public static void doRegular() throws Exception
{
long start = System.currentTimeMillis();
for (int i=0; i<1000000; i++)
{
A a = new A();
a.doSomeThing();
}
System.out.println(System.currentTimeMillis() - start);
}
public static void doReflection() throws Exception
{
long start = System.currentTimeMillis();
for (int i=0; i<1000000; i++)
{
A a = (A) Class.forName("misc.A").newInstance();
a.doSomeThing();
}
System.out.println(System.currentTimeMillis() - start);
}
}
With these results:
35 // no reflection
465 // using reflection
Bear in mind the lookup and the instantiation are done together, and in some cases the lookup can be refactored away, but this is just a basic example.
Even if you just instantiate, you still get a performance hit:
30 // no reflection
47 // reflection using one lookup, only instantiating
Again, YMMV.

Yes, it's slower.
But remember the damn #1 rule--PREMATURE OPTIMIZATION IS THE ROOT OF ALL EVIL
(Well, may be tied with #1 for DRY)
I swear, if someone came up to me at work and asked me this I'd be very watchful over their code for the next few months.
You must never optimize until you are sure you need it, until then, just write good, readable code.
Oh, and I don't mean write stupid code either. Just be thinking about the cleanest way you can possibly do it--no copy and paste, etc. (Still be wary of stuff like inner loops and using the collection that best fits your need--Ignoring these isn't "unoptimized" programming, it's "bad" programming)
It freaks me out when I hear questions like this, but then I forget that everyone has to go through learning all the rules themselves before they really get it. You'll get it after you've spent a man-month debugging something someone "Optimized".
EDIT:
An interesting thing happened in this thread. Check the #1 answer, it's an example of how powerful the compiler is at optimizing things. The test is completely invalid because the non-reflective instantiation can be completely factored out.
Lesson? Don't EVER optimize until you've written a clean, neatly coded solution and proven it to be too slow.

You may find that A a = new A() is being optimised out by the JVM.
If you put the objects into an array, they don't perform so well. ;)
The following prints...
new A(), 141 ns
A.class.newInstance(), 266 ns
new A(), 103 ns
A.class.newInstance(), 261 ns
public class Run {
private static final int RUNS = 3000000;
public static class A {
}
public static void main(String[] args) throws Exception {
doRegular();
doReflection();
doRegular();
doReflection();
}
public static void doRegular() throws Exception {
A[] as = new A[RUNS];
long start = System.nanoTime();
for (int i = 0; i < RUNS; i++) {
as[i] = new A();
}
System.out.printf("new A(), %,d ns%n", (System.nanoTime() - start)/RUNS);
}
public static void doReflection() throws Exception {
A[] as = new A[RUNS];
long start = System.nanoTime();
for (int i = 0; i < RUNS; i++) {
as[i] = A.class.newInstance();
}
System.out.printf("A.class.newInstance(), %,d ns%n", (System.nanoTime() - start)/RUNS);
}
}
This suggest the difference is about 150 ns on my machine.

If there really is need for something faster than reflection, and it's not just a premature optimization, then bytecode generation with ASM or a higher level library is an option. Generating the bytecode the first time is slower than just using reflection, but once the bytecode has been generated, it is as fast as normal Java code and will be optimized by the JIT compiler.
Some examples of applications which use code generation:
Invoking methods on proxies generated by CGLIB is slightly faster than Java's dynamic proxies, because CGLIB generates bytecode for its proxies, but dynamic proxies use only reflection (I measured CGLIB to be about 10x faster in method calls, but creating the proxies was slower).
JSerial generates bytecode for reading/writing the fields of serialized objects, instead of using reflection. There are some benchmarks on JSerial's site.
I'm not 100% sure (and I don't feel like reading the source now), but I think Guice generates bytecode to do dependency injection. Correct me if I'm wrong.

"Significant" is entirely dependent on context.
If you're using reflection to create a single handler object based on some configuration file, and then spending the rest of your time running database queries, then it's insignificant. If you're creating large numbers of objects via reflection in a tight loop, then yes, it's significant.
In general, design flexibility (where needed!) should drive your use of reflection, not performance. However, to determine whether performance is an issue, you need to profile rather than get arbitrary responses from a discussion forum.

There is some overhead with reflection, but it's a lot smaller on modern VMs than it used to be.
If you're using reflection to create every simple object in your program then something is wrong. Using it occasionally, when you have good reason, shouldn't be a problem at all.

Yes there is a performance hit when using Reflection but a possible workaround for optimization is caching the method:
Method md = null; // Call while looking up the method at each iteration.
millis = System.currentTimeMillis( );
for (idx = 0; idx < CALL_AMOUNT; idx++) {
md = ri.getClass( ).getMethod("getValue", null);
md.invoke(ri, null);
}
System.out.println("Calling method " + CALL_AMOUNT+ " times reflexively with lookup took " + (System.currentTimeMillis( ) - millis) + " millis");
// Call using a cache of the method.
md = ri.getClass( ).getMethod("getValue", null);
millis = System.currentTimeMillis( );
for (idx = 0; idx < CALL_AMOUNT; idx++) {
md.invoke(ri, null);
}
System.out.println("Calling method " + CALL_AMOUNT + " times reflexively with cache took " + (System.currentTimeMillis( ) - millis) + " millis");
will result in:
[java] Calling method 1000000 times reflexively with lookup took 5618 millis
[java] Calling method 1000000 times reflexively with cache took 270 millis

Interestingly enough, settting setAccessible(true), which skips the security checks, has a 20% reduction in cost.
Without setAccessible(true)
new A(), 70 ns
A.class.newInstance(), 214 ns
new A(), 84 ns
A.class.newInstance(), 229 ns
With setAccessible(true)
new A(), 69 ns
A.class.newInstance(), 159 ns
new A(), 85 ns
A.class.newInstance(), 171 ns

Reflection is slow, though object allocation is not as hopeless as other aspects of reflection. Achieving equivalent performance with reflection-based instantiation requires you to write your code so the jit can tell which class is being instantiated. If the identity of the class can't be determined, then the allocation code can't be inlined. Worse, escape analysis fails, and the object can't be stack-allocated. If you're lucky, the JVM's run-time profiling may come to the rescue if this code gets hot, and may determine dynamically which class predominates and may optimize for that one.
Be aware the microbenchmarks in this thread are deeply flawed, so take them with a grain of salt. The least flawed by far is Peter Lawrey's: it does warmup runs to get the methods jitted, and it (consciously) defeats escape analysis to ensure the allocations are actually occurring. Even that one has its problems, though: for example, the tremendous number of array stores can be expected to defeat caches and store buffers, so this will wind up being mostly a memory benchmark if your allocations are very fast. (Kudos to Peter on getting the conclusion right though: that the difference is "150ns" rather than "2.5x". I suspect he does this kind of thing for a living.)

Yes, it is significantly slower. We were running some code that did that, and while I don't have the metrics available at the moment, the end result was that we had to refactor that code to not use reflection. If you know what the class is, just call the constructor directly.

In the doReflection() is the overhead because of Class.forName("misc.A") (that would require a class lookup, potentially scanning the class path on the filsystem), rather than the newInstance() called on the class. I am wondering what the stats would look like if the Class.forName("misc.A") is done only once outside the for-loop, it doesn't really have to be done for every invocation of the loop.

Yes, always will be slower create an object by reflection because the JVM cannot optimize the code on compilation time. See the Sun/Java Reflection tutorials for more details.
See this simple test:
public class TestSpeed {
public static void main(String[] args) {
long startTime = System.nanoTime();
Object instance = new TestSpeed();
long endTime = System.nanoTime();
System.out.println(endTime - startTime + "ns");
startTime = System.nanoTime();
try {
Object reflectionInstance = Class.forName("TestSpeed").newInstance();
} catch (InstantiationException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
endTime = System.nanoTime();
System.out.println(endTime - startTime + "ns");
}
}

Often you can use Apache commons BeanUtils or PropertyUtils which introspection (basically they cache the meta data about the classes so they don't always need to use reflection).

I think it depends on how light/heavy the target method is. if the target method is very light(e.g. getter/setter), It could be 1 ~ 3 times slower. if the target method takes about 1 millisecond or above, then the performance will be very close. here is the test I did with Java 8 and reflectasm :
public class ReflectionTest extends TestCase {
#Test
public void test_perf() {
Profiler.run(3, 100000, 3, "m_01 by refelct", () -> Reflection.on(X.class)._new().invoke("m_01")).printResult();
Profiler.run(3, 100000, 3, "m_01 direct call", () -> new X().m_01()).printResult();
Profiler.run(3, 100000, 3, "m_02 by refelct", () -> Reflection.on(X.class)._new().invoke("m_02")).printResult();
Profiler.run(3, 100000, 3, "m_02 direct call", () -> new X().m_02()).printResult();
Profiler.run(3, 100000, 3, "m_11 by refelct", () -> Reflection.on(X.class)._new().invoke("m_11")).printResult();
Profiler.run(3, 100000, 3, "m_11 direct call", () -> X.m_11()).printResult();
Profiler.run(3, 100000, 3, "m_12 by refelct", () -> Reflection.on(X.class)._new().invoke("m_12")).printResult();
Profiler.run(3, 100000, 3, "m_12 direct call", () -> X.m_12()).printResult();
}
public static class X {
public long m_01() {
return m_11();
}
public long m_02() {
return m_12();
}
public static long m_11() {
long sum = IntStream.range(0, 10).sum();
assertEquals(45, sum);
return sum;
}
public static long m_12() {
long sum = IntStream.range(0, 10000).sum();
assertEquals(49995000, sum);
return sum;
}
}
}
The complete test code is available at GitHub:ReflectionTest.java

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.