Multi threaded object creation slower then in a single thread

Multi threaded object creation slower then in a single thread - java

I have what probably is a basic question. When I create 100 million Hashtables it takes approximately 6 seconds (runtime = 6 seconds per core) on my machine if I do it on a single core. If I do this multi-threaded on 12 cores (my machine has 6 cores that allow hyperthreading) it takes around 10 seconds (runtime = 112 seconds per core).
This is the code I use:
Main
public class Tests
{
public static void main(String args[])
{
double start = System.currentTimeMillis();
int nThreads = 12;
double[] runTime = new double[nThreads];
TestsThread[] threads = new TestsThread[nThreads];
int totalJob = 100000000;
int jobsize = totalJob/nThreads;
for(int i = 0; i < threads.length; i++)
{
threads[i] = new TestsThread(jobsize,runTime, i);
threads[i].start();
}
waitThreads(threads);
for(int i = 0; i < runTime.length; i++)
{
System.out.println("Runtime thread:" + i + " = " + (runTime[i]/1000000) + "ms");
}
double end = System.currentTimeMillis();
System.out.println("Total runtime = " + (end-start) + " ms");
}
private static void waitThreads(TestsThread[] threads)
{
for(int i = 0; i < threads.length; i++)
{
while(threads[i].finished == false)//keep waiting untill the thread is done
{
//System.out.println("waiting on thread:" + i);
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
}
Thread
import java.util.HashMap;
import java.util.Map;
public class TestsThread extends Thread
{
int jobSize = 0;
double[] runTime;
boolean finished;
int threadNumber;
TestsThread(int job, double[] runTime, int threadNumber)
{
this.finished = false;
this.jobSize = job;
this.runTime = runTime;
this.threadNumber = threadNumber;
}
public void run()
{
double start = System.nanoTime();
for(int l = 0; l < jobSize ; l++)
{
double[] test = new double[65];
}
double end = System.nanoTime();
double difference = end-start;
runTime[threadNumber] += difference;
this.finished = true;
}
}
I do not understand why creating the object simultaneously in multiple threads takes longer per thread then doing it in serial in only 1 thread. If I remove the line where I create the Hashtable this problem disappears. If anyone could help me with this I would be greatly thankful.

Update: This problem has an associated bug report and has been fixed with Java 1.7u40. And it was never an issue for Java 1.8 as Java 8 has an entirely different hash table algorithm.
Since you are not using the created objects that operation will get optimized away. So you’re only measuring the overhead of creating threads. This is surely the more overhead the more threads you start.
I have to correct my answer regarding a detail, I didn’t know yet: there is something special with the classes Hashtable and HashMap. They both invoke sun.misc.Hashing.randomHashSeed(this) in the constructor. In other words, their instances escape during construction which has an impact on the memory visibility. This implies that their construction, unlike let’s say for an ArrayList, cannot optimized away, and multi-threaded construction slows down due to what happens inside that method (i.e. synchronization).
As said, that’s special to these classes and of course this implementation (my setup:1.7.0_13). For ordinary classes the construction time goes straight to zero for such code.
Here I add a more sophisticated benchmark code. Watch the difference between DO_HASH_MAP = true and DO_HASH_MAP = false (when false it will create an ArrayList instead which has no such special behavior).
import java.util.*;
import java.util.concurrent.*;
public class AllocBench {
static final int NUM_THREADS = 1;
static final int NUM_OBJECTS = 100000000 / NUM_THREADS;
static final boolean DO_HASH_MAP = true;
public static void main(String[] args) throws InterruptedException, ExecutionException {
ExecutorService threadPool = Executors.newFixedThreadPool(NUM_THREADS);
Callable<Long> task=new Callable<Long>() {
public Long call() {
return doAllocation(NUM_OBJECTS);
}
};
long startTime=System.nanoTime(), cpuTime=0;
for(Future<Long> f: threadPool.invokeAll(Collections.nCopies(NUM_THREADS, task))) {
cpuTime+=f.get();
}
long time=System.nanoTime()-startTime;
System.out.println("Number of threads: "+NUM_THREADS);
System.out.printf("entire allocation required %.03f s%n", time*1e-9);
System.out.printf("time x numThreads %.03f s%n", time*1e-9*NUM_THREADS);
System.out.printf("real accumulated cpu time %.03f s%n", cpuTime*1e-9);
threadPool.shutdown();
}
static long doAllocation(int numObjects) {
long t0=System.nanoTime();
for(int i=0; i<numObjects; i++)
if(DO_HASH_MAP) new HashMap<Object, Object>(); else new ArrayList<Object>();
return System.nanoTime()-t0;
}
}

What about if you do it on 6 cores? Hyperthreading isn't the exact same as having double the cores, so you might want to try the amount of real cores too.
Also the OS won't necessarily schedule each of your threads to their own cores.

Since all you are doing is measuring the time and churning memory, your bottleneck is likely to be in your L3 cache or bus to main memory. In this cases, coordinating the work between threads could be producing so much overhead it is worse instead of better.
This is too long for a comment but your inner loop can be just
double start = System.nanoTime();
for(int l = 0; l < jobSize ; l++){
Map<String,Integer> test = new HashMap<String,Integer>();
}
// runtime is an AtomicLong for thread safety
runtime.addAndGet(System.nanoTime() - start); // time in nano-seconds.
Taking the time can be as slow creating a HashMap so you might not be measuring what you think you if you call the timer too often.
BTW Hashtable is synchronized and you might find using HashMap is faster, and possibly more scalable.

Related

Java speedup processes / threads

I have a rather big ArrayList.
I have to go through every index, and do a expensive calculation
My first idea to speed it up was by putting it into a thread.
It works, but it is still extremely slow. I tinkered around the calculation, to make it less expensive, but its still to slow. The best solution i came up with is basically this one.
public void calculate(){
calculatePart(0);
calculatePart(1);
}
public void calculatePart(int offset) {
new Thread() {
#Override
public void run() {
int i = offset;
while(arrayList.size() > i) {
//Do the calulation
i +=2;
}
}
}.start();
}
Yet this feels like a lazy, unprofessional solution. That is why I'm asking if there is a cleaner and even faster solution

Assuming that doing task on each element doesn't lead to data races, you could leverage the power of parallelism. To maximize the number of computations occurring at the same time, you would have to give tasks to each of the processors available in your system.
In Java, you can get the number of processors (cores) available using this:
int parallelism = Runtime.getRuntime().availableProcessors();
The idea is to create number of threads equal to the available processors.
So, if you have 4 processors available, you can create 4 threads and ask them to process items at a gap of 4.Suppose you have a list of size 10, which needs to be processed in parallel.
Then,
Thread 1 processes items at index 0,4,8
Thread 2 processes items at index 1,5,9
Thread 3 processes items at index 2,6
Thread 4 processes items at index 3,7
I tried to simulate your scenario with the following code:
import java.util.Arrays;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class SpeedUpTest {
public static void main(String[] args) throws InterruptedException, ExecutionException {
long seqTime, twoThreadTime, multiThreadTime;
List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
long time = System.currentTimeMillis();
sequentialProcessing(list);
seqTime = System.currentTimeMillis() - time;
int parallelism = 2;
ExecutorService executorService = Executors.newFixedThreadPool(parallelism);
time = System.currentTimeMillis();
List<Future> tasks = new ArrayList<>();
for (int offset = 0; offset < parallelism; offset++) {
int finalParallelism = parallelism;
int finalOffset = offset;
Future task = executorService.submit(() -> {
int i = finalOffset;
while (list.size() > i) {
try {
processItem(list.get(i));
} catch (InterruptedException e) {
e.printStackTrace();
}
i += finalParallelism;
}
});
tasks.add(task);
}
for (Future task : tasks) {
task.get();
}
twoThreadTime = System.currentTimeMillis() - time;
parallelism = Runtime.getRuntime().availableProcessors();
executorService = Executors.newFixedThreadPool(parallelism);
tasks = new ArrayList<>();
time = System.currentTimeMillis();
for (int offset = 0; offset < parallelism; offset++) {
int finalParallelism = parallelism;
int finalOffset = offset;
Future task = executorService.submit(() -> {
int i = finalOffset;
while (list.size() > i) {
try {
processItem(list.get(i));
} catch (InterruptedException e) {
e.printStackTrace();
}
i += finalParallelism;
}
});
tasks.add(task);
}
for (Future task : tasks) {
task.get();
}
multiThreadTime = System.currentTimeMillis() - time;
log("RESULTS:");
log("Total time for sequential execution : " + seqTime / 1000.0 + " seconds");
log("Total time for execution with 2 threads: " + twoThreadTime / 1000.0 + " seconds");
log("Total time for execution with " + parallelism + " threads: " + multiThreadTime / 1000.0 + " seconds");
}
private static void log(String msg) {
System.out.println(msg);
}
private static void processItem(int index) throws InterruptedException {
Thread.sleep(5000);
}
private static void sequentialProcessing(List<Integer> list) throws InterruptedException {
for (int i = 0; i < list.size(); i++) {
processItem(list.get(i));
}
}
}
OUTPUT:
RESULTS:
Total time for sequential execution : 50.001 seconds
Total time for execution with 2 threads: 25.102 seconds
Total time for execution with 4 threads: 15.002 seconds

High theoretically speaking:
if you have X elements and your calculation must perform N operations on each one then
your computer(processor) must perform X*N operations total, then...
Parallel threads can make it faster only if in the calculation operations there are some of them when thread is waiting (e.g. File or Network operations). That time can be used by other threads. But if all operations are pure CPU (e.g. mathematics) and thread is not waiting - required time to perform X*N operations stays the same.
Also each tread must give other threads ability to take control over CPU at some point. It happens automatically between methods calls or if you have Thread.yield() call in your code.
as example method like:
public void run()
{
long a=0;
for (long i=1; i < Long.MAX_VALUE; i++)
{
a+=i;
}
}
will not give other thread a chance to take control over CPU until it fully completed and exited.

Java: how to test fine-granularity in multithreaded program

In Java, I have simple multithreaded code:
public class ThreadedAlgo {
public static final int threadsCount = 3;
public static void main(String[] args) {
// start timer prior computation
time = System.currentTimeMillis();
// create threads
Thread[] threads = new Thread[threadsCount];
class ToDo implements Runnable {
public void run() { ... }
}
// create job objects
for (int i = 0; i < threadsCount; i++) {
ToDo job = new ToDo();
threads[i] = new Thread(job);
}
// start threads
for (int i = 0; i < threadsCount; i++) {
threads[i].start();
}
// wait for threads above to finish
for (int i = 0; i < threadsCount; i++) {
try {
threads[i].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
// display time after computation
System.out.println("Execution time: " + (System.currentTimeMillis() - time));
}
}
It works fine, now I want to run it for 2 or 3 threads and compute the time spent for computation of each thread. Then I will compare times: note them by t1 and t2, and if |t1 - t2| < small epsilon, I will say that my algorithm performs with fine granularity under some given conditions, that is the time spent by threads is relatively the same.
How can I measure the time of a thread?

Use System.nanoTime() at the beginning and end of the thread (job) methods to calculate the total time spent in each invocation. In your case, all threads will be executed with the same (default) priority, where time slices should be distributed pretty fair. If your threads are interlocked, use 'fair locks' for the same reason; e.g. new ReentrantLock(true);

Add the timing logic inside your Run methods

Java lock/concurrency issue when searching array with multiple threads

I am new to Java and trying to write a method that finds the maximum value in a 2D array of longs.
The method searches through each row in a separate thread, and the threads maintain a shared current maximal value. Whenever a thread finds a value larger than its own local maximum, it compares this value with the shared local maximum and updates its current local maximum and possibly the shared maximum as appropriate. I need to make sure that appropriate synchronization is implemented so that the result is correct regardless of how to computations interleave.
My code is verbose and messy, but for starters, I have this function:
static long sharedMaxOf2DArray(long[][] arr, int r){
MyRunnableShared[] myRunnables = new MyRunnableShared[r];
for(int row = 0; row < r; row++){
MyRunnableShared rr = new MyRunnableShared(arr, row, r);
Thread t = new Thread(rr);
t.start();
myRunnables[row] = rr;
}
return myRunnables[0].sharedMax; //should be the same as any other one (?)
}
For the adapted runnable, I have this:
public static class MyRunnableShared implements Runnable{
long[][] theArray;
private int row;
private long rowMax;
public long localMax;
public long sharedMax;
private static Lock sharedMaxLock = new ReentrantLock();
MyRunnableShared(long[][] a, int r, int rm){
theArray = a;
row = r;
rowMax = rm;
}
public void run(){
localMax = 0;
for(int i = 0; i < rowMax; i++){
if(theArray[row][i] > localMax){
localMax = theArray[row][i];
sharedMaxLock.lock();
try{
if(localMax > sharedMax)
sharedMax = localMax;
}
finally{
sharedMaxLock.unlock();
}
}
}
}
}
I thought this use of a lock would be a safe way to prevent multiple threads from messing with the sharedMax at a time, but upon testing/comparing with a non-concurrent maximum-finding function on the same input, I found the results to be incorrect. I'm thinking the problem might come from the fact that I just say
...
t.start();
myRunnables[row] = rr;
...
in the sharedMaxOf2DArray function. Perhaps a given thread needs to finish before I put it in the array of myRunnables; otherwise, I will have "captured" the wrong sharedMax? Or is it something else? I'm not sure on the timing of things..

I'm not sure if this is a typo or not, but your Runnable implementation declares sharedMax as an instance variable:
public long sharedMax;
rather than a shared one:
public static long sharedMax;
In the former case, each Runnable gets its own copy and will not "see" the values of others. Changing it to the latter should help. Or, change it to:
public long[] sharedMax; // array of size 1 shared across all threads
and you can now create an array of size one outside the loop and pass it in to each Runnable to use as shared storage.
As an aside: please note that there will be tremendous lock contention since every thread checks the common sharedMax value by holding a lock for every iteration of its loop. This will likely lead to poor performance. You'd have to measure, but I'd surmise that letting each thread find the row maximum and then running a final pass to find the "max of maxes" might actually be comparable or quicker.

From JavaDocs:
public interface Callable
A task that returns a result and may
throw an exception. Implementors define a single method with no
arguments called call.
The Callable interface is similar to Runnable, in that both are
designed for classes whose instances are potentially executed by
another thread. A Runnable, however, does not return a result and
cannot throw a checked exception.
Well, you can use Callable to calculate your result from one 1darray and wait with an ExecutorService for the end. You can now compare each result of the Callable to fetch the maximum. The code may look like this:
Random random = new Random(System.nanoTime());
long[][] myArray = new long[5][5];
for (int i = 0; i < 5; i++) {
myArray[i] = new long[5];
for (int j = 0; j < 5; j++) {
myArray[i][j] = random.nextLong();
}
}
ExecutorService executor = Executors.newFixedThreadPool(myArray.length);
List<Future<Long>> myResults = new ArrayList<>();
// create a callable for each 1d array in the 2d array
for (int i = 0; i < myArray.length; i++) {
Callable<Long> callable = new SearchCallable(myArray[i]);
Future<Long> callResult = executor.submit(callable);
myResults.add(callResult);
}
// This will make the executor accept no new threads
// and finish all existing threads in the queue
executor.shutdown();
// Wait until all threads are finish
while (!executor.isTerminated()) {
}
// now compare the results and fetch the biggest one
long max = 0;
for (Future<Long> future : myResults) {
try {
max = Math.max(max, future.get());
} catch (InterruptedException | ExecutionException e) {
// something bad happend...!
e.printStackTrace();
}
}
System.out.println("The result is " + max);
And your Callable:
public class SearchCallable implements Callable<Long> {
private final long[] mArray;
public SearchCallable(final long[] pArray) {
mArray = pArray;
}
#Override
public Long call() throws Exception {
long max = 0;
for (int i = 0; i < mArray.length; i++) {
max = Math.max(max, mArray[i]);
}
System.out.println("I've got the maximum " + max + ", and you guys?");
return max;
}
}

Your code has serious lock contention and thread safety issues. Even worse, it doesn't actually wait for any of the threads to finish before the return myRunnables[0].sharedMax which is a really bad race condition. Also, using explicit locking via ReentrantLock or even synchronized blocks is usually the wrong way of doing things unless you're implementing something low level (eg your own/new concurrent data structure)
Here's a version that uses the Future concurrent primitive and an ExecutorService to handle the thread creation. The general idea is:
Submit a number of concurrent jobs to your ExecutorService
Add the Future returned backed from submit(...) to a List
Loop through the list calling get() on each Future and aggregating the result
This version has the added benefit that there is no lock contention (or locking in general) between the worker threads as each just returns back the max for its slice of the array.
import java.util.concurrent.*;
import java.util.*;
public class PMax {
public static long pmax(final long[][] arr, int numThreads) {
ExecutorService pool = Executors.newFixedThreadPool(numThreads);
try {
List<Future<Long>> list = new ArrayList<Future<Long>>();
for(int i=0;i<arr.length;i++) {
// put sub-array in a final so the inner class can see it:
final long[] subArr = arr[i];
list.add(pool.submit(new Callable<Long>() {
public Long call() {
long max = Long.MIN_VALUE;
for(int j=0;j<subArr.length;j++) {
if( subArr[j] > max ) {
max = subArr[j];
}
}
return max;
}
}));
}
// find the max of each slice's max:
long max = Long.MIN_VALUE;
for(Future<Long> future : list) {
long threadMax = future.get();
System.out.println("threadMax: " + threadMax);
if( threadMax > max ) {
max = threadMax;
}
}
return max;
} catch( RuntimeException e ) {
throw e;
} catch( Exception e ) {
throw new RuntimeException(e);
} finally {
pool.shutdown();
}
}
public static void main(String args[]) {
int x = 1000;
int y = 1000;
long max = Long.MIN_VALUE;
long[][] foo = new long[x][y];
for(int i=0;i<x;i++) {
for(int j=0;j<y;j++) {
long r = (long)(Math.random() * 100000000);
if( r > max ) {
// save this to compare against pmax:
max = r;
}
foo[i][j] = r;
}
}
int numThreads = 32;
long pmax = pmax(foo, numThreads);
System.out.println("max: " + max);
System.out.println("pmax: " + pmax);
}
}
Bonus: If you're calling this method repeatedly then it would probably make sense to pull the ExecutorService creation out of the method and have it be reused across calls.

Well, that definetly is an issue - but without more code it is hard to understand if it is the only thing.
There is basically a race condition between the access of thread[0] (and this read of sharedMax) and the modification of the sharedMax in other threads.
Think what happens if the scheduler decides to let no let any thread run for now - so when you are done creating the threads, you will return the answer without modifying it even once! (of course there are other possible scenarios...)
You can overcome it by join()ing all threads before returning an answer.

Issues with using too many Threads a benchmark program

I've programmed a (very simple) benchmark in Java. It simply increments a double value up to a specified value and takes the time.
When I use this singlethreaded or with a low amount of threads (up to 100) on my 6-core desktop, the benchmark returns reasonable and repeatable results.
But when I use for example 1200 threads, the average multicore duration is significantly lower than the singlecore duration (about 10 times or more). I've made sure that the total amount of incrementations is the same, no matter how much threads I use.
Why does the performance drop so much with more threads? Is there a trick to solve this problem?
I'm posting my source, but I don't think, that there is a problem.
Benchmark.java:
package sibbo.benchmark;
import java.text.DecimalFormat;
import java.util.LinkedList;
import java.util.List;
public class Benchmark implements TestFinishedListener {
private static final double TARGET = 1e10;
private static final int THREAD_MULTIPLICATOR = 2;
public static void main(String[] args) throws InterruptedException {
Benchmark b = new Benchmark(TARGET);
b.start();
}
private int coreCount;
private List<Worker> workers = new LinkedList<>();
private List<Worker> finishedWorkers = new LinkedList<>();
private double target;
public Benchmark(double target) {
this.target = target;
getSystemInfos();
printInfos();
}
private void getSystemInfos() {
coreCount = Runtime.getRuntime().availableProcessors();
}
private void printInfos() {
System.out.println("Usable cores: " + coreCount);
System.out.println("Multicore threads: " + coreCount * THREAD_MULTIPLICATOR);
System.out.println("Loops per core: " + new DecimalFormat("###,###,###,###,##0").format(TARGET));
System.out.println();
}
public synchronized void start() throws InterruptedException {
Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
System.out.print("Initializing singlecore benchmark... ");
Worker w = new Worker(this, 0);
workers.add(w);
Thread.sleep(1000);
System.out.println("finished");
System.out.print("Running singlecore benchmark... ");
w.runBenchmark(target);
wait();
System.out.println("finished");
printResult();
System.out.println();
// Multicore
System.out.print("Initializing multicore benchmark... ");
finishedWorkers.clear();
for (int i = 0; i < coreCount * THREAD_MULTIPLICATOR; i++) {
workers.add(new Worker(this, i));
}
Thread.sleep(1000);
System.out.println("finished");
System.out.print("Running multicore benchmark... ");
for (Worker worker : workers) {
worker.runBenchmark(target / THREAD_MULTIPLICATOR);
}
wait();
System.out.println("finished");
printResult();
Thread.currentThread().setPriority(Thread.NORM_PRIORITY);
}
private void printResult() {
DecimalFormat df = new DecimalFormat("###,###,###,##0.000");
long min = -1, av = 0, max = -1;
int threadCount = 0;
boolean once = true;
System.out.println("Result:");
for (Worker w : finishedWorkers) {
if (once) {
once = false;
min = w.getTime();
max = w.getTime();
}
if (w.getTime() > max) {
max = w.getTime();
}
if (w.getTime() < min) {
min = w.getTime();
}
threadCount++;
av += w.getTime();
if (finishedWorkers.size() <= 6) {
System.out.println("Worker " + w.getId() + ": " + df.format(w.getTime() / 1e9) + "s");
}
}
System.out.println("Min: " + df.format(min / 1e9) + "s, Max: " + df.format(max / 1e9) + "s, Av per Thread: "
+ df.format((double) av / threadCount / 1e9) + "s");
}
#Override
public synchronized void testFinished(Worker w) {
workers.remove(w);
finishedWorkers.add(w);
if (workers.isEmpty()) {
notify();
}
}
}
Worker.java:
package sibbo.benchmark;
public class Worker implements Runnable {
private double value = 0;
private long time;
private double target;
private TestFinishedListener l;
private final int id;
public Worker(TestFinishedListener l, int id) {
this.l = l;
this.id = id;
new Thread(this).start();
}
public int getId() {
return id;
}
public synchronized void runBenchmark(double target) {
this.target = target;
notify();
}
public long getTime() {
return time;
}
#Override
public void run() {
synWait();
value = 0;
long startTime = System.nanoTime();
while (value < target) {
value++;
}
long endTime = System.nanoTime();
time = endTime - startTime;
l.testFinished(this);
}
private synchronized void synWait() {
try {
wait();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

You need to understand that the OS (or Java thread scheduler, or both) is trying to balance between all of the threads in your application to give them all a chance to perform some work, and there is a non-zero cost to switch between threads. With 1200 threads, you have just reached (and probably far exceeded) the tipping point wherein the processor is spending more time context switching than doing actual work.
Here is a rough analogy:
You have one job to do in room A. You stand in room A for 8 hours a day, and do your job.
Then your boss comes by and tells you that you have to do a job in room B also. Now you need to periodically leave room A, walk down the hall to room B, and then walk back. That walking takes 1 minute per day. Now you spend 3 hours, 59.5 minutes working on each job, and one minute walking between rooms.
Now imagine that you have 1200 rooms to work in. You are going to spend more time walking between rooms than doing actual work. This is the situation that you have put your processor into. It is spending so much time switching between contexts that no real work gets done.
EDIT: Now, as per the comments below, maybe you spend a fixed amount of time in each room before moving on- your work will progress, but the number of context switches between rooms still affects the overall runtime of a single task.

Ok, I think I've found my problem, but until now, no solution.
When measuring the time every thread runs to do his part of the work, there are different possible minimums for different total amounts of threads. The maximum is the same everytime. In case that a thread is started first and then is paused very often and finishes last. For example this maximum value could be 10 seconds. Assuming that the total amount of operations that is done by every thread stays the same, no matter how much threads I use, the amount of operations that is done by a single thread has to be changed when using a different amount of threads. For example, using one thread, it has to do 1000 operations, but using ten threads, everyone of them has to do just 100 operations. Now, using ten threads, the minimum amount of time that one thread can use is much lower than using one thread. So calculating the average amount of time every thread needs to do his work is nonsense. The minimum using ten Threads would be 1 second. This happens if one thread does its work without interruption.
EDIT
The solution would be to simply measure the amount of time between the start of the first thread and the completion of the last.

Are primitive datatypes thread-safe in Java

Are the primitive data types like int & short thread-safe in Java? I have executed the following code and couldn't see expected result 500 some times.
public class SampleThree extends Thread
{
static long wakeUpTime = System.currentTimeMillis() + (1000*20);
static int inT;
public static void main(String args[])
{
System.out.println("initial:" + inT);
for(int i=0; i<500; i++)
new SampleThree().start();
try {
Thread.sleep(wakeUpTime - System.currentTimeMillis() + (1000*30));
System.out.println("o/p:" + inT);
}
catch(Exception e){
e.printStackTrace();
}
}
public void run()
{
try {
long s = wakeUpTime - System.currentTimeMillis();
System.out.println("will sleep ms: " + s);
Thread.sleep(s);
inT++; // System.out.println(inT);
}
catch(Exception e) {
e.printStackTrace();
}
}
}
Here concurrently 500 thread will update the int variable inT. Main thread after waiting for concurrent update to be completed, prints inT value.
Find similar example here

There are three ways in which they're not safe:
long and double aren't even guaranteed to be updated atomically (you could see half of a write from a different thread)
The memory model doesn't guarantee that you'll see the latest updates from one thread in another thread, without extra memory barriers of some kind
The act of incrementing a variable isn't atomic anyway
Use AtomicInteger etc for thread-safe operations.

Primitive types are not thread safe. Check this tutorial.

I would suggest using classes in java.util.concurrent.atomic. They are designed for thread-safety and in some cases the JVM can take advantage of hardware features to optimize.

To read/write of a value in a multiple thread environment, the program should have proper synchronize or lock to prevent data races. It has nothing to do with which data type to access. In an ideal world, we should share nothing or only share immutable objects, which is always thread safe.
In theory, It is even not ensured to be atomic for long/double according to https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.7
HOWEVER, the implementation tends to atomic, the following code print out nothing with or without volatile keyword in my environment(64bit ubuntu 18.04, Intel 64bit CPU, Oracle JDK 8), so it is atomic in this situation, which I guess apply to all Intel/AMD 64 CPU.
We could do the same for double as well, although it is a little tricky to construct double value with certain property to check.
public class LongThreadSafe {
// multiple threads read and write this value.
// according to the java spec, only volatile long is guaranteed to be atomic
private static long value = 0;
private static final int max = (1 << 30) - 1;
private static final int threadCount = 4;
static ExecutorService executorService = Executors.newFixedThreadPool(threadCount);
static CyclicBarrier barrier = new CyclicBarrier(threadCount);
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < threadCount; i++) {
executorService.submit(() -> {
try {
// all threads start to work at the same time
barrier.await();
} catch (Exception e) {
e.printStackTrace();
}
for (int j = 1; j < max; j++) {
// read value into v2
long v2 = value;
// check v2
int low = (int) v2;
int high = (int) (v2 >> 32);
if ((high << 1) != low) {
System.out.println("invalid number found high=" + high + ", low=" + low);
}
// write LongThreadSafe.value again
LongThreadSafe.value = ((long) j << 32) | (long) (j << 1);
}
});
}
executorService.shutdown();
executorService.awaitTermination(10, TimeUnit.MINUTES);
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Multi threaded object creation slower then in a single thread - java

What about if you do it on 6 cores? Hyperthreading isn't the exact same as having double the cores, so you might want to try the amount of real cores too. Also the OS won't necessarily schedule each of your threads to their own cores.

Related

Java speedup processes / threads

Java: how to test fine-granularity in multithreaded program

Java lock/concurrency issue when searching array with multiple threads

Issues with using too many Threads a benchmark program

Are primitive datatypes thread-safe in Java

Categories

Resources