Concurrency of RandomAccessFile in Java

Concurrency of RandomAccessFile in Java - java

I am creating a RandomAccessFile object to write to a file (on SSD) by multiple threads. Each thread tries to write a direct byte buffer at a specific position within the file and I ensure that the position at which a thread writes won't overlap with another thread:
file_.getChannel().write(buffer, position);
where file_ is an instance of RandomAccessFile and buffer is a direct byte buffer.
For the RandomAccessFile object, since I'm not using fallocate to allocate the file, and the file's length is changing, will this utilize the concurrency of the underlying media?
If it is not, is there any point in using the above function without calling fallocate while creating the file?

I made some testing with the following code :
public class App {
public static CountDownLatch latch;
public static void main(String[] args) throws InterruptedException, IOException {
File f = new File("test.txt");
RandomAccessFile file = new RandomAccessFile("test.txt", "rw");
latch = new CountDownLatch(5);
for (int i = 0; i < 5; i++) {
Thread t = new Thread(new WritingThread(i, (long) i * 10, file.getChannel()));
t.start();
}
latch.await();
file.close();
InputStream fileR = new FileInputStream("test.txt");
byte[] bytes = IOUtils.toByteArray(fileR);
for (int i = 0; i < bytes.length; i++) {
System.out.println(bytes[i]);
}
}
public static class WritingThread implements Runnable {
private long startPosition = 0;
private FileChannel channel;
private int id;
public WritingThread(int id, long startPosition, FileChannel channel) {
super();
this.startPosition = startPosition;
this.channel = channel;
this.id = id;
}
private ByteBuffer generateStaticBytes() {
ByteBuffer buf = ByteBuffer.allocate(10);
byte[] b = new byte[10];
for (int i = 0; i < 10; i++) {
b[i] = (byte) (this.id * 10 + i);
}
buf.put(b);
buf.flip();
return buf;
}
#Override
public void run() {
Random r = new Random();
while (r.nextInt(100) != 50) {
try {
System.out.println("Thread " + id + " is Writing");
this.channel.write(this.generateStaticBytes(), this.startPosition);
this.startPosition += 10;
} catch (IOException e) {
e.printStackTrace();
}
}
latch.countDown();
}
}
}
So far what I've seen:
Windows 7 (NTFS partition): Run linearly (aka one thread writes and when it is over, another one gets to run)
Linux Parrot 4.8.15 (ext4 partition) (Debian based distro), with Linux Kernel 4.8.0: Threads intermingle during the execution
Again as the documentation says:
File channels are safe for use by multiple concurrent threads. The
close method may be invoked at any time, as specified by the Channel
interface. Only one operation that involves the channel's position or
can change its file's size may be in progress at any given time;
attempts to initiate a second such operation while the first is still
in progress will block until the first operation completes. Other
operations, in particular those that take an explicit position, may
proceed concurrently; whether they in fact do so is dependent upon the
underlying implementation and is therefore unspecified.
So I'd suggest to first give it a try and see if the OS(es) you are going to deploy your code to (possibly the filesystem type) support parallel execution of a FileChannel.write call
Edit: As pointed out to, the above does not mean that threads can write concurrently to the file, it is actually the opposite as the write call behave according to the contract of a WritableByteChannel which clearly specifies that only one thread at a time can write to a given file:
If one thread initiates a write operation upon a channel then any
other thread that attempts to initiate another write operation will
block until the first operation is complete

As the documentation states and Adonis already mentions this, a write can only be performed by one thread at a time. You won't achieve performance gains through concurreny, moreover, you should only worry about performance if it's an actual issue, because writing concurrently to a disk may actually degrade your performance (probably less for SSDs than HDDs).
The underlying media is in most cases (SSD, HDD, Network) single-threaded - actually, there is no such thing as a thread on hardware level, threads are nothing but an abstraction.
In your case the media is an SSD.
While the SSD internally may write data to multiple modules concurrently ( they may reach a level of parallism where writes may be as fast and even outperform a read), the internal mapping datastructures are shared resource and therefore contended, especially on frequent updates such as concurrent writes. Nevertheless, the updates of this datastructure is quite fast and therefore nothing to worry about unless it becomes a problem.
But apart from this, those are just internals of the SSD. On the outside you communicate over a Serial ATA interface, thus one-byte-at-a-time (actually packets in a Frame Information Structure, FIS). On top of this is a OS/Filesystem that again has a probably contended datastructure and/or applies their own means of optimization such as write-behind-caching.
Further, as you know what your media is, you may optimize especially for that and SSDs are really fast when one single threads writes a large piece of data.
Thus, instead of using multiple threads for writing, you may create a large In-Memory Buffer (probably consider a memory-mapped file) and write concurrently into this buffer. The memory itself is not contended, as long as you ensure each thread access it's own address space of the buffer. Once all threads are done, you write this one buffer to the SSD (not needed if using memory-mapped file).
See also this good summary about developing for SSDs:
A Summary – What every programmer should know about solid-state drives
The point for doing pre-allocation (or to be more precise, file_.setLength(), which acutally maps to ftruncate) is that the resizing of the file may use extra-cycles and you may wan't to avoid that. But again, this may depend on the OS/Filesystem.

Related

How to consume a process' stdout as a stream, without blocking?

In Java (or clojure) I would like to spin up an external process and consume its stdout as a stream. Ideally, I would like to consume the process' output stream every time that the external process flushes it, but am not sure how that can be accomplished, and how it can be accomplished without blocking.
Going around consuming a Java ProcessPipeInputStream for a shelled out process (for example a Unix ProcessPipeInputStream), I find the inherited InputStream methods a bit low-level to work with, and am not sure if there's a non-blocking way to consume from the stream every time the producer-side flushes or otherwise in a non-blocking fashion.
Many code examples block on the output stream in an infinite loop, thereby hogging a thread for the listening. My hope is this blocking behavior can be avoided altogether.
Bottom line:
Is there a non-blocking way to be notified on an input stream, every time that the producing side of it flushes?

You need to create a separate Thread that would consume from such a stream allowing the rest of your program to do whatever is meant to be do doing in parallel.
class ProcessOutputReader implements Runnable {
private InputStream processOutput;
public ProcessOutputReader(final InputStream processOutput) {
this.processOutput = processOutput;
}
#Override
public void run() {
int nextByte;
while ((nextByte = processOutput.read()) != -1) {
// do whatever you need to do byte-by-byte.
processByte(nextByte);
}
}
}
class Main {
public static void main(final String[] args) {
final Process proc = ...;
final ProcessOutputReader reader = new ProcessOutputReader(proc.getInputStream());
final Thread processOutputReaderThread = new Thread(reader);
processOutputReaderThread.setDaemon(true); // allow the VM to terminate if this is the only thread still active.
processOutputReaderThread.start();
...
// if you wanna wait for the whole process output to be processed at some point you can do this:
try {
processOutputReaderThread.join();
} catch (final InterruptedException ex) {
// you need to decide how to recover from if your wait was interrupted.
}
}
}
If instead of processing byte-by-byte you want to deal with each flush as a single piece... I'm not sure there is 100% guaranteed to be able tocapture each process flush. After all the process own's IO framework software (Java, C, Python, etc.) may process the "flush" operation differently and perhaps what you end up receiving is multiple blocks of bytes for any given flush in that external process.
In any case you can attempt to do that by using the InputStream's available method like so:
#Override
public void run() {
int nextByte;
while ((nextByte = processOutput.read()) != -1) {
final int available = processOutput.available();
byte[] block = new byte[available + 1];
block[0] = nextByte;
final int actuallyAvailable = processOutput.read(block, 1, available);
if (actuallyAvailable < available) {
if (actuallyAvailable == -1) {
block = new byte[] { nextByte };
} else {
block = Arrays.copyOf(block, actuallyAvailable + 1);
}
}
// do whatever you need to do on that block now.
processBlock(block);
}
}
I'm not 100% sure of this but I think that one cannot trust that available will return a guaranteed lower bound of the number of bytes that you can retrieve without being block nor that the next read operation is going to return that number of available bytes if so requested; that is why the code above checks on the actual number of bytes read (actuallyAvailable).

Does this non-standard Java synchronization pattern work?

Let's say I have two threads running like this:
Thread A which performs computation while updating pixels of a shared image
Thread B periodically reads the image and copies it to the screen
Thread A performs work quickly, say 1 million updates per second, so I suspect it would be a bad idea to lock and unlock on a lock/mutex/monitor that often. But if there is no lock and no way of establishing a happens-before relation from thread A to thread B, then by the Java memory model (JMM spec) thread B is not guaranteed at all to see any of A's updates to the image.
So I was thinking that the minimum solution is for threads A and B to both synchronize periodically on the same shared lock, but not actually perform any work while inside the synchronized block - this is what makes the pattern non-standard and dubious. To illustrate in half-real half-pseudo code:
class ComputationCanvas extends java.awt.Canvas {
private Object lock = new Object();
private int[] pixels = new int[1000000];
public ComputationCanvas() {
new Thread(this::runThreadA).start();
new Thread(this::runThreadB).start();
}
private void runThreadA() {
while (true) {
for (1000 steps) {
update pixels directly
without synchornization
}
synchronized(lock) {} // Blank
}
}
private void runThreadB() {
while (true) {
Thread.sleep(100);
synchronized(lock) {} // Blank
this.repaint();
}
}
#Override
public void paint(Graphics g) {
g.drawImage(pixels, 0, 0);
}
}
Does adding empty synchronization blocks in this way correctly achieve the effect of transferring data from thread A to thread B? Or is there some other solution I failed to imagine?

Yes it works. But it works horribly.
Happens before only works when the release of the writer happens before the acquire of the reader. Your implementation assumes that whatever you're writing will complete before the subsequent reading/updating from ThreadB. Causing your data to be flushed all the time by synchronized will cause performance problems, although to what extent I cannot say for sure. Sure, you've made your synchronization finer grained, have you tested it yet?
A better solution might use a singleton/transfer SPSC (single producer/single consumer) queue to store the current snapshot of the writing thread and use that whenever you update.
int[] data = ...
Queue<int[]> queue = new ...
// Thread A
while (true) {
for (1000 iterations or so) {
...
}
queue.add(data);
}
// Thread B
while (true) {
int[] snapshot = queue.take();
this.repaint();
}
The advantage of this is that you don't need to busywait, you can just wait for the queue to block or until the next write. You can skip writes that you don't have time to update. You don't need to depend on the arbitrary thread scheduler to plan data flushes for you.
Remember that thread-safe data structures are great for passing data between threads.
Edit: oops, forgot to say that depending on how your updates go, you might want to use an array copy to prevent your data from being garbled from random writes that aren't cached.

Best way to write huge number of files

I am writing a lots of files like bellow.
public void call(Iterator<Tuple2<Text, BytesWritable>> arg0)
throws Exception {
// TODO Auto-generated method stub
while (arg0.hasNext()) {
Tuple2<Text, BytesWritable> tuple2 = arg0.next();
System.out.println(tuple2._1().toString());
PrintWriter writer = new PrintWriter("/home/suv/junk/sparkOutPut/"+tuple2._1().toString(), "UTF-8");
writer.println(new String(tuple2._2().getBytes()));
writer.close();
}
}
Is there any better way to write the files..without closing or creating printwriter every time.

There is no significantly better way to write lots of files. What you are doing is inherently I/O intensive.
UPDATE - #Michael Anderson is right, I think. Using multiple threads to write the files (probably) will speed things up considerably. However, the I/O is still going to be the ultimate bottleneck from a couple of respects:
Creating, opening and closing files involves file & directory metadata access and update. This entails non-trivial CPU.
The file data and metadata changes need to be written to disc. That is possibly multiple disc writes.
There are at least 3 syscalls for each file written.
Then there are thread stitching overheads.
Unless the quantity of data written to each file is significant (multiple kilobytes per file), I doubt that the techniques like using NIO, direct buffers, JNI and so on will be worthwhile. The real bottlenecks will be in the kernel: file system operations and low-level disk I/O.
... without closing or creating printwriter every time.
No. You need to create a new PrintWriter ( or Writer or OutputStream ) for each file.
However, this ...
writer.println(new String(tuple2._2().getBytes()));
... looks rather peculiar. You appear to be:
calling getBytes() on a String (?),
converting the byte array to a String
calling the println() method on the String which will copy it, and the convert it back into bytes before finally outputting them.
What gives? What is the point of the String -> bytes -> String conversion?
I'd just do this:
writer.println(tuple2._2());
This should be faster, though I wouldn't expect the percentage speed-up to be that large.

I'm assuming you're after the fastest way. Because everyone knows fastest is best ;)
One simple way is to use a bunch of threads to do your writing for you.
However you're not going to get much benefit by doing this unless your filesystem scales well. (I use this technique on Luster based cluster systems, and in cases where "lots of files" could mean 10k - in this case many of the writes will be going to different servers / disks)
The code would look something like this: (Note I think this version is not right as for small numbers of files this fills the work queue - but see the next version for the better version anyway...)
public void call(Iterator<Tuple2<Text, BytesWritable>> arg0) throws Exception {
int nThreads=5;
ExecutorService threadPool = Executors.newFixedThreadPool(nThreads);
ExecutorCompletionService<Void> ecs = new ExecutorCompletionService<>(threadPool);
int nJobs = 0;
while (arg0.hasNext()) {
++nJobs;
final Tuple2<Text, BytesWritable> tuple2 = arg0.next();
ecs.submit(new Callable<Void>() {
#Override Void call() {
System.out.println(tuple2._1().toString());
String path = "/home/suv/junk/sparkOutPut/"+tuple2._1().toString();
try(PrintWriter writer = new PrintWriter(path, "UTF-8") ) {
writer.println(new String(tuple2._2().getBytes()))
}
return null;
}
});
}
for(int i=0; i<nJobs; ++i) {
ecs.take().get();
}
}
Better yet is to start writing your files as soon as you have data for the first one, not when you've got data for all of them - and for this writing to not block the calculation thread(s).
To do this you split your application into several pieces communicating over a (thread safe) queue.
Code then ends up looking more like this:
public void main() {
SomeMultithreadedQueue<Data> queue = ...;
int nGeneratorThreads=1;
int nWriterThreads=5;
int nThreads = nGeneratorThreads + nWriterThreads;
ExecutorService threadPool = Executors.newFixedThreadPool(nThreads);
ExecutorCompletionService<Void> ecs = new ExecutorCompletionService<>(threadPool);
AtomicInteger completedGenerators = new AtomicInteger(0);
// Start some generator threads.
for(int i=0; ++i; i<nGeneratorThreads) {
ecs.submit( () -> {
while(...) {
Data d = ... ;
queue.push(d);
}
if(completedGenerators.incrementAndGet()==nGeneratorThreads) {
queue.push(null);
}
return null;
});
}
// Start some writer threads
for(int i=0; i<nWriterThreads; ++i) {
ecs.submit( () -> {
Data d
while((d = queue.take())!=null) {
String path = data.path();
try(PrintWriter writer = new PrintWriter(path, "UTF-8") ) {
writer.println(new String(data.getBytes()));
}
return null;
}
});
}
for(int i=0; i<nThreads; ++i) {
ecs.take().get();
}
}
Note I've not provided an implementation of the queue class you can easily wrap the standard java threadsafe ones to get what you need.
There's still lots more that can be done to reduce latency, etc - heres some of the further things I've used to get the times down ...
don't even wait for all the data to be generated for a given file. Pass another queue containing packets of bytes to write.
Watch out for allocations - you can reuse some of your buffers.
There's some latency in the nio stuff - you can get some performance improvements by using C writes and JNI and direct buffers.
Thread switching can hurt, and the latency in the queues can hurt, so you might want to batch up your data slightly. Balancing this with 1 can be tricky.

Java Multithreading large arrays access

My main class, generates multiple threads based on some rules. (20-40 threads live for long time).
Each thread create several threads (short time ) --> I am using executer for this one.
I need to work on Multi dimension arrays in the short time threads --> I wrote it like it is in the code below --> but I think that it is not efficient since I pass it so many times to so many threads / tasks --. I tried to access it directly from the threads (by declaring it as public --> no success) --> will be happy to get comments / advices on how to improve it.
I also look at next step to return a 1 dimension array as a result (which might be better just to update it at the Assetfactory class ) --> and I am not sure how to.
please see the code below.
thanks
Paz
import java.util.concurrent.*;
import java.util.logging.Level;
public class AssetFactory implements Runnable{
private volatile boolean stop = false;
private volatile String feed ;
private double[][][] PeriodRates= new double[10][500][4];
private String TimeStr,Bid,periodicalRateIndicator;
private final BlockingQueue<String> workQueue;
ExecutorService IndicatorPool = Executors.newCachedThreadPool();
public AssetFactory(BlockingQueue<String> workQueue) {
this.workQueue = workQueue;
}
#Override
public void run(){
while (!stop) {
try{
feed = workQueue.take();
periodicalRateIndicator = CheckPeriod(TimeStr, Bid) ;
if (periodicalRateIndicator.length() >0) {
IndicatorPool.submit(new CalcMvg(periodicalRateIndicator,PeriodRates));
}
}
if ("Stop".equals(feed)) {
stop = true ;
}
} // try
catch (InterruptedException ex) {
logger.log(Level.SEVERE, null, ex);
stop = true;
}
} // while
} // run
Here is the CalcMVG class
public class CalcMvg implements Runnable {
private double [][][] PeriodRates = new double[10][500][4];
public CalcMvg(String Periods, double[][][] PeriodRates) {
System.out.println(Periods);
this.PeriodRates = PeriodRates ;
}
#Override
public void run(){
try{
// do some work with the data of PeriodRates array e.g. print it (no changes to array
System.out.println(PeriodRates[1][1][1]);
}
catch (Exception ex){
System.out.println(Thread.currentThread().getName() + ex.getMessage());
logger.log(Level.SEVERE, null, ex);
}
}//run
} // mvg class

There are several things going on here which seem to be wrong, but it is hard to give a good answer with the limited amount of code presented.
First the actual coding issues:
There is no need to define a variable as volatile if only one thread ever accesses it (stop, feed)
You should declare variables that are only used in a local context (run method) locally in that function and not globally for the whole instance (almost all variables). This allows the JIT to do various optimizations.
The InterruptedException should terminate the thread. Because it is thrown as a request to terminate the thread's work.
In your code example the workQueue doesn't seem to do anything but to put the threads to sleep or stop them. Why doesn't it just immediately feed the actual worker-threads with the required workload?
And then the code structure issues:
You use threads to feed threads with work. This is inefficient, as you only have a limited amount of cores that can actually do the work. As the execution order of threads is undefined, it is likely that the IndicatorPool is either mostly idle or overfilling with tasks that have not yet been done.
If you have a finite set of work to be done, the ExecutorCompletionService might be helpful for your task.
I think you will gain the best speed increase by redesigning the code structure. Imagine the following (assuming that I understood your question correctly):
There is a blocking queue of tasks that is fed by some data source (e.g. file-stream, network).
A set of worker-threads equal to the amount of cores is waiting on that data source for input, which is then processed and put into a completion queue.
A specific data set is the "terminator" for your work (e.g. "null"). If a thread encounters this terminator, it finishes it's loop and shuts down.
Now the following holds true for this construct:
Case 1: The data source is the bottle-neck. It cannot be speed-up by using multiple threads, as your harddisk/network won't work faster if you ask more often.
Case 2: The processing power on your machine is the bottle neck, as you cannot process more data than the worker threads/cores on your machine can handle.
In both cases the conclusion is, that the worker threads need to be the ones that seek for new data as soon as they are ready to process it. As either they need to be put on hold or they need to throttle the incoming data. This will ensure maximum throughput.
If all worker threads have terminated, the work is done. This can be i.E. tracked through the use of a CyclicBarrier or Phaser class.
Pseudo-code for the worker threads:
public void run() {
DataType e;
try {
while ((e = dataSource.next()) != null) {
process(e);
}
barrier.await();
} catch (InterruptedException ex) {
}
}
I hope this is helpful on your case.

Passing the array as an argument to the constructor is a reasonable approach, although unless you intend to copy the array it isn't necessary to initialize PeriodRates with a large array. It seems wasteful to allocate a large block of memory and then reassign its only reference straight away in the constructor. I would initialize it like this:
private final double [][][] PeriodRates;
public CalcMvg(String Periods, double[][][] PeriodRates) {
System.out.println(Periods);
this.PeriodRates = PeriodRates;
}
The other option is to define CalcMvg as an inner class of AssetFactory and declare PeriodRate as final. This would allow instances of CalcMvg to access PeriodRate in the outer instance of AssetFactory.
Returning the result is more difficult since it involves publishing the result across threads. One way to do this is to use synchronized methods:
private double[] result = null;
private synchronized void setResult(double[] result) {
this.result = result;
}
public synchronized double[] getResult() {
if (result == null) {
throw new RuntimeException("Result has not been initialized for this instance: " + this);
}
return result;
}
There are more advanced multi-threading concepts available in the Java libraries, e.g. Future, that might be appropriate in this case.
Regarding your concerns about the number of threads, allowing a library class to manage the allocation of work to a thread pool might solve this concern. Something like an Executor might help with this.

"Atomically" update an entire array

I have a single writer thread and single reader thread to update and process a pool of arrays(references stored in map). The ratio of writes to read is almost 5:1(latency of writes is a concern).
The writer thread needs to update few elements of an array in the pool based on some events. The entire write operation(all elements) needs to be atomic.
I want to ensure that reader thread reads the previous updated array if writer thread is updating it(something like volatile but on entire array rather than individual fields). Basically, I can afford to read stale values but not block.
Also, since the writes are so frequent, it would be really expensive to create new objects or lock the entire array while read/write.
Is there a more efficient data structure that could be used or use cheaper locks ?

How about this idea: The writer thread does not mutate the array. It simply queues the updates.
The reader thread, whenever it enters a read session that requires a stable snapshot of the array, applies the queued updates to the array, then reads the array.
class Update
{
int position;
Object value;
}
ArrayBlockingQueue<Update> updates = new ArrayBlockingQueue<>(Integer.MAX_VALUE);
void write()
{
updates.put(new Update(...));
}
Object[] read()
{
Update update;
while((update=updates.poll())!=null)
array[update.position] = update.value;
return array;
}

Is there a more efficient data structure?
Yes, absolutely! They're called persistent data structures. They are able to represent a new version of a vector/map/etc merely by storing the differences with respect to a previous version. All versions are immutable, which makes them appropiate for concurrency (writers don't interfere/block readers, and vice versa).
In order to express change, one stores references to a persistent data structure in a reference type such as AtomicReference, and changes what those references point to - not the structures themselves.
Clojure provides a top-notch implementation of persistent data structures. They're written in pure, efficient Java.
The following program exposes how one would approach your described problem using persistent data structures.
import clojure.lang.IPersistentVector;
import clojure.lang.PersistentVector;
public class AtomicArrayUpdates {
public static Map<Integer, AtomicReference<IPersistentVector>> pool
= new HashMap<>();
public static Random rnd = new Random();
public static final int SIZE = 60000;
// For simulating the reads/writes ratio
public static final int SLEEP_TIMÉ = 5;
static {
for (int i = 0; i < SIZE; i++) {
pool.put(i, new AtomicReference(PersistentVector.EMPTY));
}
}
public static class Writer implements Runnable {
#Override public void run() {
while (true) {
try {
Thread.sleep(SLEEP_TIMÉ);
} catch (InterruptedException e) {}
int index = rnd.nextInt(SIZE);
IPersistentVector vec = pool.get(index).get();
// note how we repeatedly assign vec to a new value
// cons() means "append a value".
vec = vec.cons(rnd.nextInt(SIZE + 1));
// assocN(): "update" at index 0
vec = vec.assocN(0, 42);
// appended values are nonsense, just an example!
vec = vec.cons(rnd.nextInt(SIZE + 1));
pool.get(index).set(vec);
}
}
}
public static class Reader implements Runnable {
#Override public void run() {
while (true) {
try {
Thread.sleep(SLEEP_TIMÉ * 5);
} catch (InterruptedException e) {}
IPersistentVector vec = pool.get(rnd.nextInt(SIZE)).get();
// Now you can do whatever you want with vec.
// nothing can mutate it, and reading it doesn't block writers!
}
}
}
public static void main(String[] args) {
new Thread(new Writer()).start();
new Thread(new Reader()).start();
}
}

Another idea, given that the array contains only 20 doubles.
Have two arrays, one for write, one for read.
Reader locks the read array during read.
read()
lock();
read stuff
unlock();
Writer first modifies the write array, then tryLock the read array, if locking fails, fine, write() returns; if locking succeeds, copy the write array to the read array, then release the lock.
write()
update write array
if tryLock()
copy write array to read array
unlock()
Reader can be blocked, but only for the time it takes to copy the 20 doubles, which is short.
Reader should use spin lock, like do{}while(tryLock()==false); to avoid being suspended.

I would do as follows:
synchronize the whole thing and see if the performance is good enough. Considering you only have one writer thread and one reader thread, contention will be low and this could work well enough
private final Map<Key, double[]> map = new HashMap<> ();
public synchronized void write(Key key, double value, int index) {
double[] array = map.get(key);
array[index] = value;
}
public synchronized double[] read(Key key) {
return map.get(key);
}
if it is too slow, I would have the writer make a copy of the array, change some values and put the new array back to the map. Note that array copies are very fast - typically, a 20 items array would most likely take less than 100 nanoseconds
//If all the keys and arrays are constructed before the writer/reader threads
//start, no need for a ConcurrentMap - otherwise use a ConcurrentMap
private final Map<Key, AtomicReference<double[]>> map = new HashMap<> ();
public void write(Key key, double value, int index) {
AtomicReference<double[]> ref = map.get(key);
double[] oldArray = ref.get();
double[] newArray = oldArray.clone();
newArray[index] = value;
//you might want to check the return value to see if it worked
//or you might just skip the update if another writes was performed
//in the meantime
ref.compareAndSet(oldArray, newArray);
}
public double[] read(Key key) {
return map.get(key).get(); //check for null
}
since the writes are so frequent, it would be really expensive to create new objects or lock the entire array while read/write.
How frequent? Unless there are hundreds of them every millisecond you should be fine.
Also note that:
object creation is fairly cheap in Java (think around 10 CPU cycles = a few nanoseconds)
garbage collection of short lived object is generally free (as long as the object stays in the young generation, if it is unreachable it is not visited by the GC)
whereas long lived objects have a GC performance impact because they need to be copied across to the old generation

The following variation is inspired by both my previous answer and one of zhong.j.yu's.
Writers don't interfere/block readers and vice versa, and there are no thread safety/visibility issues, or delicate reasoning going on.
public class V2 {
static Map<Integer, AtomicReference<Double[]>> commited = new HashMap<>();
static Random rnd = new Random();
static class Writer {
private Map<Integer, Double[]> writeable = new HashMap<>();
void write() {
int i = rnd.nextInt(writeable.size());
// manipulate writeable.get(i)...
commited.get(i).set(writeable.get(i).clone());
}
}
static class Reader{
void read() {
double[] arr = commited.get(rnd.nextInt(commited.size())).get();
// do something useful with arr...
}
}
}

You need two static references: readArray and writeArray and a simple mutex to track when write has been changed.
have a locked function called changeWriteArray make changes to a deepCopy of writeArray:
synchronized String[] changeWriteArray(String[] writeArrayCopy, other params go here){
// here make changes to deepCopy of writeArray
//then return deepCopy
return writeArrayCopy;
}
Notice that changeWriteArray is functional programming with effectively no side effect since it is returning a copy that is neither readArray nor writeArray.
whoever calles changeWriteArray must call it as writeArray = changeWriteArray(writeArray.deepCopy()).
the mutex is changed by both changeWriteArray and updateReadArray but is only checked by updateReadArray. If the mutex is set, updateReadArray will simply point the reference of readArray to the actual block of writeArray
EDIT:
#vemv concerning the answer you mentioned. While the ideas are the same, the difference is significant: the two static references are static so that no time is spent actually copying the changes into the readArray; rather the pointer of readArray is moved to point to writeArray. Effectively we are swapping by means of a tmp array that changeWriteArray generates as necessary. Also the locking here is minimal as reading does not require locking in the sense that you can have more than one reader at any given time.
In fact, with this approach, you can keep a count of concurrent readers and check the counter to be zero for when to update readArray with writeArray; again, furthering that reading requires no lock at all.

Improving on #zhong.j.yu's answer, it is really a good idea to queue the writes instead of trying to perform them when they occur. However, we must tackle the problem when updates are coming so fast that the reader would choke on updates continuously coming in. My idea is what if the reades only performs the writes that were queued before the read, and ignoring subsequent writes (those would be tackled by next read).
You will need to write your own synchornised queue. It will be based off a linked list, and would contain only two methods:
public synchronised enqeue(Write write);
This method will atomically enqueue a write. There is a possible deadlock when writes would come faster than it would actually take to enqueue them, but I think there would have to be hundreds of thousands of writes every second to achieve that.
public synchronised Element cut();
This will atomically empty the queue and returns its head (or tail) as the Element object. It will contain a chain of other Elements (Element.next, etc..., just the usual linked list stuff), all those representing a chain of writes since last read. The queue would then be empty, ready to accept new writes. The reader then can trace the Element chain (which will be standalone by then, untouched by subsequent writes), perform the writes, and finally perform the read. While the reader processes the read, new writes would be enqueued in the queue, but those will be next read's problem.
I wrote this once, albeit in C++, to represent a sound data buffer. There were more writes (driver sends more data), than reads (some mathematical stuff over the data), while the writes had to finish as soon as possible. (The data came in real-time, so I needed to save them before next batch was ready in the driver.)

I've got a funny solution using three arrays and a volatile boolean toggle. Basically, both threads have its own array. Additionally, there's a shared array controlled via the toggle.
When the writer finishes and the toggle allows it, it copies the newly written array into the shared array and flips the toggle.
Similarly, before the reader starts, when the toggle allows it, it copies the shared array into its own array and flips the toggle.
public class MolecularArray {
private final double[] writeArray;
private final double[] sharedArray;
private final double[] readArray;
private volatile boolean writerOwnsShared;
MolecularArray(int length) {
writeArray = new double[length];
sharedArray = new double[length];
readArray = new double[length];
}
void read(Consumer<double[]> reader) {
if (!writerOwnsShared) {
copyFromTo(sharedArray, readArray);
writerOwnsShared = true;
}
reader.accept(readArray);
}
void write(Consumer<double[]> writer) {
writer.accept(writeArray);
if (writerOwnsShared) {
copyFromTo(writeArray, sharedArray);
writerOwnsShared = false;
}
}
private void copyFromTo(double[] from, double[] to) {
System.arraycopy(from, 0, to, 0, from.length);
}
}
It depends on the "single writer thread and single reader" assumption.
It never blocks.
It uses a constant (albeit huge) amount of memory.
Repeated calls to read without any intervening write do no copying and vice versa.
The reader does not necessarily see the most recent data, but it sees the data from the first write started after the previous read, if any.
I guess, this could be improved using two shared arrays.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.