My application requires concurrent access to a data file using memory mapping. My goal is to make it scalable in a shared memory system. After studied the source code of memory mapped file library implementation, I cannot figure out:
Is it legal to read from a MappedByteBuffer in multiple threads? Does get block other get at OS (*nix) level?
If a thread put into a MappedByteBuffer, is the content immediately visible to another thread calling get?
Thank you.
To clarify a point: The threads are using a single instance of MappedByteBuffer, not multiple instances.
Buffers are not thread safe and their access should be controlled by appropriate synchronisation; see the Thread Safety section in http://docs.oracle.com/javase/6/docs/api/java/nio/Buffer.html . ByteBuffer is a subclass of the Buffer class and therefore has the same thread safety issue.
Trying to make scalable the use of memory mapped files in a shared memory system looks highly suspicious to me. The use of memory mapped files is for performance. When you step into shared systems, looking for performance should be one thing to give a low priority at. Not that you should look for a slow system but you will have so many other problems that simply make it work should be your first (and only?) priority at the beginning. I won't be surprised if at the end you will need to replace your concurrent access to a data file using memory mapping with something else.
For some ideas like the use of an Exchanger, see Can multiple threads see writes on a direct mapped ByteBuffer in Java? and Options to make Java's ByteBuffer thread safe .
Related
Use case: a single data structure (hashtable, array, etc) whose members are accessed frequently by multiple threads and modified infrequently by those same threads. How do I maintain performance while guaranteeing thread safety (ie, preventing dirty reads).
Java: Concurrent version of the data structure (concurrent hashmap, Vector, etc).
Python: No need if only threads accessing it, because of GIL. If it's multiple processes that will be reading and updating the data structure, then use threading.Lock. Force the each process's code to acquire the lock before and release the lock after accessing the data structure.
Does that sound reasonable? Will Java's concurrent data structure impose too much penalty to read speed? Is there higher level concurrency mechanism in python?
Instead of reasoning about performance, I highly recommend to measure it for your application. Don't risk thread problems for a performance improvement that you most probably won't ever notice.
So: write thread-safe code without any performance-tricks, use a decent profiler to find the percentage of time spent inside the data structure access, and then decide if that part is worth any improvement.
I bet there will be other bottlenecks, not the shared data structure.
If you like, come back to us with your code and the profiler results.
This a general programming question. Let's say I have a thread doing a specific simulation, where speed is quite important. At every iteration I want to extract data from it and write it to a file.
Is it a better practice to hand over the data to a different thread and let the simulation thread focus on his job, or since speed is very important, make the simulation thread do the data recording too without any copying of data. (in my case it is 3-5 deques of integers with a size of 1000-10000)
Firstly it surely depends on how much data we are copying, but what else can it depend on? Can the cost of synchronization and copying be worth? Is it a good practice to create small runnables at each iteration to handle the recording task in case of 50 or more iterations per second?
If you truly want low latency on this stat capturing, and you want it during the simulation itself then two techniques come to mind. They can be used together very effectively. Please note that these two approaches are fairly far from the standard Java trodden path, so measure first and confirm that you need these techniques before abusing them; they can be difficult to implement correctly.
The fastest way to write the data to file during a simulation, without slowing down the simulation is to hand the work off to another thread. However care has to be taken on how the hand off occurs, as a memory barrier in the simulation thread will slow the simulation. Given the writer only cares that the values will come eventually I would consider using the memory barrier that sits behind AtomicLong.lazySet, it requests a thread safe write out to a memory address without blocking for the write to actually become visible to the other thread. Unfortunately direct access to this memory barrier is currently only availble via lazySet or via class sun.misc.Unsafe, which obviously is not part of the public Java API. However that should not be too large of a hurdle as it is on all current JVM implementations and Doug Lea is talking about moving parts of it into the mainstream.
To avoid the slow, blocking file IO that Java uses; make use of a memory mapped file. This lets the OS perform async IO for you on your behalf, and is very efficient. It also supports use of the same memory barrier mentioned above.
For examples of both techniques, I strongly recommend reading the source code to HFT Chronicle by Peter Lawrey. In fact, HFT Chronicle may be just the library for you to use here. It offers a highly efficient and simple to use disk backed queue that can sustain a million or so messages per second.
In my work on a stress-testing HTTP client I stored the stats into an array and, when the array was ready to send to the GUI, I would create a new array for the tester client and hand off the full array to the network layer. This means that you don't need to pay for any copying, just for the allocation of a fresh array (an ultra-fast operation on the JVM, involving hand-coded assembler macros to utilize the best SIMD instructions available for the task).
I would also suggest not throwing yourself head-on into the realms of optimal memory barrier usage; the difference between a plain volatile write and an AtomicReference.lazySet() can only be measurable if your thread does almost nothing else but excercise the memory barrier (at least millions of writes per second). Depending on your target I/O throughput, you may not even need NIO to meet the goal. Better try first with simple, easily maintainable code than dig elbows-deep into highly specialized APIs without a confirmed need for that.
I would like to measure the heap usage of a specific Thread from within the app that creates it.
That app is multi-threaded and has other threads that I am not interested in measuring.
Please do not point me at profilers or other external reporting tools. I am only interested in code I can run at runtime within the same application.
If possible I am looking to something similar to using ThreadMXBean to measure CPU time. If there is such a solution I have not found (and I have been searching for a while).
Also is there a way to know which memory pools a thread is using? I was thinking of using something similar to
HashSet<String> poolNames = getUsedPoolNames(thisThread);
long heapUsage = 0L;
for(MemoryPoolMXBean bean: ManagementFactory.getMemoryPoolMXBeans()) {
if(poolNames .contains(bean.getName()){
heapUsage += bean.getHeapUsage().getUsage().getUsed();
}
}
Would something like this work? What would getUsedPoolNames(~) look like?
How are memory pools and thread linked?
I don't think there's something at the API level that will get you that. Not easily, anyways. Memory's not really reserved against threads. Memory is reserved against objects, and objects are not thread-specific.
There is a nifty tool out there that can do all this by parsing memory dumps, but I'm not sure if it works against a live JVM.
What you would need to do is find the objects associated with the threead (not sure how to do that) and then navigate its references to calculate the retained heap (as the memory being used is more than just the the object itself, but also the objects whose references being held also take up memory. This is not a trivial task. And you're not going to get it from a simple API call. Plus, it's not really thread-specific, as another thread could have references to the same objects (as objects can be used by multiple threads.
Part of this is difficult because threads/objects are meant to be ignorant about the memory pools. That's something for the JVM to manage.
So, um, dunno.
I use MappedByteBuffers to achieve thread safety between readers and writers of a file via volatile variables (writer updates position and readers read the writer's position) (this is a file upload system, the the incoming file is a stream, if that matters). There are more tricks, obviously (sparse files, power of two mapping growth), but it all boils down to that.
I can't find a faster way to write to a file while concurrently reading the same without caching the same completely in memory (which I cannot do due to shear size).
Is there any other method of IO that guarantees visibility within the same process for readers to written bytes? MappedByteBuffer makes its guarantees, indirectly, via the Java Memory Model, and I'd expect any other solution to do the same (read: non platform specific and more).
Is this the fastest way? Am I missing something in the docs?
I did some tests quite a few years ago on what was then decent hardware, and MappedByteBuffer was about 20% faster than any other I/O technique. It does have the disadvantage for writing that you need to know the file size in advance.
I'm new to Java programming. I have several questions about how to implement RingFiFoBuffer:
Can I store big XML files into this buffer? If yes how big?
Can several threads insert/delete/fetch records from the RingBuffer simultaneously?
How many records can I store?
Is there any tutorial that I can see how to write the code.
I only found http://commons.apache.org/collections/apidocs/org/apache/commons/collections/buffer/CircularFifoBuffer.html
Question 1 and 3: That is only limited by the memory you assign to the Java process that executes your program.
Qestion 2: Accessing a Collection like the referenced CircularFifoBuffer usually requires to "synchronize" them. The linked JavaDoc already contains the code for synchronizing it:
Buffer fifo = BufferUtils.synchronizedBuffer(new CircularFifoBuffer());
Can I store big XML files into this buffer? If yes how big?
You are only limited by your disk space with memory mapped files.
Can several threads insert/delete/fetch records from the RingBuffer simultaneously?
That depends on your implementation. Usually ring buffers are shared between threads.
How many records can I store?
This is something you usually limit when you create the ring buffer so its up to you. Its usually sensible to keep these to a minimum as larger ring buffers can often be slower than tighter ring buffers. So the practical limit may depend on your application and the hardware used.
Is there any tutorial that I can see how to write the code.
The best example I know is the Disruptor library. Its pretty advanced but has better documentation than any I can think of. (Including libraries I have written ;)
http://code.google.com/p/disruptor/