I need to create a File System Manager (more or less) which can read or write data to files.
My problem is how do I handle concurrency?
I can do something like
public class FileSystemManager {
private ReadWriteLock readWriteLock = new ReentrantReadWriteLock();
public byte[] read(String path) {
readWriteLock.readLock().lock();
try {
...
} finally {
readWriteLock.readLock().unlock();
}
}
public void write(String path, byte[] data) {
readWriteLock.writeLock().lock();
try {
...
} finally {
readWriteLock.writeLock().unlock();
}
}
}
But this would mean all access to the write (for example) will be locked, even if the first invocation is targeting /tmp/file1.txt and the second invocation is targeting /tmp/file2.txt.
Any ideas how to go about this?
Suggest Message Passing For Concurrency Not Threads
In general, this kind of locking happens beneath the java level. Are you really planning on reading and writing the same files and directories? Implementing directories?
Right now there is lots of unsafe threading code that may start blowing up as threads start really running together on multicore hardware.
My suggestion would be to manage concurrency via message passing. You can roll your own if this is an educational exercise, or use one of zillions of queuing and message systems out there. In this kind of system you have only one reader/writer process, but possibly many clients. You avoid the need for file and thread concurrency management.
One advantage of the message passing paradigm is that it will be much, much easier to create a distributed system for load balancing.
Can't you create a different object for each Path and then use synchronize blocks and synchronize on "this"?
You can store the ReadWriteLock instances in a map keyed on path, just make sure that you get concurrent access to the map correct (possibly using ConcurrentHashMap).
If you actually care about locking the file using operating system primitives you might try looking into using java.nio.FileChannel. This has support for fine grained locking of file regions among other things. Also check out java.nio.channels.FileLock.
I would look deeply into Java 5 and the java.util.concurrent package. I'd also recommend reading Brian Goetz' "Java Concurrency in Practice".
Related
I am using hadoop for writing data I scrape.
I have a spring service that is called from multiple threads to write some content to the HDFS.
#Service
public class WriteService
{
public void write(String path, String content)
{
FileSystem fs = FileSystem.get(conf);
}
}
I am not sure whether the FileSystem object can be a member of the WriteService and I don't find whether it is thread safe or not.
I am using the DistributedFileSystem object.
Do you know if it is thread-safe and I can use it as a member to my service?
Thank you
Hadoop DFS uses a so-called WORM-Model. This makes it more robust when it comes to concurrency issues.
But, to answer the question, it is not safe in general. You still need to think about concurrency control requirements.
If config.setBoolean("fs.hdfs.impl.disable. cache", true); is modified first, FileSystem.get(config) can be used in multiple threads.
Assuming a Win32FileSystem and beginMultiThreading runs many times simultaneously on a shared MultiThreadingClass object, what is the most possible way that this can cause a data-race or some other threading issue? I know that this is probably not thread safe, because (1) the argument to setPath gets reused. I see also that (2) path is not a final variable in java.io.File. However, I can't seem to find a part where this code could error out on its own due to threading issue.
public class MultiThreadingClass {
private Holder h = new Holder();
private String path ="c:\\somepath";
public void beginMultiThreading(){
h.setPath(new File(path));
h.begin();
}
}
public class Holder {
private File path;
public void setPath(File path){
this.path = path;
}
public void begin(){
System.out.println(path.getCanonicalPath()+"some string");
}
}
As #Duncan says, the code is currently thread-safe. But it doesn't do any file writing at this time. As you are using File objects, I have an expectation that you will be dealing with files. Once you start to write files, there are further considerations:
Writing to a single file from multiple threads needs to be synchronized. To my knowledge, this is not "out of the box" functionality.
Writing to the same file from different JVMs or even from different class loaders in the same JVM is much harder. (With most web frameworks, writing to a logging file from multiple web apps is an example of writing to a single file from different class loaders). You are back to using a lock file or a platform-specific mutex of some sort.
Caveat: It is a while since I have had to do this, so there may be more support in the latest Java concurrency package or NIO package that someone else can expand on.
Your example code has no multi-threading at all. So I'll assume that either multiple threads are operating on their own MultiThreadingClass instance, or that they are sharing a common instance between them.
Either way, this code is thread safe. The only shared state is a private string object, which is not adjusted as part of your methods.
From what I know and researched, the synchronized keyword in Java lets synchronize a method or code block statement to handle multi-threaded access. If I want to lock a file for writing purposes on a multi-threaded environment, I must should use the classes in the Java NIO package to get the best results. Yesterday, I come up with a question about handling a shared servlet for file I/O operations, and BalusC comments are good to help with the solution, but the code in this answer confuses me. I'm not asking community "burn that post" or "let's downvote him" (note: I haven't downvoted it or anything, and I have nothing against the answer), I'm asking for an explanation if the code fragment can be considered a good practice
private static File theFile = new File("theonetoopen.txt");
private void someImportantIOMethod(Object stuff){
/*
This is the line that confuses me. You can use any object as a lock, but
is good to use a File object for this purpose?
*/
synchronized(theFile) {
//Your file output writing code here.
}
}
The problem is not about locking on a File object - you can lock on any object and it does not really matter (to some extent).
What strikes me is that you are using a non final monitor, so if another part of your code reallocates theFile: theFile = new File();, the next thread that comes around will lock with a different object and you don't have any guarantee that your code won't be executed by 2 threads simultaneously any more.
Had theFile been final, the code would be ok, although it is preferable to use private monitors, just to make sure there is not another piece of code that uses it for other locking purposes.
If you only need to lock the file within a single application then it's OK (assuming final is added).
Note that the solution won't work if you load the class more than once using different class loaders. For example, if you have a web application that is deployed twice in the same web server, each instance of the application will have its own lock object.
As you mention, if you want the locking to be robust and have the file locked from other programs too, you should use FileLock (see the docs, on some systems it is not guaranteed that all programs must respect the lock).
Had you seen: final Object lock = new Object() would you be asking?
As #assylias pointed out the problem is that the lock is not final here
Every object in Java can act as a lock for synchronization. They are called intrinsic locks. Only one thread at a time can execute a block of code guarded by a given lock.
More on that: http://docs.oracle.com/javase/tutorial/essential/concurrency/locksync.html
Using synchronized keyword for the whole method could have performance impact on your application. That's why you can sometimes use synchronized block.
You should remember that lock reference can't be changed. The best solution is to use final keyword.
I know that many OSes perform some sort of locking on the filesystem to prevent inconsistent views. Are there any guarantees that Java and/or Android make about thread-safety of file access? I would like to know as much about this as possible before I go ahead and write the concurrency code myself.
If I missed a similar question that was answered feel free to close this thread. Thanks.
Android is built on top of Linux, so inherits Linux's filesystem semantics. Unless you explicitly lock a file, multiple applications and threads can open it for read/write access. Unless you actually need cross-process file synchronization, I would suggest using normal Java synchronization primitives for arbitrating access to the file.
The normal reading/writing functionality (FileInputStream, etc) does not provide any thread-safety AFAIK. To achieve thread-safety, you need to use FileChannel. This would look something like:
FileInputStream in = new FileInputStream("file.txt");
FileChannel channel = in.getChannel();
FileLock lock = channel.lock();
// do some reading
lock.release();
I would read the File Lock doc, and take care with the threading!
I have a critical section of my (Java) code which basically goes like the snippet below. They're coming in from a nio server.
void messageReceived(User user, Message message) {
synchronized(entryLock) {
userRegistry.updateLastMessageReceived(user,time());
server.receive(user,message);
}
}
However, a high percentage of my messages are not going to change the server state, really. They're merely the client saying "hello, I'm still here". I really don't want to have to make that inside the synchronization block.
I could use a synchronous map or something like that, but it's still going to incur a synchronization penalty.
What I would really like to do is to have something like a drop box, like this
void messageReceived(User user, Message message) {
dropbox.add(new UserReceived(user,time());
if(message.getType() != message.TYPE_KEPT_ALIVE) {
synchronized(entryLock) {
server.receive(user,message);
}
}
}
I have a cleanup routine to automatically put clients that aren't active to sleep. So instead of synchronizing on every kept alive message to update the registry, the cleanup routine can simply compile the kept alive messages in a single synchronization block.
So naturally, reconigizing a need for this, the first thing I did was start making a solution. Then I decided this was a non-trivial class, and a problem that was more than likely fairly common. so here I am.
tl;dr Is there a Java library or other solution I can use to facilitate atomically adding to a list of objects in an asynchronous manner? Collecting from the list in an asychronous manner is not required. I just don't want to synchronize on every add to the list.
ConcurrentLinkedQueue claims to be:
This implementation employs an efficient "wait-free" algorithm based on one described in Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Maged M. Michael and Michael L. Scott.
I'm not sure what the quotes on "wait-free" entail but the Concurrent* classes are good places to look for structures like you're looking for.
You might also be interested in the following: Effective Concurrency: Lock-Free Code — A False Sense of Security. It talks about how hard these things are to get right, even for experts.
Well, there are few things you must bear in mind.
First, there is very little "synchronization cost" if there is little contention (more than one thread trying to enter the synchronized block at the same time).
Second, if there is contention, you're going to incur some cost no matter what technique you're using. Paul is right about ConcurrentLinkedQueue and the "wait-free" means that thread concurrency control is not done using locks, but still, you will always pay some price for contention. You may also want to look at ConcurrentHashMap because I'm not sure a list is what you're looking for. Using both classes is quite simple and common.
If you want to be more adventurous, you might find some non-locking synchronization primitives in java.util.concurrent.atomic.
One thing we could do is to use a simple ArrayList for keep-alive messages:
Keep adding to this list whenever each keep-alive message comes.
The other thread would synch on a lock X and read and process
keep-alives. Note that this thread is not removing from list only
reading/copying.
Finally in messageReceived itself you check if the list has grown
say beyond 1000, in which case you synch on the lock X and clear the
list.
List keepAliveList = new ArrayList();
void messageReceived(User user, Message message) {
if(message.getType() == message.TYPE_KEPT_ALIVE) {
if(keepAliveList.size() > THRESHOLD) {
synchronized(X) {
processList.addAll(list);
list.clear();
}
}
keepAliveList.add(message);
}
}
//on another thread
void checkKeepAlives() {
synchronized(X) {
processList.addAll(list)
}
processKeepAlives(processList);
}