I have this method:
GenericDatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(baos, null);
public void WriteToFile(Record record) {
this.baos.reset();
try (FileOutputStream fileOut = new FileOutputStream(avroFile, true)) {
datumWriter.write(record, encoder);
encoder.flush();
fileOut.write("RecordStart\n".getBytes());
baos.writeTo(fileOut);
fileOut.write("\nRecordEnd\n".getBytes());
this.baos.flush();
} catch (IOException e) {
logger.error("Error while writing: ", e);
}
}
The above method is begin called by multiple threads and each thread will write a record between RecordStart and RecordEnd, there may be case where interleaving of logs is happening i.e we will not get our record between RecordStart and RecordEnd
So to avoid this situation one solution is to use synchronized but this will cause the performance issue since we are making threads to wait.
So i want some suggestion so we can avoid multiple threads writing to the same file at same time which may cause interleaving of logs ?
You can only benefit from parallel processing when your operations can be parallelized. By that I mean:
If you are writing to a file, this specific step of the computation must be done synchronously, be that via synchronized or via file lock, or else you'll get scrambled data.
What you can do to improve performance is: reduce the synchronous/locked block to the minumum possible, leaving the very last step (writing) only on a synchronized or locked block. Other than that you can write to multiple files.
I would prefer to use a file lock because it will keep the method more generalist. If you ever decide expand it so it can be used to write multiple files. Also it avoids other processes to use the file meanwhile (other than your program).
Take a look at this question.
Answering the specific question:
So i want some suggestion so we can avoid multiple threads writing to the same file at same time which may cause interleaving of logs ?
Without losing performance... I don't think there is a way. The very nature of writing to a file demands it to be sequential.
Most of the systems I've seen, which write all the log to a single file, use a queue, and a method that keeps writing record by record while the queue can offer, so everything gets written eventually as long as the system is not constantly receiving more records than the disk can manage.
Related
I've designed a Server-Client App using Java and i have connected to the Server more than one users.
The Server provides some features, such as:
downloading files
creating files
writting/appending files etc.
I spotted some issues that need to be synchronized when two or more users send the same request.
For Example: When users want to download the same file at the same time, how can i synchronize this action by using synchronized blocks or any other method?
//main (connecting 2 users in the server)
ServerSocket server= new ServerSocket(8080, 50);
MyThread client1=new MyThread(server);
MyThread client2=new MyThread(server);
client1.start();
client2.start();
Here is the method i would like to synchronize:
//outstream = new BufferedWriter(new OutputStreamWriter(sock.getOutputStream()));//output to client
//instream = new BufferedReader(new InputStreamReader(sock.getInputStream()));//input from client
public void downloadFile(File file,String name) throws FileNotFoundException, IOException {
synchronized(this)
{
if (file.exists()) {
BufferedReader readfile = new BufferedReader(new FileReader(name + ".txt"));
String newpath = "../Socket/" + name + ".txt";
BufferedWriter socketfile = new BufferedWriter(new FileWriter(newpath));
String line;
while ((line = readfile.readLine()) != null) {
outstream.write(line + "\n");
outstream.flush();
socketfile.write(line);
}
outstream.write("EOF\n");
outstream.flush();
socketfile.close();
outstream.write("Downloaded\n");
outstream.flush();
} else {
outstream.write("FAIL\n");
}
outstream.flush();
}
}
Note: This method is in a class that extends Thread and is being used when i want to "download" the file in the overriden method Run()
Does this example assures me that when 2 users want to download the same file one of them will have to wait? and will the other one get it? Thanks for your time!
Locking in concurrent is used to provide mutual exclusion to some piece of code. For locking you can use as synchronized as unstructured lock like ReentrantLock and others.
The main goal of any lock is to provide mutual exclusion to the piece of code placed inside which will mean that this piece will be executed only by one thead at a time. Section inside the lock is called critical section.
To achieve a proper locking it is not enough just to place critical code there. Also you have to make sure that modification of you variables made inside the critical section is made only there. Because if you locked some piece of code but references to variables inside also got passed to some concurrent executing thread without any locking then lock wont save you in that case and you will get a data race. Locks secure only execution of a critical section and only guarantee you that code placed in the critical section will be executed only by one thread at a time.
//outstream = new BufferedWriter(new
OutputStreamWriter(sock.getOutputStream()));//output to client
//instream = new BufferedReader(new
InputStreamReader(sock.getInputStream()));//input from client
public void downloadFile(File file,String name) throws
FileNotFoundException, IOException {
synchronized(this)
{
Who is the owner of this method? Client? If yes then it won't work. You should lock on the same object. It should be shared with all threads which require locking. But in your case every client will have it's own lock and the other threads know nothing about other thread's locks. You can lock at the Client.class. This will work.
synchronize(this) vs synchronize(MyClass.class)
After doing that you will have proper locking for reading (downloading) the file. But what about write? Imagine the case when during reading some other thread will want to modify that file. But you have locks only for reading. You are reading the beginning of the file and the other thread is modifying the end of it. So the writing thread will succeed and you will get logically a corrupted file with the begging from one file and the end of the other. Of course file systems and standard java library try to take care about such cases (by using locks in readers\writes, locking the file offsets etc) but in general it is a possible scenario. So you will need also the same lock on write. And read and write methods should share and use the same lock.
And we've came to a situation when we have correct behavior but low performance. This is our tradeoff. But we can do better. Now we are using the same lock for every write and read method and this means that we can read or write to only one any file at a time. But this is not correct cause we can modify or read different files without any possible corruption. So the better approach will be to associate a lock with a file not the whole method. And here nio comes to help you.
How can I lock a file using java (if possible)
https://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileLock.html
And actually you can read a file concurrently if offsets are different. Due to obvious physical reasons you can't read the same part of a file concurrently. But concurrent read and taking care about offsets seems as a huge overhead fmpv and im not sure that you will need that. Anyway here is some info: Concurrent reading of a File (java preferred)
I came across this scenario and did not understand why it is happening. Can someone please help me understand the behaviour of nio file lock.
I opened a file using FileOutputStream and after acquiring an exclusive lock using nio FileLock I wrote some data into the file. Did not release the lock. Opened another FileOutputStream on the same file with an intention to acquire a lock and do a write operation and expect this to fail.But opening the second fileoutputstream overwrote the already locked file which had data written into it even before I try to get second lock. Is this expected? My understanding was acquiring an exclusive lock would prevent any changes on the locked file. How can I prevent overwriting my file when trying to get another lock ? (as if another process tries to get a lock on the same file on a different vm ? )
Sample program I tried:
File fileToWrite = new File("C:\\temp\\myfile.txt");
FileOutputStream fos1 = new FileOutputStream(fileToWrite);
FileOutputStream fos2 =null;
FileLock lock1,lock2 =null;
lock1=fos1.getChannel().tryLock();
if(lock1!=null){
//wrote date to myfile.txt after acquiring lock
fos1.write(data.getBytes());
//opened myfile.txt again and this replaced the file
fos2 = new FileOutputStream(fileToWrite);
//got an overlappingfilelock exception here
lock2=fos2.getChannel().tryLock();
fos2.write(newdata.getBytes());
}
lock1.release();
fos1.close();
if(lock2!=null)
lock2.release();
fos2.close();
Also tried splitting the above into two programs. Executed 1st and started second when 1st is waiting. File which is locked by program1 got overwritten by program2. Sample below:
Program1:
File fileToWrite = new File("C:\\temp\\myfile.txt");
FileOutputStream fos1 = new FileOutputStream(fileToWrite);
FileLock lock1 =null;
lock1=fos1.getChannel().tryLock();
if(lock1!=null){
//wrote date to myfile.txt after acquiring lock
fos1.write(data.getBytes());
System.out.println("wrote data and waiting");
//start other program while sleep
Thread.sleep(10000);
System.out.println("finished wait");
}
lock1.release();
fos1.close();
Program2:
File fileToWrite = new File("C:\\temp\\myfile.txt");
System.out.println("opening 2nd out stream");
//this overwrote the file
FileOutputStream fos2 = new FileOutputStream(fileToWrite);
FileLock lock2 =null;
lock2=fos2.getChannel().tryLock();
//lock is null here
System.out.println("lock2="+lock2);
if(lock2!=null){
//wrote date to myfile.txt after acquiring lock
System.out.println("writing NEW data");
fos2.write(newdata.getBytes());
}
if(lock2!=null)
lock2.release();
fos2.close();
Thanks
When you acquire a FileLock, you acquire it for the entire JVM. That’s why creating more FileOutputStreams and overwriting the same file within the same JVM will never been prevented by a FileLock— the JVM owns the lock. Thus, the OverlappingFileLockException is not meant to tell you that the lock isn’t available (that would be signaled by tryLock via returning null), it’s meant to tell you that there is a programming error: an attempt to acquire a lock that you already own.
When trying to access the same file from a different JVM, you stumble across the fact that the locking isn’t necessarily preventing other processes from writing into the locked region, it just prevents them from locking that region. And since you are using the constructor which truncates existing files, that might happen before your attempt of acquiring the lock.
One solution is use new FileOutputStream(fileToWrite, true) to avoid truncating the file. This works regardless of whether you open the file within the same JVM or a different process.
However, maybe you don’t want to append to the file. I guess you want to overwrite in the case you successfully acquired the lock. In this case, the constructors of FileOutputStream don’t help you as they force you to decide for either, truncating or appending.
The solution is to abandon the old API and open the FileChannel directly (requires at least Java 7). Then you have plenty of standard open options where truncating and appending are distinct. Omitting both allows overwriting without eagerly truncating the file:
try(FileChannel fch=FileChannel.open(fileToWrite.toPath(),
StandardOpenOption.CREATE, StandardOpenOption.WRITE)){
try(FileLock lock=fch.tryLock()) {
if(lock!=null) {
// you can directly write into the channel
// but in case you really need an OutputStream:
OutputStream fos=Channels.newOutputStream(fch);
fos.write(testData.getBytes());
// you may explicitly truncate the file to the actually written content:
fch.truncate(fch.position());
System.out.println("waiting while holding lock...");
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(5));
}
else System.out.println("couldn't acquire lock");
}
}
Since it requires Java 7 anyway you can use automatic resource management for cleaning up. Note that this code uses CREATE which implies the already familiar behavior of creating the file if it doesn’t exists, in contrast to CREATE_NEW which would require that the file doesn’t exist.
Due to the specified options, the open operation may create the file but not truncate it. All subsequent operations are only performed when acquiring the lock succeeded.
File locks only are only specified to work against other file locks.
From the Javadoc:
Whether or not a lock actually prevents another program from accessing the content of the locked region is system-dependent and therefore unspecified. The native file-locking facilities of some systems are merely advisory, meaning that programs must cooperatively observe a known locking protocol in order to guarantee data integrity. On other systems native file locks are mandatory, meaning that if one program locks a region of a file then other programs are actually prevented from accessing that region in a way that would violate the lock. On yet other systems, whether native file locks are advisory or mandatory is configurable on a per-file basis. To ensure consistent and correct behavior across platforms, it is strongly recommended that the locks provided by this API be used as if they were advisory locks.
I'm working around I/O and found java.io.FileInputStream.getChannel() on the internet. I want to know the exact purpose of getChannel. Why do we need to use java.io.FileInputStream.getChannel()?
Example: http://www.tutorialspoint.com/java/io/fileinputstream_getchannel.htm
By creating the channel,The stream will be safe for access by multiple concurrent threads.
And from FileChannel class:
File channels are safe for use by multiple concurrent threads. The close method may be invoked at any time, as specified by the Channel interface. Only one operation that involves the channel's position or can change its file's size may be in progress at any given time; attempts to initiate a second such operation while the first is still in progress will block until the first operation completes. Other operations, in particular those that take an explicit position, may proceed concurrently; whether they in fact do so is dependent upon the underlying implementation and is therefore unspecified.
getChannel simply returns a FileChannel to the original File.
FileChannel offers a way of reading, writing, mapping, and manipulating a file. It is quite a low level utility class and if you are new to Java, I would not recommend using this class but have a look at FileWriter or FileReader.
The getChannel() method is also important when You want to acquire a lock
in a specific area of a file or even when You want to Lock the entire file.
public void blockFiles() throws FileNotFoundException {
FileInputStream fis = new FileInputStream(new File("some.txt"));
FileChannel fileChannel = fis.getChannel();
try {
FileLock fileLock = fileChannel.tryLock();
if (fileLock !=null){
System.out.println("File is locked, You won't get access anymore");
}
} catch (IOException e) {
e.printStackTrace();
}
}
It is useful when You have multiple processes working on the same file and You want to prevent concurrency problems.
I'm making a DAW in Java, actually its more basic than that, I modelled it after an old Tascam 4-Track recorder I once owned. I'm trying to monitor audio while recording with as little latency (delay) between the two as possible. If I write the audio bytes in the same thread I'm reading them in there's a significant amount of latency (if you want to see the code I have I'll post it but it seemed irrelevant since I think it needs to be rewritten). What I had been thinking about doing is using a producer, consumer thread and a queue to store chunks of bytes in between. so my producer thread would read bytes from a TargetDataLine and store them in a queue, probably using a method that returns the number of bytes read so I can check for the EOF in my while loop. And create a concurrent thread that takes the chunks of bytes stored in the queue (when they are bytes to be written) and writes them to a SourceDataLine. My thought is two threads running simultaneously will be able to write the bytes almost at the same time they're read, or at least be better than what I have now but I want to know how other people have solved this problem.
Also I would need to make sure my consumer thread waits if there are no bytes in the queue and is notified when bytes are added to start writing bytes again, if some one would post an example of the proper way to synchronize the two threads I would appreciate it. I know they have to be in synchronized code blocks, should I use multiple locks? I'm not asking for an example specific to audio just a general example that adds something to a collection then removes it, any help is appreciated. Thanks.
in "classic" java you can (and probbaly should) use a single lock object for producer-consumer implementations. something like
public final static Object LOCK = new Object();
then in your produce() method you'll have code like this:
synchronized(LOCK) {
//place stuff in queue
LOCK.notifyAll(); //wake up any sleepers
}
and in your consume() method you'll have the other side:
synchronized(LOCK) {
if (anything in queue) {
return something
}
//queue is empty - wait
while (nothing in queue) { //while bit is important - we might wakeup for no reason or someone else might grab everything leaving us with nothing
try {
Lock.wait();
} catch (InterruptedException ex) {
//99% of java code ignores these spurious wakeups, and they also hardly ever really happen
}
}
}
but this is old-school. more modern versions of java have classes that neatly wrap all of this low level voodoo for you. for example ArrayBlockingQueue. you could just define a "global" static queue and then use offer() and take() for you produce() and consume() implementations respectively.
but if youre really concerned with latency i'd go the extra mile and use a library written exactly for low-latency inter-thread ocmmunication. a good example of such a library is the disruptor that claims much better latencies than ArrayBlockingQueue.
Is there a way to have one thread in java make a read call to some FileInputStream or similar and have a second thread processing the bytes being loaded at the same time?
I've tried a number of things - my current attempt has one thread running this:
FileChannel inStream;
try {
inStream = (new FileInputStream(inFile)).getChannel();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
int result;
try {
result = inStream.read(inBuffer);
} ...
And a second thread wanting to access the bytes as they are being loaded. Clearly the read call in the first thread blocks until the buffer is full, but I want to be able to access the bytes loaded into the buffer before that point. Currently, everything I try has the buffer and it's backing array unchanged until the read completes - this not only defeats the point of this threading but also suggests the data is being loaded into some intermediate buffer somewhere and then copied into my buffer later, which seems daft.
One option would be to do a bunch of smaller reads into the array with offsets on subsequent reads, but that adds extra overhead.
Any ideas?
When you read data sequentially, the OS will read ahead the data before you need it. As the system is doing this for you already, you may not get the benefit you might expect.
why can't I just make my Filechannel or FileInputStream "flow" into my ByteBuffer or some byte array?
That is sort of what it does already.
If you want a more seamless loading of the data, you can use a memory mapped files as it "appears" in the memory of the program immediately and is loaded in the background as you use it.
What I usually do with requirements like this is to use multiple buffer class instances, preferably sized to allow efficient loading - a multiple of cluster-size, say. As soon as the first buffer gets loaded up, queue it off, (ie. push its pointer/instance onto a producer-consumer queue), to the thread that will process it and immediately create, (or depool), another buffer instance and start loading that one. To control overall data flow, you can create a suitable number of buffer objects at startup and store them in a 'pool queue', (another producer-consumer queue), and then you can circulate the objects full of data from the pool, to the file-read thread, then to the buffer-processing thread, than back to the pool.
This keeps the file->processing queue 'topped up' with buffer-objects full of data, no bulk copying required, no unavoidable delays, no inefficient inter-thread comms of single bytes, no messy locking of buffer-indexes, no chance that the file-read thread and data-processing thread can ever operate on the same buffer object.
If you want/need to use a threadPool to perform the processing, you can easily do so but you may need a sequence-number in the buffer objects if you need any resulting output from this subsystem to be in the same order as it was read from the file.
The buffer-objects may also contain result data members, exception/errorMessage fields, anything that you might want. The file and/or result data could easily be forwarded on to other thread/s from the data-processing, (eg. a logger or GUI display of progress), before getting repooled. Since it's all just pointer/instance queueing, the huge amount of data wil lflow around your system quickly and efficiently.
I would recommend to use SynchronousQueue. Reader will retrieve data from the queue and writer will "publish" the data from your file.
Use a PipedInput/OutputStream to create a familiar looking pipe with a buffer.?
Also use a FileInputStream to read it byte per byte if necessary. the fis.read() function will not block, it will return -1 if there is no data and you can always check for available();