I am creating a program which needs to read from a file that is still being written.
The main question is this: If the read and write will be performed using InputStream and OutputStream classes running on a separate thread, what are the catches and edge cases that I will need to be aware of in order to prevent data corruption?
In case anyone is wondering if I have considered other, non-InputStream based approach, the answer is yes, I have but unfortunately it's not possible in this project since the program uses libraries that only works with InputStream and OutputStream.
Also, several readers have asked why this complications is necessary. Why not perform reading after the file has been written completely?
The reason is efficiency. The program will perform the following
Download a series of byte chunks of 1.5MB each. The program will receive thousands of such chunks that can total up to 30GB. Also, chunks are downloaded concurrently in order to maximize bandwidth, so they may arrive out of order.
The program will send each chunk for processing as soon as they have arrived. Please note that they will be sent for processing in order. If chunk m arrives before chunk m-1 does, they will be buffered on disk until chunk m-1 arrives and is sent for processing.
perform processing of these chunks starting from chunk 0 up to chunk n until every chunks has been processed
Resend the processed result back.
If we are to wait for the whole file to be transferred, it will introduce a huge delay on what is supposed to be a real-time system.
Use a RandomAccessFile. Via a getChannel or such one could use a ByteBuffer.
You will not be able to "insert" or "delete" middle parts of the file. For such a purpose your original approach would be fine, but using two files.
For concurrency: to keep in synch you could maintain one single object model of the file, do changes there. Only the pending changes need to be kept in memory, other hierarchical data could be reread and reparsed as needed.
So your problem (as you've cleared it up now) is that you can't start processing until chunk#1 has arrived, and you need to buffer every chunk#N (N > 1) until you can process them.
I would write each chunk to their own file and create a custom InputStream that will read every chunk in order. While downloading the chunkfile would be named something like chunk.1.downloading and when the whole chunk is loaded it will be renamed to chunk.1.
The custom InputStream will check to see if file chunk.N exists (where N = 1...X). If not, it will block. Each time a chunk has been downloaded completely, the InputStream is notified, it will check if the downloaded chunk was the next one to be processed. If yes, read as normally, otherwise block again.
You should use PipedInputStream and PipedOutputStream:
static Thread newCopyThread(InputStream is, OutputStream os) {
Thread t = new Thread() {
#Override
public void run() {
byte[] buffer = new byte[2048];
try {
while (true) {
int size = is.read(buffer);
if (size < 0) break;
os.write(buffer, 0, size);
}
is.close();
os.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
}
}
};
return t;
}
public void main(String[] args) throws IOException, InterruptedException {
ByteArrayInputStream bi = new ByteArrayInputStream("abcdefg".getBytes());
PipedInputStream is = new PipedInputStream();
PipedOutputStream os = new PipedOutputStream(is);
Thread p = newCopyThread(bi, os);
Thread c = newCopyThread(is, System.out);
p.start();
c.start();
p.join();
c.join();
}
Related
A logic that handles the rollback of a write to a file is this possible?
From my understanding a BufferWriter only writes when a .close() or .flush() is invoked.
I would like to know is it possible to, rollback a write or undo any changes to a file when an error has occurred?
This means that the BufferWriter acts as a temporary storage to store the changes done to a file.
How big is what you're writing? If it isn't too big, then you could write to a ByteArrayOutputStream so you're writing in memory and not affecting the final file you want to write to. Only once you've written everything to memory and have done whatever you want to do to verify that everything is OK can you write to the output file. You can pretty much be guaranteed that if the file gets written to at all, it will get written to in its entirety (unless you run out of disk space.). Here's an example:
import java.io.*;
class Solution {
public static void main(String[] args) {
ByteArrayOutputStream os = new ByteArrayOutputStream();
try {
// Do whatever writing you want to do here. If this fails, you were only writing to memory and so
// haven't affected the disk in any way.
os.write("abcdefg\n".getBytes());
// Possibly check here to make sure everything went OK
// All is well, so write the output file. This should never fail unless you're out of disk space
// or you don't have permission to write to the specified location.
try (OutputStream os2 = new FileOutputStream("/tmp/blah")) {
os2.write(os.toByteArray());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
If you have to (or just want to) use Writers instead of OutputStreams, here's the equivalent example:
Writer writer = new StringWriter();
try {
// again, this represents the writing process that you worry might fail...
writer.write("abcdefg\n");
try (Writer os2 = new FileWriter("/tmp/blah2")) {
os2.write(writer.toString());
}
} catch (IOException e) {
e.printStackTrace();
}
It is impossible to rollback or undo changes already applied to files/streams,
but there are tons of alternatives to do so:
One simple trick is to clean the destination and redo the process again, to clean the file:
PrintWriter writer = new PrintWriter(FILE_PATH);
writer.print("");
// other operations
writer.close();
You can remove the content entirely and re-run again.
Or if you are sure the last line(s) are the problems, you may do remove last line actions for your purpose, such as rollback the line instead:
Delete last line in text file
I'm trying to write two programs, one that writes to a text file, and the other one that reads from it. I've tried using java.io, but ran into concurrency problems. However, when I switched to java.nio, I ran into even bigger problems, probably not related to concurrency since I lock the file in both programs when trying to read/write, but the actual way of reading from or writing to a file.
Writer program code (the part that is relevant):
Path filePath = Paths.get("map.txt");
FileChannel fileChannel;
ByteBuffer buffer;
StringBuilder existingObjects = new StringBuilder();
while (true) {
for (FlyingObject fo : airbornUnitsList) {
existingObjects.append(fo.toString() + System.lineSeparator());
}
if(existingObjects.length() > System.lineSeparator().length())
existingObjects.setLength(existingObjects.length() - System.lineSeparator().length());
buffer = ByteBuffer.wrap(existingObjects.toString().getBytes());
fileChannel = FileChannel.open(filePath, StandardOpenOption.READ, StandardOpenOption.WRITE);
fileChannel.lock();
fileChannel.write(buffer);
fileChannel.close();
existingObjects.delete(0, existingObjects.length());
sleep(100);
}
FlyingObject is a simple class with some fields and an overridden toString() method and airbornUnitsList is a list of those objects, so I'm basically iterating through the list, appending the FlyingObject objects to StringBuilder object, removing the last "new line" from StringBuilder, putting it into the buffer and writing to the file. As you can see, I have locked the file prior to writing to the file and then unlocked it afterwards.
Reader program code (the part that is relevant):
Path filePath = Paths.get("map.txt");
FileChannel fileChannel;
ByteBuffer buffer;
StringBuilder readObjects = new StringBuilder();
while (true) {
fileChannel = FileChannel.open(filePath, StandardOpenOption.READ, StandardOpenOption.WRITE);
fileChannel.lock();
buffer = ByteBuffer.allocate(100);
numOfBytesRead = fileChannel.read(buffer);
while (numOfBytesRead != -1) {
buffer.flip();
readObjects.append(new String(buffer.array()));
buffer.clear();
numOfBytesRead = fileChannel.read(buffer);
}
fileChannel.close();
System.out.println(readObjects);
}
Even when I manually write a few lines in the file and then run the Reader program, it doesn't read it correctly. What could be the issue here?
EDIT: After playing with buffer size a bit, I realized that the file is read wrongly because the buffer size is smaller than the content in the file. Could this be related to file encoding?
I found out what the problem was.
Firstly, in the writer program, I needed to add the fileChannel.truncate(0); after opening the file channel. That way, I would delete the old content of the file and write it from the beginning. Without that line, I would just overwrite the old content of the file with new content when writing and if the new content is shorter than the old content, the old content would still remain in those positions not covered by new content. Only if I was sure that the new content is at least as big as the old content and would rewrite it completely, I wouldn't need the truncate option, but that wasn't the case for me.
Secondly, regarding the reader, the reason it wasn't reading the whole file is because the while loop would end before the last part of the file content was appended to the StringBuilder. After I modified the code and changed the order of operations a bit, like this:
numOfBytesRead = 0;
while (numOfBytesRead != -1) {
numOfBytesRead = fileChannel.read(buffer);
buffer.flip();
readObjects.append(new String(buffer.array()));
buffer.clear();
}
it worked without problems.
The following code creates a file, but it is not openable, and the file size it creates does not remotely correspond with the size of the file I am trying to download. (using whatsapp updater link as an example):
private static boolean download(){
try {
String outfile = "/sdcard/whatsapp.apk";
URL download = new URL("https://www.whatsapp.com/android/current/WhatsApp.apk");
ReadableByteChannel rbc = Channels.newChannel(download.openStream());
FileOutputStream fileOut = new FileOutputStream(outfile);
fileOut.getChannel().transferFrom(rbc, 0, 1 << 24);
fileOut.close();
rbc.close();
return true;
} catch (IOException ioe){
return false;
}
}
EDIT: this is a shortened version of my full code, (the full code alows network ops on main thread, and trust all certificates), also changed the code in the question.
Tests show that IOException is not being throw, and code completes without error. so, Why is the downloaded file not usable?
From the Javadoc:
Fewer than the requested number of bytes will be transferred if the source channel has fewer than count bytes remaining, or if the source channel is non-blocking and has fewer than count bytes immediately available in its input buffer.
This means that it is not guaranteed that this will save the entire file at once. You should put this call in a loop which breaks once the entire download has completed.
I'm writing a java rest service to support parallel upload of parts of a large file. I am writing these parts in separate files and merging them using file channel. I have a sample implemented in Golang, it does the same but when it merges the parts, it takes no time. When I use file channel or read from one stream and write to the final file, it takes long time. The difference I think is, Golang has ability to keep the data on the disk as it is and just merge them by not actually moving the data. Is there any way I can do the same in java?
Here is my code that merges parts, I loop through this method for all parts:
private void mergeFileUsingChannel(String destinationPath, String sourcePath, long partSize, long offset) throws Exception{
FileChannel outputChannel = null;
FileChannel inputChannel = null;
try{
outputChannel = new FileOutputStream(new File(destinationPath)).getChannel();
outputChannel.position(offset);
inputChannel = new FileInputStream(new File(sourcePath)).getChannel();
inputChannel.transferTo(0, partSize, outputChannel);
}catch(Exception e){
e.printStackTrace();
}
finally{
if(inputChannel != null)
inputChannel.close();
if(outputChannel != null){
outputChannel.close();
}
}
}
The documentation of FileChannel transferTo states:
"Many operating systems can transfer bytes directly from the filesystem cache to the target channel without actually copying them."
So the code you have written is correct, and the inefficiency you are seeing is probably related to the underlying file-system type.
One small optimization I could suggest would be to open the file in append mode.
"Whether the advancement of the position and the writing of the data are done in a single atomic operation is system-dependent"
Beyond that, you may have to think of a way to work around the problem. For example, by creating a large enough contiguous file as a first step.
EDIT: I also noticed that you are not explicitly closing your FileOutputStream. It would be best to hang on to that and close it, so that all the File Descriptors are closed.
If you have ever used a p2p downloading software, they can download a file with multi-threading, and they created only one file, So I wonder how the threads write data into that file. Sequentially or in parallel?
Imagine that you want to dump a big database table to a file, and how to make this job faster?
You can use multiple threads writing a to a file e.g. a log file. but you have to co-ordinate your threads as #Thilo points out. Either you need to synchronize file access and only write whole record/lines, or you need to have a strategy for allocating regions of the file to different threads e.g. re-building a file with known offsets and sizes.
This is rarely done for performance reasons as most disk subsystems perform best when being written to sequentially and disk IO is the bottleneck. If CPU to create the record or line of text (or network IO) is the bottleneck it can help.
Image that you want to dump a big database table to a file, and how to make this job faster?
Writing it sequentially is likely to be the fastest.
Java nio package was designed to allow this. Take a look for example at http://docs.oracle.com/javase/1.5.0/docs/api/java/nio/channels/FileChannel.html .
You can map several regions of one file to different buffers, each buffer can be filled separately by a separate thread.
The synchronized declaration enables doing this. Try the below code which I use in a similar context.
package hrblib;
import java.io.*;
public class FileOp {
static int nStatsCount = 0;
static public String getContents(String sFileName) {
try {
BufferedReader oReader = new BufferedReader(new FileReader(sFileName));
String sLine, sContent = "";
while ((sLine=oReader.readLine()) != null) {
sContent += (sContent=="")?sLine: ("\r\n"+sLine);
}
oReader.close();
return sContent;
}
catch (IOException oException) {
throw new IllegalArgumentException("Invalid file path/File cannot be read: \n" + sFileName);
}
}
static public void setContents(String sFileName, String sContent) {
try {
File oFile = new File(sFileName);
if (!oFile.exists()) {
oFile.createNewFile();
}
if (oFile.canWrite()) {
BufferedWriter oWriter = new BufferedWriter(new FileWriter(sFileName));
oWriter.write (sContent);
oWriter.close();
}
}
catch (IOException oException) {
throw new IllegalArgumentException("Invalid folder path/File cannot be written: \n" + sFileName);
}
}
public static synchronized void appendContents(String sFileName, String sContent) {
try {
File oFile = new File(sFileName);
if (!oFile.exists()) {
oFile.createNewFile();
}
if (oFile.canWrite()) {
BufferedWriter oWriter = new BufferedWriter(new FileWriter(sFileName, true));
oWriter.write (sContent);
oWriter.close();
}
}
catch (IOException oException) {
throw new IllegalArgumentException("Error appending/File cannot be written: \n" + sFileName);
}
}
}
You can have multiple threads write to the same file - but one at a time. All threads will need to enter a synchronized block before writing to the file.
In the P2P example - one way to implement it is to find the size of the file and create a empty file of that size. Each thread is downloading different sections of the file - when they need to write they will enter a synchronized block - move the file pointer using seek and write the contents of the buffer.
What kind of file is this? Why do you need to feed it with more threads? It depends on the characteristics (I don't know better word for it) of the file usage.
Transferring a file from several places over network (short: Torrent-like)
If you are transferring an existing file, the program should
as soon, as it gets know the size of the file, create it with empty content: this prevents later out-of-disk error (if there's not enough space, it will turns out at the creation, before downloading anything of it), also it helps the the performance;
if you organize the transfer well (and why not), each thread will responsible for a distinct portion of the file, thus file writes will be distinct,
even if somehow two threads pick the same portion of the file, it will cause no error, because they write the same data for the same file positions.
Appending data blocks to a file (short: logging)
If the threads just appends fixed or various-lenght info to a file, you should use a common thread. It should use a relatively large write buffer, so it can serve client threads quick (just taking the strings), and flush it out optimal scheduling and block size. It should use dedicated disk or even computer.
Also, there can be several performance issues, that's why are there logging servers around, even expensive commercial ones.
Reading and writing random time, random position (short: database)
It requires complex design, with mutexes etc., I never done this kinda stuff, but I can imagine. Ask Oracle for some tricks :)