I would like to know the difference between the following:
FileChannel fc = FileChannel.open();
RandomAccessFile ra = new RandomAccessFile("RandomFile", "rw");
Since Java 7 the class FileChannel implements SeekableByteChannel therefore has all it needs in order to randomly access the file.
Can we say that the 2 are totally the same?
FileChannel has many more features since it is also GatheringByteChannel, InterruptibleChannel, ScatteringByteChannel. Besides it can lock files, transfer files, work with direct byte buffers, see API
Related
I have a binary file that contains big endian data. I am using this code to read it in
FileChannel fileInputChannel = new FileInputStream(fileInput).getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect((int)fileInputChannel.size());
while (bb.remaining() > 0)
fileInputChannel.read(bb);
fileInputChannel.close();
bb.flip();
I have to do something identical for zip files. In other words decompress the file from a zip file and order it. I understand I can read it in via ZipInputStream but then I have to provide the coding for the "endianness". With ByteBuffer you can use ByteOrder.
Is there an NIO alternative for zip files ?
If you have your ZipInputStream, just use Channels.newChannel to convert it to a Channel then proceed as you wish. But you should keep in mind that it might be possible that a ZipInputStream can’t predict its uncompressed size so you might have to guess the appropriate buffer size and possibly re-allocate a bigger buffer when needed. And, since the underlying API uses byte arrays, there is no benefit in using direct ByteBuffers in the case of ZipInputStream, i.e. I recommend using ByteBuffer.allocate instead of ByteBuffer.allocateDirect for this use case.
By the way you can replace while(bb.remaining() > 0) with while(bb.hasRemaining()). And since Java 7 you can use FileChannel.open to open a FileChannel without the detour via FileInputStream.
I'm loading a 2D array from file, it's 15,000,000 * 3 ints big (it will be 40,000,000 * 3 eventually). Right now, I use dataInputStream.readInt() to sequentially read the ints. It takes ~15 seconds. Can I make it significantly (at least 3x) faster or is this about as fast as I can get?
Yes, you can. From benchmark of 13 different ways of reading files:
If you have to pick the fastest approach, it would be one of these:
FileChannel with a MappedByteBuffer and array reads.
FileChannel with a direct ByteBuffer and array reads.
FileChannel with a wrapped array ByteBuffer and direct array access.
For the best Java read performance, there are 4 things to remember:
Minimize I/O operations by reading an array at a time, not a byte at
a time. An 8 KB array is a good size (that's why it's a default value for BufferedInputStream).
Minimize method calls by getting data an array at a time, not a byte
at a time. Use array indexing to get at bytes in the array.
Minimize thread synchronization locks if you don't need thread
safety. Either make fewer method calls to a thread-safe class, or use
a non-thread-safe class like FileChannel and MappedByteBuffer.
Minimize data copying between the JVM/OS, internal buffers, and
application arrays. Use FileChannel with memory mapping, or a direct
or wrapped array ByteBuffer.
Map your file into memory!
Java 7 code:
FileChannel channel = FileChannel.open(Paths.get("/path/to/file"),
StandardOpenOption.READ);
ByteBuffer buf = channel.map(0, channel.size(),
FileChannel.MapMode.READ_ONLY);
// use buf
See here for more details.
If you use Java 6, you'll have to:
RandomAccessFile file = new RandomAccessFile("/path/to/file", "r");
FileChannel channel = file.getChannel();
// same thing to obtain buf
You can even use .asIntBuffer() on the buffer if you want. And you can read only what you actually need to read, when you need to read it. And it does not impact your heap.
Does
final OutputStream output = new FileOutputStream(file);
truncate the file if it already exists? Surprisingly, the API documentation for Java 6 does not say. Nor does the API documentation for Java 7. The specification for the language itself has nothing to say about the semantics of the FileOutputStream class.
I am aware that
final OutputStream output = new FileOutputStream(file, true);
causes appending to the file. But appending and truncating are not the only possibilities. If you write 100 bytes into a 1000 byte file, one possibility is that the final 900 bytes are left as they were.
FileOutputStream without the append option does truncate the file.
Note that FileOutputStream opens a Stream, not a random access file, so i guess it does make sense that it behaves that way, although i agree that the documentation could be more explicit about it.
I tried this on Windows 2008 x86 and java 1.6.0_32-b05
I created 2 processes which wrote continually to the same file one 1Mb of the character 'b' and the other 4Mb of the character 'a'. Unless I used
out = new RandomAccessFile(which, "rw");
out.setLength(0);
out.getChannel().lock();
I found that a 3rd reader process could read what appeared to be a File which started with 1Mb of 'b's followed by 'a's
I found that writing first to a temporary file and then renaming it
File.renameTo
to the File also worked.
I would not depend on FileOuputStream on windows to truncate a file which may be being read by a second process...
Not new FileOutputStream(file)
Nor FileOutputStream(file, false) ( does not truncate )
Nor
this;
out = new FileOutputStream(which, false);
out.getChannel().truncate(0);
out.getChannel().force(true);
However
out = new FileOutputStream(which, false);
out.getChannel().truncate(0);
out.getChannel().force(true);
out.getChannel().lock();
does work
FileOutputStream is meant to write binary data, which is most often overwritten.
If you are manipulating text data, you should better use a FileWriter which has convenient append methods.
I want to get a Stream from some arbitrary position in an existing file, for example I need to read/write from/to a file starting with 101th byte.
Is it safe to use something like that?
final FileInputStream fin = new FileInputStream(f);
fin.skip(100);
Skip javadoc tells that it may sometimes skip lesser number of bytes than specified.
What should I do then?
you can't write using a FileInputStream. you need to use a RandomAccessFile if you want to write to arbitrary locations in a file. eunfortunately, there is no easy way to use a RandomAccessFile as an InputStream/OutputStream (looks like #aix may have a good suggestion for adapting RandomAccessFile to InputStream/OutputStream), but there are various example adapters available online.
another alternative is to use a FileChannel. you can set the position of the FileChannel directly, then use the Channels utility methods to get InputStream/OutputStream adapters on top of the Channel.
How about the following:
final RandomAccessFile raf = new RandomAccessFile(f, mode);
raf.seek(100);
final FileInputStream fin = new FileInputStream(raf.getFD());
// read from fin
In the Sun JVM, is the FileChannel.size() method guaranteed to return the correct size of the file, including any pending updates? In other words, is the following test guaranteed to pass (assuming that nothing else is writing to the file):
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "rw");
FileChannel fileChannel = randomAccessFile.getChannel();
fileChannel.write(buffer);
long size1 = fileChannel.size();
fileChannel.force(true);
long size2 = fileChannel.size();
assertEquals(size1, size2);
No strong guarantees, imo. It'd depend on the OS. But from what I see the sizes shall be equal...
More:
force(true) on linux is implemented via fsync(fd), on windows it is FlushFileBuffers.
However you do not check if write has finished writing (buffer.hasRemaining()), I am not sure if any of impl allow async writes Windows, Linux