Does
final OutputStream output = new FileOutputStream(file);
truncate the file if it already exists? Surprisingly, the API documentation for Java 6 does not say. Nor does the API documentation for Java 7. The specification for the language itself has nothing to say about the semantics of the FileOutputStream class.
I am aware that
final OutputStream output = new FileOutputStream(file, true);
causes appending to the file. But appending and truncating are not the only possibilities. If you write 100 bytes into a 1000 byte file, one possibility is that the final 900 bytes are left as they were.
FileOutputStream without the append option does truncate the file.
Note that FileOutputStream opens a Stream, not a random access file, so i guess it does make sense that it behaves that way, although i agree that the documentation could be more explicit about it.
I tried this on Windows 2008 x86 and java 1.6.0_32-b05
I created 2 processes which wrote continually to the same file one 1Mb of the character 'b' and the other 4Mb of the character 'a'. Unless I used
out = new RandomAccessFile(which, "rw");
out.setLength(0);
out.getChannel().lock();
I found that a 3rd reader process could read what appeared to be a File which started with 1Mb of 'b's followed by 'a's
I found that writing first to a temporary file and then renaming it
File.renameTo
to the File also worked.
I would not depend on FileOuputStream on windows to truncate a file which may be being read by a second process...
Not new FileOutputStream(file)
Nor FileOutputStream(file, false) ( does not truncate )
Nor
this;
out = new FileOutputStream(which, false);
out.getChannel().truncate(0);
out.getChannel().force(true);
However
out = new FileOutputStream(which, false);
out.getChannel().truncate(0);
out.getChannel().force(true);
out.getChannel().lock();
does work
FileOutputStream is meant to write binary data, which is most often overwritten.
If you are manipulating text data, you should better use a FileWriter which has convenient append methods.
Related
I have read that BufferedOutputStream Class improves efficiency and must be used with FileOutputStream in this way -
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream("myfile.txt"));
and for writing to the same file below statement is also works -
FileOutputStream fout = new FileOutputStream("myfile.txt");
But the recommended way is to use Buffer for reading / writing operations and that's the reason only I too prefer to use Buffer for the same.
But my question is how to measure performance of above 2 statements. Is their any tool or kind of something, don't know exactly what? but which will be useful to analyse it's performance.
As new to JAVA language, I am very curious to know about it.
Buffering is only helpful if you are doing inefficient reading or writing. For reading, it's helpful for letting you read line by line, even when you could gobble up bytes / chars faster just using read(byte[]) or read(char[]). For writing, it allows you to buffer pieces of what you want to send through I/O with the buffer, and to send them only on flush (see PrintWriter (PrintOutputStream(?).setAutoFlush())
But if you are just trying to read or write as fast as you can, buffering doesn't improve performance
For an example of efficient reading from a file:
File f = ...;
FileInputStream in = new FileInputStream(f);
byte[] bytes = new byte[(int) f.length()]; // file.length needs to be less than 4 gigs :)
in.read(bytes); // this isn't guaranteed by the API but I've found it works in every situation I've tried
Versus inefficient reading:
File f = ...;
BufferedReader in = new BufferedReader(f);
String line = null;
while ((line = in.readLine()) != null) {
// If every readline call was reading directly from the FS / Hard drive,
// it would slow things down tremendously. That's why having a buffer
//capture the file contents and effectively reading from the buffer is
//more efficient
}
These numbers came from a MacBook Pro laptop using an SSD.
BufferedFileStreamArrayBatchRead (809716.60-911577.03 bytes/ms)
BufferedFileStreamPerByte (136072.94 bytes/ms)
FileInputStreamArrayBatchRead (121817.52-1022494.89 bytes/ms)
FileInputStreamByteBufferRead (118287.20-1094091.90 bytes/ms)
FileInputStreamDirectByteBufferRead (130701.87-956937.80 bytes/ms)
FileInputStreamReadPerByte (1155.47 bytes/ms)
RandomAccessFileArrayBatchRead (120670.93-786782.06 bytes/ms)
RandomAccessFileReadPerByte (1171.73 bytes/ms)
Where there is a range in the numbers, it varies based on the size of the buffer being used. A larger buffer results in more speed up to a point, typically somewhere around the size of the caches within the hardware and operating system.
As you can see, reading bytes individually is always slow. Batching the reads into chunks is easily the way to go. It can be the difference between 1k per ms and 136k per ms (or more).
These numbers are a little old, and they will vary wildly by setup but they will give you an idea. The code for generating the numbers can be found here, edit Main.java to select the tests that you want to run.
An excellent (and more rigorous) framework for writing benchmarks is JMH. A tutorial for learning how to use JMH can be found here.
How do I read the last n number of bytes from a file, without using RandomAccessFile.
The last 6 bytes in my files contain crucial information when writing the files back. I need to write my original files, and then append the last 6 bytes elsewhere.
Any guidance? Thanks
You have to do it by using RandomAccessFile.Instances of this class support both reading and writing to a random access file. A random access file behaves like a large array of bytes stored in the file system.
RandomAccessFile randomAccessFile = new RandomAccessFile(your_file, "r");
randomAccessFile.seek(your_file.length() - n);
randomAccessFile.read(byteArray, 0, n);
You could implement an OutputStream that "decorates" your current stream by extending FilterOutputStream to preserves the last six bytes written. When writing is complete, query your custom decorator for the last six bytes.
The implementation could use a simple ring buffer that records all single-byte writes, or up to the last six bytes of each block write.
try this
FileInputStream fis = new FileInputStream(file);
fis.getChannel().position(fis.getChannel().size() - 6);
byte[] a= new byte[6];
fis.read(a);
Is there any approach to convert large XML file(500+MBs) from 'Windows-1252' encoding to 'UTF-8' encoding in java?
Sure:
Open a FileInputStream wrapped in an InputStreamReader with the Windows-1252 for the input
Open a FileOutputStream wrapped in an OutputStreamWriter with the UTF-8 encoding for the output
Create a buffer char array (e.g. 16K)
Repeatedly read into the array and write however much has been written:
char[] buffer = new char[16 * 1024];
int charsRead;
while ((charsRead = input.read(buffer)) > 0) {
output.write(buffer, 0, charsRead);
}
Don't forget to close the output afterwards! (Otherwise there could be buffered data which never gets written to disk.)
Note that as it's XML, you may well need to manually change the XML declaration as well, as it should be specifying that it's in Windows-1252...
The fact that this works on a streaming basis means you don't need to worry about the size of the file - it only reads up to 16K characters in memory at a time.
Is this a one-off or a job that you need to run repeatedly and make efficient?
If it's a one-off, I don't see the need for Java coding. Just run the query ".", for example
java net.sf.saxon.Query -s:input.xml -qs:. -o:output.xml
making sure you allocate say 3Gb of memory.
If you're doing it repeatedly and want a streamed approach, you have to choose between handling it as text (as Jon Skeet suggests) or as XML. The advantage of doing it as XML is primarily that the XML declaration will get taken care of, and character references will be converted to characters. The simplest is to use a JAXP identity transformation:
Source in = new StreamSource(new File("input.xml"));
TransformerFactory f = TransformerFactory.newInstance();
Result out = new StreamResult(new File("output.xml"));
f.newTransformer().transform(in, out);
If this is a one-off, Java may not be the most appropriate tool. Consider iconv:
iconv -f windows-1252 -t utf-8 <source.xml >target.xml
This has all the benefits of streaming without requiring you to write any code.
Unlike Michael's solution, this won't take care of the XML declaration. Edit this manually if necessary or, now you're using UTF-8, omit it.
I have a function in which I am only given a BufferedInputStream and no other information about the file to be read. I unfortunately cannot alter the method definition as it is called by code I don't have access to. I've been using the code below to read the file and place its contents in a String:
public String[] doImport(BufferedInputStream stream) throws IOException, PersistenceException {
int bytesAvail = stream.available();
byte[] bytesRead = new byte[bytesAvail];
stream.read(bytesRead);
stream.close();
String fileContents = new String(bytesRead);
//more code here working with fileContents
}
My problem is that for large files (>2Gb), this code causes the program to either run extremely slowly or truncate the data, depending on the computer the program is executed on. Does anyone have a recommendation regarding how to deal with large files in this situation?
You're assuming that available() returns the size of the file; it does not. It returns the number of bytes available to be read, and that may be any number less than or equal to the size of the file.
Unfortunately there's no way to do what you want in just one shot without having some other source of information on the length of the file data (i.e., by calling java.io.File.length()). Instead, you have to possibly accumulate from multiple reads. One way is by using ByteArrayOutputStream. Read into a fixed, finite-size array, then write the data you read into a ByteArrayOutputStream. At the end, pull the byte array out. You'll need to use the three-argument forms of read() and write() and look at the return value of read() so you know exactly how many bytes were read into the buffer on each call.
I'm not sure why you don't think you can read it line-by-line. BufferedInputStream only describes how the underlying stream is accessed, it doesn't impose any restrictions on how you ultimately read data from it. You can use it just as if it were any other InputStream.
Namely, to read it line-by-line you could do
InputStreamReader streamReader = new InputStreamReader(stream);
BufferedInputReader lineReader = new BufferedInputReader(streamReader);
String line = lineReader.readLine();
...
[Edit] This response is to the original wording of the question, which asked specifically for a way to read the input file line-by-line.
So - I've got a third party library that needs a File as input. I've got a byte array.
I don't want to write the bytes to disk .. I'd like to keep this in memory. Any idea on how I can create a File from the provided byte array (without writing to disk)?
Sorry, not possible. A File is inherently an on-disk entity, unless you have a RAM disk - but that's not something you can create in Java.
That's exactly the reason why APIs should not be based on File objects (or be overloaded to accept an InputStream).
There's one possibility, but it's a real long-shot.
If the API uses new FileReader(file) or new FileInputStream(file) then you're hosed, but...
If it converts the file to a URL or URI (using toURL() or toURI()) then, since File is not final, you can pass in a subclass of File in which you control the construction of the URL/URI and, more importantly, the handler.
But the chances are VERY slim!
So I see there is an accepted answer (and this is old), but I found a way to do this. I was using the IDOL On Demand API and needed to convert a byte array to a File.
Here is an example of taking a byte array of an image and turning into a File:
//imageByte is the byte array that is already defined
BufferedImage image = null;
ByteArrayInputStream bis = new ByteArrayInputStream(imageByte);
image = ImageIO.read(bis);
bis.close();
// write the image to a file
File outputfile = new File("image.png");
ImageIO.write(image, "png", outputfile);
And so outputfile is a File that can be used later in your program.