writing a file in Java without O_SYNC semantics

writing a file in Java without O_SYNC semantics - java

In C, when I call open() to open a file descriptor, I have to explicitly pass the O_SYNC flag to ensure that writes to this file will be persisted to disk by the time write() returns. If I want to, I can not supply O_SYNC to open(), and then my writes will return much more quickly because they only have to make it into a filesystem cache before returning. If I want to, later on I can force all outstanding writes to this file to be written to disk by calling fsync(), which blocks until that operation has finished. (More details are available on all this in the Linux man page.)
Is there any way to do this in Java? The most similar thing I could find was using a BufferedOutputStream and calling .flush() on it, but if I'm doing writes to randomized file offsets I believe this would mean the internal buffer for the output stream could end up consuming a lot of memory.

Using Java 7 NIO FileChannel#force method:
RandomAccessFile aFile = new RandomAccessFile("file.txt", "rw");
FileChannel inChannel = aFile.getChannel();
// .....................
// flushes all unwritten data from the channel to the disk
channel.force(true);
An important detail :
If the file does not reside on a local device then no such guarantee is made.

Based on Sergey Tachenov's comment, I found that you can use FileChannel for this. Here's some sample code that I think does the trick:
import java.nio.*;
import java.nio.channels.*;
import java.nio.file.*;
import java.nio.file.attribute.*;
import java.io.*;
import java.util.*;
import java.util.concurrent.*;
import static java.nio.file.StandardOpenOption.*;
public class Main {
public static void main(String[] args) throws Exception {
// Open the file as a FileChannel.
Set<OpenOption> options = new HashSet<>();
options.add(WRITE);
// options.add(SYNC); <------- This would force O_SYNC semantics.
try (FileChannel channel = FileChannel.open(Paths.get("./test.txt"), options)) {
// Generate a bit data to write.
ByteBuffer buffer = ByteBuffer.allocate(4096);
for (int i = 0; i < 10; i++) {
buffer.put(i, (byte) i);
}
// Choose a random offset between 0 and 1023 and write to it.
long offset = ThreadLocalRandom.current().nextLong(0, 1024);
channel.write(buffer, offset);
}
}
}

Related

Compare contents of pdf in Java [duplicate]

How would you write a java function boolean sameContent(Path file1,Path file2)which determines if the two given paths point to files which store the same content? Of course, first, I would check if the file sizes are the same. This is a necessary condition for storing the same content. But then I'd like to listen to your approaches. If the two files are stored on the same hard drive (like in most of my cases) it's probably not the best way to jump too many times between the two streams.

Exactly what FileUtils.contentEquals method of Apache commons IO does and api is here.
Try something like:
File file1 = new File("file1.txt");
File file2 = new File("file2.txt");
boolean isTwoEqual = FileUtils.contentEquals(file1, file2);
It does the following checks before actually doing the comparison:
existence of both the files
Both file's that are passed are to be of file type and not directory.
length in bytes should not be the same.
Both are different files and not one and the same.
Then compare the contents.

If you don't want to use any external libraries, then simply read the files into byte arrays and compare them (won't work pre Java-7):
byte[] f1 = Files.readAllBytes(file1);
byte[] f2 = Files.readAllBytes(file2);
by using Arrays.equals.
If the files are large, then instead of reading the entire files into arrays, you should use BufferedInputStream and read the files chunk-by-chunk as explained here.

Since Java 12 there is method Files.mismatch which returns -1 if there is no mismatch in the content of the files. Thus the function would look like following:
private static boolean sameContent(Path file1, Path file2) throws IOException {
return Files.mismatch(file1, file2) == -1;
}

If the files are small, you can read both into the memory and compare the byte arrays.
If the files are not small, you can either compute the hashes of their content (e.g. MD5 or SHA-1) one after the other and compare the hashes (but this still leaves a very small chance of error), or you can compare their content but for this you still have to read the streams alternating.
Here is an example:
boolean sameContent(Path file1, Path file2) throws IOException {
final long size = Files.size(file1);
if (size != Files.size(file2))
return false;
if (size < 4096)
return Arrays.equals(Files.readAllBytes(file1), Files.readAllBytes(file2));
try (InputStream is1 = Files.newInputStream(file1);
InputStream is2 = Files.newInputStream(file2)) {
// Compare byte-by-byte.
// Note that this can be sped up drastically by reading large chunks
// (e.g. 16 KBs) but care must be taken as InputStream.read(byte[])
// does not neccessarily read a whole array!
int data;
while ((data = is1.read()) != -1)
if (data != is2.read())
return false;
}
return true;
}

This should help you with your problem:
package test;
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
public class CompareFileContents {
public static void main(String[] args) throws IOException {
File file1 = new File("test1.txt");
File file2 = new File("test2.txt");
File file3 = new File("test3.txt");
boolean compare1and2 = FileUtils.contentEquals(file1, file2);
boolean compare2and3 = FileUtils.contentEquals(file2, file3);
boolean compare1and3 = FileUtils.contentEquals(file1, file3);
System.out.println("Are test1.txt and test2.txt the same? " + compare1and2);
System.out.println("Are test2.txt and test3.txt the same? " + compare2and3);
System.out.println("Are test1.txt and test3.txt the same? " + compare1and3);
}
}

If it for unit test, then AssertJ provides a method named hasSameContentAs. An example:
Assertions.assertThat(file1).hasSameContentAs(file2)

I know I'm pretty late to the party on this one, but memory mapped IO is a pretty simple way to do this if you want to use straight Java APIs and no third party dependencies. It's only a few calls to open the files, map them, and then compare use ByteBuffer.equals(Object) to compare the files.
This is probably going to give you the best performance if you expect the particular file to be large because you're offloading a majority of the IO legwork onto the OS and the otherwise highly optimized bits of the JVM (assuming you're using a decent JVM).
Straight from the
FileChannel JavaDoc:
For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory.
import java.io.IOException;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
public class MemoryMappedCompare {
public static boolean areFilesIdenticalMemoryMapped(final Path a, final Path b) throws IOException {
try (final FileChannel fca = FileChannel.open(a, StandardOpenOption.READ);
final FileChannel fcb = FileChannel.open(b, StandardOpenOption.READ)) {
final MappedByteBuffer mbba = fca.map(FileChannel.MapMode.READ_ONLY, 0, fca.size());
final MappedByteBuffer mbbb = fcb.map(FileChannel.MapMode.READ_ONLY, 0, fcb.size());
return mbba.equals(mbbb);
}
}
}

It's >=JR6 compatible, library-free and don't read all content at time.
public static boolean sameFile(File a, File b) {
if (a == null || b == null) {
return false;
}
if (a.getAbsolutePath().equals(b.getAbsolutePath())) {
return true;
}
if (!a.exists() || !b.exists()) {
return false;
}
if (a.length() != b.length()) {
return false;
}
boolean eq = true;
FileChannel channelA;
FileChannel channelB;
try {
channelA = new RandomAccessFile(a, "r").getChannel();
channelB = new RandomAccessFile(b, "r").getChannel();
long channelsSize = channelA.size();
ByteBuffer buff1 = channelA.map(FileChannel.MapMode.READ_ONLY, 0, channelsSize);
ByteBuffer buff2 = channelB.map(FileChannel.MapMode.READ_ONLY, 0, channelsSize);
for (int i = 0; i < channelsSize; i++) {
if (buff1.get(i) != buff2.get(i)) {
eq = false;
break;
}
}
} catch (FileNotFoundException ex) {
Logger.getLogger(HotUtils.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(HotUtils.class.getName()).log(Level.SEVERE, null, ex);
}
return eq;
}

package test;
import org.junit.jupiter.api.Test;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import static org.junit.Assert.assertEquals;
public class CSVResultDIfference {
#Test
public void csvDifference() throws IOException {
Path file_F = FileSystems.getDefault().getPath("C:\\Projekts\\csvTestX", "yolo2.csv");
long size_F = Files.size(file_F);
Path file_I = FileSystems.getDefault().getPath("C:\\Projekts\\csvTestZ", "yolo2.csv");
long size_I = Files.size(file_I);
assertEquals(size_F, size_I);
}
}
it worked for me :)

How do I Execute Java from Java?

I have this DownloadFile.java and downloads the file as it should:
import java.io.*;
import java.net.URL;
public class DownloadFile {
public static void main(String[] args) throws IOException {
String fileName = "setup.exe";
// The file that will be saved on your computer
URL link = new URL("http://onlinebackup.elgiganten.se/software/elgiganten/setup.exe");
// The file that you want to download
// Code to download
InputStream in = new BufferedInputStream(link.openStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int n = 0;
while (-1 != (n = in.read(buf))) {
out.write(buf, 0, n);
}
out.close();
in.close();
byte[] response = out.toByteArray();
FileOutputStream fos = new FileOutputStream(fileName);
fos.write(response);
fos.close();
// End download code
System.out.println("Finished");
}
}
I want to execute this from a mouse event in Gui.java.
private void jLabel17MouseClicked(java.awt.event.MouseEvent evt){
}
How do I do this?

Your current method is a static method, which is fine, but all the data that it extracts is held tightly within the main method, preventing other classes from using it, but fortunately this can be corrected.
My suggestion:
re-write your DownloadFile code so that it is does not simply a static main method, but rather a method that can be called by other classes easily, and that returns the data from the file of interest. This way outside classes can call the method and then receive the data that the method extracted.
Give it a String parameter that will allow the calling code to pass in the URL address.
Give it a File parameter for the file that it should write data to.
Consider having it return data (a byte array?), if this data will be needed by the calling program.
Or if it does not need to return data, perhaps it could return boolean to indicate if the download was successful or not.
Make sure that your method throws all exceptions (such as IO and URL excptions) that it needs to throw.
Also, if this is to be called by a Swing GUI, be sure to call this type of code in a background thread, such as in a SwingWorker, so that this code does not tie up the Swing event thread, rendering your GUI frozen for a time.

How to use java.nio.channels.FileChannel to write a byte[] to a file - Basics

I do not have experience using Java channels. I would like to write a byte array to a file. Currently, I have the following code:
String outFileString = DEFAULT_DECODED_FILE; // Valid file pathname
FileSystem fs = FileSystems.getDefault();
Path fp = fs.getPath(outFileString);
FileChannel outChannel = FileChannel.open(fp, EnumSet.of(StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING, StandardOpenOption.WRITE));
// Please note: result.getRawBytes() returns a byte[]
ByteBuffer buffer = ByteBuffer.allocate(result.getRawBytes().length);
buffer.put(result.getRawBytes());
outChannel.write(buffer); // File successfully created/truncated, but no data
With this code, the output file is created, and truncated if it exists. Also, in the IntelliJ debugger, I can see that buffer contains data. Also, the line outChannel.write() is successfully called without throwing an exception. However, after the program exits, the data does not appear in the output file.
Can somebody (a) tell me if the FileChannel API is an acceptable choice for writing a byte array to a file, and (b) if so, how should the above code be modified to get it to work?

As gulyan points out, you need to flip() your byte buffer before writing it. Alternately, you could wrap your original byte array:
ByteBuffer buffer = ByteBuffer.wrap(result.getRawBytes());
To guarantee the write is on disk, you need to use force():
outChannel.force(false);
Or you could close the channel:
outChannel.close();

You should call:
buffer.flip();
before the write.
This prepares the buffer for reading.
Also, you should call
buffer.clear();
before putting data into it.

To answer your first question
tell me if the FileChannel API is an acceptable choice for writing a byte array to a file
It's ok but there's simpler ways. Try using a FileOutputStream. Typically this would be wrapped by a BufferedOutputStream for performance but the key is both of these extend OutputStream which has a simple write(byte[]) method. This is much easier to work with than the channel/buffer API.

Here is a complete example of FileChannel.
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.WritableByteChannel;
public class FileChannelTest {
// This is a Filer location where write operation to be done.
private static final String FILER_LOCATION = "C:\\documents\\test";
// This is a text message that to be written in filer location file.
private static final String MESSAGE_WRITE_ON_FILER = "Operation has been committed.";
public static void main(String[] args) throws FileNotFoundException {
// Initialized the File and File Channel
RandomAccessFile randomAccessFileOutputFile = null;
FileChannel outputFileChannel = null;
try {
// Create a random access file with 'rw' permission..
randomAccessFileOutputFile = new RandomAccessFile(FILER_LOCATION + File.separator + "readme.txt", "rw");
outputFileChannel = randomAccessFileOutputFile.getChannel();
//Read line of code one by one and converted it into byte array to write into FileChannel.
final byte[] bytes = (MESSAGE_WRITE_ON_FILER + System.lineSeparator()).getBytes();
// Defined a new buffer capacity.
ByteBuffer buffer = ByteBuffer.allocate(bytes.length);
// Put byte array into butter array.
buffer.put(bytes);
// its flip the buffer and set the position to zero for next write operation.
buffer.flip();
/**
* Writes a sequence of bytes to this channel from the given buffer.
*/
outputFileChannel.write(buffer);
System.out.println("File Write Operation is done!!");
} catch (IOException ex) {
System.out.println("Oops Unable to proceed file write Operation due to ->" + ex.getMessage());
} finally {
try {
outputFileChannel.close();
randomAccessFileOutputFile.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

Splitting a .gz file into specified file sizes in Java using byte[] array

I have written a code to split a .gz file into user specified parts using byte[] array. But the for loop is not reading/writing the last part of the parent file which is less than the array size. Can you please help me in fixing this?
package com.bitsighttech.collection.packaging;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.log4j.Logger;
public class FileSplitterBytewise
{
private static Logger logger = Logger.getLogger(FileSplitterBytewise.class);
private static final long KB = 1024;
private static final long MB = KB * KB;
private FileInputStream fis;
private FileOutputStream fos;
private DataInputStream dis;
private DataOutputStream dos;
public boolean split(File inputFile, String splitSize)
{
int expectedNoOfFiles =0;
try
{
double parentFileSizeInB = inputFile.length();
Pattern p = Pattern.compile("(\\d+)\\s([MmGgKk][Bb])");
Matcher m = p.matcher(splitSize);
m.matches();
String FileSizeString = m.group(1);
String unit = m.group(2);
double FileSizeInMB = 0;
try {
if (unit.toLowerCase().equals("kb"))
FileSizeInMB = Double.parseDouble(FileSizeString) / KB;
else if (unit.toLowerCase().equals("mb"))
FileSizeInMB = Double.parseDouble(FileSizeString);
else if (unit.toLowerCase().equals("gb"))
FileSizeInMB = Double.parseDouble(FileSizeString) * KB;
} catch (NumberFormatException e) {
logger.error("invalid number [" + FileSizeInMB + "] for expected file size");
}
double fileSize = FileSizeInMB * MB;
int fileSizeInByte = (int) Math.ceil(fileSize);
double noOFFiles = parentFileSizeInB/fileSizeInByte;
expectedNoOfFiles = (int) Math.ceil(noOFFiles);
int splinterCount = 1;
fis = new FileInputStream(inputFile);
dis = new DataInputStream(new BufferedInputStream(fis));
fos = new FileOutputStream("F:\\ff\\" + "_part_" + splinterCount + "_of_" + expectedNoOfFiles);
dos = new DataOutputStream(new BufferedOutputStream(fos));
byte[] data = new byte[(int) fileSizeInByte];
while ( splinterCount <= expectedNoOfFiles ) {
int i;
for(i = 0; i<data.length-1; i++)
{
data[i] = s.readByte();
}
dos.write(data);
splinterCount ++;
}
}
catch(Exception e)
{
logger.error("Unable to split the file " + inputFile.getName() + " in to " + expectedNoOfFiles);
return false;
}
logger.debug("Successfully split the file [" + inputFile.getName() + "] in to " + expectedNoOfFiles + " files");
return true;
}
public static void main(String args[])
{
String FilePath1 = "F:\\az.gz";
File file= new File(FilePath1);
FileSplitterBytewise fileSplitter = new FileSplitterBytewise();
String splitlen = "1 MB";
fileSplitter.split(file, splitlen);
}
}

I'd suggest to make more methods. You've got a complicated string-handling section of code in split(); it would be best to make a method that takes the human-friendly string as input and returns the number you're looking for. (It would also make it far easier for you to test this section of the routine; there's no way you can test it now.)
Once it is split off and you're writing test cases, you'll probably find that the error message you generate if the string doesn't contain kb, mb, or gb is extremely confusing -- it blames the number 0 for the mistake rather than pointing out the string does not have the expected units.
Using an int to store the file size means your program will never handle files larger than two gigabytes. You should stick with long or double. (double feels wrong for something that is actually confined to integer values but I can't quickly think why it would fail.)
byte[] data = new byte[(int) fileSizeInByte];
Allocating several gigabytes like this is going to destroy your performance -- that's a potentially huge memory allocation (and one that might be considered under control of an adversary; depending upon your security model, this might or might not be a big deal). Don't try to work with the entire file in one piece.
You appear to be reading and writing the files one byte at a time. That's a guarantee to very slow performance. Doing some performance testing for another question earlier today, I found that my machine could read (from a hot cache) 2000 times faster using 131kb blocks than two-byte blocks. One-byte blocks would be even worse. A cold cache would be significantly worse for such small sizes.
fos = new FileOutputStream("F:\\ff\\" + "_part_" + splinterCount + "_of_" + expectedNoOfFiles);
You only appear to ever open one file output stream. Your post probably should have said "only the first works", because it looks like you've not yet tried it on a file that creates three or more pieces.
catch(Exception e)
At this point, you've got the ability to discover errors in your program; you choose to ignore them completely. Sure, you log an error message, but you cannot actually debug your program with the data you log. You should log at a minimum the exception type, message, and maybe even full stack-trace. This combination of data is immensely useful when trying to solve problems, especially in a few months when you've forgotten the details of how it works.

Can you please help me in fixing this?
I would use;
drop the DataInput/OutputStreams, you don't need them.
use in.read(data) to read the whole block instead on one byte at a time. Reading one byte at a time is so much slower!
or read the whole of the data array, you are reading one less.
stop when you reach the end of the file, it might not be a whole multiple of the size.
only write as much as you have read, if your blocks at 1 MB byte there is 100 KB left you should only read/write 100 KB at the end.
close your files when have finished, esp as you have a buffered stream.
you "split" writes everything to the same file (so its not actually splitting) You need to create, write to and close output files in a loop.
don't use fields when you could be/should be using local variables.
would use the length as a long in bytes.
the pattern ignores incorrect input and your pattern doesn't match the test you check for. e.g. your patten allows 1 G or 1 k but these will be treated as 1 MB.

Why doesn't Java properly re-create this image from an InputStream?

I've looked at this every way I can think... The problem is that I end up writing the PERFECT number of bytes, and the files are VERY similar - but some bytes are different. I opened the Java generated file in Scite as well as the original, and even though they are close, they are not the same. Is there any way to fix this? I've tried doing everything possible - I've used different wrappers, readers, writers and different methods of taking the byte array (or taking it as chars - tried both) and making it into a file.
The image in question, for the test, is at http://www.google.com/images/srpr/nav_logo13.png. Here's the code:
import java.awt.Image;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import javax.imageio.ImageIO;
public class ImgExample
{
private String address = "http://www.google.com";
/**
* Returns a 3 dimensional array that holds the RGB values of each pixel at the position of the current
* webcam picture. For example, getPicture()[1][2][3] is the pixel at (2,1) and the BLUE value.
* [row][col][0] is alpha
* [row][col][1] is red
* [row][col][2] is green
* [row][col][3] is blue
*/
public int[][][] getPicture()
{
Image camera = null;
try {
int maxChars = 35000;
//The image in question is 28,736 bytes, but I want to make sure it's bigger
//for testing purposes as in my case, it's an image stream so it's unpredictable
byte[] buffer = new byte[maxChars];
//create the connection
HttpURLConnection conn = (HttpURLConnection)(new URL(this.address+"/images/srpr/nav_logo13.png")).openConnection();
conn.setUseCaches(false);
//wrap a buffer around our input stream
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
int bytesRead = 0;
while ( bytesRead < maxChars && reader.ready() )
{
//reader.read returns an int - I'm assuming this is okay?
buffer[bytesRead] = (byte)reader.read();
bytesRead++;
if ( !reader.ready() )
{
//This is here to make sure the stream has time to download the next segment
Thread.sleep(10);
}
}
reader.close();
//Great, write out the file for viewing
File writeOutFile = new File("testgoog.png");
if ( writeOutFile.exists() )
{
writeOutFile.delete();
writeOutFile.createNewFile();
}
FileOutputStream fout = new FileOutputStream(writeOutFile, false);
//FileWriter fout = new FileWriter(writeOutFile, false);
//needed to make sure I was actually reading 100% of the file in question
System.out.println("Bytes read = "+bytesRead);
//write out the byte buffer from the first byte to the end of all the chars read
fout.write(buffer, 0, bytesRead);
fout.flush();
fout.close();
//Finally use a byte stream to create an image
ByteArrayInputStream byteImgStream = new ByteArrayInputStream(buffer);
camera = ImageIO.read(byteImgStream);
byteImgStream.close();
} catch ( Exception e ) { e.printStackTrace(); }
return ImgExample.imageToPixels(camera);
}
public static int[][][] imageToPixels (Image image)
{
//there's a bunch of code here that works in the real program, no worries
//it creates a 3d arr that goes [x][y][alpha, r, g, b val]
//e.g. imageToPixels(camera)[1][2][3] gives the pixel's blue value for row 1 col 2
return new int[][][]{{{-1,-1,-1}}};
}
public static void main(String[] args)
{
ImgExample ex = new ImgExample();
ex.getPicture();
}
}

The problem as I see it is that you're using Readers. In Java, Readers are for processing character streams, not binary streams, and the character conversions that it does are most likely what's changing your bytes on you.
Instead, you should read() from the InputStream directly. InputStream's read() will block until data is available, but returns -1 when the end of the stream is reached.
Edit: You can also wrap the InputStream in a BufferedInputStream.

BufferedReader is intended for reading character streams, not byte/binary streams.
BufferedReader.read() returns The character read, as an integer in the range 0 to 65535. This will likely truncate any binary data where the byte value is greater than 65535.
I think you want to use InputStream.read() directly, not wrapped in a BufferedReader/InputStreamReader.
And finally, not related to the problem, but if you open a FileOutputStream to append=false, there isn't really any point to deleting any file that already exists - append=false does the same thing.

I think your problem is you are using an InputStreamReader. From the javadocs
An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset. The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.
You dont want the conversion to character streams.

And you also shouldn't be using ready() like that. You are just wasting time with that and the sleeps. read() will block until data arrives anyway, and it will block the correct length of time, not an arbitrary guess. The canonical copy loop in Java goes like this:
int count;
byte[] buffer; // whatever size you like
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.