Compare files block by block (bytes) java

Compare files block by block (bytes) java - java

I try to compare two files block by block. If blocks are equals - get next block and compare them.
If final blocks are equals - return true; all other variant - return false.
I don't understand how to get right the next block and how to get the end of file.
private static boolean getBlocks(File file1, File file2, int count) throws IOException {
RandomAccessFile raf1 = new RandomAccessFile(file1, "r");
RandomAccessFile raf2 = new RandomAccessFile(file2, "r");
int point = count * 512;
FileChannel fc1 = raf1.getChannel();
FileChannel fc2 = raf2.getChannel();
MappedByteBuffer buffer1 = fc1.map(FileChannel.MapMode.READ_ONLY, point, 512);
MappedByteBuffer buffer2 = fc2.map(FileChannel.MapMode.READ_ONLY, point, 512);
byte[] bytes1 = new byte[512];
byte[] bytes2 = new byte[512];
buffer1.get(bytes1);
buffer2.get(bytes2);
if (bytes1.length == bytes2.length) {
for (int i = 0; i < bytes1.length; i++) {
if(bytes1[i] != bytes2[i]) {
return false;
}
}
if (true) {
count++;
getBlocks(file1, file2, point);
}
}
buffer1.clear();
buffer2.clear();
return true;
}

That's how you can read a file byte by byte until EOF is reached: http://www.java2s.com/Code/Java/File-Input-Output/Testingforendoffilewhilereadingabyteatatime.htm
When using MappedByteBuffer this should be the answer: https://stackoverflow.com/a/12509314/1321564

Related

Incomplete file returned by GridFS

I'm working on a Java project to store and retrieve files from MongoDB using GridFS specification. I'm using the code snippets provided in MongoDB Java driver documentation from https://mongodb.github.io/mongo-java-driver/4.1/driver/tutorials/gridfs/.
While using OpenDownloadStream to retrieve the file, I noticed that if the file is divided into more than one chunks, it returns only the first chunk, and not the full file.
ObjectId fileId;
GridFSDownloadStream downloadStream = gridFSBucket.openDownloadStream(fileId);
int fileLength = (int) downloadStream.getGridFSFile().getLength();
byte[] bytesToWriteTo = new byte[fileLength];
downloadStream.read(bytesToWriteTo); /*read file contents */
downloadStream.close();
System.out.println(new String(bytesToWriteTo, StandardCharsets.UTF_8));
Any solutions to this?

Looking at the class GridFSDownloadStreamImpl which implements GridFSDownloadStream, it looks like the method read(byte[]) reads chunk by chunk:
#Override
public int read(final byte[] b) {
return read(b, 0, b.length);
}
#Override
public int read(final byte[] b, final int off, final int len) {
checkClosed();
if (currentPosition == length) {
return -1;
} else if (buffer == null) {
buffer = getBuffer(chunkIndex);
} else if (bufferOffset == buffer.length) {
chunkIndex += 1;
buffer = getBuffer(chunkIndex);
bufferOffset = 0;
}
int r = Math.min(len, buffer.length - bufferOffset);
System.arraycopy(buffer, bufferOffset, b, off, r);
bufferOffset += r;
currentPosition += r;
return r;
}
Therefore, you have to loop until all expected bytes are actually read:
byte[] bytesToWriteTo = new byte[fileLength];
int bytesRead = 0;
while(bytesRead < fileLength) {
int newBytesRead = downloadStream.read(bytesToWriteTo);
if(newBytesRead == -1) {
throw new Exception();
}
bytesRead += newBytesRead;
}
downloadStream.close();
Note that I was not able to test above code so please use with caution.

I ended up using readAllBytes() method and it returns the whole file.
GridFSDownloadStream downloadStream = gridFSBucket.openDownloadStream(fileId);
int fileLength = (int) downloadStream.getGridFSFile().getLength();
byte[] bytesToWriteTo = new byte[fileLength];
bytesToWriteTo = downloadStream.readAllBytes();
downloadStream.close();

FileChannel and ByteBuffer writing extra data

I am creating a method that will take in a file and split it into shardCount pieces and generate a parity file.
When I run this method, it appears that I am writing out extra data into my parity file. This is my first time using FileChannel and ByteBuffers, so I'm not certain I completely understand how to use them despite staring at the documentation for about 8 hours.
This code is a simplified version of the parity section.
public static void splitAndGenerateParityFile(File file, int shardCount, String fileID) throws IOException {
RandomAccessFile rin = new RandomAccessFile(file, "r");
FileChannel fcin = rin.getChannel();
//Create parity files
File parity = new File(fileID + "_parity");
if (parity.exists()) throw new FileAlreadyExistsException("Could not create parity file! File already exists!");
RandomAccessFile parityRAF = new RandomAccessFile(parity, "rw");
FileChannel parityOut = parityRAF.getChannel();
long bytesPerFile = (long) Math.ceil(rin.length() / shardCount);
//Make buffers for each section of the file we will be reading from
for (int i = 0; i < shardCount; i++) {
ByteBuffer bb = ByteBuffer.allocate(1024);
shardBuffers.add(bb);
}
ByteBuffer parityBuffer = ByteBuffer.allocate(1024);
//Generate parity
boolean isParityBufferEmpty = true;
for (long i = 0; i < bytesPerFile; i++) {
isParityBufferEmpty = false;
int pos = (int) (i % 1024);
byte p = 0;
if (pos == 0) {
//Read chunk of file into each buffer
for (int j = 0; j < shardCount; j++) {
ByteBuffer bb = shardBuffers.get(j);
bb.clear();
fcin.read(bb, bytesPerFile * j + i);
bb.rewind();
}
//Dump parity buffer
if (i > 0) {
parityBuffer.rewind();
parityOut.write(parityBuffer);
parityBuffer.clear();
isParityBufferEmpty = true;
}
}
//Get parity
for (ByteBuffer bb : shardBuffers) {
if (pos >= bb.limit()) break;
p ^= bb.get(pos);
}
//Put parity in buffer
parityBuffer.put(pos, p);
}
if (!isParityBufferEmpty) {
parityBuffer.rewind();
parityOut.write(parityBuffer);
parityBuffer.clear();
}
fcin.close();
rin.close();
parityOut.close();
parityRAF.close();
}
Please let me know if there is anything wrong with either the parity algorithm or the file IO, or if there's anything I can do to optimize this. I'm happy to hear about other (better) ways of doing file IO.

Here is the solution I found (though it may need more tuning):
public static void splitAndGenerateParityFile(File file, int shardCount, String fileID) throws IOException {
int BUFFER_SIZE = 4 * 1024 * 1024;
RandomAccessFile rin = new RandomAccessFile(file, "r");
FileChannel fcin = rin.getChannel();
//Create parity files
File parity = new File(fileID + "_parity");
if (parity.exists()) throw new FileAlreadyExistsException("Could not create parity file! File already exists!");
RandomAccessFile parityRAF = new RandomAccessFile(parity, "rw");
FileChannel parityOut = parityRAF.getChannel();
//Create shard files
ArrayList<File> shards = new ArrayList<>(shardCount);
for (int i = 0; i < shardCount; i++) {
File f = new File(fileID + "_part_" + i);
if (f.exists()) throw new FileAlreadyExistsException("Could not create shard file! File already exists!");
shards.add(f);
}
long bytesPerFile = (long) Math.ceil(rin.length() / shardCount);
ArrayList<ByteBuffer> shardBuffers = new ArrayList<>(shardCount);
//Make buffers for each section of the file we will be reading from
for (int i = 0; i < shardCount; i++) {
ByteBuffer bb = ByteBuffer.allocate(BUFFER_SIZE);
shardBuffers.add(bb);
}
ByteBuffer parityBuffer = ByteBuffer.allocate(BUFFER_SIZE);
//Generate parity
boolean isParityBufferEmpty = true;
for (long i = 0; i < bytesPerFile; i++) {
isParityBufferEmpty = false;
int pos = (int) (i % BUFFER_SIZE);
byte p = 0;
if (pos == 0) {
//Read chunk of file into each buffer
for (int j = 0; j < shardCount; j++) {
ByteBuffer bb = shardBuffers.get(j);
bb.clear();
fcin.position(bytesPerFile * j + i);
fcin.read(bb);
bb.flip();
}
//Dump parity buffer
if (i > 0) {
parityBuffer.flip();
while (parityBuffer.hasRemaining()) {
parityOut.write(parityBuffer);
}
parityBuffer.clear();
isParityBufferEmpty = true;
}
}
//Get parity
for (ByteBuffer bb : shardBuffers) {
if (!bb.hasRemaining()) break;
p ^= bb.get();
}
//Put parity in buffer
parityBuffer.put(p);
}
if (!isParityBufferEmpty) {
parityBuffer.flip();
parityOut.write(parityBuffer);
parityBuffer.clear();
}
fcin.close();
rin.close();
parityOut.close();
parityRAF.close();
}
As suggested by VGR, I replaced rewind() with flip(). I also switched to relative operations instead of absolute. I don't think the absolute methods adjust the cursor position or the limit, so that was likely the cause of the error. I also changed the buffer size to 4MB as I am interested in generating the parity for large files.

ZLib decompression fails on large byte array

When experimenting with ZLib compression, I have run across a strange problem. Decompressing a zlib-compressed byte array with random data fails reproducibly if the source array is at least 32752 bytes long. Here's a little program that reproduces the problem, you can see it in action on IDEOne. The compression and decompression methods are standard code picked off tutorials.
public class ZlibMain {
private static byte[] compress(final byte[] data) {
final Deflater deflater = new Deflater();
deflater.setInput(data);
deflater.finish();
final byte[] bytesCompressed = new byte[Short.MAX_VALUE];
final int numberOfBytesAfterCompression = deflater.deflate(bytesCompressed);
final byte[] returnValues = new byte[numberOfBytesAfterCompression];
System.arraycopy(bytesCompressed, 0, returnValues, 0, numberOfBytesAfterCompression);
return returnValues;
}
private static byte[] decompress(final byte[] data) {
final Inflater inflater = new Inflater();
inflater.setInput(data);
try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length)) {
final byte[] buffer = new byte[Math.max(1024, data.length / 10)];
while (!inflater.finished()) {
final int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
outputStream.close();
final byte[] output = outputStream.toByteArray();
return output;
} catch (DataFormatException | IOException e) {
throw new RuntimeException(e);
}
}
public static void main(final String[] args) {
roundTrip(100);
roundTrip(1000);
roundTrip(10000);
roundTrip(20000);
roundTrip(30000);
roundTrip(32000);
for (int i = 32700; i < 33000; i++) {
if(!roundTrip(i))break;
}
}
private static boolean roundTrip(final int i) {
System.out.printf("Starting round trip with size %d: ", i);
final byte[] data = new byte[i];
for (int j = 0; j < data.length; j++) {
data[j]= (byte) j;
}
shuffleArray(data);
final byte[] compressed = compress(data);
try {
final byte[] decompressed = CompletableFuture.supplyAsync(() -> decompress(compressed))
.get(2, TimeUnit.SECONDS);
System.out.printf("Success (%s)%n", Arrays.equals(data, decompressed) ? "matching" : "non-matching");
return true;
} catch (InterruptedException | ExecutionException | TimeoutException e) {
System.out.println("Failure!");
return false;
}
}
// Implementing Fisher–Yates shuffle
// source: https://stackoverflow.com/a/1520212/342852
static void shuffleArray(byte[] ar) {
Random rnd = ThreadLocalRandom.current();
for (int i = ar.length - 1; i > 0; i--) {
int index = rnd.nextInt(i + 1);
// Simple swap
byte a = ar[index];
ar[index] = ar[i];
ar[i] = a;
}
}
}
Is this a known bug in ZLib? Or do I have an error in my compress / decompress routines?

It is an error in the logic of the compress / decompress methods; I am not this deep in the implementations but with debugging I found the following:
When the buffer of 32752 bytes is compressed, the deflater.deflate() method returns a value of 32767, this is the size to which you initialized the buffer in the line:
final byte[] bytesCompressed = new byte[Short.MAX_VALUE];
If you increase the buffer size for example to
final byte[] bytesCompressed = new byte[4 * Short.MAX_VALUE];
the you will see, that the input of 32752 bytes actually is deflated to 32768 bytes. So in your code, the compressed data does not contain all the data which should be in there.
When you then try to decompress, the inflater.inflate()method returns zero which indicates that more input data is needed. But as you only check for inflater.finished() you end in an endless loop.
So you can either increase the buffer size on compressing, but that probably just means haveing the problem with bigger files, or you better need to rewrite to compress/decompress logic to process your data in chunks.

Apparently the compress() method was faulty.
This one works:
public static byte[] compress(final byte[] data) {
try (final ByteArrayOutputStream outputStream =
new ByteArrayOutputStream(data.length);) {
final Deflater deflater = new Deflater();
deflater.setInput(data);
deflater.finish();
final byte[] buffer = new byte[1024];
while (!deflater.finished()) {
final int count = deflater.deflate(buffer);
outputStream.write(buffer, 0, count);
}
final byte[] output = outputStream.toByteArray();
return output;
} catch (IOException e) {
throw new IllegalStateException(e);
}
}

How to convert audio file into byte array

I want to convert an audio file into a byte array. I currently did it and I want to know if its works :
private static AudioFormat getFormat() {
float sampleRate = 44100;
int sampleSizeInBits = 16;
int channels = 1;
boolean signed = true;
boolean bigEndian = true;
return new AudioFormat(sampleRate, sampleSizeInBits, channels, signed,
bigEndian);
}
public static byte[] listenSound(File f) {
AudioInputStream din = null;
AudioInputStream outDin = null;
PCM2PCMConversionProvider conversionProvider = new PCM2PCMConversionProvider();
try {
AudioInputStream in = AudioSystem.getAudioInputStream(f);
AudioFormat baseFormat = in.getFormat();
AudioFormat decodedFormat = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
baseFormat.getSampleRate(),
16,
baseFormat.getChannels(),
baseFormat.getChannels() * 2,
baseFormat.getSampleRate(),
false);
din = AudioSystem.getAudioInputStream(decodedFormat, in);
if (!conversionProvider.isConversionSupported(getFormat(), decodedFormat)) {
System.out.println("Conversion Not Supported.");
System.exit(-1);
}
outDin = conversionProvider.getAudioInputStream(getFormat(), din);
ByteArrayOutputStream out = new ByteArrayOutputStream();
int n = 0;
byte[] buffer = new byte[1024];
while (true) {
n++;
if (n > 1000)
break;
int count = 0;
count = outDin.read(buffer, 0, 1024);
if (count > 0) {
out.write(buffer, 0, count);
}
}
in.close();
din.close();
outDin.close();
out.flush();
out.close();
//byte[] b=out.toByteArray();
//for(int i=0; i<b.length; i++)
//System.out.println("b = "+b[i]);
return out.toByteArray();
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
That byte array data is actually known as time domain i need to be sure if it works before transforming this data into frequency domain with Discrete Fourier.
Thanks for your help !

typically the actual number of bytes processed is returned from a call like
count = outDin.read(buffer, 0, 1024);
so in addition to your current hard break after processing 1000 chunks if the API does in fact return a byte count you should check for it :
int size_chunk = 1024
byte[] buffer = new byte[size_chunk];
boolean keep_streaming = true;
while (keep_streaming) {
n++;
if (n > 1000) { // troubleshooting ONLY remove later
keep_streaming = false;
}
int count = 0;
count = outDin.read(buffer, 0, size_chunk);
if (count > 0) {
out.write(buffer, 0, count);
}
if (count < size_chunk) { // input stream has been consumed
keep_streaming = false;
}
}
You did not supply a link to the API doc so I cannot confirm, however assuming outDin.read will output the number of bytes actually processed, the above code will correctly output only the bytes to match input and so will result in a smaller output if input is less than 1 meg of data (your original logic blindly generated a 1 meg output stopped only after seeing 1000 chunks ... it also assumes you intend to truncate input after 1 meg of data as per your lines
if (n > 1000) { // troubleshooting ONLY remove later
keep_streaming = false;
}

How to return and delete file?

I want to return file (read or load) from method and then remove this file.
public File method() {
File f = loadFile();
f.delete();
return f;
}
But when I delete a file, I delete it from disk and then exists only descriptor to non-existing file on return statement. So what is the most effective way for it.

You can't keep the File handle of deleted file, rather you can keep the data in a byte array temporarily, delete the file and then return the byte array
public byte[] method() {
File f =loadFile();
FileInputStream fis = new FileInputStream(f);
byte[] data = new byte[fis.available()];
fis.read(data);
f.delete();
return data;
}
// Edit Aproach 2
FileInputStream input = new FileInputStream(f);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int bytesRead = input.read(buf);
while (bytesRead != -1) {
baos.write(buf, 0, bytesRead);
bytesRead = input.read(buf);
}
baos.flush();
byte[] bytes = baos.toByteArray();
you can construct the file data from byte array
However, my suggestion is to use IOUtils.toByteArray(InputStream input) from Jakarta commons, why do you want re write when already in plate

Assuming you want to return the file to the browser, this is how I did it :
File pdf = new File("file.pdf");
if (pdf.exists()) {
try {
InputStream inputStream = new FileInputStream(pdf);
httpServletResponse.setContentType("application/pdf");
httpServletResponse.addHeader("content-disposition", "inline;filename=file.pdf");
copy(inputStream, httpServletResponse.getOutputStream());
inputStream.close();
pdf.delete();
} catch (Exception e) {
e.printStackTrace();
}
}
private static int copy(InputStream input, OutputStream output) throws IOException {
byte[] buffer = new byte[512];
int count = 0;
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
}
return count;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Compare files block by block (bytes) java - java

That's how you can read a file byte by byte until EOF is reached: http://www.java2s.com/Code/Java/File-Input-Output/Testingforendoffilewhilereadingabyteatatime.htm When using MappedByteBuffer this should be the answer: https://stackoverflow.com/a/12509314/1321564

Related

Incomplete file returned by GridFS

FileChannel and ByteBuffer writing extra data

ZLib decompression fails on large byte array

How to convert audio file into byte array

How to return and delete file?

Categories

Resources