FileChannel and ByteBuffer writing extra data - java

I am creating a method that will take in a file and split it into shardCount pieces and generate a parity file.
When I run this method, it appears that I am writing out extra data into my parity file. This is my first time using FileChannel and ByteBuffers, so I'm not certain I completely understand how to use them despite staring at the documentation for about 8 hours.
This code is a simplified version of the parity section.
public static void splitAndGenerateParityFile(File file, int shardCount, String fileID) throws IOException {
RandomAccessFile rin = new RandomAccessFile(file, "r");
FileChannel fcin = rin.getChannel();
//Create parity files
File parity = new File(fileID + "_parity");
if (parity.exists()) throw new FileAlreadyExistsException("Could not create parity file! File already exists!");
RandomAccessFile parityRAF = new RandomAccessFile(parity, "rw");
FileChannel parityOut = parityRAF.getChannel();
long bytesPerFile = (long) Math.ceil(rin.length() / shardCount);
//Make buffers for each section of the file we will be reading from
for (int i = 0; i < shardCount; i++) {
ByteBuffer bb = ByteBuffer.allocate(1024);
shardBuffers.add(bb);
}
ByteBuffer parityBuffer = ByteBuffer.allocate(1024);
//Generate parity
boolean isParityBufferEmpty = true;
for (long i = 0; i < bytesPerFile; i++) {
isParityBufferEmpty = false;
int pos = (int) (i % 1024);
byte p = 0;
if (pos == 0) {
//Read chunk of file into each buffer
for (int j = 0; j < shardCount; j++) {
ByteBuffer bb = shardBuffers.get(j);
bb.clear();
fcin.read(bb, bytesPerFile * j + i);
bb.rewind();
}
//Dump parity buffer
if (i > 0) {
parityBuffer.rewind();
parityOut.write(parityBuffer);
parityBuffer.clear();
isParityBufferEmpty = true;
}
}
//Get parity
for (ByteBuffer bb : shardBuffers) {
if (pos >= bb.limit()) break;
p ^= bb.get(pos);
}
//Put parity in buffer
parityBuffer.put(pos, p);
}
if (!isParityBufferEmpty) {
parityBuffer.rewind();
parityOut.write(parityBuffer);
parityBuffer.clear();
}
fcin.close();
rin.close();
parityOut.close();
parityRAF.close();
}
Please let me know if there is anything wrong with either the parity algorithm or the file IO, or if there's anything I can do to optimize this. I'm happy to hear about other (better) ways of doing file IO.

Here is the solution I found (though it may need more tuning):
public static void splitAndGenerateParityFile(File file, int shardCount, String fileID) throws IOException {
int BUFFER_SIZE = 4 * 1024 * 1024;
RandomAccessFile rin = new RandomAccessFile(file, "r");
FileChannel fcin = rin.getChannel();
//Create parity files
File parity = new File(fileID + "_parity");
if (parity.exists()) throw new FileAlreadyExistsException("Could not create parity file! File already exists!");
RandomAccessFile parityRAF = new RandomAccessFile(parity, "rw");
FileChannel parityOut = parityRAF.getChannel();
//Create shard files
ArrayList<File> shards = new ArrayList<>(shardCount);
for (int i = 0; i < shardCount; i++) {
File f = new File(fileID + "_part_" + i);
if (f.exists()) throw new FileAlreadyExistsException("Could not create shard file! File already exists!");
shards.add(f);
}
long bytesPerFile = (long) Math.ceil(rin.length() / shardCount);
ArrayList<ByteBuffer> shardBuffers = new ArrayList<>(shardCount);
//Make buffers for each section of the file we will be reading from
for (int i = 0; i < shardCount; i++) {
ByteBuffer bb = ByteBuffer.allocate(BUFFER_SIZE);
shardBuffers.add(bb);
}
ByteBuffer parityBuffer = ByteBuffer.allocate(BUFFER_SIZE);
//Generate parity
boolean isParityBufferEmpty = true;
for (long i = 0; i < bytesPerFile; i++) {
isParityBufferEmpty = false;
int pos = (int) (i % BUFFER_SIZE);
byte p = 0;
if (pos == 0) {
//Read chunk of file into each buffer
for (int j = 0; j < shardCount; j++) {
ByteBuffer bb = shardBuffers.get(j);
bb.clear();
fcin.position(bytesPerFile * j + i);
fcin.read(bb);
bb.flip();
}
//Dump parity buffer
if (i > 0) {
parityBuffer.flip();
while (parityBuffer.hasRemaining()) {
parityOut.write(parityBuffer);
}
parityBuffer.clear();
isParityBufferEmpty = true;
}
}
//Get parity
for (ByteBuffer bb : shardBuffers) {
if (!bb.hasRemaining()) break;
p ^= bb.get();
}
//Put parity in buffer
parityBuffer.put(p);
}
if (!isParityBufferEmpty) {
parityBuffer.flip();
parityOut.write(parityBuffer);
parityBuffer.clear();
}
fcin.close();
rin.close();
parityOut.close();
parityRAF.close();
}
As suggested by VGR, I replaced rewind() with flip(). I also switched to relative operations instead of absolute. I don't think the absolute methods adjust the cursor position or the limit, so that was likely the cause of the error. I also changed the buffer size to 4MB as I am interested in generating the parity for large files.

Related

Incomplete file returned by GridFS

I'm working on a Java project to store and retrieve files from MongoDB using GridFS specification. I'm using the code snippets provided in MongoDB Java driver documentation from https://mongodb.github.io/mongo-java-driver/4.1/driver/tutorials/gridfs/.
While using OpenDownloadStream to retrieve the file, I noticed that if the file is divided into more than one chunks, it returns only the first chunk, and not the full file.
ObjectId fileId;
GridFSDownloadStream downloadStream = gridFSBucket.openDownloadStream(fileId);
int fileLength = (int) downloadStream.getGridFSFile().getLength();
byte[] bytesToWriteTo = new byte[fileLength];
downloadStream.read(bytesToWriteTo); /*read file contents */
downloadStream.close();
System.out.println(new String(bytesToWriteTo, StandardCharsets.UTF_8));
Any solutions to this?
Looking at the class GridFSDownloadStreamImpl which implements GridFSDownloadStream, it looks like the method read(byte[]) reads chunk by chunk:
#Override
public int read(final byte[] b) {
return read(b, 0, b.length);
}
#Override
public int read(final byte[] b, final int off, final int len) {
checkClosed();
if (currentPosition == length) {
return -1;
} else if (buffer == null) {
buffer = getBuffer(chunkIndex);
} else if (bufferOffset == buffer.length) {
chunkIndex += 1;
buffer = getBuffer(chunkIndex);
bufferOffset = 0;
}
int r = Math.min(len, buffer.length - bufferOffset);
System.arraycopy(buffer, bufferOffset, b, off, r);
bufferOffset += r;
currentPosition += r;
return r;
}
Therefore, you have to loop until all expected bytes are actually read:
byte[] bytesToWriteTo = new byte[fileLength];
int bytesRead = 0;
while(bytesRead < fileLength) {
int newBytesRead = downloadStream.read(bytesToWriteTo);
if(newBytesRead == -1) {
throw new Exception();
}
bytesRead += newBytesRead;
}
downloadStream.close();
Note that I was not able to test above code so please use with caution.
I ended up using readAllBytes() method and it returns the whole file.
GridFSDownloadStream downloadStream = gridFSBucket.openDownloadStream(fileId);
int fileLength = (int) downloadStream.getGridFSFile().getLength();
byte[] bytesToWriteTo = new byte[fileLength];
bytesToWriteTo = downloadStream.readAllBytes();
downloadStream.close();

Huffman Code writing bits to a file for compression

I was asked to use huffman code to compress an input file and write it to an output file. I have finished implementing the huffman tree structure and generating the huffman codes. But I dont know how to write those codes into a file so that the file is less in size than the original file.
Right now I have the codes in string representation (e.g huffman code for 'c' is "0100"). Someone please help me write those bits into a
file.
Here a possible implementation to write stream of bits(output of Huffman coding) into file.
class BitOutputStream {
private OutputStream out;
private boolean[] buffer = new boolean[8];
private int count = 0;
public BitOutputStream(OutputStream out) {
this.out = out;
}
public void write(boolean x) throws IOException {
this.count++;
this.buffer[8-this.count] = x;
if (this.count == 8){
int num = 0;
for (int index = 0; index < 8; index++){
num = 2*num + (this.buffer[index] ? 1 : 0);
}
this.out.write(num - 128);
this.count = 0;
}
}
public void close() throws IOException {
int num = 0;
for (int index = 0; index < 8; index++){
num = 2*num + (this.buffer[index] ? 1 : 0);
}
this.out.write(num - 128);
this.out.close();
}
}
By calling write method you will able to write bit by bit in a file (OutputStream).
Edit
For your specific problem, to save each character's huffman code you can simply use this if you don't want to use some other fancy class -
String huffmanCode = "0100"; // lets say its huffman coding output for c
BitSet huffmanCodeBit = new BitSet(huffmanCode.length());
for (int i = 0; i < huffmanCode.length(); i++) {
if(huffmanCode.charAt(i) == '1')
huffmanCodeBit.set(i);
}
String path = Resources.getResource("myfile.out").getPath();
ObjectOutputStream outputStream = null;
try {
outputStream = new ObjectOutputStream(new FileOutputStream(path));
outputStream.writeObject(huffmanCodeBit);
} catch (IOException e) {
e.printStackTrace();
}

write/read variable byte encoded string representation to/from file in JAVA

everyone! I recently learned about variable byte encoding.
for example, if a file contains this sequence of number: 824 5 214577
applying variable byte encoding this sequence would be encoded as 000001101011100010000101000011010000110010110001.
Now I want to know how to write that in another file such that to produce a kind of compressed file from the original. and similarly how to read it. I'm using JAVA .
Have tried this:
LinkedList<Integer> numbers = new LinkedList<Integer>();
numbers.add(824);
numbers.add(5);
numbers.add(214577);
String code = VBEncoder.encodeToString(numbers);//returns 000001101011100010000101000011010000110010110001 into code
File file = new File("test.compressed");
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
out.writeBytes(code);
out.flush();
this just writes the binary representation into the file..and this is not what I'm expecting.
I have also tried this:
LinkedList<Integer> code = VBEncoder.encode(numbers);//returns linked list of Byte(i give its describtion later)
File file = new File("test.compressed");
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
for(Byte b:code){
out.write(b.toInt());
System.out.println(b.toInt());
}
out.flush();
// he goes the describtion of the class Byte
class Byte {
int[] abyte;
Byte() {
abyte = new int[8];
}
public void readInt(int n) {
String bin = Integer.toBinaryString(n);
for (int i = 0; i < (8 - bin.length()); i++) {
abyte[i] = 0;
}
for (int i = 0; i < bin.length(); i++) {
abyte[i + (8 - bin.length())] = bin.charAt(i) - 48;
}
}
public void switchFirst() {
abyte[0] = 1;
}
public int toInt() {
int res = 0;
for (int i = 0; i < 8; i++) {
res += abyte[i] * Math.pow(2, (7 - i));
}
return res;
}
public static Byte fromString(String codestring) {
Byte b = new Byte();
for(int i=0; i < 8; i++)
b.abyte[i] = (codestring.charAt(i)=='0')?0:1;
return b;
}
public String toString() {
String res = "";
for (int i = 0; i < 8; i++) {
res += abyte[i];
}
return res;
}
}
its prints this in the console:
6
184
133
13
12
177
this second attempt seems to work...the output file size is 6 bytes while for the first attemps it was 48 bytes.
but the problem in the second attempt is that I can't successfully read back the file.
InputStreamReader inStream = new InputStreamReader(new FileInputStream(file));
int c = -1;
while((c = inStream.read()) != -1){
System.out.println( c );
}
i get this:
6
184
8230
13
12
177
..so maybe I'm doing it the wrong way: expecting to receive some good advice from you. thanks!
It is solved; I was just not reading the file the right way:below is the right way:
DataInputStream inStream = null;
inStream = new DataInputStream(new BufferedInputStream(newFileInputStream(file)));
int c = -1;
while((c = inStream.read()) != -1){
Byte b = new Byte();
b.readInt(c);
System.out.println( c +":" + b.toString());
}
now I get this as the result:
6:00000110
184:10111000
133:10000101
13:00001101
12:00001100
177:10110001
Now the importance of writing the original sequence of integers into variable encoded bytes reduces the size of the file; if we normally write this sequence of integers in the file, its size would be 12 bytes (3 * 4 bytes). but now it is just 6 bytes.
int c = -1;
LinkedList<Byte> bytestream = new LinkedList<Byte>();
while((c = inStream.read()) != -1){
Byte b = new Byte();
b.readInt(c);
bytestream.add(b);
}
LinkedList<Integer> numbers = VBEncoder.decode(bytestream);
for(Integer number:numbers) System.out.println(number);
//
//here goes the code of VBEncoder.decode
public static LinkedList<Integer> decode(LinkedList<Byte> code) {
LinkedList<Integer> numbers = new LinkedList<Integer>();
int n = 0;
for (int i = 0; !(code.isEmpty()); i++) {
Byte b = code.poll();
int bi = b.toInt();
if (bi < 128) {
n = 128 * n + bi;
} else {
n = 128 * n + (bi - 128);
numbers.add(n);
n = 0;
}
}
return numbers;
}
I get back the sequence:
824
5
214577

Compare files block by block (bytes) java

I try to compare two files block by block. If blocks are equals - get next block and compare them.
If final blocks are equals - return true; all other variant - return false.
I don't understand how to get right the next block and how to get the end of file.
private static boolean getBlocks(File file1, File file2, int count) throws IOException {
RandomAccessFile raf1 = new RandomAccessFile(file1, "r");
RandomAccessFile raf2 = new RandomAccessFile(file2, "r");
int point = count * 512;
FileChannel fc1 = raf1.getChannel();
FileChannel fc2 = raf2.getChannel();
MappedByteBuffer buffer1 = fc1.map(FileChannel.MapMode.READ_ONLY, point, 512);
MappedByteBuffer buffer2 = fc2.map(FileChannel.MapMode.READ_ONLY, point, 512);
byte[] bytes1 = new byte[512];
byte[] bytes2 = new byte[512];
buffer1.get(bytes1);
buffer2.get(bytes2);
if (bytes1.length == bytes2.length) {
for (int i = 0; i < bytes1.length; i++) {
if(bytes1[i] != bytes2[i]) {
return false;
}
}
if (true) {
count++;
getBlocks(file1, file2, point);
}
}
buffer1.clear();
buffer2.clear();
return true;
}
That's how you can read a file byte by byte until EOF is reached: http://www.java2s.com/Code/Java/File-Input-Output/Testingforendoffilewhilereadingabyteatatime.htm
When using MappedByteBuffer this should be the answer: https://stackoverflow.com/a/12509314/1321564

Inserting an image to a particular position in a word document using docx4j

I want to add an image to particular position in my word document using docx4j. I don't want inline insertion. The code below performs adding the image inline with text. But I want floating insertion where I can explicitly give the location of where the image should be placed in the page. Please help me.
public R addUserPic(P parag, WordprocessingMLPackage wordMLPackage)
throws Exception {
File file = new File("src/main/resources/PictureNew.png");
byte[] bytes = convertImageToByteArray(file);
BinaryPartAbstractImage imagePart = BinaryPartAbstractImage
.createImagePart(wordMLPackage, bytes);
int docPrId = 1;
int cNvPrId = 2;
Inline inline = imagePart.createImageInline("Filename hint",
"Alternative text", docPrId, cNvPrId, false);
ObjectFactory factory = new ObjectFactory();
R run = factory.createR();
org.docx4j.wml.Drawing drawing = factory.createDrawing();
run.getContent().add(drawing);
drawing.getAnchorOrInline().add(inline);
return run;
}
private static byte[] convertImageToByteArray(File file)
throws FileNotFoundException, IOException {
InputStream is = new FileInputStream(file);
long length = file.length();
if (length > Integer.MAX_VALUE) {
System.out.println("File too large!!");
}
byte[] bytes = new byte[(int) length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead = is.read(bytes, offset, bytes.length - offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
System.out.println("Could not completely read file "
+ file.getName());
}
is.close();
return bytes;
}
The thread you have cross posted in, at http://www.docx4java.org/forums/docx-java-f6/how-to-create-a-floating-image-t1224.html answers your question.

Categories

Resources