Huffman Code writing bits to a file for compression - java

I was asked to use huffman code to compress an input file and write it to an output file. I have finished implementing the huffman tree structure and generating the huffman codes. But I dont know how to write those codes into a file so that the file is less in size than the original file.
Right now I have the codes in string representation (e.g huffman code for 'c' is "0100"). Someone please help me write those bits into a
file.

Here a possible implementation to write stream of bits(output of Huffman coding) into file.
class BitOutputStream {
private OutputStream out;
private boolean[] buffer = new boolean[8];
private int count = 0;
public BitOutputStream(OutputStream out) {
this.out = out;
}
public void write(boolean x) throws IOException {
this.count++;
this.buffer[8-this.count] = x;
if (this.count == 8){
int num = 0;
for (int index = 0; index < 8; index++){
num = 2*num + (this.buffer[index] ? 1 : 0);
}
this.out.write(num - 128);
this.count = 0;
}
}
public void close() throws IOException {
int num = 0;
for (int index = 0; index < 8; index++){
num = 2*num + (this.buffer[index] ? 1 : 0);
}
this.out.write(num - 128);
this.out.close();
}
}
By calling write method you will able to write bit by bit in a file (OutputStream).
Edit
For your specific problem, to save each character's huffman code you can simply use this if you don't want to use some other fancy class -
String huffmanCode = "0100"; // lets say its huffman coding output for c
BitSet huffmanCodeBit = new BitSet(huffmanCode.length());
for (int i = 0; i < huffmanCode.length(); i++) {
if(huffmanCode.charAt(i) == '1')
huffmanCodeBit.set(i);
}
String path = Resources.getResource("myfile.out").getPath();
ObjectOutputStream outputStream = null;
try {
outputStream = new ObjectOutputStream(new FileOutputStream(path));
outputStream.writeObject(huffmanCodeBit);
} catch (IOException e) {
e.printStackTrace();
}

Related

huffman code encoder - Write to output file

I am a computer science student - second year. I was asked to prepare a project - by Hoffman Code.
During the project I got stuck in the fault, I am in the project phase building the encoder. I get a file - and I have to encode it in bytes - according to the Hoffman code.
My question is how to encode the file in bytes - what I did: for example:
I received the word "abca cadbara" in the file. And into another file I put the encoding but using a string and not in bytes.
the part of the code:
public static void writeOutputFile (String[] input_names, String[] output_names, Map<Character, String> codes)
{
FileInputStream input;
FileOutputStream output;
try
{
input = new FileInputStream(input_names[0]);
output = new FileOutputStream(output_names[0]);
for (int i = 0; i < (int) input.getChannel().size(); i++)
{
int x = input.read();
String codeOutput = codes.get((char) x);
//output.write(Integer.parseInt(codeOutput, 2));
for (int j = 0; j < codeOutput.length(); j++) {
output.write((int) codeOutput.charAt(j));
}
}
input.close();
output.close();
}
catch (Exception e)
{
e.printStackTrace();
}
}
How can I use bytes and not the string?
Thanks for the help.
public static void writeOutputFile (String[] input_names,
String[] output_names,
Map<Character, String> codes) {
try (FileInputStream input = new FileInputStream(input_names[0]);
FileOutputStream output = new FileOutputStream(output_names[0])) {
StringBuilder toWrite = new StringBuilder();
for (int i = 0; i < (int) input.getChannel().size(); i++) {
toWrite.append(codes.get((char) input.read()));
}
output.write(toWrite.toString().getBytes());
} catch (IOException e) {
e.printStackTrace();
}
}
Use String.getBytes() to write bytes to the file.
Use try-with-resources and don't worry about closing the resources. Use ; to separate multiple resources.
Don't write in a loop. Build the string first and then write it once. I/O is slow.
When concatenating in a loop, use StringBuilder avoid creating new Strings.
I made your code a bit more concise, you can rewrite as you like.

How to use XOR to develop a ​OTPInputStream​ in Java

I want to develop a ​OTPInputStream ​in Java that extends the ​InputStream ​and takes another input stream of key data and provides a stream encrypting / decrypting input stream.I need to develop a test program to show the use of ​OTPInputStream​ that uses XOR and arbitrary data.
I tried with this code but I have problem that is
java.io.FileInputStream cannot be cast to java.lang.CharSequence
What should I do here?
public class Bitwise_Encryption {
static String file = "" ;
static String key = "VFGHTrbg";
private static int[] encrypt(FileInputStream file, String key) {
int[] output = new int[((CharSequence) file).length()];
for(int i = 0; i < ((CharSequence) file).length(); i++) {
int o = (Integer.valueOf(((CharSequence) file).charAt(i)) ^ Integer.valueOf(key.charAt(i % (key.length() - 1)))) + '0';
output[i] = o;
}
return output;
}
private static String decrypt(int[] input, String key) {
String output = "";
for(int i = 0; i < input.length; i++) {
output += (char) ((input[i] - 48) ^ (int) key.charAt(i % (key.length() - 1)));
}
return output;
}
public static void main(String args[]) throws FileNotFoundException {
FileInputStream file = new FileInputStream("directory");
encrypt(file,key);
//decrypt();
int[] encrypted = encrypt(file,key);
System.out.println("Encrypted Data is :");
for(int i = 0; i < encrypted.length; i++)
System.out.printf("%d,", encrypted[i]);
System.out.println("");
System.out.println("---------------------------------------------------");
System.out.println("Decrypted Data is :");
System.out.println(decrypt(encrypted,key));
}
}
Think what you want is just file.read() and file.getChannel().size() to read one character at a time and get the size of the file
Try something like this:
private static int[] encrypt(FileInputStream file, String key) {
int fileSize = file.getChannel().size();
int[] output = new int[fileSize];
for(int i = 0; i < output.length; i++) {
char char1 = (char) file.read();
int o = (char1 ^ Integer.valueOf(key.charAt(i % (key.length() - 1)))) + '0';
output[i] = o;
}
return output;
}
Will have to do some error handling because file.read() will return -1 if the end of the file has been reached and as pointed out reading one byte at a time is lot of IO operations and can slow down performance. You can keep the data in a buffer and read it another way like this:
private static int[] encrypt(FileInputStream file, String key) {
int fileSize = file.getChannel().size();
int[] output = new int[fileSize];
int read = 0;
int offset = 0;
byte[] buffer = new byte[1024];
while((read = file.read(buffer)) > 0) {
for(int i = 0; i < read; i++) {
char char1 = (char) buffer[i];
int o = (char1 ^ Integer.valueOf(key.charAt(i % (key.length() - 1)))) + '0';
output[i + offset] = o;
}
offset += read;
}
return output;
}
This will read in 1024 bytes at a time from the file and store it in your buffer, then you can loop through the buffer to do your logic. The offset value is to store where in our output the current spot is. Also you will have to make sure that i + offset doesn't exceed your array size.
UPDATE
After working with it; i decided to switch to Base64 Encoding/Decoding to remove non-printable characters:
private static String encrypt(InputStream file, String key) throws Exception {
int read = 0;
byte[] buffer = new byte[1024];
try(ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
while((read = file.read(buffer)) > 0) {
baos.write(buffer, 0, read);
}
return base64Encode(xorWithKey(baos.toByteArray(), key.getBytes()));
}
}
private static String decrypt(String input, String key) {
byte[] decoded = base64Decode(input);
return new String(xorWithKey(decoded, key.getBytes()));
}
private static byte[] xorWithKey(byte[] a, byte[] key) {
byte[] out = new byte[a.length];
for (int i = 0; i < a.length; i++) {
out[i] = (byte) (a[i] ^ key[i%key.length]);
}
return out;
}
private static byte[] base64Decode(String s) {
return Base64.getDecoder().decode(s.trim());
}
private static String base64Encode(byte[] bytes) {
return Base64.getEncoder().encodeToString(bytes);
}
This method is cleaner and doesn't require knowing the size of your InputStream or do any character conversions. It reads your InputStream into an OutputStream to do the Base64 Encoding as well to remove non printable characters.
I have tested this and it works both for encrypting and decrypting.
I got the idea from this answer:
XOR operation with two strings in java

write/read variable byte encoded string representation to/from file in JAVA

everyone! I recently learned about variable byte encoding.
for example, if a file contains this sequence of number: 824 5 214577
applying variable byte encoding this sequence would be encoded as 000001101011100010000101000011010000110010110001.
Now I want to know how to write that in another file such that to produce a kind of compressed file from the original. and similarly how to read it. I'm using JAVA .
Have tried this:
LinkedList<Integer> numbers = new LinkedList<Integer>();
numbers.add(824);
numbers.add(5);
numbers.add(214577);
String code = VBEncoder.encodeToString(numbers);//returns 000001101011100010000101000011010000110010110001 into code
File file = new File("test.compressed");
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
out.writeBytes(code);
out.flush();
this just writes the binary representation into the file..and this is not what I'm expecting.
I have also tried this:
LinkedList<Integer> code = VBEncoder.encode(numbers);//returns linked list of Byte(i give its describtion later)
File file = new File("test.compressed");
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
for(Byte b:code){
out.write(b.toInt());
System.out.println(b.toInt());
}
out.flush();
// he goes the describtion of the class Byte
class Byte {
int[] abyte;
Byte() {
abyte = new int[8];
}
public void readInt(int n) {
String bin = Integer.toBinaryString(n);
for (int i = 0; i < (8 - bin.length()); i++) {
abyte[i] = 0;
}
for (int i = 0; i < bin.length(); i++) {
abyte[i + (8 - bin.length())] = bin.charAt(i) - 48;
}
}
public void switchFirst() {
abyte[0] = 1;
}
public int toInt() {
int res = 0;
for (int i = 0; i < 8; i++) {
res += abyte[i] * Math.pow(2, (7 - i));
}
return res;
}
public static Byte fromString(String codestring) {
Byte b = new Byte();
for(int i=0; i < 8; i++)
b.abyte[i] = (codestring.charAt(i)=='0')?0:1;
return b;
}
public String toString() {
String res = "";
for (int i = 0; i < 8; i++) {
res += abyte[i];
}
return res;
}
}
its prints this in the console:
6
184
133
13
12
177
this second attempt seems to work...the output file size is 6 bytes while for the first attemps it was 48 bytes.
but the problem in the second attempt is that I can't successfully read back the file.
InputStreamReader inStream = new InputStreamReader(new FileInputStream(file));
int c = -1;
while((c = inStream.read()) != -1){
System.out.println( c );
}
i get this:
6
184
8230
13
12
177
..so maybe I'm doing it the wrong way: expecting to receive some good advice from you. thanks!
It is solved; I was just not reading the file the right way:below is the right way:
DataInputStream inStream = null;
inStream = new DataInputStream(new BufferedInputStream(newFileInputStream(file)));
int c = -1;
while((c = inStream.read()) != -1){
Byte b = new Byte();
b.readInt(c);
System.out.println( c +":" + b.toString());
}
now I get this as the result:
6:00000110
184:10111000
133:10000101
13:00001101
12:00001100
177:10110001
Now the importance of writing the original sequence of integers into variable encoded bytes reduces the size of the file; if we normally write this sequence of integers in the file, its size would be 12 bytes (3 * 4 bytes). but now it is just 6 bytes.
int c = -1;
LinkedList<Byte> bytestream = new LinkedList<Byte>();
while((c = inStream.read()) != -1){
Byte b = new Byte();
b.readInt(c);
bytestream.add(b);
}
LinkedList<Integer> numbers = VBEncoder.decode(bytestream);
for(Integer number:numbers) System.out.println(number);
//
//here goes the code of VBEncoder.decode
public static LinkedList<Integer> decode(LinkedList<Byte> code) {
LinkedList<Integer> numbers = new LinkedList<Integer>();
int n = 0;
for (int i = 0; !(code.isEmpty()); i++) {
Byte b = code.poll();
int bi = b.toInt();
if (bi < 128) {
n = 128 * n + bi;
} else {
n = 128 * n + (bi - 128);
numbers.add(n);
n = 0;
}
}
return numbers;
}
I get back the sequence:
824
5
214577

Stream of short[]

Hi I need to calculate the entropy of order m of a file where m is the number of bit (m <= 16).
So:
H_m(X)=-sum_i=0 to i=2^m-1{(p_i,m)(log_2 (p_i,m))}
So, I thought to create an input stream to read the file and then calculate the probability of each sequence composed by m bit.
For m = 8 it's easy because I consider a byte.
Since that m<=16 I tought to consider as primitive type short, save each short of the file in an array short[] and then manipulate bits using bitwise operators to obtain all the sequences of m bit in the file.
Is this a good idea?
Anyway, I'm not able to create a stream of short. This is what I've done:
public static void main(String[] args) {
readFile(FILE_NAME_INPUT);
}
public static void readFile(String filename) {
short[] buffer = null;
File a_file = new File(filename);
try {
File file = new File(filename);
FileInputStream fis = new FileInputStream(filename);
DataInputStream dis = new DataInputStream(fis);
int length = (int)file.length() / 2;
buffer = new short[length];
int count = 0;
while(dis.available() > 0 && count < length) {
buffer[count] = dis.readShort();
count++;
}
System.out.println("length=" + length);
System.out.println("count=" + count);
for(int i = 0; i < buffer.length; i++) {
System.out.println("buffer[" + i + "]: " + buffer[i]);
}
fis.close();
}
catch(EOFException eof) {
System.out.println("EOFException: " + eof);
}
catch(FileNotFoundException fe) {
System.out.println("FileNotFoundException: " + fe);
}
catch(IOException ioe) {
System.out.println("IOException: " + ioe);
}
}
But I lose a byte and I don't think this is the best way to proced.
This is what I think to do using bitwise operator:
int[] list = new int[l];
foreach n in buffer {
for(int i = 16 - m; i > 0; i-m) {
list.add( (n >> i) & 2^m-1 );
}
}
I'm assuming in this case to use shorts.
If I use bytes, how can I do a cycle like that for m > 8?
That cycle doesn't work because I have to concatenate multiple bytes and each time varying the number of bits to be joined..
Any ideas?
Thanks
I think you just need to have a byte array:
public static void readFile(String filename) {
ByteArrayOutputStream outputStream=new ByteArrayOutputStream();
try {
FileInputStream fis = new FileInputStream(filename);
byte b=0;
while((b=fis.read())!=-1) {
outputStream.write(b);
}
byte[] byteData=outputStream.toByteArray();
fis.close();
}
catch(IOException ioe) {
System.out.println("IOException: " + ioe);
}
Then you can manipulate byteData as per your bitwise operations.
--
If you want to work with shorts you can combine bytes read this way
short[] buffer=new short[(int)(byteData.length/2.)+1];
j=0;
for(i=0; i<byteData.length-1; i+=2) {
buffer[j]=(short)((byteData[i]<<8)|byteData[i+1]);
j++;
}
To check for odd bytes do this
if((byteData.length%2)==1) last=(short)((0x00<<8)|byteData[byteData.length-1]]);
last is a short so it could be placed in buffer[buffer.length-1]; I'm not sure if that last position in buffer is available or occupied; I think it is but you need to check j after exiting the loop; if j's value is buffer.length-1 then it is available; otherwise might be some problem.
Then manipulate buffer.
The second approach with working with bytes is more involved. It's a question of its own. So try this above.

Limit size byte[] Java android

I have to fill a byte[] in my Android application. Sometime, this one is bigger than 4KB.
I initialize my byte[] like this :
int size = ReadTools.getPacketSize(ptr.dataInputStream);
byte[] myByteArray = new byte[size];
Here, my size = 22625. But when I fill up my byte[] like this :
while (i != size) {
myByteArray[i] = ptr.dataInputStream.readByte();
i++;
}
But when I print the content of my byte[], I have a byte[] with size = 4060.
Does Java split my byte[] if this one is bigger than 4060 ? And if yes, how can I have a byte[] superior to 4060 ?
Here is my full code:
public class ReadSocket extends Thread{
DataInputStream inputStream;
BufferedReader reader;
GlobalContent ptr;
public ReadSocket(DataInputStream inputStream, GlobalContent ptr)
{
this.inputStream = inputStream;
this.ptr = ptr;
}
public void run() {
int i = 0;
int j = 0;
try {
ptr.StatusThreadReadSocket = 1;
while(ptr.dataInputStream.available() == 0)
{
if(ptr.StatusThreadReadSocket == 0)
{
ptr.dataInputStream.close();
break;
}
}
if(ptr.StatusThreadReadSocket == 1)
{
int end = ReadTools.getPacketSize(ptr.dataInputStream);
byte[] buffer = new byte[end];
while (i != end) {
buffer[j] = ptr.dataInputStream.readByte();
i++;
j++;
}
ptr.StatusThreadReadSocket = 0;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
...
}
Java doesn't split anything. You should post the minimal code which reproduces your error, and tell where ReadTools comes from.
There are two options here:
ReadTools.getPacketSize() returns 4096
You inadevertedly reassign myByteArray to another array
You should really post your full code and tell what library you use. Likely, it will have a method like
read(byte[] buffer, int offset, int length);
Which will save you some typing and also give better performance if all you need is bulk reading the content of the input in memory

Categories

Resources