This question already has answers here:
Is it possible to read/write bits from a file using JAVA?
(9 answers)
Closed 9 years ago.
I am working on Huffman Compression algorithm. I have the code for each character. For example
f=1100
d=111
e=1101
b=101
c=100
a=0
Now in order to achieve compression I need to write the codes as bits to a binary file. I am right now able to wrote them as bytes, which is doing nothing but increasing the size of compressed file. How do I write the codes as bits to binary file in Java?
Well if you have the text "fdebcafdbca" you would need to write that
as the bits:
110011111011011000110011111011011000
Separated and padded:
11001111 10110110 00110011 11101101 10000000 //4 bits of padding here
In hexadecimal:
CF B6 33 ED 80
So you'd write the byte array of 0xCF 0xB6 0x33 0xED 0x80 to a file. That's 5 bytes = 40 bits, 4 wasted
bits. The text originally takes 12 bytes, so not much saving as you need to store the tree as well. You cannot avoid using padding if they don't align to a byte boundary.
Although not recommended at all, if you have a string then you could do this:
public class BitWriter {
private byte nthBit = 0;
private int index = 0;
private byte[] data;
public BitWriter( int nBits ) {
this.data = new byte[(int)Math.ceil(nBits / 8.0)];
}
public void writeBit(boolean bit) {
if( nthBit >= 8) {
nthBit = 0;
index++;
if( index >= data.length) {
throw new IndexOutOfBoundsException();
}
}
byte b = data[index];
int mask = (1 << (7 - nthBit));
if( bit ) {
b = (byte)(b | mask);
}
data[index] = b;
nthBit++;
}
public byte[] toArray() {
byte[] ret = new byte[data.length];
System.arraycopy(data, 0, ret, 0, data.length);
return ret;
}
public static void main( String... args ) {
BitWriter bw = new BitWriter(6);
String strbits = "101010";
for( int i = 0; i < strbits.length(); i++) {
bw.writeBit( strbits.charAt(i) == '1');
}
byte[] b = bw.toArray();
for( byte a : b ) {
System.out.format("%02X", a);
//A8 == 10101000
}
}
}
Related
DNA molecules are denoted by one of four values: A, C, G, or T. I need to convert a string of characters from A, C, G, and T to an array of bytes, encoding each of the characters
with two bits.A with bits 00, C with bits 01, G with 10, and T with 11. I don't understand how to convert characters to 2 bits. I was trying to shift and mask, but got wrong result.
At the very beginning, I check if there are characters in the line. Then i convert each character into a bit value and insert it into an array. When i insert ACGT, in the output i got 0 1 3 2. And here I have a problem, because I don’t understand how to convert the value to 2 bits.
Scanner text = new Scanner(System.in);
String str = text.nextLine();
if (str.contains("A") && str.contains("C") && str.contains("G") && str.contains("T")){
System.out.println("");
}
else
{
System.out.println("wrong command format");
}
byte mas[] = str.getBytes();
System.out.println("String in byte array : " + Arrays.toString(mas));
for (int i = 0; i < mas.length; i++){
byte mask = 3;
byte number = mas[i];
byte result = (byte)((number >> 1) & mask);
System.out.println(result);
}
}
}
It seems that you want to save the bits in a byte. The following example might give some ideas.
public class Main
{
private static final int A = 0x00; // b00
private static final int C = 0x01; // b01
private static final int G = 0x02; // b10
private static final int T = 0x03; // b11
public static void main(String[] args) throws Exception
{
byte store = 0;
store = setByte(store, 0, A);
store = setByte(store, 1, C);
store = setByte(store, 2, G);
store = setByte(store, 3, T);
System.out.println(Integer.toBinaryString(store));
//11111111111111111111111111100100
System.out.println(getByte(store, 0)); //0
System.out.println(getByte(store, 1)); //1
System.out.println(getByte(store, 2)); //2
System.out.println(getByte(store, 3)); //3
}
//Behavior :: Store "value" into "store".
//Reminder :: Valid index 0 - 3. Valid value 0 - 3.
private static byte setByte(byte store, int index, int value)
{
store = (byte)(store & ~(0x3 << (2 * index)));
return store |= (value & 0x3) << (2 * index);
}
private static byte getByte(byte store, int index)
{
return (byte)((store >> (2 * index)) & 0x3);
}
}
I haven't tested this, but it may help you.
byte test = 69;
byte insert = 0b01;
byte index = 2;
final byte ones = 0b00000011;
//Clear out the data at specified index
test = (byte) (test & ~(ones << index));
//Insert data
test |= (byte) (insert << index);
It works as follows:
Clear the 2 bits at the index in the byte (using bitwise AND).
Insert the 2 data bits at the index in the byte using bitwise OR).
You can "convert" the chars ACGT to 0, 1, 2, 3 using bit arithmetic.
byte[] bytes = str.getBytes();
for (int i = 0; i < bytes.length; i++) {
bytes[i] = (byte)(bytes[i] >> 1 & 3 ^ bytes[i] >> 2 & 1);
}
I suspect your initial check should be:
if (!str.matches("[ACGT]+") {
System.out.println("wrong command format");
return;
}
How can i iterate bits in a byte array?
You'd have to write your own implementation of Iterable<Boolean> which took an array of bytes, and then created Iterator<Boolean> values which remembered the current index into the byte array and the current index within the current byte. Then a utility method like this would come in handy:
private static Boolean isBitSet(byte b, int bit)
{
return (b & (1 << bit)) != 0;
}
(where bit ranges from 0 to 7). Each time next() was called you'd have to increment your bit index within the current byte, and increment the byte index within byte array if you reached "the 9th bit".
It's not really hard - but a bit of a pain. Let me know if you'd like a sample implementation...
public class ByteArrayBitIterable implements Iterable<Boolean> {
private final byte[] array;
public ByteArrayBitIterable(byte[] array) {
this.array = array;
}
public Iterator<Boolean> iterator() {
return new Iterator<Boolean>() {
private int bitIndex = 0;
private int arrayIndex = 0;
public boolean hasNext() {
return (arrayIndex < array.length) && (bitIndex < 8);
}
public Boolean next() {
Boolean val = (array[arrayIndex] >> (7 - bitIndex) & 1) == 1;
bitIndex++;
if (bitIndex == 8) {
bitIndex = 0;
arrayIndex++;
}
return val;
}
public void remove() {
throw new UnsupportedOperationException();
}
};
}
public static void main(String[] a) {
ByteArrayBitIterable test = new ByteArrayBitIterable(
new byte[]{(byte)0xAA, (byte)0xAA});
for (boolean b : test)
System.out.println(b);
}
}
Original:
for (int i = 0; i < byteArray.Length; i++)
{
byte b = byteArray[i];
byte mask = 0x01;
for (int j = 0; j < 8; j++)
{
bool value = b & mask;
mask << 1;
}
}
Or using Java idioms
for (byte b : byteArray ) {
for ( int mask = 0x01; mask != 0x100; mask <<= 1 ) {
boolean value = ( b & mask ) != 0;
}
}
An alternative would be to use a BitInputStream like the one you can find here and write code like this:
BitInputStream bin = new BitInputStream(new ByteArrayInputStream(bytes));
while(true){
int bit = bin.readBit();
// do something
}
bin.close();
(Note: Code doesn't contain EOFException or IOException handling for brevity.)
But I'd go with Jon Skeets variant and do it on my own.
I needed some bit streaming in my application. Here you can find my BitArray implementation. It is not a real iterator pattern but you can ask for 1-32 bits from the array in a streaming way. There is also an alternate implementation called BitReader later in the file.
I know, probably not the "coolest" way to do it, but you can extract each bit with the following code.
int n = 156;
String bin = Integer.toBinaryString(n);
System.out.println(bin);
char arr[] = bin.toCharArray();
for(int i = 0; i < arr.length; ++i) {
System.out.println("Bit number " + (i + 1) + " = " + arr[i]);
}
10011100
Bit number 1 = 1
Bit number 2 = 0
Bit number 3 = 0
Bit number 4 = 1
Bit number 5 = 1
Bit number 6 = 1
Bit number 7 = 0
Bit number 8 = 0
You can iterate through the byte array, and for each byte use the bitwise operators to iterate though its bits.
Alternatively, you can use BitSet for this:
byte[] bytes=...;
BitSet bitSet=BitSet.valueOf(bytes);
for(int i=0;i<bitSet.length();i++){
boolean bit=bitSet.get(i);
//use your bit
}
I was making a web application using image stenography in java. But I got stuck in between as when I use a encoding and decoding same algorithm in my desktop application. I got different results(CORRECT). But when I use same algorithm in case of web application, results are wrong
Encoding the text is done as follow :
private static BufferedImage add_text(BufferedImage image, String text)
{
//convert all items to byte arrays: image, message, message length
byte img[] = get_byte_data(image);
byte msg[] = text.getBytes();
byte len[] = bit_conversion(msg.length);
try
{
encode_text(img, len, 0); //0 first positiong
encode_text(img, msg, 32); //4 bytes of space for length: 4bytes*8bit = 32 bits
}
catch(Exception e)
{
JOptionPane.showMessageDialog(null,"Target File cannot hold message!", "Error",JOptionPane.ERROR_MESSAGE);
}
return image;
}
It uses three functions.
get_byte_data() is as follow :
private static byte[] get_byte_data(BufferedImage image)
{
WritableRaster raster = image.getRaster();
DataBufferByte buffer = (DataBufferByte)raster.getDataBuffer();
return buffer.getData();
}
2nd function used is bit_Conversion.It is as follow :
private static byte[] bit_conversion(int i)
{
byte byte3 = (byte)((i & 0xFF000000) >>> 24); //0
byte byte2 = (byte)((i & 0x00FF0000) >>> 16); //0
byte byte1 = (byte)((i & 0x0000FF00) >>> 8 ); //0
byte byte0 = (byte)((i & 0x000000FF) );
return(new byte[]{byte3,byte2,byte1,byte0});
}
3rd and final one is encode_text that is used to encode the text in image
private static byte[] encode_text(byte[] image, byte[] addition, int offset)
{
//check that the data + offset will fit in the image
if(addition.length + offset > image.length)
{
throw new IllegalArgumentException("File not long enough!");
}
//loop through each addition byte
for(int i=0; i<addition.length; ++i)
{
//loop through the 8 bits of each byte
int add = addition[i];
for(int bit=7; bit>=0; --bit, ++offset) //ensure the new offset value carries on through both loops
{
//assign an integer to b, shifted by bit spaces AND 1
//a single bit of the current byte
int b = (add >>> bit) & 1;
//assign the bit by taking: [(previous byte value) AND 0xfe] OR bit to add
//changes the last bit of the byte in the image to be the bit of addition
image[offset] = (byte)((image[offset] & 0xFE) | b );
}
}
return image;
}
Decode :
public static String decode(String path, String name)
{
byte[] decode;
try
{
//user space is necessary for decrypting
BufferedImage image = user_space(getImage(image_path(path,name,"png")));
decode = decode_text(get_byte_data(image));
return(new String(decode));
}
catch(Exception e)
{
JOptionPane.showMessageDialog(null,
"There is no hidden message in this image!","Error",
JOptionPane.ERROR_MESSAGE);
return "";
}
}
Decode _text function :
private static byte[] decode_text(byte[] image)
{
int length = 0;
int offset = 32;
//loop through 32 bytes of data to determine text length
for(int i=0; i<32; ++i) //i=24 will also work, as only the 4th byte contains real data
{
length = (length << 1) | (image[i] & 1);
}
byte[] result = new byte[length];
//loop through each byte of text
for(int b=0; b<result.length; ++b )
{
//loop through each bit within a byte of text
for(int i=0; i<8; ++i, ++offset)
{
//assign bit: [(new byte value) << 1] OR [(text byte) AND 1]
result[b] = (byte)((result[b] << 1) | (image[offset] & 1));
}
}
return result;
}
What can be the reason for different results? Please help. Also tell a solution for the same
I need to store a couple binary sequences that are 16 bits in length into a byte array (of length 2). The one or two binary numbers don't change, so a function that does conversion might be overkill. Say for example the 16 bit binary sequence is 1111000011110001. How do I store that in a byte array of length two?
String val = "1111000011110001";
byte[] bval = new BigInteger(val, 2).toByteArray();
There are other options, but I found it best to use BigInteger class, that has conversion to byte array, for this kind of problems. I prefer if, because I can instantiate class from String, that can represent various bases like 8, 16, etc. and also output it as such.
Edit: Mondays ... :P
public static byte[] getRoger(String val) throws NumberFormatException,
NullPointerException {
byte[] result = new byte[2];
byte[] holder = new BigInteger(val, 2).toByteArray();
if (holder.length == 1) result[0] = holder[0];
else if (holder.length > 1) {
result[1] = holder[holder.length - 2];
result[0] = holder[holder.length - 1];
}
return result;
}
Example:
int bitarray = 12321;
String val = Integer.toString(bitarray, 2);
System.out.println(new StringBuilder().append(bitarray).append(':').append(val)
.append(':').append(Arrays.toString(getRoger(val))).append('\n'));
I have been disappointed with all of the solutions I have found to converting strings of bits to byte arrays and vice versa -- all have been buggy (even the BigInteger solution above), and very few are as efficient as they should be.
I realize the OP was only concerned with a bit string to an array of two bytes, which the BitInteger approach seems to work fine for. However, since this post is currently the first search result when searching "bit string to byte array java" in Google, I am going to post my general solution here for people dealing with huge strings and/or huge byte arrays.
Note that my solution below is the only solution I have ran that passes all of my test cases -- many online solutions to this relatively simple problem simply do not work.
Code
/**
* Zips (compresses) bit strings to byte arrays and unzips (decompresses)
* byte arrays to bit strings.
*
* #author ryan
*
*/
public class BitZip {
private static final byte[] BIT_MASKS = new byte[] {1, 2, 4, 8, 16, 32, 64, -128};
private static final int BITS_PER_BYTE = 8;
private static final int MAX_BIT_INDEX_IN_BYTE = BITS_PER_BYTE - 1;
/**
* Decompress the specified byte array to a string.
* <p>
* This function does not pad with zeros for any bit-string result
* with a length indivisible by 8.
*
* #param bytes The bytes to convert into a string of bits, with byte[0]
* consisting of the least significant bits in the byte array.
* #return The string of bits representing the byte array.
*/
public static final String unzip(final byte[] bytes) {
int byteCount = bytes.length;
int bitCount = byteCount * BITS_PER_BYTE;
char[] bits = new char[bitCount];
{
int bytesIndex = 0;
int iLeft = Math.max(bitCount - BITS_PER_BYTE, 0);
while (bytesIndex < byteCount) {
byte value = bytes[bytesIndex];
for (int b = MAX_BIT_INDEX_IN_BYTE; b >= 0; --b) {
bits[iLeft + b] = ((value % 2) == 0 ? '0' : '1');
value >>= 1;
}
iLeft = Math.max(iLeft - BITS_PER_BYTE, 0);
++bytesIndex;
}
}
return new String(bits).replaceFirst("^0+(?!$)", "");
}
/**
* Compresses the specified bit string to a byte array, ignoring trailing
* zeros past the most significant set bit.
*
* #param bits The string of bits (composed strictly of '0' and '1' characters)
* to convert into an array of bytes.
* #return The bits, as a byte array with byte[0] containing the least
* significant bits.
*/
public static final byte[] zip(final String bits) {
if ((bits == null) || bits.isEmpty()) {
// No observations -- return nothing.
return new byte[0];
}
char[] bitChars = bits.toCharArray();
int bitCount = bitChars.length;
int left;
for (left = 0; left < bitCount; ++left) {
// Ignore leading zeros.
if (bitChars[left] == '1') {
break;
}
}
if (bitCount == left) {
// Only '0's in the string.
return new byte[] {0};
}
int cBits = bitCount - left;
byte[] bytes = new byte[((cBits) / BITS_PER_BYTE) + (((cBits % BITS_PER_BYTE) > 0) ? 1 : 0)];
{
int iRight = bitCount - 1;
int iLeft = Math.max(bitCount - BITS_PER_BYTE, left);
int bytesIndex = 0;
byte _byte = 0;
while (bytesIndex < bytes.length) {
while (iLeft <= iRight) {
if (bitChars[iLeft] == '1') {
_byte |= BIT_MASKS[iRight - iLeft];
}
++iLeft;
}
bytes[bytesIndex++] = _byte;
iRight = Math.max(iRight - BITS_PER_BYTE, left);
iLeft = Math.max((1 + iRight) - BITS_PER_BYTE, left);
_byte = 0;
}
}
return bytes;
}
}
Performance
I was bored at work so I did some performance testing comparing against the accepted answer here for when N is large. (Pretending to ignore the fact that the BigInteger approach posted above doesn't even work properly as a general approach.)
This is running with a random bit string of size 5M and a random byte array of size 1M:
String -> byte[] -- BigInteger result: 39098ms
String -> byte[] -- BitZip result: 29ms
byte[] -> String -- Integer result: 138ms
byte[] -> String -- BitZip result: 71ms
And the code:
public static void main(String[] argv) {
int testByteLength = 1000000;
int testStringLength = 5000000;
// Independently random.
final byte[] randomBytes = new byte[testByteLength];
final String randomBitString;
{
StringBuilder sb = new StringBuilder();
Random rand = new Random();
for (int i = 0; i < testStringLength; ++i) {
int value = rand.nextInt(1 + i);
sb.append((value % 2) == 0 ? '0' : '1');
randomBytes[i % testByteLength] = (byte) value;
}
randomBitString = sb.toString();
}
byte[] resultCompress;
String resultDecompress;
{
Stopwatch s = new Stopwatch();
TimeUnit ms = TimeUnit.MILLISECONDS;
{
s.start();
{
resultCompress = compressFromBigIntegerToByteArray(randomBitString);
}
s.stop();
{
System.out.println("String -> byte[] -- BigInteger result: " + s.elapsed(ms) + "ms");
}
s.reset();
}
{
s.start();
{
resultCompress = zip(randomBitString);
}
s.stop();
{
System.out.println("String -> byte[] -- BitZip result: " + s.elapsed(ms) + "ms");
}
s.reset();
}
{
s.start();
{
resultDecompress = decompressFromIntegerParseInt(randomBytes);
}
s.stop();
{
System.out.println("byte[] -> String -- Integer result: " + s.elapsed(ms) + "ms");
}
s.reset();
}
{
s.start();
{
resultDecompress = unzip(randomBytes);
}
s.stop();
{
System.out.println("byte[] -> String -- BitZip result: " + s.elapsed(ms) + "ms");
}
s.reset();
}
}
}
If you have binary strings (literally String objects that contain only 1's and 0's), how would you output them as bits into a file?
This is for a text compressor I was working on; it's still bugging me, and it'd be nice to finally get it working. Thanks!
Easiest is to simply take 8 consecutive characters, turn them into a byte and output that byte. Pad with zeros at the end if you can recognize the end-of-stream, or add a header with length (in bits) at the beginning of the file.
The inner loop would look something like:
byte[] buffer = new byte[ ( string.length + 7 ) / 8 ];
for ( int i = 0; i < buffer.length; ++i ) {
byte current = 0;
for ( int j = 7; j >= 0; --j )
if ( string[ i * 8 + j ] == '1' )
current |= 1 << j;
output( current );
}
You'll need to make some adjustments, but that's the general idea.
If you're lucky, java.math.BigInteger may do everything for you.
String s = "11001010001010101110101001001110";
byte[] bytes = (new java.math.BigInteger(s, 2)).toByteArray();
This does depend on the byte order (big-endian) and right-aligning (if the number of bits is not a multiple of 8) being what you want but it may be simpler to modify the array afterwards than to do the character conversion yourself.
public class BitOutputStream extends FilterOutputStream
{
private int buffer = 0;
private int bitCount = 0;
public BitOutputStream(OutputStream out)
{
super(out);
}
public void writeBits(int value, int numBits) throws IOException
{
while(numBits>0)
{
numBits--;
int mix = ((value&1)<<bitCount++);
buffer|=mix;
value>>=1;
if(bitCount==8)
align8();
}
}
#Override
public void close() throws IOException
{
align8(); /* Flush any remaining partial bytes */
super.close();
}
public void align8() throws IOException
{
if(bitCount > 0)
{
bitCount=0;
write(buffer);
buffer=0;
}
}
}
And then...
if (nextChar == '0')
{
bos.writeBits(0, 1);
}
else
{
bos.writeBits(1, 1);
}
Assuming the String has a multiple of eight bits, (you can pad it otherwise), take advantage of Java's built in parsing in the Integer.valueOf method to do something like this:
String s = "11001010001010101110101001001110";
byte[] data = new byte[s.length() / 8];
for (int i = 0; i < data.length; i++) {
data[i] = (byte) Integer.parseInt(s.substring(i * 8, (i + 1) * 8), 2);
}
Then you should be able to write the bytes to a FileOutputStream pretty simply.
On the other hand, if you looking for effeciency, you should consider not using a String to store the bits to begin with, but build up the bytes directly in your compressor.