Bit manipulation and output in Java - java

If you have binary strings (literally String objects that contain only 1's and 0's), how would you output them as bits into a file?
This is for a text compressor I was working on; it's still bugging me, and it'd be nice to finally get it working. Thanks!

Easiest is to simply take 8 consecutive characters, turn them into a byte and output that byte. Pad with zeros at the end if you can recognize the end-of-stream, or add a header with length (in bits) at the beginning of the file.
The inner loop would look something like:
byte[] buffer = new byte[ ( string.length + 7 ) / 8 ];
for ( int i = 0; i < buffer.length; ++i ) {
byte current = 0;
for ( int j = 7; j >= 0; --j )
if ( string[ i * 8 + j ] == '1' )
current |= 1 << j;
output( current );
}
You'll need to make some adjustments, but that's the general idea.

If you're lucky, java.math.BigInteger may do everything for you.
String s = "11001010001010101110101001001110";
byte[] bytes = (new java.math.BigInteger(s, 2)).toByteArray();
This does depend on the byte order (big-endian) and right-aligning (if the number of bits is not a multiple of 8) being what you want but it may be simpler to modify the array afterwards than to do the character conversion yourself.

public class BitOutputStream extends FilterOutputStream
{
private int buffer = 0;
private int bitCount = 0;
public BitOutputStream(OutputStream out)
{
super(out);
}
public void writeBits(int value, int numBits) throws IOException
{
while(numBits>0)
{
numBits--;
int mix = ((value&1)<<bitCount++);
buffer|=mix;
value>>=1;
if(bitCount==8)
align8();
}
}
#Override
public void close() throws IOException
{
align8(); /* Flush any remaining partial bytes */
super.close();
}
public void align8() throws IOException
{
if(bitCount > 0)
{
bitCount=0;
write(buffer);
buffer=0;
}
}
}
And then...
if (nextChar == '0')
{
bos.writeBits(0, 1);
}
else
{
bos.writeBits(1, 1);
}

Assuming the String has a multiple of eight bits, (you can pad it otherwise), take advantage of Java's built in parsing in the Integer.valueOf method to do something like this:
String s = "11001010001010101110101001001110";
byte[] data = new byte[s.length() / 8];
for (int i = 0; i < data.length; i++) {
data[i] = (byte) Integer.parseInt(s.substring(i * 8, (i + 1) * 8), 2);
}
Then you should be able to write the bytes to a FileOutputStream pretty simply.
On the other hand, if you looking for effeciency, you should consider not using a String to store the bits to begin with, but build up the bytes directly in your compressor.

Related

How can I create a stream of bits from a byte array? [duplicate]

How can i iterate bits in a byte array?
You'd have to write your own implementation of Iterable<Boolean> which took an array of bytes, and then created Iterator<Boolean> values which remembered the current index into the byte array and the current index within the current byte. Then a utility method like this would come in handy:
private static Boolean isBitSet(byte b, int bit)
{
return (b & (1 << bit)) != 0;
}
(where bit ranges from 0 to 7). Each time next() was called you'd have to increment your bit index within the current byte, and increment the byte index within byte array if you reached "the 9th bit".
It's not really hard - but a bit of a pain. Let me know if you'd like a sample implementation...
public class ByteArrayBitIterable implements Iterable<Boolean> {
private final byte[] array;
public ByteArrayBitIterable(byte[] array) {
this.array = array;
}
public Iterator<Boolean> iterator() {
return new Iterator<Boolean>() {
private int bitIndex = 0;
private int arrayIndex = 0;
public boolean hasNext() {
return (arrayIndex < array.length) && (bitIndex < 8);
}
public Boolean next() {
Boolean val = (array[arrayIndex] >> (7 - bitIndex) & 1) == 1;
bitIndex++;
if (bitIndex == 8) {
bitIndex = 0;
arrayIndex++;
}
return val;
}
public void remove() {
throw new UnsupportedOperationException();
}
};
}
public static void main(String[] a) {
ByteArrayBitIterable test = new ByteArrayBitIterable(
new byte[]{(byte)0xAA, (byte)0xAA});
for (boolean b : test)
System.out.println(b);
}
}
Original:
for (int i = 0; i < byteArray.Length; i++)
{
byte b = byteArray[i];
byte mask = 0x01;
for (int j = 0; j < 8; j++)
{
bool value = b & mask;
mask << 1;
}
}
Or using Java idioms
for (byte b : byteArray ) {
for ( int mask = 0x01; mask != 0x100; mask <<= 1 ) {
boolean value = ( b & mask ) != 0;
}
}
An alternative would be to use a BitInputStream like the one you can find here and write code like this:
BitInputStream bin = new BitInputStream(new ByteArrayInputStream(bytes));
while(true){
int bit = bin.readBit();
// do something
}
bin.close();
(Note: Code doesn't contain EOFException or IOException handling for brevity.)
But I'd go with Jon Skeets variant and do it on my own.
I needed some bit streaming in my application. Here you can find my BitArray implementation. It is not a real iterator pattern but you can ask for 1-32 bits from the array in a streaming way. There is also an alternate implementation called BitReader later in the file.
I know, probably not the "coolest" way to do it, but you can extract each bit with the following code.
int n = 156;
String bin = Integer.toBinaryString(n);
System.out.println(bin);
char arr[] = bin.toCharArray();
for(int i = 0; i < arr.length; ++i) {
System.out.println("Bit number " + (i + 1) + " = " + arr[i]);
}
10011100
Bit number 1 = 1
Bit number 2 = 0
Bit number 3 = 0
Bit number 4 = 1
Bit number 5 = 1
Bit number 6 = 1
Bit number 7 = 0
Bit number 8 = 0
You can iterate through the byte array, and for each byte use the bitwise operators to iterate though its bits.
Alternatively, you can use BitSet for this:
byte[] bytes=...;
BitSet bitSet=BitSet.valueOf(bytes);
for(int i=0;i<bitSet.length();i++){
boolean bit=bitSet.get(i);
//use your bit
}

Bit shuffling to change encoding from little endian

So in the program I need to read an number from the user which needs to be changed from little endian encoding to whatever encoding the user wants to change it to. The encoding entered by the user is just a 4 digits number which just means which byte should be where after the encoding. e.g. 4321 means put the 4th byte first followed by the 3rd and so on. the encoding can take other form such as 3214 etc.
This is my code, would really appreciate if someone point out where I am missing out.
import java.util.Scanner;
class encoding {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int n = sc.nextInt();
String byteOrder = sc.next();
long[] bitMask = { // little endian
Long.parseLong("11111111000000000000000000000000", 2),
Long.parseLong("00000000111111110000000000000000", 2),
Long.parseLong("00000000000000001111111100000000", 2),
Long.parseLong("00000000000000000000000011111111", 2)
};
int[] bytes = {
(int)(bitMask[0] & n),
(int)(bitMask[1] & n),
(int)(bitMask[2] & n),
(int)(bitMask[3] & n)
};
int result = 0;
shuffleBytes(bytes, byteOrder);
for (int i = 0; i < 4; i++) {
bytes[i] = bytes[i] << (i * 8);
result |= bytes[i];
}
System.out.println(result);
}
static void shuffleBytes(int[] bytes, String encoding) {
for (int i = 0; i < 4; i++) {
int index = Integer.parseInt(encoding.substring(i, i+1))-1;
int copy = bytes[i];
bytes[i] = bytes[index];
bytes[index] = copy;
}
}
}
Fixing your current solution
There are two problems:
1. Forgot to right-align bytes
In ...
int[] bytes = {
(int)(bitMask[0] & n),
(int)(bitMask[1] & n),
(int)(bitMask[2] & n),
(int)(bitMask[3] & n)
};
... you forgot to shift each "byte" to the right. As a result, you end up with a list of "bytes" of the form 0x……000000, 0x00……0000, 0x0000……00, 0x000000……. This is not a problem yet, but after shuffleBytes you shift each of these entries again using bytes[i] = bytes[i] << (i * 8);. As a result, the relevant parts (__) end up at a completely different spot or are shifted completely out of the integer.
To fix this, shift each (int)(bitMask[…] & n) to the right:
int[] bytes = {
(int)(bitMask[0] & n) >> (3*8),
(int)(bitMask[1] & n) >> (2*8),
(int)(bitMask[2] & n) >> (1*8),
(int)(bitMask[3] & n) >> (0*8)
};
2. Swapping more than once
In ...
static void shuffleBytes(int[] bytes, String encoding) {
for (int i = 0; i < 4; i++) {
int index = Integer.parseInt(encoding.substring(i, i+1))-1;
int copy = bytes[i];
bytes[i] = bytes[index];
bytes[index] = copy;
}
}
... you swap some bytes multiple times because you operate in-place. To understand what happens consider the following minimal example where we want to swap two bytes using order = "21". We inspect the variables before/after each iteration of the for loop.
The original input is bytes = {x, y} and order = "21"
We moved bytes[0] to bytes[1]. Now we have bytes = {y, x}.
But we are not finished yet. The loop continues and moves bytes[1] to bytes[0]. You assumed that bytes[1] would still be y at this point. However, because of the previous iteration this entry now holds x instead. Therefore, the result is bytes = {x, y}.
Here nothing changed, but for more entries you might also end up with something that is neither the original order nor the expected output order.
The easiest way to fix this is to write the result into a new array:
static int[] shuffleBytes(int[] bytes, String encoding) {
int[] result = new int[bytes.length];
for (int i = 0; i < 4; i++) {
int index = Integer.parseInt(encoding.substring(i, i+1))-1;
result[index] = bytes[i];
}
return result; // also adapt main() to use this return value
}
Alternative Solution
Even though you could fix your solution as described above I'm not too happy with it. Therefore, I propose this alternative solution which is cleaner, shorter, and more efficient.
import java.util.Scanner;
public class Encoding {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int input = sc.nextInt();
System.out.format("input = 0x%08x = %1$d%n", input);
String newOrder = sc.next();
int output = reorder(input, newOrder);
System.out.format("output = 0x%08x = %1$d%n", output);
}
/** #param newOrder permutation of "1234" */
static int reorder(int input, String newOrder) {
int output = 0;
for (char byte1Based : newOrder.toCharArray()) {
output <<= 8;
int shift = (byte1Based - '1') * 8;
output |= ((0xFF << shift) & input) >> shift;
}
return output;
}
}

java - how to create and manipulate a bit array with length of 10 million bits

I just came across a problem; it was easy to solve in pseudo code, but when I started coding it in java; I started to realize I didn't know where to start...
Here is what I need to do:
I need a bit array of size 10 million (bits) (let's call it A).
I need to be able to set the elements in this array to 1 or 0 (A[99000]=1).
I need to iterate through the 10 million elements.
The "proper" way in Java is to use the already-existing BitSet class pointed out by Hunter McMillen. If you're figuring out how a large bit-array is managed purely for the purpose of thinking through an interesting problem, then calculating the position of a bit in an array of bytes is just basic modular arithmetic.
public class BitArray {
private static final int ALL_ONES = 0xFFFFFFFF;
private static final int WORD_SIZE = 32;
private int bits[] = null;
public BitArray(int size) {
bits = new int[size / WORD_SIZE + (size % WORD_SIZE == 0 ? 0 : 1)];
}
public boolean getBit(int pos) {
return (bits[pos / WORD_SIZE] & (1 << (pos % WORD_SIZE))) != 0;
}
public void setBit(int pos, boolean b) {
int word = bits[pos / WORD_SIZE];
int posBit = 1 << (pos % WORD_SIZE);
if (b) {
word |= posBit;
} else {
word &= (ALL_ONES - posBit);
}
bits[pos / WORD_SIZE] = word;
}
}
Use BitSet (as Hunter McMillen already pointed out in a comment). You can easily get and set bits. To iterate just use a normal for loop.
Here is a more optimized implementation of phatfingers 'BitArray'
class BitArray {
private static final int MASK = 63;
private final long len;
private long bits[] = null;
public BitArray(long size) {
if ((((size-1)>>6) + 1) > 2147483647) {
throw new IllegalArgumentException(
"Field size to large, max size = 137438953408");
}else if (size < 1) {
throw new IllegalArgumentException(
"Field size to small, min size = 1");
}
len = size;
bits = new long[(int) (((size-1)>>6) + 1)];
}
public boolean getBit(long pos) {
return (bits[(int)(pos>>6)] & (1L << (pos&MASK))) != 0;
}
public void setBit(long pos, boolean b) {
if (getBit(pos) != b) { bits[(int)(pos>>6)] ^= (1L << (pos&MASK)); }
}
public long getLength() {
return len;
}
}
Since we use fields of 64 we extend the maximum size to 137438953408-bits which is roughly what fits in 16GB of ram. Additionally we use masks and bit shifts instead of division and modulo operations the reducing the computation time. The improvement is quite substantial.
byte[] A = new byte[10000000];
A[99000] = 1;
for(int i = 0; i < A.length; i++) {
//do something
}
If you really want bits, you can use boolean and let true = 1, and false = 0.
boolean[] A = new boolean[10000000];
//etc

Java BitSets writing to file [duplicate]

This question already has answers here:
Is it possible to read/write bits from a file using JAVA?
(9 answers)
Closed 9 years ago.
I am working on Huffman Compression algorithm. I have the code for each character. For example
f=1100
d=111
e=1101
b=101
c=100
a=0
Now in order to achieve compression I need to write the codes as bits to a binary file. I am right now able to wrote them as bytes, which is doing nothing but increasing the size of compressed file. How do I write the codes as bits to binary file in Java?
Well if you have the text "fdebcafdbca" you would need to write that
as the bits:
110011111011011000110011111011011000
Separated and padded:
11001111 10110110 00110011 11101101 10000000 //4 bits of padding here
In hexadecimal:
CF B6 33 ED 80
So you'd write the byte array of 0xCF 0xB6 0x33 0xED 0x80 to a file. That's 5 bytes = 40 bits, 4 wasted
bits. The text originally takes 12 bytes, so not much saving as you need to store the tree as well. You cannot avoid using padding if they don't align to a byte boundary.
Although not recommended at all, if you have a string then you could do this:
public class BitWriter {
private byte nthBit = 0;
private int index = 0;
private byte[] data;
public BitWriter( int nBits ) {
this.data = new byte[(int)Math.ceil(nBits / 8.0)];
}
public void writeBit(boolean bit) {
if( nthBit >= 8) {
nthBit = 0;
index++;
if( index >= data.length) {
throw new IndexOutOfBoundsException();
}
}
byte b = data[index];
int mask = (1 << (7 - nthBit));
if( bit ) {
b = (byte)(b | mask);
}
data[index] = b;
nthBit++;
}
public byte[] toArray() {
byte[] ret = new byte[data.length];
System.arraycopy(data, 0, ret, 0, data.length);
return ret;
}
public static void main( String... args ) {
BitWriter bw = new BitWriter(6);
String strbits = "101010";
for( int i = 0; i < strbits.length(); i++) {
bw.writeBit( strbits.charAt(i) == '1');
}
byte[] b = bw.toArray();
for( byte a : b ) {
System.out.format("%02X", a);
//A8 == 10101000
}
}
}

Write "compressed" Array to increase IO performance?

I have an int and float array each of length 220 million (fixed). Now, I want to store/upload those arrays to/from memory and disk. Currently, I am using Java NIO's FileChannel and MappedByteBuffer to solve this. It works fine, but it takes near about 5 seconds (Wall Clock Time) for storing/uploading array to/from memory to disk. Now, I want to make it faster.
Here, I should mention most of those array elements are 0 ( nearly 52 %).
like:
int arr1 [] = { 0 , 0 , 6 , 7 , 1, 0 , 0 ...}
Can anybody help me, is there any nice way to improve speed by not storing or loading those 0's. This can compensated by using Arrays.fill (array , 0).
The following approach requires n / 8 + nz * 4 bytes on disk, where n is the size of the array, and nz the number of non-zero entries. For 52% zero entries, you'd reduce storage size by 52% - 3% = 49%.
You could do:
void write(int[] array) {
BitSet zeroes = new BitSet();
for (int i = 0; i < array.length; i++)
zeroes.set(i, array[i] == 0);
write(zeroes); // one bit per index
for (int i = 0; i < array.length; i++)
if (array[i] != 0)
write(array[y]);
}
int[] read() {
BitSet zeroes = readBitSet();
array = new int[zeroes.length];
for (int i = 0; i < zeroes.length; i++) {
if (zeroes.get(i)) {
// nothing to do (array[i] was initialized to 0)
} else {
array[i] = readInt();
}
}
}
Edit: That you say this is slightly slower implies that the disk is not the bottleneck. You could tune the above approach by writing the bitset as you construct it, so you don't have to write the bitset to memory before writing it to disk. Also, by writing the bitset word by word interspersed with the actual data we can do only a single pass over the array, reducing cache misses:
void write(int[] array) {
writeInt(array.length);
int ni;
for (int i = 0; i < array.length; i = ni) {
ni = i + 32;
int zeroesMap = 0;
for (j = i + 31; j >= i; j--) {
zeroesMap <<= 1;
if (array[j] == 0) {
zeroesMap |= 1;
}
}
writeInt(zeroesMap);
for (j = i; j < ni; j++)
if (array[j] != 0) {
writeInt(array[j]);
}
}
}
}
int[] read() {
int[] array = new int[readInt()];
int ni;
for (int i = 0; i < array.length; i = ni) {
ni = i + 32;
zeroesMap = readInt();
for (j = i; j < ni; j++) {
if (zeroesMap & 1 == 1) {
// nothing to do (array[i] was initialized to 0)
} else {
array[j] = readInt();
}
zeroesMap >>= 1;
}
}
return array;
}
(The preceeding code assumes array.length is a multiple of 32. If not, write the last slice of the array in whatever way you like)
If that doesn't reduce proceccing time either, compression is not the way to go (I don't think any general purpose compression algorithm will be faster than the above).
Depending upon the distribution, consider Run-length Encoding:
Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs.
It is simple ... which is good, and possibly bad, here ;-)
In case you are willing to write the serialization-desirialization code yourself, instead of storing all the zeroes you can store a series of ranges that indicate where those zeros are(with a special marker), together with the actual non-zero data.
So the array in your example: { 0 , 0 , 6 , 7 , 1, 0 , 0 ...}
can be stored as:
%0-1, 6, 7, 1 %5-6
when reading this data, if you hit % it means you have a range in from of you, you read the start and the end and fill an zeroes. Then you go on and see a non #, this means you hit an actual value.
In a sparse array that has large sequences of consecutive values this will yield great compression.
There is a standard compression utils in java: java.util.zip - it's general purpose library but due to sheer availability is an ok solution. Specialized compressions, encoding should be researched, if need arises and I rarely recommend zip as the soultion of choise.
Here is a sample how to handle zip via Deflater/Inflater.
Most people know ZipInput/Output Stream (and esp. Gzip). All of them have downsdes in handling the copy from mem->zlib and esp. GZip which is a total disaster as having CRC32 calling the native code (calling native code removes the ability to optimize and introduces some more performance hits).
Few important notes: do not boost zip compression high, that will kill any performance whatsoever - of course one can experiment and fit their best ratio between CPU and disk activity.
The code also demonstrates one of the real shortcomings of java.util.zip - it doesn't support direct buffers. The support is beyond trivial, yet no one bother to do it. Direct buffers will save few memory copies and reduces the memory footprint.
Last note: there is java version of (j)zlib and it beats the native impl. on compression quite nicely.
package t1;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.Random;
import java.util.zip.DataFormatException;
import java.util.zip.Deflater;
import java.util.zip.Inflater;
public class ZInt {
private static final int bucketSize = 1<<17;//in real world should not be const, but we bored horribly
static final int zipLevel = 2;//feel free to experiement, higher compression (5+)is likely to be total waste
static void write(int[] a, File file, boolean sync) throws IOException{
byte[] bucket = new byte[Math.min(bucketSize, Math.max(1<<13, Integer.highestOneBit(a.length >>3)))];//128KB bucket
byte[] zipOut = new byte[bucket.length];
final FileOutputStream fout = new FileOutputStream(file);
FileChannel channel = fout.getChannel();
try{
ByteBuffer buf = ByteBuffer.wrap(bucket);
//unfortunately java.util.zip doesn't support Direct Buffer - that would be the perfect fit
ByteBuffer out = ByteBuffer.wrap(zipOut);
out.putInt(a.length);//write length aka header
if (a.length==0){
doWrite(channel, out, 0);
return;
}
Deflater deflater = new Deflater(zipLevel, false);
try{
for (int i=0;i<a.length;){
i = put(a, buf, i);
buf.flip();
deflater.setInput(bucket, buf.position(), buf.limit());
if (i==a.length)
deflater.finish();
//hacking and using bucket here is tempting since it's copied twice but well
for (int n; (n= deflater.deflate(zipOut, out.position(), out.remaining()))>0;){
doWrite(channel, out, n);
}
buf.clear();
}
}finally{
deflater.end();
}
}finally{
if (sync)
fout.getFD().sync();
channel.close();
}
}
static int[] read(File file) throws IOException, DataFormatException{
FileChannel channel = new FileInputStream(file).getChannel();
try{
byte[] in = new byte[(int)Math.min(bucketSize, channel.size())];
ByteBuffer buf = ByteBuffer.wrap(in);
channel.read(buf);
buf.flip();
int[] a = new int[buf.getInt()];
if (a.length==0)
return a;
int i=0;
byte[] inflated = new byte[Math.min(1<<17, a.length*4)];
ByteBuffer intBuffer = ByteBuffer.wrap(inflated);
Inflater inflater = new Inflater(false);
try{
do{
if (!buf.hasRemaining()){
buf.clear();
channel.read(buf);
buf.flip();
}
inflater.setInput(in, buf.position(), buf.remaining());
buf.position(buf.position()+buf.remaining());//simulate all read
for (;;){
int n = inflater.inflate(inflated,intBuffer.position(), intBuffer.remaining());
if (n==0)
break;
intBuffer.position(intBuffer.position()+n).flip();
for (;intBuffer.remaining()>3 && i<a.length;i++){//need at least 4 bytes to form an int
a[i] = intBuffer.getInt();
}
intBuffer.compact();
}
}while (channel.position()<channel.size() && i<a.length);
}finally{
inflater.end();
}
// System.out.printf("read ints: %d - channel.position:%d %n", i, channel.position());
return a;
}finally{
channel.close();
}
}
private static void doWrite(FileChannel channel, ByteBuffer out, int n) throws IOException {
out.position(out.position()+n).flip();
while (out.hasRemaining())
channel.write(out);
out.clear();
}
private static int put(int[] a, ByteBuffer buf, int i) {
for (;buf.hasRemaining() && i<a.length;){
buf.putInt(a[i++]);
}
return i;
}
private static int[] generateRandom(int len){
Random r = new Random(17);
int[] n = new int[len];
for (int i=0;i<len;i++){
n[i]= r.nextBoolean()?0: r.nextInt(1<<23);//limit bounds to have any sensible compression
}
return n;
}
public static void main(String[] args) throws Throwable{
File file = new File("xxx.xxx");
int[] n = generateRandom(3000000); //{0,2,4,1,2,3};
long start = System.nanoTime();
write(n, file, false);
long elapsed = System.nanoTime() - start;//elapsed will be fairer if the sync is true
System.out.printf("File length: %d, for %d ints, ratio %.2f in %.2fms %n", file.length(), n.length, ((double)file.length())/4/n.length, java.math.BigDecimal.valueOf(elapsed, 6) );
int[] m = read(file);
//compare, Arrays.equals doesn't return position, so it sucks/kinda
for (int i=0; i<n.length; i++){
if (m[i]!=n[i]){
System.err.printf("Failed at %d%n",i);
break;
}
}
System.out.printf("All done!");
};
}
Please note, the code is not a proper benchmark!
The delayed replies comes from the fact it was quite boring to code, yet another zip example, sorry

Categories

Resources