The standard api does not include an AtomicBitSet implementation. I could roll my own on top of AtomicIntegerArray, but would prefer not too.
Is anyone aware of an existing implementation released under a licence compatible with Apache 2? I require only basic operations to set and check bits.
Edit:
The code is both performance and memory critical so I'd like to avoid synchronization or an integer per flag if possible.
I would use an AtomicIntegerArray and I would use 32 flags per integer which would give you the same density as BitSet but without needing locks for thread safety.
public class AtomicBitSet {
private final AtomicIntegerArray array;
public AtomicBitSet(int length) {
int intLength = (length + 31) >>> 5; // unsigned / 32
array = new AtomicIntegerArray(intLength);
}
public void set(long n) {
int bit = 1 << n;
int idx = (int) (n >>> 5);
while (true) {
int num = array.get(idx);
int num2 = num | bit;
if (num == num2 || array.compareAndSet(idx, num, num2))
return;
}
}
public boolean get(long n) {
int bit = 1 << n;
int idx = (int) (n >>> 5);
int num = array.get(idx);
return (num & bit) != 0;
}
}
Look at http://www.javamex.com/tutorials/synchronization_concurrency_7_atomic_integer_long.shtml
Not exactly using BitSet, but AtomicInteger.
Related
I have a peculiar problem for which I am looking for an efficient solution. I have a byte array which contains the most significant n bytes of an unsigned 4 byte integer (most sig byte first). The value of the remaining bytes (if any) are unknown. I need to check whether the partially known integer value could fall within a certain range (+ or - x) of a known integer. It's also valid for the integer represented by the byte array under test to wrap around.
I have a solution which works (below). The problem is that this solution performs way more comparisons than I believe is necessary and a whole load of comparisons will be duplicated in the scenario in which least sig bytes are unknown. I'm pretty sure it can be done more efficiently but can't figure out how. The scenario in which least significant bytes are unknown is an edge case so I might be able to live with it but it forms part of a system which needs to have low latency so if anyone could help with this that would be great.
Thanks in advance.
static final int BYTES_IN_INT = 4;
static final int BYTE_SHIFT = 010;
// partial integer byte array length guaranteed to be 1-4 so no checking necessary
static boolean checkPartialIntegerInRange(byte[] partialIntegerBytes, int expectedValue, int range)
{
boolean inRange = false;
if(partialIntegerBytes.length == BYTES_IN_INT)
{
// normal scenario, all bytes known
inRange = Math.abs(ByteBuffer.wrap(partialIntegerBytes).getInt() - expectedValue) <= range;
}
else
{
// we don't know least significant bytes, could have any value
// need to check if partially known int could lie in the range
int partialInteger = 0;
int mask = 0;
// build partial int and mask
for (int i = 0; i < partialIntegerBytes.length; i++)
{
int shift = ((BYTES_IN_INT - 1) - i) * BYTE_SHIFT;
// shift bytes to correct position
partialInteger |= (partialIntegerBytes[i] << shift);
// build up mask to mask off expected value for comparison
mask |= (0xFF << shift);
}
// check partial int falls in range
for (int i = -(range); i <= range; i++)
{
if (partialInteger == ((expectedValue + i) & mask))
{
inRange = true;
break;
}
}
}
return inRange;
}
EDIT: Thanks to the contributors below. Here is my new solution. Comments welcome.
static final int BYTES_IN_INT = 4;
static final int BYTE_SHIFT = 010;
static final int UBYTE_MASK = 0xFF;
static final long UINT_MASK = 0xFFFFFFFFl;
public static boolean checkPartialIntegerInRange(byte[] partialIntegerBytes, int expectedValue, int range)
{
boolean inRange;
if(partialIntegerBytes.length == BYTES_IN_INT)
{
inRange = Math.abs(ByteBuffer.wrap(partialIntegerBytes).getInt() - expectedValue) <= range;
}
else
{
int partialIntegerMin = 0;
int partialIntegerMax = 0;
for(int i=0; i < BYTES_IN_INT; i++)
{
int shift = ((BYTES_IN_INT - 1) - i) * BYTE_SHIFT;
if(i < partialIntegerBytes.length)
{
partialIntegerMin |= (((partialIntegerBytes[i] & UBYTE_MASK) << shift));
partialIntegerMax = partialIntegerMin;
}
else
{
partialIntegerMax |=(UBYTE_MASK << shift);
}
}
long partialMinUnsigned = partialIntegerMin & UINT_MASK;
long partialMaxUnsigned = partialIntegerMax & UINT_MASK;
long rangeMinUnsigned = (expectedValue - range) & UINT_MASK;
long rangeMaxUnsigned = (expectedValue + range) & UINT_MASK;
if(rangeMinUnsigned <= rangeMaxUnsigned)
{
inRange = partialMinUnsigned <= rangeMaxUnsigned && partialMaxUnsigned >= rangeMinUnsigned;
}
else
{
inRange = partialMinUnsigned <= rangeMaxUnsigned || partialMaxUnsigned >= rangeMinUnsigned;
}
}
return inRange;
}
Suppose you have one clockwise interval (x, y) and one normal interval (low, high) (each including their endpoints), determining whether they intersect can be done as (not tested):
if (x <= y) {
// (x, y) is a normal interval, use normal interval intersect
return low <= y && high >= x;
}
else {
// (x, y) wraps
return low <= y || high >= x;
}
To compare as unsigned integers, you can use longs (cast up with x & 0xffffffffL to counteract sign-extension) or Integer.compareUnsigned (in newer versions of Java) or, if you prefer you can add/subtract/xor both operands with Integer.MIN_VALUE.
Convert your unsigned bytes to an integer. Right-shift by 32-n (so your meaningful bytes are the min bytes). Right-shift your min/max integers by the same amount. If your shifted test value is equal to either shifted integer, it might be in the range. If it's between them, it's definitely in the range.
Presumably the sign bit on your integers is always zero (if not, just forcibly convert the negative to zero, since your test value can't be negative). But because that's only one bit, unless you were given all 32 bits as n, that shouldn't matter (it's not much of a problem in that special case).
I just came across a problem; it was easy to solve in pseudo code, but when I started coding it in java; I started to realize I didn't know where to start...
Here is what I need to do:
I need a bit array of size 10 million (bits) (let's call it A).
I need to be able to set the elements in this array to 1 or 0 (A[99000]=1).
I need to iterate through the 10 million elements.
The "proper" way in Java is to use the already-existing BitSet class pointed out by Hunter McMillen. If you're figuring out how a large bit-array is managed purely for the purpose of thinking through an interesting problem, then calculating the position of a bit in an array of bytes is just basic modular arithmetic.
public class BitArray {
private static final int ALL_ONES = 0xFFFFFFFF;
private static final int WORD_SIZE = 32;
private int bits[] = null;
public BitArray(int size) {
bits = new int[size / WORD_SIZE + (size % WORD_SIZE == 0 ? 0 : 1)];
}
public boolean getBit(int pos) {
return (bits[pos / WORD_SIZE] & (1 << (pos % WORD_SIZE))) != 0;
}
public void setBit(int pos, boolean b) {
int word = bits[pos / WORD_SIZE];
int posBit = 1 << (pos % WORD_SIZE);
if (b) {
word |= posBit;
} else {
word &= (ALL_ONES - posBit);
}
bits[pos / WORD_SIZE] = word;
}
}
Use BitSet (as Hunter McMillen already pointed out in a comment). You can easily get and set bits. To iterate just use a normal for loop.
Here is a more optimized implementation of phatfingers 'BitArray'
class BitArray {
private static final int MASK = 63;
private final long len;
private long bits[] = null;
public BitArray(long size) {
if ((((size-1)>>6) + 1) > 2147483647) {
throw new IllegalArgumentException(
"Field size to large, max size = 137438953408");
}else if (size < 1) {
throw new IllegalArgumentException(
"Field size to small, min size = 1");
}
len = size;
bits = new long[(int) (((size-1)>>6) + 1)];
}
public boolean getBit(long pos) {
return (bits[(int)(pos>>6)] & (1L << (pos&MASK))) != 0;
}
public void setBit(long pos, boolean b) {
if (getBit(pos) != b) { bits[(int)(pos>>6)] ^= (1L << (pos&MASK)); }
}
public long getLength() {
return len;
}
}
Since we use fields of 64 we extend the maximum size to 137438953408-bits which is roughly what fits in 16GB of ram. Additionally we use masks and bit shifts instead of division and modulo operations the reducing the computation time. The improvement is quite substantial.
byte[] A = new byte[10000000];
A[99000] = 1;
for(int i = 0; i < A.length; i++) {
//do something
}
If you really want bits, you can use boolean and let true = 1, and false = 0.
boolean[] A = new boolean[10000000];
//etc
I need a quick hash function for integers:
int hash(int n) { return ...; }
Is there something that exists already in Java?
The minimal properties that I need are:
hash(n) & 1 does not appear periodic when used with a bunch of consecutive values of n.
hash(n) & 1 is approximately equally likely to be 0 or 1.
HashMap, as well as Guava's hash-based utilities, use the following method on hashCode() results to improve bit distributions and defend against weaker hash functions:
/*
* This method was written by Doug Lea with assistance from members of JCP
* JSR-166 Expert Group and released to the public domain, as explained at
* http://creativecommons.org/licenses/publicdomain
*
* As of 2010/06/11, this method is identical to the (package private) hash
* method in OpenJDK 7's java.util.HashMap class.
*/
static int smear(int hashCode) {
hashCode ^= (hashCode >>> 20) ^ (hashCode >>> 12);
return hashCode ^ (hashCode >>> 7) ^ (hashCode >>> 4);
}
So, I read this question, thought hmm this is a pretty math-y question, it's probably out of my league. Then, I ended up spending so much time thinking about it that I actually believe I've got the answer: No function can satisfy the criteria that f(n) & 1 is non-periodic for consecutive values of n.
Hopefully someone will tell me how ridiculous my reasoning is, but until then I believe it's correct.
Here goes: Any binary integer n can be represented as either 1...0 or 1...1, and only the least significant bit of that bitmap will affect the result of n & 1. Further, the next consecutive integer n + 1 will always contain the opposite least significant bit. So, clearly any series of consecutive integers will exhibit a period of 2 when passed to the function n & 1. So then, is there any function f(n) that will sufficiently distribute the series of consecutive integers such that periodicity is eliminated?
Any function f(n) = n + c fails, as c must end in either 0 or 1, so the LSB will either flip or stay the same depending on the constant chosen.
The above also eliminates subtraction for all trivial cases, but I have not taken the time to analyze the carry behavior yet, so there may be a crack here.
Any function f(n) = c*n fails, as the LSB will always be 0 if c ends in 0 and always be equal to the LSB of n if c ends in 1.
Any function f(n) = n^c fails, by similar reasoning. A power function would always have the same LSB as n.
Any function f(n) = c^n fails, for the same reason.
Division and modulus were a bit less intuitive to me, but basically, the LSB of either option ends up being determined by a subtraction (already ruled out). The modulus will also obviously have a period equal to the divisor.
Unfortunately, I don't have the rigor necessary to prove this, but I believe any combination of the above operations will ultimately fail as well. This leads me to believe that we can rule out any transcendental function, because these are implemented with polynomials (Taylor series? not a terminology guy).
Finally, I held out hope on the train ride home that counting the bits would work; however, this is actually a periodic function as well. The way I thought about it was, imagine taking the sum of the digits of any decimal number. That sum obviously would run from 0 through 9, then drop to 1, run from 1 to 10, then drop to 2... It has a period, the range just keeps shifting higher the higher we count. We can actually do the same thing for the sum of the binary digits, in which case we get something like: 0,1,1,2,2,....5,5,6,6,7,7,8,8....
Did I leave anything out?
TL;DR I don't think your question has an answer.
[SO decided to convert my "trivial answer" to comment. Trying to add little text to it to see if it can be fooled]
Unless you need the ranger of hashing function to be wider..
The NumberOfSetBits function seems to vary quite a lot more then the hashCode, and as such seems more appropriate for your needs. Turns out there is already a fairly efficient algorithm on SO.
See Best algorithm to count the number of set bits in a 32-bit integer.
I did some experimentation (see test program below); computation of 2^n in Galois fields, and floor(A*sin(n)) both did very well to produce a sequence of "random" bits. I tried multiplicative congruential random number generators and some algebra and CRC (which is analogous of k*n in Galois fields), none of which did well.
The floor(A*sin(n)) approach is the simplest and quickest; the 2^n calculation in GF32 takes approx 64 multiplies and 1024 XORs worstcase, but the periodicity of output bits is extremely well-understood in the context of linear-feedback shift registers.
package com.example.math;
public class QuickHash {
interface Hasher
{
public int hash(int n);
}
static class MultiplicativeHasher1 implements Hasher
{
/* multiplicative random number generator
* from L'Ecuyer is x[n+1] = 1223106847 x[n] mod (2^32-5)
* http://dimsboiv.uqac.ca/Cours/C2012/8INF802_Hiv12/ref/paper/RNG/TableLecuyer.pdf
*/
final static long a = 1223106847L;
final static long m = (1L << 32)-5;
/*
* iterative step towards computing mod m
* (j*(2^32)+k) mod (2^32-5)
* = (j*(2^32-5)+j*5+k) mod (2^32-5)
* = (j*5+k) mod (2^32-5)
* repeat twice to get a number between 0 and 2^31+24
*/
private long quickmod(long x)
{
long j = x >>> 32;
long k = x & 0xffffffffL;
return j*5+k;
}
// treat n as unsigned before computation
#Override public int hash(int n) {
long h = a*(n&0xffffffffL);
long h2 = quickmod(quickmod(h));
return (int) (h2 >= m ? (h2-m) : h2);
}
#Override public String toString() { return getClass().getSimpleName(); }
}
/**
* computes (2^n) mod P where P is the polynomial in GF2
* with coefficients 2^(k+1) represented by the bits k=31:0 in "poly";
* coefficient 2^0 is always 1
*/
static class GF32Hasher implements Hasher
{
static final public GF32Hasher CRC32 = new GF32Hasher(0x82608EDB, 32);
final private int poly;
final private int ofs;
public GF32Hasher(int poly, int ofs) {
this.ofs = ofs;
this.poly = poly;
}
static private long uint(int x) { return x&0xffffffffL; }
// modulo GF2 via repeated subtraction
int mod(long n) {
long rem = n;
long q = uint(this.poly);
q = (q << 32) | (1L << 31);
long bitmask = 1L << 63;
for (int i = 0; i < 32; ++i, bitmask >>>= 1, q >>>= 1)
{
if ((rem & bitmask) != 0)
rem ^= q;
}
return (int) rem;
}
int mul(int x, int y)
{
return mod(uint(x)*uint(y));
}
int pow2(int n) {
// compute 2^n mod P using repeated squaring
int y = 1;
int x = 2;
while (n > 0)
{
if ((n&1) != 0)
y = mul(y,x);
x = mul(x,x);
n = n >>> 1;
}
return y;
}
#Override public int hash(int n) {
return pow2(n+this.ofs);
}
#Override public String toString() {
return String.format("GF32[%08x, ofs=%d]", this.poly, this.ofs);
}
}
static class QuickHasher implements Hasher
{
#Override public int hash(int n) {
return (int) ((131111L*n)^n^(1973*n)%7919);
}
#Override public String toString() { return getClass().getSimpleName(); }
}
// adapted from http://www.w3.org/TR/PNG-CRCAppendix.html
static class CRC32TableHasher implements Hasher
{
final private int table[];
static final private int polyval = 0xedb88320;
public CRC32TableHasher()
{
this.table = make_table();
}
/* Make the table for a fast CRC. */
static public int[] make_table()
{
int[] table = new int[256];
int c;
int n, k;
for (n = 0; n < 256; n++) {
c = n;
for (k = 0; k < 8; k++) {
if ((c & 1) != 0)
c = polyval ^ (c >>> 1);
else
c = c >>> 1;
}
table[n] = (int) c;
}
return table;
}
public int iterate(int state, int i)
{
return this.table[(state ^ i) & 0xff] ^ (state >>> 8);
}
#Override public int hash(int n) {
int h = -1;
h = iterate(h, n >>> 24);
h = iterate(h, n >>> 16);
h = iterate(h, n >>> 8);
h = iterate(h, n);
return h ^ -1;
}
#Override public String toString() { return getClass().getSimpleName(); }
}
static class TrigHasher implements Hasher
{
#Override public String toString() { return getClass().getSimpleName(); }
#Override public int hash(int n) {
double s = Math.sin(n);
return (int) Math.floor((1<<31)*s);
}
}
private static void test(Hasher hasher) {
System.out.println(hasher+":");
for (int i = 0; i < 64; ++i)
{
int h = hasher.hash(i);
System.out.println(String.format("%08x -> %08x %%2 = %d",
i,h,(h&1)));
}
for (int i = 0; i < 256; ++i)
{
System.out.print(hasher.hash(i) & 1);
}
System.out.println();
analyzeBits(hasher);
}
private static void analyzeBits(Hasher hasher) {
final int N = 65536;
final int maxrunlength=32;
int[][] runs = {new int[maxrunlength], new int[maxrunlength]};
int[] count = new int[2];
int prev = -1;
System.out.println("Run length test of "+N+" bits");
for (int i = 0; i < maxrunlength; ++i)
{
runs[0][i] = 0;
runs[1][i] = 0;
}
int runlength_minus1 = 0;
for (int i = 0; i < N; ++i)
{
int b = hasher.hash(i) & 0x1;
count[b]++;
if (b == prev)
++runlength_minus1;
else if (i > 0)
{
++runs[prev][runlength_minus1];
runlength_minus1 = 0;
}
prev = b;
}
++runs[prev][runlength_minus1];
System.out.println(String.format("%d zeros, %d ones", count[0], count[1]));
for (int i = 0; i < maxrunlength; ++i)
{
System.out.println(String.format("%d runs of %d zeros, %d runs of %d ones", runs[0][i], i+1, runs[1][i], i+1));
}
}
public static void main(String[] args) {
Hasher[] hashers = {
new MultiplicativeHasher1(),
GF32Hasher.CRC32,
new QuickHasher(),
new CRC32TableHasher(),
new TrigHasher()
};
for (Hasher hasher : hashers)
{
test(hasher);
}
}
}
The simplest hash for int value is the int value.
See Java Integer class
public int hashCode()
public static int hashCode(int value)
Returns:
a hash code value for this object, equal to the primitive int value represented by this Integer object.
If I have an integer in Java how do I count how many bits are zero except for leading zeros?
We know that integers in Java have 32 bits but counting the number of set bits in the number and then subtracting from 32 does not give me what I want because this will also include the leading zeros.
As an example, the number 5 has one zero bit because in binary it is 101.
Take a look at the API documentation of Integer:
32 - Integer.numberOfLeadingZeros(n) - Integer.bitCount(n)
To count non-leading zeros in Java you can use this algorithm:
public static int countNonleadingZeroBits(int i)
{
int result = 0;
while (i != 0)
{
if (i & 1 == 0)
{
result += 1;
}
i >>>= 1;
}
return result;
}
This algorithm will be reasonably fast if your inputs are typically small, but if your input is typically a larger number it may be faster to use a variation on one of the bit hack algorithms on this page.
Count the total number of "bits" in your number, and then subtract the number of ones from the total number of bits.
This what I would have done.
public static int countBitsSet(int num) {
int count = num & 1; // start with the first bit.
while((num >>>= 1) != 0) // shift the bits and check there are some left.
count += num & 1; // count the next bit if its there.
return count;
}
public static int countBitsNotSet(int num) {
return 32 - countBitsSet(num);
}
Using some built-in functions:
public static int zeroBits(int i)
{
if (i == 0) {
return 0;
}
else {
int highestBit = (int) (Math.log10(Integer.highestOneBit(i)) /
Math.log10(2)) + 1;
return highestBit - Integer.bitCount(i);
}
}
Since evaluation order in Java is defined, we can do this:
public static int countZero(int n) {
for (int i=1,t=0 ;; i<<=1) {
if (n==0) return t;
if (n==(n&=~i)) t++;
}
}
Note that this relies on the LHS of an equality being evaluated first; try the same thing in C or C++ and the compiler is free to make you look foolish by setting your printer on fire.
What's the best way to unstub the following functions?
// Convert a bit-vector to an integer.
int bitvec2int(boolean[] b)
{
[CODE HERE]
}
// Convert an integer x to an n-element bit-vector.
boolean[] int2bitvec(int x, int n)
{
[CODE HERE]
}
Or is there a better way to do that sort of thing than passing boolean arrays around?
This comes up in an Android app where we need an array of 20 booleans to persist and the easiest way to do that is to write an integer or string to the key-value store.
I'll post the way we (Bee and I) wrote the above as an answer. Thanks!
Use java.util.BitSet instead. It'd be much faster than dealing with boolean[].
Also, you should really ask yourself if these 20 boolean really should be enum, in which case you can use EnumSet, which is the Java solution to the bit fields technique from C (see: Effective Java 2nd Edition: Use EnumSet instead of bit fields).
BitSet to/from int conversion
You might as well just use BitSet and drop the int, but just in case you need these:
static BitSet toBitSet(int i) {
BitSet bs = new BitSet(Integer.SIZE);
for (int k = 0; k < Integer.SIZE; k++) {
if ((i & (1 << k)) != 0) {
bs.set(k);
}
}
return bs;
}
static int toInt(BitSet bs) {
int i = 0;
for (int pos = -1; (pos = bs.nextSetBit(pos+1)) != -1; ) {
i |= (1 << pos);
}
return i;
}
Two different techniques were deliberately used for instructional purposes. For robustness, the BitSet to int conversion should ensure that 32 bits is enough.
EnumSet example
This example is based on the one given in the book:
import java.util.*;
public enum Style {
BOLD, ITALIC, UNDERLINE, STRIKETHROUGH;
public static void main(String[] args) {
Set<Style> s1 = EnumSet.of(BOLD, UNDERLINE);
System.out.println(s1); // prints "[BOLD, UNDERLINE]"
s1.addAll(EnumSet.of(ITALIC, UNDERLINE));
System.out.println(s1.contains(ITALIC)); // prints "true"
}
}
From the API:
This representation is extremely compact and efficient. The space and time performance of this class should be good enough to allow its use as a high-quality, typesafe alternative to traditional int-based "bit flags."
// Convert a big-endian bit-vector to an integer.
int bitvec2int(boolean[] b)
{
int x = 0;
for(boolean i : b) x = x << 1 | (i?1:0);
return x;
}
// Convert an integer x to an n-element big-endian bit-vector.
boolean[] int2bitvec(int x, int n)
{
boolean[] b = new boolean[n];
for(int i = 0; i < n; i++) b[i] = (1 << n-i-1 & x) != 0;
return b;
}