Java - Vector of binary flags - java

If I'm being told that a specific function will return a vector of binary flags (32-bit int value), what does that mean? Can you give an example that demonstrates that?
Thanks a lot.

A thirty-two bit integer can be viewed as a vector of thirty-two ones and zeroes, and bitwise arithmetic can be used to extract individual bits.
For example, if FLAG_FOO is an constant whose value is a power of two — say, 1024 — then flag_vector & FLAG_FOO != 0 confirms that a specific bit is set. This is because & is "bitwise and"; it evaluates to an integer whose bits are one where both operands' bits are one, and zero where either operand's bits are zero. For example, binary 00100110 & binary 10000011 is 00000010. (Except that you're using thirty-two bit integers, obviously, instead of just eight.)
Conversely, "bitwise or", |, can be used to construct such a value; for example, flag_vector = FLAG_FOO | FLAG_BAR | FLAG_BAZ would have three bits (flags) set.
This is used, for example, in the java.util.regex.Pattern class, whose static compile method is overloaded to take a second argument composed of such flags. Pattern.compile("a.c", CASE_INSENSITIVE | DOTALL) creates a pattern based on the string a.c, with the "case-insensitive" and ".-represents-any-character,-even-newline" options enabled.

Related

What is the difference between `bitCount()` and `bitLength()` of a `BigInteger`

The descriptions of bitCount() and bitLength() are rather cryptic:
public int bitCount()
Returns the number of bits in the two's complement representation of this BigInteger that differ from its sign bit. This method is useful when implementing bit-vector style sets atop BigIntegers.
Returns:
number of bits in the two's complement representation of this BigInteger that differ from its sign bit.
public int bitLength()
Returns the number of bits in the minimal two's-complement representation of this BigInteger, excluding a sign bit. For positive BigIntegers, this is equivalent to the number of bits in the ordinary binary representation. (Computes (ceil(log2(this < 0 ? -this : this+1))).)
Returns:
number of bits in the minimal two's-complement representation of this BigInteger, excluding a sign bit.
What is the real difference between these two methods and when should I use which?
I have used bitCount occasionally to count the number of set bits in a positive integer but I've only rarely use bitLength and usually when I meant bitCount because the differences between the descriptions are too subtle for me to instantly grok.
Google Attractor: Java BigInteger bitCount vs bitLength
A quick demonstration:
public void test() {
BigInteger b = BigInteger.valueOf(0x12345L);
System.out.println("b = " + b.toString(2));
System.out.println("bitCount(b) = " + b.bitCount());
System.out.println("bitLength(b) = " + b.bitLength());
}
prints
b = 10010001101000101
bitCount(b) = 7
bitLength(b) = 17
So, for positive integers:
bitCount() returns the number of set bits in the number.
bitLength() returns the position of the highest set bit i.e. the length of the binary representation of the number (i.e. log2).
Another basic function is missing:
bitCount() is useful to find the cardinal of a set of integers;
bitLength() is useful to find the largest of integers that are members in this set;
getLowestSetBit() is still needed to find the smallest of integers that are members in this set (this is also needed to implement fast iterators on bitsets).
There are efficient ways to:
reduce a very large bitset to a bitCount() without having to shift each word stored in the bitset (e.g. 64-bit words) using a slow loop over each of the 64-bits. This does not require any loop and can be computed using a small bounded number of arithmetic operations on 64-bit numbers (with the additional benefit: no need to perform any test for loop conditions, parallelism is possible, less than 64 operations for 64-bit words, so the cost is in O(1) time.
compute the bitLength(): you just need the number of words used to store the bitset, or its highest used index in an array of words, and then a small arithmetic operations on the single word stored at this index: on a 64-bit word, at most 8 arithmetic operations are sufficient, so the cost is in O(1) time.
but for the bitSmallest(): you still need to perform a binary search to locate the highest "bit-splitting" position in a word (at unknown position in the lowest subset of words, that still need to be scanned as long as they are all zeroes, so parallelization is difficult and the cost is O(N) time where N is the bitLength() of the bitset) under which all bits are zeroes. And I wonder if we can avoid the costly tests-and-branches on the first non-all zero words, using only arithmetic, so that full parallelism can be used to give a reply in O(1) time for this last word only.
In my opinion the 3rd problem requires a more efficient storage for bitsets than a flat array of words: we need a representation using a binary tree instead:
Suppose you want to store 64 bits in a bitset
this set is equivalent to storing 2 subsets A and B, of 32 bits for each
but instead of naively storing {A, B} you can store {A or B, (A or B) xor A, (A or B) xor B}, where "or" and "xor" are bit-for-bit operations (this basically adding 50% of info, by not storing jsut two separate elements but their "sum" and their respective difference of this sum).
You can apply it recursively for 128 bits, 256 bits, but in fact you could as well avoid the 50% cost at each step by summing more elements. using the "xor" differences instead of elements themselves can be used to accelerate some operations (not shown here), like other compression schemes that are efficient on sparse sets.
This allows faster scanning of zeroes because you can skip very fast, in O(log2(N)) time the null bits and locate words that have some non-zero bits: they have (A or B)==0.
Another common usage of bitsets is to allow them to represent their complement, but this is not easy when the number of integers that the set could have as members if very large (e.g. to represent a set of 64-bit integers): the bitset should then reserve at least one bit to indicate that the bitsets does NOT store directly the integers that are members of the set, but instead store only the integers that are NOT members of the set.
And an efficient representation of the bitset using a tree-like structure should allow each node in the binary tree to choose if it should store the members or the non-members, depending on the cardinality of members in each subrange (each subrange will represent a subset of all integers between k and (k+2^n-1), where k is the node number in the binary tree, each node storing a single word of n bits; one of these bits storing if the word contains members or non-members).
There's an efficient way to store binary trees in a flat indexed array, if the tree is dense enough to have few words set with bits all set to 0 or all set to 1. If this is not the case (for very "sparse" sets), you need something else using pointers like a B-tree, where each page of the B-tree can be either a flat "dense" range, or an ordered index of subtrees: you'll store flat dense ranges in leaf nodes which can be allocated in a flat array, and you'll sore other nodes separately in another store that can also be an array: instead of a pointer from one node to the other for a subbranch of the btree, you use an index in that array; the index itself can have one bit indicating if you are pointing to other pages of branches, or to a leaf node.
But the current default implementation of bitsets in Java collections does not use these technics, so BitSets are still not efficient enough to store very sparse sets of large integers. You need your own library to reduce the storage requirement and still allow fast lookup in the bitset, in O(log2(N)) time, to determine if an integer is a member or not of the set of integers represented by this optimized bitset.
But anyway the default Java implementation is sufficient if you just need bitCount() and bitLength() and your bitsets are used for dense sets, for sets of small integers (for a set of 16-bit integers, a naive approach storing 64K bit, i.e. using 8KB of memory at most, is generally enough).
For very sparse sets of large integers, it will always be more efficient to just store a sorted array of integer values (e.g. not more than one bit every 128 bits), or a hashed table if the bit set would not set more than 1 bit for every range of 32 bits: you can still add an extra bit in these structures to store the "complement" bit.
But I've not found that getLowestSetBit() was efficient enough: the BigInteger package still cannot support very sparse bitsets without huge memory costs, even if BigInteger can be used easility to represent the "complement" bit just as a "sign bit" with its signum() and substract methods, which are efficient.
Very large and very sparse bitsets are needed for example for somme wellknown operations, like searches in large very databases of RDF tuples in a knowledge database, each tuple being indexed by a very large GUID (represented by 128-bit integers): you need to be able to perform binary operations like unions, differences, and complements.

Bit manipulation in Java - 2s complement and flipping bits

I was recently looking into some problems with but manipulation in Java and I came up with two questions.
1) Firstly, I came up to the problem of flipping all the bits in a number.
I found this solution:
public class Solution {
public int flipAllBits(int num) {
int mask = (1 << (int)Math.floor(Math.log(num)/Math.log(2))+1) - 1;
return num ^ mask;
}
}
But what happens when k = 32 bits? Can the 1 be shifted 33 times?
What I understand from the code (although it doesn't really make sense), the mask is 0111111.(31 1's)....1 and not 32 1's, as someone would expect. And therefore when num is a really large number this would fail.
2) Another question I had was determining when something is a bit sequence in 2s complement or just a normal bit sequence. For example I read that 1010 when flipped is 0110 which is -10 but also 6. Which one is it and how do we know?
Thanks.
1) The Math object calls are not necessary. Flipping all the bits in any ordinal type in Java (or C) is not an arithmatic operation. It is a bitwise operation. Using the '^' operator, simply using 1- as an operand will work regardless of the sizeof int in C/C++ or a Java template with with the ordinal type as a parameter T. The tilde '~' operator is the other option.
T i = 0xf0f0f0f0;
System.out.println(T.toHexString(i));
i ^= -1;
System.out.println(T.toHexString(i));
i = ~ i;
System.out.println(T.toHexString(i));
2) Since the entire range of integers maps to the entire range of integers in a 2's compliment transform, it is not possible to detect whether a number is or is not 2's complement unless one knows the range of numbers from which the 2's complement might be calculated and the two sets (before and after) are mutually exclusive.
That mask computation is fairly inscrutable, I'm going to guess that it (attempts to, since you mention it's wrong) make a mask up to and including the highest set bit. Whether that's useful for "flipping all bits" is an other possible point of discussion, since to me at least, "all bits" means all 32 of them, not some number that depends on the value. But if that's what you want then that's what you want. Especially combined with that second question, that looks like a mistake to me, so you'd be implementing the wrong thing from the start - see near the bottom.
Anyway, the mask can be generated with some reasonably nice bitmath, which does not create any doubt about possible edge cases (eg Math.log(0) is probably bad, and k=32 corresponds with negative numbers which are also probably bad to put into a log):
int m = num | (num >> 16);
m |= m >> 8;
m |= m >> 4;
m |= m >> 2;
m |= m >> 1;
return num ^ m;
Note that this function has odd properties, it almost always returns an unsigned-lower number than went in, except at 0. It flips bits so the name is not completely wrong, but flipAllBits(flipAllBits(x)) != x (usually), while the name suggests it should be an involution.
As for the second question, there is nothing to determine. Two's complement is scheme by which you can interpret a bitvector - any bitvector. So it's really a choice you make; to interpret a given bitvector that way or some other way. In Java the "default" interpretation is two's complement (eg toString will print an int by interpreting it according to its two's complement meaning), but you don't have to go along with it, you can (with care) treat int as unsigned, or as an array of booleans, or several bitfields packed together, etc.
If you wanted to invert all the bits but made the common mistake to assume that the number of bits in an int is variable (and that you therefore needed to compute a mask that covers "all bits"), I have some great news for you, because inverting all bits is a lot easier:
return ~num;
If you were reading "invert all bits" in the context of two's complement, it would have the above meaning, so all bits, including those left of the highest set bit.

How does the implementation of this bitwise operator make sense?

In an earlier question regarding how to maximize the JFrame I saw this bit of code and it worked. I took out the
name.getExtendedState()
and it still worked? What does the use of the "getter" and the OR symbol accomplish?
name.setExtendedState(name.getExtendedState()|JFrame.MAXIMIZED_BOTH);
Using name.getExtendedState()|JFrame.MAXIMIZED_BOTH means that you're adding MAXIMIZED_BOTH to the existing extended state. If you say only JFrame.MAXIMIZED_BOTH, that means you're replacing the extended state with only that bit, and throwing away anything in the current extended state.
From the API getExtendedState() :
Gets the state of this frame. The state is represented as a bitwise mask.
NORMAL
Indicates that no state bits are set.
ICONIFIED
MAXIMIZED_HORIZ
MAXIMIZED_VERT
MAXIMIZED_BOTH
Concatenates MAXIMIZED_HORIZ and MAXIMIZED_VERT.
The logical OR will combine the returned value with the value of JFrame.MAXIMIZED_BOTH
for example if NORMAL was 10110 and MAXIMIZED_BOTH was 01100, ORing the two would yeild 1110
Normal 10110
MaxBoth 01100
Result 11110
Quoted from Wikipedia: http://en.wikipedia.org/wiki/Bitwise_operation#OR
A bitwise OR takes two bit patterns of equal length and performs the
logical inclusive OR operation on each pair of corresponding bits. The
result in each position is 1 if the first bit is 1 or the second bit
is 1 or both bits are 1; otherwise, the result is 0. For example:
0101 (decimal 5)
OR 0011 (decimal 3)
= 0111 (decimal 7)
So if getExtendedState() is returning a number made up of binary flags (ie. a bit field)... ORing it (using the pipe operator | ), is simply keeping ALL the existing flags in the object's state and also setting the bit/s that correspond to the state JFrame.MAXIMIZED_BOTH.
This is because ORing sets a bit to 1 if it is 1 in either the first operand OR the second operand.
Hope that helps explain it.

java long datatype conversion to unsigned value

I'm porting some C++ code to Java code.
There is no unsigned datatype in java which can hold 64 bits.
I have a hashcode which is stored in Java's long datatype (which of course is signed).
long vp = hashcode / 38; // hashcode is of type 'long'
Since 38 here is greater than 2, the resulting number can be safely used for any other arithmetic in java.
The question is what if the signed bit in 'hashcode' is set to 1. I don't want to get a negative value in variable vp. I wanted a positive value as if the datatype is an unsigned one.
P.S: I don't want to used Biginteger for this purpose because of performance issues.
Java's primative integral types are considered signed, and there isn't really anything you can do about it. However, depending on what you need it for, this may not matter.
Since the integers are all done in two's complement, signed and unsigned are exact same at the binary level. The difference is how you interpret them, and in certain operations. Specifically, right shift, division, modulus and comparison differ. Unsigned right shifts can be done with the >>> operator. As long as you don't need one of the missing operators, you can use longs perfectly well.
If you can use third-party libraries, you can e.g. use Guava's UnsignedLongs class to treat long values as unsigned for many purposes, including division. (Disclosure: I contribute to Guava.)
Well here is how i solved this. Right shift hashcode by 1 bit(division by 2). Then divide that right shifted number by 19(which is 38/2). So essentially i divided the number by 38 exactly like how it is done in c++. I got the same value what i got in c++

why the binary representationof -127>>1 is 11000000?

I know the binary representation of -127 is 10000001 (complement).
Can any body tell me why I right shift it by 1 digit, then I get 11000000 ?
(-127) = 10000001
(-127>>1) = 11000000 ???
Thanks.
If your programming language does a sign-extending right shift (as Java does), then the left-most 1 comes from extending the sign. That is, because the top bit was set in the original number it remains set in the result for each shift (so shifting by more than 1 has all 1's in the top most bits corresponding to the number of shifts done).
This is language dependent - IIRC C and C++ sign-extend on right shift for a signed value and do not for an unsigned value. Java has a special >>> operator to shift without extending (in java all numeric primitive values are signed, including the misleadingly named byte).
Right-shifting in some languages will pad with whatever is in the most significant bit (in this case 1). This is so that the sign will not change on shifting a negative number, which would turn into a positive one if this was not in place.
-127 as a WORD (2 bytes) is 1111111110000001. If you right shift this by 1 bit, and represent it as a single byte the result is 11000000 This is probably what you are seeing.
Because, if you divide -127 (two's-complement encoded as 10000001) by 2 and round down (towards -infinity, not towards zero), you get -64 (two's-complement encoded as 11000000).
Bit-wise, the reason is: when right-shifting signed values, you do sign-extension -- rather than shifting in zeroes, you duplicate the most significant bit. When working with two's-complement signed numbers, this ensures the correct result, as described above.
Assembly languages (and the machine languages they encode) typically have separate instructions for unsigned and signed right-shift operations (also called "logical shift right" vs. "arithmetic shift right"); and compiled languages typically pick the appropriate instruction when shifting unsigned and signed values, respectively.
It's sign extending, so that a negative number right shifted is still a negative number.

Categories

Resources