Confused about the use of ptrdiff_t in C++

Confused about the use of ptrdiff_t in C++ - java

I need to translate this line of code in Java and I am
not sure what to do about ptrdiff_t. Not sure what it does here. By the way, mask_block
is of type size_t.
size_t lowest_bit = mask_block & (-(ptrdiff_t)mask_block);
Thanks

Beware! This is bit magic!
( x & ~(x-1) ) returns the lowest set bit in an expression. The author of the original code decided to use ( x & (-x) ) which is effectively the same due to the two's comlement representation of integers. But (the original author thought that) to get -x you need to use signed types and, as pointed out earlier, ptrdiff_t is signed, size_t is unsigned.
As Java does not have unsigned types, mask_block will be int and mask_block & (-mask_block) will work without any issue.
Note that due to the interoperability between signed and unsigned types, the cast is superfluous in C++ as well.

ptrdiff_t is the type that should be used for the (integer) difference between two pointers. That is, the result of subtracting one pointer from another. It is a signed integer, and should be large enough to stroe the size of largest possible array (so in Java, that would simply be an int, I'd guess)

ptrdiff_t is the name of a type, like int or ::std::string. The C++ standard promises that this type will be an integer type large enough to hold the difference between any two pointers that you can subtract. Of course, the idea of subtracting pointers is a rather foreign concept in Java. In order to be able to do this, ptrdiff_t must be able to hold negative numbers.
The sub-expression in which ptrdiff_t is used is a cast expression, like a Java typecast. However, unlike a Java typecast, C++ cast expressions are more dangerous and uglier. They can be used for all kinds of different type conversions that Java would balk at. And sometimes they will yield surprising results.
In this case, it looks like someone needed a value which was an unsigned integer of some kind (maybe an unsigned long or something) to be able to be negative. They needed to turn it into a signed value. ptrdiff_t is typically the largest size integer the platform supports. So if you're going to turn an arbitrary unsigned integer type into a signed one ptrdiff_t would be the type to use that would be least likely to result in some kind of odd truncation or sign change with C++'s rather ugly cast operation.
In particular, it looks like the type they wanted was size_t, which is another type in the C++ standard. It is an unsigned type (just like I was guessing), and is guaranteed to be an integer type that's big enough to hold the size of any possible object in memory. It's usually the same size as ptrdiff_t.
The reason the person who wrote the code wanted to do this was an interesting bit manipulation trick. To show you the trick, I'll show you how this expression plays out in a number of scenarios.
Suppose mask_block is 48. Lets say that on this hypothetical platform, size_t is 16 bits (which is very small, but this is just an example). In binary then, mask_block looks like this:
0000 0000 0011 0000
And -(ptrdiff_t)mask_block is -48, which looks like this:
1111 1111 1101 0000
So, 48 & -48 is this:
0000 0000 0001 0000
Which is 16. Notice that this is the value of the lowest set bit in 48. Lets try 50. 50 looks like this:
0000 0000 0011 0010
And -50 looks like this:
1111 1111 1100 1110
So, 50 & -50 looks like this:
0000 0000 0000 0010
Which is 2. Notice again how this is the value of the lowest set bit in 50.
So this is just a trick to find the value of the lowest set bit in mask. The fact the variable is called lowest_bit should be a clue there. :-)
Of course, this trick isn't completely portable. Some platforms that C and (maybe C++ by now) run on do not use twos complement representation, and this trick won't work on those platforms.
In Java, you can just do this long lowest_bit = mask_block & -mask_block; and get the same effect. Java guarantees twos complement integers and doesn't even have unsigned integers. So it should work just fine.

x & -x is a bit hack that clears all bits of x excluding its lowest bit.
For all non-zero values of x, it is 1 << lb, where lb is the position of the least significant bit (counting starting with 0).
Why is it casted to ptrdiff_t? Without further knowledge it is difficult to say. I'm not even sure that the cast is needed. ptrdiff_t is guaranteed to be a signed integral type and size_t is always an unsigned integral type. So, I guess that the author of the C++ code wanted to be sure that it is signed and has the same size as a pointer. It should be sufficient to port the code to Java by simply ignoring the cast, as in Java all integers are signed anyway.
The resulting code will also be more portable than the original C/C++ version, which assumes that the machine uses 2's complement to represent integers, although it is (at least in theory) not guaranteed by the C or C++ standard. In Java, however, it is guaranteed that the JVM must use 2's complement.

Related

Why is AS3 right bit shift different than the same thing in Java?

Hard to explain without some code.. so
var junk:uint = uint(4294280300);
trace(junk.toString(2)); // returns 11111111111101011000010001101100
junk = junk >> 8;
trace(junk.toString(2)); // returns 11111111111111111111010110000100
and here is the Java part
long junk = 4294280300L;
System.out.println(Long.toBinaryString(junk)); // returns 11111111111101011000010001101100
junk = junk >> 8;
System.out.println(Long.toBinaryString(junk)); // returns 111111111111010110000100
What am I doing wrong? How can I achieve the same result in Java? I have tried using >>> instead of >> but it doesn't seem to work.

I don't know ActionScript at all, but it's for sure due to differences in the internal representation of numbers.
The type uint of ActionScript seem indeed to be an unsigned integer coded on 32 bits.
Additionally, the number appear to be converted into a signed integer before the right shift operation is performed. This counter-intuitive behavior explains the result.
You don't have this problem in Java because long is an integer number coded on 64 bits, and the value 4294280300 perfectly fits into 64 bits.
You would have observed the same result as in ActionScript if you have had used an int instead of a long.
Let's look at what JavaScript does to better understand what appears to happen in ActionScript: JavaScript stores all numbers as double floating-point and you are sure to not lose precision on integers that can fit in 53 bits.
Trying with the same value, see that you get the same result as in ActionScript if you use >>, but the same as Java if you use >>>.
ON JavaScript side, it appears that >> a.k.a arithmetic shift converts first the value to 32-bit signed integer, while it doesn's with >>> a.k.a logical shift.
That's weird, and wouldn't really surprising that ActionScript does something similar.
Interestingly, python has no >>>operator, always does an arithmetic shift, and seems to work even beyond 64 bits.
Given the popularity of this question
or this one, >> vs. >>> is a common source of confusion in languages where the two operators exist.

Often big numbers become negative

Since I started using eclipse for project euler, I noticed that big numbers sometime become a seemingly random negative numbers. I suppose this has something to do with passing the boudry of the type.
I'll be glad if you could explain to me how these negative numbers are generated and what is the logic behind it. Also, how can I avoid them (preferable not with BigInteger class). Danke!=)

This image shows what you're looking for. In your case it's obviously larger numbers, but the principle stays the same.
Examples of limits in java are:
int: −2,147,483,648 to 2,147,483,647.
long: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
In the image 0000, 0001 etc, shows the binary representation of the numbers.
EDIT: In project euler you often have to think of a way to work around the lagre numbers. The problems are designed with numbers that big so that you can't use the ordinary way of problem solving. However, if you find that you really need to use them, i suggest studying BigInteger anyway. You will find it useful in the long run, and it's not all that complicated. Here is a link with lots of understandable examples:
BigInteger Example

In mathematics numbers are infinite. However in computers they are not. There is MAX_VALUE for each int-like type: int, short, long. For example Integer.MAX_VALUE. When you try to increase number more than this value the number becomes negative. This way the internal binary representation of numbers work.
int i = Integer.MAX_VALUE;
i++; // i becomes negative.

Here's a two's complement representation for 2-bit integer: (U means Unsigned, S means Signed)
U | bits | S
---------------
0 | 00 | 0
1 | 01 | 1 \ overflow here:
2 | 10 | -2 / 1 + 1 = -2
3 | 11 | -1
Arithmetic is done mostly like in the unsigned case, modulo max(U) (4 in our case).
The logic is the same for bigger types. int in Java is 32 bit. Use long for 64 bits.

You are probably overflowing the size of your data type, since the most significant bit is the sign bit. I don't think that Java has unsigned data types, so you may try using a larger data type such as long if you want to hold bigger numbers than int. If you are still overflowing a long though, you're pretty much stuck with BigInteger.

java long datatype conversion to unsigned value

I'm porting some C++ code to Java code.
There is no unsigned datatype in java which can hold 64 bits.
I have a hashcode which is stored in Java's long datatype (which of course is signed).
long vp = hashcode / 38; // hashcode is of type 'long'
Since 38 here is greater than 2, the resulting number can be safely used for any other arithmetic in java.
The question is what if the signed bit in 'hashcode' is set to 1. I don't want to get a negative value in variable vp. I wanted a positive value as if the datatype is an unsigned one.
P.S: I don't want to used Biginteger for this purpose because of performance issues.

Java's primative integral types are considered signed, and there isn't really anything you can do about it. However, depending on what you need it for, this may not matter.
Since the integers are all done in two's complement, signed and unsigned are exact same at the binary level. The difference is how you interpret them, and in certain operations. Specifically, right shift, division, modulus and comparison differ. Unsigned right shifts can be done with the >>> operator. As long as you don't need one of the missing operators, you can use longs perfectly well.

If you can use third-party libraries, you can e.g. use Guava's UnsignedLongs class to treat long values as unsigned for many purposes, including division. (Disclosure: I contribute to Guava.)

Well here is how i solved this. Right shift hashcode by 1 bit(division by 2). Then divide that right shifted number by 19(which is 38/2). So essentially i divided the number by 38 exactly like how it is done in c++. I got the same value what i got in c++

For bitwise operations in Java, do I need to care that most numeric types in Java are signed?

This is a really basic question, but I've never fully convinced myself that my intuitive answer of "it makes no difference" is correct, so maybe someone has a good way to understand this:
If all I want to do with one of the primitive numeric types in Java is bitwise arithmetic, can I simply treat it as if it was an unsigned value or do I need to avoid negative numbers, i.e. keep the highest order bit always set to 0? For example, can I use an int as if it was an unsigned 32-bit number, or should I only use the lowest 31 bits?
I'm looking for as general an answer as possible, but let me give an example: Let's say I want to store 32 flags. Can I store all of them in a single int, if I use something like
store = store & ~(1 << index) | (value << index)
to set flag index to value and something like
return (store & (1 << index)) != 0
to retrieve flag index? Or could I run into any sort of issues with this or similar code if I ever set the flag with index 31 to 1?
I know I need to always be using >>> instead of >> for right shifting, but is this the only concern? Or could there be other things going wrong related to the two's complement representation of negative numbers when I use the highest bit?

I know I need to always be using >>> instead of >> for right shifting, but is this the only concern?
Yes, this is the only concern. Shifting left works the same on signed and unsigned numbers; same goes for ANDing, ORing, and XORing. As long as you use >>> for shifting right, you can use all 32 bits of a signed int.

There are legitimate reasons to use >> as well in that context (a common case is when making a mask that should be 0 or -1 directly, without having to negate a mask that is 0 or 1), so there is really no concern at all. Just be careful of what you're doing to make sure it matches your intent.
Operations that care about signedness (ie they have distinct signed and unsigned forms with different semantics) are:
right shift
division (unsigned form not available in Java)
modulo (unsigned form not available in Java)
comparisons (except equality) (unsigned forms not available in Java)
Operations that don't care about signedness are:
and
or
xor
addition
subtraction
two's complement negation (-x means ~x + 1)
one's complement (~x means -x - 1)
left shift
multiplication

why the binary representationof -127>>1 is 11000000?

I know the binary representation of -127 is 10000001 (complement).
Can any body tell me why I right shift it by 1 digit, then I get 11000000 ?
(-127) = 10000001
(-127>>1) = 11000000 ???
Thanks.

If your programming language does a sign-extending right shift (as Java does), then the left-most 1 comes from extending the sign. That is, because the top bit was set in the original number it remains set in the result for each shift (so shifting by more than 1 has all 1's in the top most bits corresponding to the number of shifts done).
This is language dependent - IIRC C and C++ sign-extend on right shift for a signed value and do not for an unsigned value. Java has a special >>> operator to shift without extending (in java all numeric primitive values are signed, including the misleadingly named byte).

Right-shifting in some languages will pad with whatever is in the most significant bit (in this case 1). This is so that the sign will not change on shifting a negative number, which would turn into a positive one if this was not in place.

-127 as a WORD (2 bytes) is 1111111110000001. If you right shift this by 1 bit, and represent it as a single byte the result is 11000000 This is probably what you are seeing.

Because, if you divide -127 (two's-complement encoded as 10000001) by 2 and round down (towards -infinity, not towards zero), you get -64 (two's-complement encoded as 11000000).
Bit-wise, the reason is: when right-shifting signed values, you do sign-extension -- rather than shifting in zeroes, you duplicate the most significant bit. When working with two's-complement signed numbers, this ensures the correct result, as described above.
Assembly languages (and the machine languages they encode) typically have separate instructions for unsigned and signed right-shift operations (also called "logical shift right" vs. "arithmetic shift right"); and compiled languages typically pick the appropriate instruction when shifting unsigned and signed values, respectively.

It's sign extending, so that a negative number right shifted is still a negative number.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.