Count number of bits set to 1, why non-negative integers? - java

In Elements of Programming Interviews in Java, the authors say: "Writing a program to count the number of bits that are set to 1 in a nonnegative integer is a good way to get up to speed with primitive types".
Why do they qualify this with 'nonnegative'?

Non-negative means positive or zero.
You are assumed to use shifts and bit operations.
Bit shifts can be sign-extending, which can be misleading. To avoid this issue, you are asked to handle non-negative values only.
But you can handle negative values too, if you want. Java has unsigned shifts >>>.
Sure this is an exercise. In real code you'll use a library implementation. java.lang.Integer has bitCount(int i). I assume bitCount will execute on a hardware as bit-counting instruction, such as x86 popcnt, so it should be better than manual bit manipulation, or at least use a very decent bit manipulation implementation rather than one you'd write as an exercise. There's a whole tag hammingweight about counting those bits.

Negative integers are stored a little bit differently.
In a 4 Bit signed integer, 1111b would be -1, whereas 1000b would be -8, making the count of 1's dependent on the number of bits in the number. That way you can't answer the question anymore.

Related

Why sum of two MaxIntegers returns -2? [duplicate]

What is an integer overflow error?
Why do i care about such an error?
What are some methods of avoiding or preventing it?
Integer overflow occurs when you try to express a number that is larger than the largest number the integer type can handle.
If you try to express the number 300 in one byte, you have an integer overflow (maximum is 255). 100,000 in two bytes is also an integer overflow (65,535 is the maximum).
You need to care about it because mathematical operations won't behave as you expect. A + B doesn't actually equal the sum of A and B if you have an integer overflow.
You avoid it by not creating the condition in the first place (usually either by choosing your integer type to be large enough that you won't overflow, or by limiting user input so that an overflow doesn't occur).
The easiest way to explain it is with a trivial example. Imagine we have a 4 bit unsigned integer. 0 would be 0000 and 1111 would be 15. So if you increment 15 instead of getting 16 you'll circle back around to 0000 as 16 is actually 10000 and we can not represent that with less than 5 bits. Ergo overflow...
In practice the numbers are much bigger and it circles to a large negative number on overflow if the int is signed but the above is basically what happens.
Another way of looking at it is to consider it as largely the same thing that happens when the odometer in your car rolls over to zero again after hitting 999999 km/mi.
When you store an integer in memory, the computer stores it as a series of bytes. These can be represented as a series of ones and zeros.
For example, zero will be represented as 00000000 (8 bit integers), and often, 127 will be represented as 01111111. If you add one to 127, this would "flip" the bits, and swap it to 10000000, but in a standard two's compliment representation, this is actually used to represent -128. This "overflows" the value.
With unsigned numbers, the same thing happens: 255 (11111111) plus 1 would become 100000000, but since there are only 8 "bits", this ends up as 00000000, which is 0.
You can avoid this by doing proper range checking for your correct integer size, or using a language that does proper exception handling for you.
An integer overflow error occurs when an operation makes an integer value greater than its maximum.
For example, if the maximum value you can have is 100000, and your current value is 99999, then adding 2 will make it 'overflow'.
You should care about integer overflows because data can be changed or lost inadvertantly, and can avoid them with either a larger integer type (see long int in most languages) or with a scheme that converts long strings of digits to very large integers.
Overflow is when the result of an arithmetic operation doesn't fit in the data type of the operation. You can have overflow with a byte-sized unsigned integer if you add 255 + 1, because the result (256) does not fit in the 8 bits of a byte.
You can have overflow with a floating point number if the result of a floating point operation is too large to represent in the floating point data type's exponent or mantissa.
You can also have underflow with floating point types when the result of a floating point operation is too small to represent in the given floating point data type. For example, if the floating point data type can handle exponents in the range of -100 to +100, and you square a value with an exponent of -80, the result will have an exponent around -160, which won't fit in the given floating point data type.
You need to be concerned about overflows and underflows in your code because it can be a silent killer: your code produces incorrect results but might not signal an error.
Whether you can safely ignore overflows depends a great deal on the nature of your program - rendering screen pixels from 3D data has a much greater tolerance for numerical errors than say, financial calculations.
Overflow checking is often turned off in default compiler settings. Why? Because the additional code to check for overflow after every operation takes time and space, which can degrade the runtime performance of your code.
Do yourself a favor and at least develop and test your code with overflow checking turned on.
From wikipedia:
In computer programming, an integer
overflow occurs when an arithmetic
operation attempts to create a numeric
value that is larger than can be
represented within the available
storage space. For instance, adding 1 to the largest value that can be represented
constitutes an integer overflow. The
most common result in these cases is
for the least significant
representable bits of the result to be
stored (the result is said to wrap).
You should care about it especially when choosing the appropriate data types for your program or you might get very subtle bugs.
From http://www.first.org/conference/2006/papers/seacord-robert-slides.pdf :
An integer overflow occurs when an integer is
increased beyond its maximum value or
decreased beyond its minimum value.
Overflows can be signed or unsigned.
P.S.: The PDF has detailed explanation on overflows and other integer error conditions, and also how to tackle/avoid them.
I'd like to be a bit contrarian to all the other answers so far, which somehow accept crappy broken math as a given. The question is tagged language-agnostic and in a vast number of languages, integers simply never overflow, so here's my kind-of sarcastic answer:
What is an integer overflow error?
An obsolete artifact from the dark ages of computing.
why do i care about it?
You don't.
how can it be avoided?
Use a modern programming language in which integers don't overflow. (Lisp, Scheme, Smalltalk, Self, Ruby, Newspeak, Ioke, Haskell, take your pick ...)
I find showing the Two’s Complement representation on a disc very helpful.
Here is a representation for 4-bit integers. The maximum value is 2^3-1 = 7.
For 32 bit integers, we will see the maximum value is 2^31-1.
When we add 1 to 2^31-1 : Clockwise we move by one and it is clearly -2^31 which is called integer overflow
Ref : https://courses.cs.washington.edu/courses/cse351/17wi/sections/03/CSE351-S03-2cfp_17wi.pdf
This happens when you attempt to use an integer for a value that is higher than the internal structure of the integer can support due to the number of bytes used. For example, if the maximum integer size is 2,147,483,647 and you attempt to store 3,000,000,000 you will get an integer overflow error.

Java HashMap array size

I am reading the implementation details of Java 8 HashMap, can anyone let me know why Java HashMap initial array size is 16 specifically? What is so special about 16? And why is it the power of two always? Thanks
The reason why powers of 2 appear everywhere is because when expressing numbers in binary (as they are in circuits), certain math operations on powers of 2 are simpler and faster to perform (just think about how easy math with powers of 10 are with the decimal system we use). For example, multication is not a very efficient process in computers - circuits use a method similar to the one you use when multiplying two numbers each with multiple digits. Multiplying or dividing by a power of 2 requires the computer to just move bits to the left for multiplying or the right for dividing.
And as for why 16 for HashMap? 10 is a commonly used default for dynamically growing structures (arbitrarily chosen), and 16 is not far off - but is a power of 2.
You can do modulus very efficiently for a power of 2. n % d = n & (d-1) when d is a power of 2, and modulus is used to determine which index an item maps to in the internal array - which means it occurs very often in a Java HashMap. Modulus requires division, which is also much less efficient than using the bitwise and operator. You can convince yourself of this by reading a book on Digital Logic.
The reason why bitwise and works this way for powers of two is because every power of 2 is expressed as a single bit set to 1. Let's say that bit is t. When you subtract 1 from a power of 2, you set every bit below t to 1, and every bit above t (as well as t) to 0. Bitwise and therefore saves the values of all bits below position t from the number n (as expressed above), and sets the rest to 0.
But how does that help us? Remember that when dividing by a power of 10, you can count the number of zeroes following the 1, and take that number of digits starting from the least significant of the dividend in order to find the remainder. Example: 637989 % 1000 = 989. A similar property applies to binary numbers with only one bit set to 1, and the rest set to 0. Example: 100101 % 001000 = 000101
There's one more thing about choosing the hash & (n - 1) versus modulo and that is negative hashes. hashcode is of type int, which of course can be negative. modulo on a negative number (in Java) is negative also, while & is not.
Another reason is that you want all of the slots in the array to be equally likely to be used. Since hash() is evenly distributed over 32 bits, if the array size didn't divide into the hash space, then there would be a remainder causing lower indexes to have a slightly higher chance of being used. Ideally, not just the hash, but (hash() % array_size) is random and evenly distributed.
But this only really matters for data with a small hash range (like a byte or character).

What Primitive To Use In Java

I am a little confused on when to use what primitives. If I am defining a number, how do I know what to use byte, short, int, or long? I know they are different bytes, but does that mean that I can only use one of them for a certain number?
So simply, my question is, when do I use each of the four primitives listed above?
If I am lets say defining a number, how do I know what to use byte, short, int, or long?
Depending on your use-case. There's no huge penalty to using an int as opposed to a short, unless you have billions of numbers. Simply consider the range a variable might use. In most cases it is reasonable to use int, whose range is -2,147,483,648 to 2,147,483,647, while long handles numbers in the range of +/- 9.22337204*1018. If you aren't sure, long won't specifically hurt.
The only reasons you might want to use byte specifically is if you are storing byte data such as parts of a file, or are doing something like network communication or serialization where the number of bytes is important. Remember that Java's bytes are also signed (-128 to 127). Same for short--might be useful to save 2GB of memory for a billion-element array, but not specifically useful for much other than, again, serialization with a specific byte alignment.
does that mean that I can only use one of them for a certain number?
No, you can use any that is large enough to handle that number. Of course, decimal values need a double or a float--double is usually ideal due to its higher precision and few drawbacks. Some libraries (e.g. 3D drawing) might use floats however. Remember that you can cast numeric types (e.g. (byte) someInt or (float) functionReturningADouble();
A byte can be a number from -128 to 127
A short can be a number from -32,768 to 32,767
An int can be from -2^32 to 2^32 - 1
A long is -2^63 to 2^63 - 1
Usually unless space is an issue an int is fine.
In order to use a primitive you need to know the data range of the number stored in it. This depends heavily on the thing that you are modeling:
Use byte when your number is in the range [-128..127]
Use short when your number is in the range [-32768..32767]
Use int when your number is in the range [-231..231-1]
Use long when your number is in the range [-263..263-1]
In addition, byte data type is used for representing "raw" binary data, in which case it is not used like a number.
When you want an unlimited range, use BigInteger. This flexibility comes at a cost: operations on BigInteger can be orders of magnitude more expensive than identical operations on primitives.
Do you know what values your variable will potentially hold?
If it will be between -128 and 127, use a byte.
If it will be between -32,768 and 32,767, use a short.
If it will be between -2^32 and 2^32 - 1, use an int.
If it will be between -2^63 and 2^63 - 1, use a long.
Generally an int will be fine for most cases, but you might use a byte or a short if memory is a concern, or a long if you need larger values.
Try writing a little test program. What happens when you put a value that's too large in one of these types of variables?
Source: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
See also: the year 2038 problem
The general guidance should definitely be: If you don't have any good reason to pick a specific primitive, pick int.
This has at least three advantages:
Every kind of arithmetic or logical operation will return an int. Hence if you use anything smaller you will have to cast a lot to actually assign values back.
It's the accepted standard for "some number". If you pick anything else, other developers will wonder about the reasons.
It will be the most efficient, although it's likely that the performance difference from using anything else will be minuscule.
That said if you have an array of millions of elements yes you definitely should figure out what primitive fits the input range - I refer you to the other answers for a summary of how large each one is.
#dasblinkenlight
When you are working for large project, you need to optimize the code for each requirement. Apparently you would reduce the memory occupied by the project.
Hence using int, short, long, byte plays a vital role in terms of optimising the memory. When the interger holds a 1 digit value then you declare the integer with int or short, but you can still choose long,float and double technically but to become a good programmer you can try to follow standards.
Byte - Its 8 bit. So it can hold a value from -128 to 127.
short: Its 32 bit. so -32,768 and a maximum value of 32,767
Long- Its 64 bit. -2 (power of) 63 and a maximum value of 2(power of)63-1
float and double- for large decimal values.
Boolean for Yes or No decesions- can hold a value of 1 or 0
Char for characters.
Hope this answer helps you.

How to turn a division into a bitwise shift when power of two?

I have the following division that I need to do often:
int index = pos / 64;
Division can be expensive in the cpu level. I am hoping there is a way to do that with bitwise shift. I would also like to understand how you can go from division to shift, in other words, I don't want to just memorize the bitwise expression.
int index = pos >> 6 will do it, but this is unnecessary. Any reasonable compiler will do this sort of thing for you. Certainly the Sun/Oracle compiler will.
The general rule is that i/(2^n) can be implemented with i >> n. Similarly i*(2^n) is i << n.
You need to be concerned with negative number representation if i is signed. E.g. twos-complement produces reasonable results (if right shift is arithmetic--sign bit copied). Signed magnitude does not.
The compiler will implement it for you in the most efficient way, as long you understand what you need and ask the compiler to do exactly that. If shift is the most efficient way in this case, the compiler will use shift.
Keep in mind though that if you are performing signed division (i.e pos is signed), then it cannot be fully implemented by a shift alone. Shift by itself will generate invalid results for negative values of pos. If the compiler decides to use shifts for this operations, it will also have to perform some post-shift corrections on the intermediate result to make it agree with the requirements of the language specification.
For this reason, if you are really looking for maximum possible efficiency of your division operations, you have to remember not to use signed types thoughtlessly. Prefer to use unsigned types whenever possible, and use signed types only when you have to.
P.S. AFAIK, Java implements Euclidean division, meaning that the above remarks do not apply to Java. Euclidean division is performed correctly by a shift on a negative divisor in 2's-complement representation. The above remarks would apply to C/C++.
http://www.java-samples.com/showtutorial.php?tutorialid=58
For each power of 2 you want to divide by, right shift it once. So to divide by 4 you would right shift twice. To divide by 8 right shift 3 times. Divide by 16 right shift 4 times. 32 -> 5 times. 64 -> 6 times. So to divide by 64 you can right shift 6 times. myvalue = myvalue >> 6;

Can java floats be sorted by their byte representations?

I'm working in Hadoop, and I need to provide a comparator to sort objects as raw network order byte arrays. This is easy for me to do with integers -- I just compare each byte in order. I also need to do this for floats. I think, but I can't find a reference, that the IEEE 754 format for floats used by Java can be sorted by just comparing each byte as a signed 8 bit value.
Can anyone confirm or refute this?
Edit: the representation is IEEE 754 32 bit floating point. I actually have a (larger) byte buffer and an offset and length within that buffer. I found some utility methods already there that make it easy to turn this into a float, so I guess this question is moot. I'm still curious if anyone knows the answer.
Positive floats have the same ordering as their bit representations viewed as 2s complement integers. Negative floats do not.
For example, the bit representation of -2.0f is 0xc0000000 and -1.0f is 0xbf800000. If you try to use a comparison on the representations, you get -2.0f > -1.0f, which is incorrect.
There's also the issue of NaNs (which compare unordered against all floating point data, whereas the representations do not), but you may not care about them.
This almost works:
int bits = Float.floatToIntBits(x);
bits ^= (bits >> 31) & Integer.MAX_VALUE;
Here negative floats have bits 0-30 inverted (because you want the opposite order to what the original sign/magnitude representation would give you, whilst preserving the sign bit.)
Caveats:
NaNs are included in the ordering (best to consider the results undefined if NaNs are involved.)
+0 now compares as greater than -0 (the built-in relational operators consider them equal.)
It works for all other values though, including denormals and infinities.
Use Float.toIntBits(float) and compare the integers.
Edit: This works only for positive numbers, including positive infinity, but not NaN. For negative numbers you have to reverse the ordering. And positive numbers are of course greater than negative numbers.
well, if you are transmitting data over the network, you should have some form of semantic representation for when you are transmitting an int and when you are transmitting a float. Since it is machine agnostic information, data type width should also be defined in some place or predefined by the spec (i.e 32 bit or 64 bit floats). So, what you should really do is accumulate your bytes into the appropriate data type, then use the natural language data types to do the comparison.
To be really accurate with an answer tho, we would need to see your transmit and receive code to see if you are autoboxing primitives via some kind of decorated i/o stream or some such thing. For a better answer, please provide better detail.

Categories

Resources