What is the point of casting long to integer? - java

public class ConstantsAndCasting{
public static void main (String args[]){
long hugeNum = 23456013477456L;
int smallNum = (int)hugeNum;
System.out.println(hugeNum);
System.out.println(smallNum);
}
}
The output to the code listed above is:
23456013477456
1197074000
It appears as though when casting a long to an integer value, java retains 32 bits starting from the right working left. What results is a value that is closer to 0 than to the original long value. This totally makes sense from a machine perspective but what is the practical use for this? It seems like you'd be better off using the random number generator to produce 10 random characters.
Thanks in advance!

It's a little unclear how to answer that... As with any type conversion, it iis used when you have a value of one type (in this case, long) but you need a value of a different type (in this case, int).
Yes, sometimes it will give you unwanted results, because of the limitations of the casting operation (Which, in turn, are based on the limitations of the data types). But picking one of those out, and saying "so what's the point in ever doing this operation", would be like saying "if I have two ints, a and b, each valued at 2,000,000,000 - adding them doesn't get the desired result... so what's the point adding ints?"

The primary useful purpose of casting a long to an int is to take a long value which is in the range -2147483648..2147483647 and use that number with code that expects an int value in that range. The behavior with values was outside that range was chosen because:
The designers of Java wanted to fully specify its behavior whenever practical.
Taking the bottom 32 bits and ignoring the top 32 bits was faster than any other consistent course of action which would work sensibly with values in the range -2147483648..2147483647.
There are a number of situations where other courses of action might sometimes allow more efficient code generation if consistency wasn't required, or more useful semantics if speed wasn't required. The approach that was taken, however, offers the best compromise between speed, usefulness, and consistency.

Related

Do I need to use type long to store 2166136261 in Java?

I'm attempting to implement a simple FNV hash function in Java. The function requires the value 2166136261 as a function parameter. In one online source, the number is assigned as follows in hexadecimal:
private static final int FNV_32_INIT = 0x811C9DC5;
But I don't think this is correct since 2166136261 is larger than 2147483647 which is the largest value an int can hold.
Am I right in assuming that I would need at least a variable of type long in order to store the value, regardless of decimal or hexadecimal?
Edit:
The value used as the offset basis in this algorithm is arbitrary, so it doesn't matter whether FNV_32_INIT = 2166136261 or some other value:
Regarding the offset_basis value:
The switch from FNV-0 to FNV-1 was purely to change the offset_basis
to a non-zero value. The selection of that non-zero value is
arbitrary. The string: chongo /../\ was used
because the tester was looking at an EMail message from Landon in
Landon's standard EMail signature line. Actually the person who did
that did not see very well. Landon, today, uses ()'s instead of <>'s
and his ASCII bats use "oo" eyes instead of ".." as in: chongo (Landon
Curt Noll) /\oo/\ We didn't bother correcting her error because it
does not matter. In the general case, almost any offset_basis will
serve so long as it is non-zero.
Source
Yes, you're correct.
Whether it's expressed in decimal, hex, binary, octal, base 64, or any other radix, the expressions all represent the same magnitude.

ULP calculation floating-point

I'm currently writing a "new language" at school, and I have to implement a Math class.
One of the specifications is to implement ulp method, like Math.ulp() in Java. We are working on float types.
I found many interesting sources, but I'm still not able to calculate ulp of a float..
AFAIK, if
then
But, how can I get this normalized form for a float without any lib ?
And how to get parameters e & n+1 ?
Thank you for your help,
Best regards,
I'm not sure Java has the possibility of aliasing to get the bit pattern of the floating point number (there is a close enough alternative, as the second part of the answer shows), but if it does, then e is the bits 23 through 30 (31 is sign) minus some constant (as in the wikipedia description of the format, it's 127 unless the number is subnormal) while n is fixed (in this case it's 23 bits, or 24 if it includes the implicit 1)
It's recommended you use a lib that does this job for you properly.
Another option I've been notified in the comments of (rather indirectly) implies converting the float bits to int. I will write a snippet of code directly. I'm not fully versed in Java (the code may not work immediately due to lacking package/class specifiers (floatToIntBits as well as intToFloatBits are static methods of the class java.lang.Float). This is different from the above suggestion since it's a bit unorthodox but it has better performance than the code you suggested in the question itself.
float ulp(float x) {
int repr;
float next;
if (Float.isNaN(x)) return Float.NaN; //special handling, to be safe
repr = Float.floatToIntBits(x);
x++; //will work correctly independently of sign
next = Float.intBitsToFloat(repr)
return next-x;
}

Java - operations on native short always return int, why? and how? [duplicate]

Why does the Java API use int, when short or even byte would be sufficient?
Example: The DAY_OF_WEEK field in class Calendar uses int.
If the difference is too minimal, then why do those datatypes (short, int) exist at all?
Some of the reasons have already been pointed out. For example, the fact that "...(Almost) All operations on byte, short will promote these primitives to int". However, the obvious next question would be: WHY are these types promoted to int?
So to go one level deeper: The answer may simply be related to the Java Virtual Machine Instruction Set. As summarized in the Table in the Java Virtual Machine Specification, all integral arithmetic operations, like adding, dividing and others, are only available for the type int and the type long, and not for the smaller types.
(An aside: The smaller types (byte and short) are basically only intended for arrays. An array like new byte[1000] will take 1000 bytes, and an array like new int[1000] will take 4000 bytes)
Now, of course, one could say that "...the obvious next question would be: WHY are these instructions only offered for int (and long)?".
One reason is mentioned in the JVM Spec mentioned above:
If each typed instruction supported all of the Java Virtual Machine's run-time data types, there would be more instructions than could be represented in a byte
Additionally, the Java Virtual Machine can be considered as an abstraction of a real processor. And introducing dedicated Arithmetic Logic Unit for smaller types would not be worth the effort: It would need additional transistors, but it still could only execute one addition in one clock cycle. The dominant architecture when the JVM was designed was 32bits, just right for a 32bit int. (The operations that involve a 64bit long value are implemented as a special case).
(Note: The last paragraph is a bit oversimplified, considering possible vectorization etc., but should give the basic idea without diving too deep into processor design topics)
EDIT: A short addendum, focussing on the example from the question, but in an more general sense: One could also ask whether it would not be beneficial to store fields using the smaller types. For example, one might think that memory could be saved by storing Calendar.DAY_OF_WEEK as a byte. But here, the Java Class File Format comes into play: All the Fields in a Class File occupy at least one "slot", which has the size of one int (32 bits). (The "wide" fields, double and long, occupy two slots). So explicitly declaring a field as short or byte would not save any memory either.
(Almost) All operations on byte, short will promote them to int, for example, you cannot write:
short x = 1;
short y = 2;
short z = x + y; //error
Arithmetics are easier and straightforward when using int, no need to cast.
In terms of space, it makes a very little difference. byte and short would complicate things, I don't think this micro optimization worth it since we are talking about a fixed amount of variables.
byte is relevant and useful when you program for embedded devices or dealing with files/networks. Also these primitives are limited, what if the calculations might exceed their limits in the future? Try to think about an extension for Calendar class that might evolve bigger numbers.
Also note that in a 64-bit processors, locals will be saved in registers and won't use any resources, so using int, short and other primitives won't make any difference at all. Moreover, many Java implementations align variables* (and objects).
* byte and short occupy the same space as int if they are local variables, class variables or even instance variables. Why? Because in (most) computer systems, variables addresses are aligned, so for example if you use a single byte, you'll actually end up with two bytes - one for the variable itself and another for the padding.
On the other hand, in arrays, byte take 1 byte, short take 2 bytes and int take four bytes, because in arrays only the start and maybe the end of it has to be aligned. This will make a difference in case you want to use, for example, System.arraycopy(), then you'll really note a performance difference.
Because arithmetic operations are easier when using integers compared to shorts. Assume that the constants were indeed modeled by short values. Then you would have to use the API in this manner:
short month = Calendar.JUNE;
month = month + (short) 1; // is july
Notice the explicit casting. Short values are implicitly promoted to int values when they are used in arithmetic operations. (On the operand stack, shorts are even expressed as ints.) This would be quite cumbersome to use which is why int values are often preferred for constants.
Compared to that, the gain in storage efficiency is minimal because there only exists a fixed number of such constants. We are talking about 40 constants. Changing their storage from int to short would safe you 40 * 16 bit = 80 byte. See this answer for further reference.
The design complexity of a virtual machine is a function of how many kinds of operations it can perform. It's easier to having four implementations of an instruction like "multiply"--one each for 32-bit integer, 64-bit integer, 32-bit floating-point, and 64-bit floating-point--than to have, in addition to the above, versions for the smaller numerical types as well. A more interesting design question is why there should be four types, rather than fewer (performing all integer computations with 64-bit integers and/or doing all floating-point computations with 64-bit floating-point values). The reason for using 32-bit integers is that Java was expected to run on many platforms where 32-bit types could be acted upon just as quickly as 16-bit or 8-bit types, but operations on 64-bit types would be noticeably slower. Even on platforms where 16-bit types would be faster to work with, the extra cost of working with 32-bit quantities would be offset by the simplicity afforded by only having 32-bit types.
As for performing floating-point computations on 32-bit values, the advantages are a bit less clear. There are some platforms where a computation like float a=b+c+d; could be performed most quickly by converting all operands to a higher-precision type, adding them, and then converting the result back to a 32-bit floating-point number for storage. There are other platforms where it would be more efficient to perform all computations using 32-bit floating-point values. The creators of Java decided that all platforms should be required to do things the same way, and that they should favor the hardware platforms for which 32-bit floating-point computations are faster than longer ones, even though this severely degraded PC both the speed and precision of floating-point math on a typical PC, as well as on many machines without floating-point units. Note, btw, that depending upon the values of b, c, and d, using higher-precision intermediate computations when computing expressions like the aforementioned float a=b+c+d; will sometimes yield results which are significantly more accurate than would be achieved of all intermediate operands were computed at float precision, but will sometimes yield a value which is a tiny bit less accurate. In any case, Sun decided everything should be done the same way, and they opted for using minimal-precision float values.
Note that the primary advantages of smaller data types become apparent when large numbers of them are stored together in an array; even if there were no advantage to having individual variables of types smaller than 64-bits, it's worthwhile to have arrays which can store smaller values more compactly; having a local variable be a byte rather than an long saves seven bytes; having an array of 1,000,000 numbers hold each number as a byte rather than a long waves 7,000,000 bytes. Since each array type only needs to support a few operations (most notably read one item, store one item, copy a range of items within an array, or copy a range of items from one array to another), the added complexity of having more array types is not as severe as the complexity of having more types of directly-usable discrete numerical values.
If you used the philosophy where integral constants are stored in the smallest type that they fit in, then Java would have a serious problem: whenever programmers write code using integral constants, they have to pay careful attention to their code to check if the type of the constants matter, and if so look up the type in the documentation and/or do whatever type conversions are needed.
So now that we've outlined a serious problem, what benefits could you hope to achieve with that philosophy? I would be unsurprised if the only runtime-observable effect of that change would be what type you get when you look the constant up via reflection. (and, of course, whatever errors are introduced by lazy/unwitting programmers not correctly accounting for the types of the constants)
Weighing the pros and the cons is very easy: it's a bad philosophy.
Actually, there'd be a small advantage. If you have a
class MyTimeAndDayOfWeek {
byte dayOfWeek;
byte hour;
byte minute;
byte second;
}
then on a typical JVM it needs as much space as a class containing a single int. The memory consumption gets rounded to a next multiple of 8 or 16 bytes (IIRC, that's configurable), so the cases when there are real saving are rather rare.
This class would be slightly easier to use if the corresponding Calendar methods returned a byte. But there are no such Calendar methods, only get(int) which must returns an int because of other fields. Each operation on smaller types promotes to int, so you need a lot of casting.
Most probably, you'll either give up and switch to an int or write setters like
void setDayOfWeek(int dayOfWeek) {
this.dayOfWeek = checkedCastToByte(dayOfWeek);
}
Then the type of DAY_OF_WEEK doesn't matter, anyway.
Using variables smaller than the bus size of the CPU means more cycles are necessary. For example when updating a single byte in memory, a 64-bit CPU needs to read a whole 64-bit word, modify only the changed part, then write back the result.
Also, using a smaller data type requires overhead when the variable is stored in a register, since the behavior of the smaller data type to be accounted for explicitly. Since the whole register is used anyways, there is nothing to be gained by using a smaller data type for method parameters and local variables.
Nevertheless, these data types might be useful for representing data structures that require specific widths, such as network packets, or for saving space in large arrays, sacrificing speed.

Best way to avoid number polarity reversal if multiplying two big numbers in Java

My question is related to this
How can I check if multiplying two numbers in Java will cause an overflow?
In my application, x and y are calculated on the fly and somewhere in my formula I have to multiply x and y.
int x=64371;
int y=64635;
System.out.println((x*y));
I get wrong output as -134347711
I can quickly fix above by the changing variable x and y from type int to long and get correct answer for above case. But, there is no gurantee that x and y won't grow beyond max capacity of long as well.
Question
Why does I get a negative number here, even though I am not storing the final result in any variable? (for curiosity sake)
Since, I won't know the value of x and y in advance, is there any quicker way to avoid this overflow. Maybe by dividing all x and y by a certain large constant for entire run of the application or should I take log of x and y before multiplying them? (actual question)
EDIT:
Clarification
The application runs on a big data set, which takes hours to complete. It would be nicer to have a solution which is not too slow.
Since the final result is used for comparison (they just need to be somewhat proportional to the original result) and it is acceptable to have +-5% error in the final value if that gives huge performance gain.
If you know that the numbers are likely to be large, use BigInteger instead. This is guaranteed not to overflow, and then you can either check whether the result is too large to fit into an int or long, or you can just use the BigInteger value directly.
BigInteger is an arbitrary-precision class, so it's going to be slower than using a direct primitive value (which can probably be stored in a processor register), so figure out whether you're realistically going to be overflowing a long (an int times an int will always fit in a long), and choose BigInteger if your domain really requires it.
You get a negative number because of an integer overflow: using two-s complement representation, Java interprets any integer with the most significant bit set to 1 as a negative.
There are very clever methods involving bit manipulation for detecting situations when an addition or subtraction would result in an overflow or an underflow. If you do not know how big your results are going to be, it is best to switch to BigInteger. Your code would look very different, though, because Java lacks operator overloading facilities that would make mathematical operations on BigInteger objects look familiar. The code will be somewhat slower, too. However, you will be guaranteed against overflows and underflows.
EDIT :
it is acceptable to have +-5% error in the final value if that gives huge performance gain.
+-5% error is a huge allowance for error! If this is indeed acceptable in your system, than using double or even float could work. These types are imprecise, but their range is enormously larger than that of an int, and they do not overflow so easily. You have to be extremely careful, though, because floating-point data types are inherently inexact. You need to always keep in mind the way the data is represented to avoid common precision problems.
Why does I get a negative number here, even though I am not storing
the final result in any variable? (for curiosity sake)
x and y are int types. When you multiply them, they are put into a piece of memory temporarily. The type of that is determined by the types of the originals. int*int will always yield an int. Even if it overflows. if you cast one of them to a long, then it will create a long for the multiplication, and you will not get an overflow.
Since, I won't know the value of x and y, is there any quicker way to
avoid this overflow. Maybe by dividing all x and y by a certain large
constant for entire run of the application or should I take log of x
and y before multiplying them? (actual question)
If x and y are positive then you can check
if(x*y<0)
{
//overflow
}
else
{
//do something with x*y
}
Unfortunately this is not fool-proof. You may overrun right into positive numbers again. for example: System.out.println(Integer.MAX_VALUE * 3); will output: 2147483645.
However, this technique will always work for adding 2 integers.
As others have said, BigInteger is sure not to overflow.
Negative value is just (64371 * 64635) - 2^32. Java not performs widening primitive conversion at run time.
Multiplication of ints always result in an int, even if it's not stored in a variable. Your product is 4160619585, which requires unsigned 32-bit (which Java does not have), or a larger word size (or BigInteger, as someone seem to have mentioned already).
You could add logs instead, but the moment you try to exp the result, you would get a number that won't round correctly into a signed 32-bit.
Since both multiplicands are int, doing the multiplication using long via casting would avoid an overflow in your specific case:
System.out.println(x * (long) y);
You don't want to use logarithms because transcendental functions are slow and floating point arithmetic is imprecise - the result is likely to not be equal to the correct integer answer.

how can i generate a unique int from a unique string?

I have an object with a String that holds a unique id .
(such as "ocx7gf" or "67hfs8")
I need to supply it an implementation of int hascode() which will be unique obviously.
how do i cast a string to a unique int in the easiest/fastest way?
10x.
Edit - OK. I already know that String.hashcode is possible. But it is not recommended in any place. Actually' if any other method is not recommended - Should I use it or not if I have my object in a collection and I need the hashcode. should I concat it to another string to make it more successful?
No, you don't need to have an implementation that returns a unique value, "obviously", as obviously the majority of implementations would be broken.
What you want to do, is to have a good spread across bits, especially for common values (if any values are more common than others). Barring special knowledge of your format, then just using the hashcode of the string itself would be best.
With special knowledge of the limits of your id format, it may be possible to customise and result in better performance, though false assumptions are more likely to make things worse than better.
Edit: On good spread of bits.
As stated here and in other answers, being completely unique is impossible and hash collisions are possible. Hash-using methods know this and can deal with it, but it does impact upon performance, so we want collisions to be rare.
Further, hashes are generally re-hashed so our 32-bit number may end up being reduced to e.g. one in the range 0 to 22, and we want as good a distribution within that as possible to.
We also want to balance this with not taking so long to compute our hash, that it becomes a bottleneck in itself. An imperfect balancing act.
A classic example of a bad hash method is one for a co-ordinate pair of X, Y ints that does:
return X ^ Y;
While this does a perfectly good job of returning 2^32 possible values out of the 4^32 possible inputs, in real world use it's quite common to have sets of coordinates where X and Y are equal ({0, 0}, {1, 1}, {2, 2} and so on) which all hash to zero, or matching pairs ({2,3} and {3, 2}) which will hash to the same number. We are likely better served by:
return ((X << 16) | (x >> 16)) ^ Y;
Now, there are just as many possible values for which this is dreadful than for the former, but it tends to serve better in real-world cases.
Of course, there is a different job if you are writing a general-purpose class (no idea what possible inputs there are) or have a better idea of the purpose at hand. For example, if I was using Date objects but knew that they would all be dates only (time part always midnight) and only within a few years of each other, then I might prefer a custom hash code that used only the day, month and lower-digits of the years, over the standard one. The writer of Date though can't work on such knowledge and has to try to cater for everyone.
Hence, If I for instance knew that a given string is always going to consist of 6 case-insensitive characters in the range [a-z] or [0-9] (which yours seem to, but it isn't clear from your question that it does) then I might use an algorithm that assigned a value from 0 to 35 (the 36 possible values for each character) to each character, and then walk through the string, each time multiplying the current value by 36 and adding the value of the next char.
Assuming a good spread in the ids, this would be the way to go, especially if I made the order such that the lower-significant digits in my hash matched the most frequently changing char in the id (if such a call could be made), hence surviving re-hashing to a smaller range well.
However, lacking such knowledge of the format for sure, I can't make that call with certainty, and I could well be making things worse (slower algorithm for little or even negative gain in hash quality).
One advantage you have is that since it's an ID in itself, then presumably no other non-equal object has the same ID, and hence no other properties need be examined. This doesn't always hold.
You can't get a unique integer from a String of unlimited length. There are 4 billionish (2^32) unique integers, but an almost infinite number of unique strings.
String.hashCode() will not give you unique integers, but it will do its best to give you differing results based on the input string.
EDIT
Your edited question says that String.hashCode() is not recommended. This is not true, it is recommended, unless you have some special reason not to use it. If you do have a special reason, please provide details.
Looks like you've got a base-36 number there (a-z + 0-9). Why not convert it to an int using Integer.parseInt(s, 36)? Obviously, if there are too many unique IDs, it won't fit into an int, but in that case you're out of luck with unique integers and will need to get by using String.hashCode(), which does its best to be close to unique.
Unless your strings are limited in some way or your integers hold more bits than the strings you're trying to convert, you cannot guarantee the uniqueness.
Let's say you have a 32 bit integer and a 64-character character set for your strings. That means six bits per character. That will allow you to store five characters into an integer. More than that and it won't fit.
represent each string character by a five-digit binary digit, eg. a by 00001 b by 00010 etc. thus 32 combinations are possible, for example, cat might be written as 00100 00001 01100, then convert this binary into decimal, eg. this would be 4140, thus cat would be 4140, similarly, you can get cat back from 4140 by converting it to binary first and Map the five digit binary to string
One way to do it is assign each letter a value, and each place of the string it's own multiple ie a = 1, b = 2, and so on, then everything in the first digit (read left to right) would be multiplied by a prime number, the next the next prime number and so on, such that the final digit was multiplied by a prime larger than the number of possible subsets in that digit (26+1 for a space or 52+1 with capitols and so on for other supported characters). If the number is mapped back to the first digits (leftmost character) any number you generate from a unique string mapping back to 1 or 6 whatever the first letter will be, gives a unique value.
Dog might be 30,3(15),101(7) or 782, while God 33,3(15),101(4) or 482. More importantly than unique strings being generated they can be useful in generation if the original digit is kept, like 30(782) would be unique to some 12(782) for the purposes of differentiating like strings if you ever managed to go over the unique possibilities. Dog would always be Dog, but it would never be Cat or Mouse.

Categories

Resources