I understand that, in Java, there are no unsigned numbers, they're all signed with wizardry behind the scenes to make sense of it.
So, I am puzzled by this, and it's probably that wizardry that I mentioned before that's eluding me.
private static int broadcast = 0xffffffff; //4294967295
I'm using IntelliJ for an IDE, and in that, the above statement works fine. If I replace the hex number with the decimal #, I get a complaint that the number is too large. It's the same number, what am I missing?
How I'm using it all:
package com.company;
import java.net.InetAddress;
import java.net.UnknownHostException;
public class Main {
private final static int broadcast = 0xffffffff; //4294967295, or 255.255.255.255
private final static int firstClassE = 0xf0000000; //4026531840, or 240.0.0.0
public static int GetIntInetAddress(InetAddress toConvert)
{
final byte[] addr = toConvert.getAddress();
final int ipAddr =
((addr[0] & 0xFF) << (3 * 8)) +
((addr[1] & 0xFF) << (2 * 8)) +
((addr[2] & 0xFF) << (1 * 8)) +
(addr[3] & 0xFF);
return ipAddr;
}
public static Boolean IsClassEAddress(InetAddress address)
{
int curAddr = GetIntInetAddress(address);
System.out.println(String.format("curAddr: %d, firstClassE: %d, broadcast: %d", curAddr, firstClassE, broadcast));
return (curAddr >= firstClassE && curAddr < broadcast) ? true : false;
}
public static void main(String[] args) throws UnknownHostException
{
String ip = "10.20.30.40";
InetAddress someIP = InetAddress.getByName(ip);
if (IsClassEAddress(someIP))
{
// Raise a flag
System.out.println("Class E IP address detected.");
}
// Output of program is:
// curAddr: 169090600, firstClassE: -268435456, broadcast: -1
}
}
In IntelliJ, there's another strange example of this behaviour. When I examine an address, the inspector is showing both the proper value and the negative value, as I've highlighted with red arrows in the pic below. Using Windows calc, I put in -84 and converted to hex and received FFF...FAC. When I put in 172, I received just AC... why do I get the same hex number, just preceeded by a 1 in the most sig position?
This is specified in JLS section 3.10.1, which has different rules for decimal literals and literals of other bases:
It is a compile-time error if a decimal literal of type int is larger than 2147483648 (231), or if the decimal literal 2147483648 appears anywhere other than as the operand of the unary minus operator (§15.15.4).
vs
The following hexadecimal, octal, and binary literals represent the decimal value -1:
0xffff_ffff,
0377_7777_7777, and
0b1111_1111_1111_1111_1111_1111_1111_1111
It is a compile-time error if a hexadecimal, octal, or binary int literal does not fit in 32 bits.
So that's why the compiler is behaving that way - it's basically as per the spec.
As for why the specification was written that way... I suspect it's because constants written in non-decimal bases are usually there for bitmasking techniques and the like - where really you just care about the bits in the value rather than the sign and magnitude of the integer it represents.
0xffffffff is -1 as an int (although 4294967295 if it was unsigned).
Depending on how you are using the value it probably doesn't matter. For example, writing the value out to a binary file or using it as a bitmask will write out the same bytes.
If in java, you need to actually use the value 4294967295 as a positive number, you need to use a long.
private static long broadcast = 4294967295L;
Note the trailing "L" to mark it as a long.
Related
Just when I though I had a fair grasp on how Java treats all Integers/Bytes etc.. as signed numbers, it hit me with another curve ball and got me thinking if I really understand this treatment after all.
This is a piece of assembly code that is supposed to jump to an address if a condition is met (which indeed is met). PC just before the jump is C838h and then after condition check it is supposed to be: C838h + FCh (h = hex) which I thought would be treated as signed so the PC would jump backwards: FCh = -4 in two's compliment negative number. But To my surprise, java ADDED FCh to the PC making it incorrectly jump to C934h instead of back to C834h.
C832: LD B,04 06 0004 (PC=C834)
C834: LD (HL), A 77 9800 (PC=C835)
C835: INC L:00 2C 0001 (PC=C836)
C836: JR NZ, n 20 00FC (PC=C934)
I tested this in a java code and indeed the result was the same:
int a = 0xC838;
int b = 0xFC;
int result = a + b;
System.out.printf("%04X\n", result); //prints C934 (incorrect)
To fix this I had to cast FCh into a byte after checking if the first bit is 1, which it is: 11111100
int a = 0xC838;
int b = (byte) 0xFC;
int result = a + b;
System.out.printf("%04X\n", result); //prints C834 (correct)
In short I guess my question is that I thought java would know that FCh is a negative number but that is not the case unless I cast it to a byte. Why? Sorry I know this question is asked many times and I seem to be asking it myself alot.
0xfc is a positive number. If you want a negative number, then write a negative number. -0x4 would do just fine.
But if you want to apply this to non-constant data, you'll need to tell Java that you want it sign-extended in some way.
The core of the problem is that you have a 32-bit signed integer, but you want it treated like an 8-bit signed integer. The easiest way to achieve that would be to just use byte as you did above.
If you really don't want to write byte, you can write (0xfc << 24) >> 24:
class Main
{
public static void main(String[] args)
{
int a = 0xC838;
int b = (0xfc << 24) >> 24;
int result = a + b;
System.out.printf("%04X\n", result);
}
}
(The 24 derives from the difference of the sizes of int (32 bits) and byte (8 bits)).
I cannot figure out why this works. I am attempting to mask the least significant 32 bits of java on a long but it does not properly AND the 33rd and 34th bit and further. Here is my example
class Main {
public static void main(String[] args) {
long someVal = 17592096894893l; //hex 0xFFFFAAFAFAD
long mask = 0xFF; //binary
long result = mask & someVal;
System.out.println("Example 1 this works on one byte");
System.out.printf("\n%x %s", someVal, Long.toBinaryString(someVal) );
System.out.printf("\n%x %s", result, Long.toBinaryString(result) );
long someVal2 = 17592096894893l; //hex 0xFFFFAAFAFAD
mask = 0xFFFFFFFF; //binary
result = mask & someVal2;
System.out.println("\nExample 2 - this does not work");
System.out.printf("\n%x %s", someVal2, Long.toBinaryString(someVal2) );
System.out.printf("\n%x %s", result, Long.toBinaryString(result) );
}
}
I was expecting the results to drop the most significant byte to be a zero since the AND operation did it on 32 bits. Here is the output I get.
Example 1 - this works
ffffaafafad 11111111111111111010101011111010111110101101
ad 10101101
Example 2 - this does not work
ffffaafafad 11111111111111111010101011111010111110101101
ffffaafafad 11111111111111111010101011111010111110101101
I would like to be able to mask the first least significant 4 bytes of the long value.
I believe what you’re seeing here is the fact that Java converts integers to longs using sign extension.
For starters, what should this code do?
int myInt = -1;
long myLong = myInt;
System.out.println(myLong);
This should intuitively print out -1, and that’s indeed what happens. I mean, it would be kinda weird if in converting an int to a long, we didn’t get the same number we started with.
Now, let’s take this code:
int myInt = 0xFFFFFFFF;
long myLong = myInt;
System.out.println(myLong);
What does this print? Well, 0xFFFFFFFF is the hexadecimal version of the signed 32-bit number -1. That means that this code is completely equivalent to the above code, so it should (and does) print the same value, -1.
But the value -1, encoded as a long, doesn’t have representation 0x00000000FFFFFFFF. That would be 232 - 1, not -1. Rather, since it’s 64 bits long, -1 is represented as 0xFFFFFFFFFFFFFFFFF. Oops - all the upper bits just got activated! That makes it not very effective as a bitmask.
The rule in Java is that if you convert an int to a long, if the very first bit of the int is 1, then all 32 upper bits of the long will get set to 1 as well. That’s in place so that converting an integer to a long preserves the numeric value.
If you want to make a bitmask that’s actually 64 bits long, initialize it with a long literal rather than an int literal:
mask = 0xFFFFFFFFL; // note the L
Why does this make a difference? Without the L, Java treats the code as
Create the integer value 0xFFFFFFFF = -1, giving 32 one bits.
Convert that integer value into a long. To do so, use sign extension to convert it to the long value -1, giving 64 one bits in a row.
However, if you include the L, Java interprets things like this:
Create the long value 0xFFFFFFFF = 232 - 1, which is 32 zero bits followed by 32 one bits.
Assign that value to mask.
Hope this helps!
I am generating modulus and exponent from Java and .NET system but there are differences in both. I need the out come like .NET in java. Java is adding two extra zeros on hex conversion of modulus and in exponent removing 1 zero but .NET is generating correct. Please see below results from .Net and Java.
If I use toString(16) then its generate below results. toString(16) is not adding two zeros in modulus but removing a zero from exponent where .NET add a 0 in exponent and remove two zeros from modulus which is I want.
String modlusHexString = publicKey.getModulus().toString(16).toUpperCase();
String exponentHexString = publicKey.getPublicExponent().toString(16).toUpperCase();
ModlusHex toString(16): D9B4E023A7CEF604499E184CBA7B7847FE35A824D15FF902EADB952FB54620158A564EFDDB0A66A7647CBDB339359BF6756F5851A73CC1D24859A064DD7AE30B2330F965C54682B10E886D35FE005F42B056C7ABF64D3F6D592AEDC0234417507A0A1432E51C7867E3ACC4867A1AE03EF9E62050180882B18771D5703C8BADCB3AC767CD1A1C0F9344F10B8C82EF5D0ACA4422512EA3ECCB5B71097BDEDAD9BBBE11697D1E61814CF3BBDEB48BDC2C95AA10DFC3F7F794E307D49B5455F928A9BB3ED2F28D6E2974238EFB2D9A822EC1832177CB988206204DF1D9DB7D291E2816576BEBD669184894B526F0B5D10C7D19FA67E79DADDF97D4A3082D4812A27B
ExponentHex toString(16): 10001
I tried below method also to convert BigInteger of modulus and exponent to Hex but no luck-
static String toHex(byte[] ba) {
StringBuilder hex = new StringBuilder(ba.length * 2);
for (byte b : ba) {
hex.append(String.format("%02x", b, 0xff));
}
return hex.toString().toUpperCase();
}
Modlus Hex: 00D9B4E023A7CEF604499E184CBA7B7847FE35A824D15FF902EADB952FB54620158A564EFDDB0A66A7647CBDB339359BF6756F5851A73CC1D24859A064DD7AE30B2330F965C54682B10E886D35FE005F42B056C7ABF64D3F6D592AEDC0234417507A0A1432E51C7867E3ACC4867A1AE03EF9E62050180882B18771D5703C8BADCB3AC767CD1A1C0F9344F10B8C82EF5D0ACA4422512EA3ECCB5B71097BDEDAD9BBBE11697D1E61814CF3BBDEB48BDC2C95AA10DFC3F7F794E307D49B5455F928A9BB3ED2F28D6E2974238EFB2D9A822EC1832177CB988206204DF1D9DB7D291E2816576BEBD669184894B526F0B5D10C7D19FA67E79DADDF97D4A3082D4812A27B
Exponent : 010001
Following is .NET generated HEX of public key modulus and exponent which is correct
.NET
Modulus HEX:
F86020AFD75A03911BE8818BCB506B5DAC2760C68BB46F2A53E10E3E0972A7FFFFF71EAC0E0D73E8FDB4C332C759E781E54C0F1F4637656E6E995873B9580027F49606811C7B5DB458C1BAC3E3D6EB7B77BFE1E55A822F23797E4CB27C7BD88C1B782AEF5235DAE55B937ABB0FFD30AF64F8D69DA07946441D9E4704FA98BD026E00A92851FABC8AB347AB75615ACC8A7CCFDE56B0797DDB70FCA1F28F23F86548AABE6DD89B5CC859BC6352D077F765AAFCF15695215850A8D7E1F9DF187AE0EF1934E096B739E884757F810B3320EFA72BBE2A957CE465E2010A5FD9C96A5F6456658D3BA0DF51B472AFEBEF31C3609B58C6A03C671DA33650039822465179
Exponent : 010001
The problem you are facing results from the behavior of the methods BigInteger.toString(int radix) and BigInteger.toByteArray().
When you call the BigInteger.toString(int radix) method, it returns only the significant digits of the number. So if the value is supposed to be, for example, 05ABFF, it returns only 5ABFF. This is natural when the radix is 10 (we don't expect the big integer 13 to have be converted to something like 013), but this is somewhat counter-intuitive when the radix is 16, as you expect the output to have an even length, exactly two characters for each byte. But that's not how it works.
But when you call your own toHex() method, it is based on the value returned from BigInteger.toByteArray(). Here you have your other problem. This method always returns the number of bytes necessary to represent the number, including a sign bit. Now consider the number 0xD9B4E023. This is actually a negative number if it is considered an integer, but if it is considered as positive by BigInt, you need an extra byte that represents the sign. Hence the additional byte that translates to 00 in your method.
I can think of two possible solutions:
static String toHex(byte[] ba) {
StringBuilder hex = new StringBuilder(ba.length * 2);
boolean skipZeroBytes = true;
for (byte b : ba) {
// As soon as we hit the first non-zero byte, we stop skipping bytes
if (b != 0) {
skipZeroBytes = false;
}
// If the current byte is zero, and we are in skipping mode, skip
if (skipZeroBytes) {
continue;
}
hex.append(String.format("%02X", b, 0xff));
}
if (skipZeroBytes) {
// If we are still in skipping mode, it means all the bytes in the
// array were zero and we skipped them all. So just return the
// representation of a zero.
return "00";
} else {
return hex.toString();
}
}
What we do here is skip all the initial zero bytes until we hit the first non-zero byte, and only then we start interpreting it. Small note: using the format %02X with a capital X gives you uppercase hexadecimal digits and saves the need to call toUpperCase() later.
The other, simpler method is to add the missing zero to the result of BigInteger.toString(int radix):
static String toHex2(BigInteger bi) {
String hex = bi.toString(16).toUpperCase();
if (hex.length() % 2 == 1) {
return "0" + hex;
} else {
return hex;
}
}
byte s[] = getByteArray()
for(.....)
Integer.toHexString((0x000000ff & s[i]) | 0xffffff00).substring(6);
I understand that you are trying to convert the byte into hex string. What I don't understand is how that is done. For instance if s[i] was 00000001 (decimal 1) than could you please explain:
Why 0x000000ff & 00000001 ? Why not directly use 00000001?
Why result from #1 | 0xffffff00?
Finally why substring(6) is applied?
Thanks.
It's basically because bytes are signed in Java. If you promote a byte to an int, it will sign extend, meaning that the byte 0xf2 will become 0xfffffff2. Sign extension is a method to keep the value the same when widening it, by copying the most significant (sign) bit into all the higher-order bits. Both those values above are -14 in two's complement notation. If instead you had widened 0xf2 to 0x000000f2, it would be 242, probably not what you want.
So the & operation is to strip off any of those extended bits, leaving only the least significant 8 bits. However, since you're going to be forcing those bits to 1 in the next step anyway, this step seems a bit of a waste.
The | operation following that will force all those upper bits to be 1 so that you're guaranteed to get an 8-character string from ffffff00 through ffffffff inclusive (since toHexString doesn't give you leading zeroes, it would translate 7 into "7" rather than the "07" that you want).
The substring(6) is then applied so that you only get the last two of those eight hex digits.
It seems a very convoluted way of ensuring you get a two-character hex string to me when you can just use String.format ("%02x", s[i]). However, it's possible that this particular snippet of code may predate Java 5 when String.format was introduced.
If you run the following program:
public class testprog {
public static void compare (String s1, String s2) {
if (!s1.equals(s2))
System.out.println ("Different: " + s1 + " " + s2);
}
public static void main(String args[]) {
byte b = -128;
while (b < 127) {
compare (
Integer.toHexString((0x000000ff & b) | 0xffffff00).substring(6),
String.format("%02x", b, args));
b++;
}
compare (
Integer.toHexString((0x000000ff & b) | 0xffffff00).substring(6),
String.format("%02x", b, args));
System.out.println ("Done");
}
}
you'll see that the two expressions are identical - it just spits out Done since the two expressions produce the same result in all cases.
I'm currently looking at a simple programming problem that might be fun to optimize - at least for anybody who believes that programming is art :) So here is it:
How to best represent long's as Strings while keeping their natural order?
Additionally, the String representation should match ^[A-Za-z0-9]+$. (I'm not too strict here, but avoid using control characters or anything that might cause headaches with encodings, is illegal in XML, has line breaks, or similar characters that will certainly cause problems)
Here's a JUnit test case:
#Test
public void longConversion() {
final long[] longs = { Long.MIN_VALUE, Long.MAX_VALUE, -5664572164553633853L,
-8089688774612278460L, 7275969614015446693L, 6698053890185294393L,
734107703014507538L, -350843201400906614L, -4760869192643699168L,
-2113787362183747885L, -5933876587372268970L, -7214749093842310327L, };
// keep it reproducible
//Collections.shuffle(Arrays.asList(longs));
final String[] strings = new String[longs.length];
for (int i = 0; i < longs.length; i++) {
strings[i] = Converter.convertLong(longs[i]);
}
// Note: Comparator is not an option
Arrays.sort(longs);
Arrays.sort(strings);
final Pattern allowed = Pattern.compile("^[A-Za-z0-9]+$");
for (int i = 0; i < longs.length; i++) {
assertTrue("string: " + strings[i], allowed.matcher(strings[i]).matches());
assertEquals("string: " + strings[i], longs[i], Converter.parseLong(strings[i]));
}
}
and here are the methods I'm looking for
public static class Converter {
public static String convertLong(final long value) {
// TODO
}
public static long parseLong(final String value) {
// TODO
}
}
I already have some ideas on how to approach this problem. Still, I though I might get some nice (creative) suggestions from the community.
Additionally, it would be nice if this conversion would be
as short as possible
easy to implement in other languages
EDIT: I'm quite glad to see that two very reputable programmers ran into the same problem as I did: using '-' for negative numbers can't work as the '-' doesn't reverse the order of sorting:
-0001
-0002
0000
0001
0002
Ok, take two:
class Converter {
public static String convertLong(final long value) {
return String.format("%016x", value - Long.MIN_VALUE);
}
public static long parseLong(final String value) {
String first = value.substring(0, 8);
String second = value.substring(8);
long temp = (Long.parseLong(first, 16) << 32) | Long.parseLong(second, 16);
return temp + Long.MIN_VALUE;
}
}
This one takes a little explanation. Firstly, let me demonstrate that it is reversible and the resultant conversions should demonstrate the ordering:
for (long aLong : longs) {
String out = Converter.convertLong(aLong);
System.out.printf("%20d %16s %20d\n", aLong, out, Converter.parseLong(out));
}
Output:
-9223372036854775808 0000000000000000 -9223372036854775808
9223372036854775807 ffffffffffffffff 9223372036854775807
-5664572164553633853 316365a0e7370fc3 -5664572164553633853
-8089688774612278460 0fbba6eba5c52344 -8089688774612278460
7275969614015446693 e4f96fd06fed3ea5 7275969614015446693
6698053890185294393 dcf444867aeaf239 6698053890185294393
734107703014507538 8a301311010ec412 734107703014507538
-350843201400906614 7b218df798a35c8a -350843201400906614
-4760869192643699168 3dedfeb1865f1e20 -4760869192643699168
-2113787362183747885 62aa5197ea53e6d3 -2113787362183747885
-5933876587372268970 2da6a2aeccab3256 -5933876587372268970
-7214749093842310327 1be00fecadf52b49 -7214749093842310327
As you can see Long.MIN_VALUE and Long.MAX_VALUE (the first two rows) are correct and the other values basically fall in line.
What is this doing?
Assuming signed byte values you have:
-128 => 0x80
-1 => 0xFF
0 => 0x00
1 => 0x01
127 => 0x7F
Now if you add 0x80 to those values you get:
-128 => 0x00
-1 => 0x7F
0 => 0x80
1 => 0x81
127 => 0xFF
which is the correct order (with overflow).
Basically the above is doing that with 64 bit signed longs instead of 8 bit signed bytes.
The conversion back is a little more roundabout. You might think you can use:
return Long.parseLong(value, 16);
but you can't. Pass in 16 f's to that function (-1) and it will throw an exception. It seems to be treating that as an unsigned hex value, which long cannot accommodate. So instead I split it in half and parse each piece, combining them together, left-shifting the first half by 32 bits.
EDIT: Okay, so just adding the negative sign for negative numbers doesn't work... but you could convert the value into an effectively "unsigned" long such that Long.MIN_VALUE maps to "0000000000000000", and Long.MAX_VALUE maps to "FFFFFFFFFFFFFFFF". Harder to read, but will get the right results.
Basically you just need to add 2^63 to the value before turning it into hex - but that may be a slight pain to do in Java due to it not having unsigned longs... it may be easiest to do using BigInteger:
private static final BigInteger OFFSET = BigInteger.valueOf(Long.MIN_VALUE)
.negate();
public static String convertLong(long value) {
BigInteger afterOffset = BigInteger.valueOf(value).add(OFFSET);
return String.format("%016x", afterOffset);
}
public static long parseLong(String text) {
BigInteger beforeOffset = new BigInteger(text, 16);
return beforeOffset.subtract(OFFSET).longValue();
}
That wouldn't be terribly efficient, admittedly, but it works with all your test cases.
If you don't need a printable String, you can encode the long in four chars after you've shifted the value by Long.MIN_VALUE (-0x80000000) to emulate an unsigned long:
public static String convertLong(long value) {
value += Long.MIN_VALUE;
return "" +
(char)(value>>48) + (char)(value>>32) +
(char)(value>>16) + (char)value;
}
public static long parseLong(String value) {
return (
(((long)value.charAt(0))<<48) +
(((long)value.charAt(1))<<32) +
(((long)value.charAt(2))<<16) +
(long)value.charAt(3)) + Long.MIN_VALUE;
}
Usage of surrogate pairs is not a problem, since the natural order of a string is defined by the UTF-16 values in its chars and not by the UCS-2 codepoint values.
There's a technique in RFC2550 -- an April 1st joke RFC about the Y10K problem with 4-digit dates -- that could be applied to this purpose. Essentially, each time the integer's string representation grows to require another digit, another letter or other (printable) character is prepended to retain desired sort-order. The negative rules are more arcane, yielding strings that are harder to read at a glance... but still easy enough to apply in code.
Nicely, for positive numbers, they're still readable.
See:
http://www.faqs.org/rfcs/rfc2550.html