Java parsing long from string

Java parsing long from string - java

I'm currently trying to parse some long values stored as Strings in java, the problem I have is this:
String test = "fffff8000261e000"
long number = Long.parseLong(test, 16);
This throws a NumberFormatException:
java.lang.NumberFormatException: For input string: "fffff8000261e000"
However, if I knock the first 'f' off the string, it parses it fine.
I'm guessing this is because the number is large and what I'd normally do is put an 'L' on the end of the long to fix that problem. I can't however work out the best way of doing that when parsing a long from a string.
Can anyone offer any advice?
Thanks

There's two different ways of answering your question, depending on exactly what sort of behavior you're really looking for.
Answer #1: As other people have pointed out, your string (interpreted as a positive hexadecimal integer) is too big for the Java long type. So if you really need (positive) integers that big, then you'll need to use a different type, perhaps java.math.BigInteger, which also has a constructor taking a String and a radix.
Answer #2: I wonder, though, if your string represents the "raw" bytes of the long. In your example it would represent a negative number. If that's the case, then Java's built-in long parser doesn't handle values where the high bit is set (i.e. where the first digit of a 16 digit string is greater than 7).
If you're in case #2, then here is one (pretty inefficient) way of handling it:
String test = "fffff8000261e000";
long number = new java.math.BigInteger(test, 16).longValue();
which produces the value -8796053053440. (If your string is more than 16 hex digits long, it would silently drop any higher bits.)
If efficiency is a concern, you could write your own bit-twiddling routine that takes the hex digits off the end of the string two at a time, perhaps building a byte array, then converting to long. Some similar code is here:
How to convert a Java Long to byte[] for Cassandra?

The primitive long variable can hold values in the range from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 inclusive.
The calculation shows that fffff8000261e000 hexademical is 18,446,735,277,656,498,176 decimal, which is obviously out of bounds. Instead, fffff8000261e000 hexademical is 1,152,912,708,553,793,536 decimal, which is as obviously within bounds.
As everybody here proposed, use BigInteger to account for such cases. For example, BigInteger bi = new BigInteger("fffff8000261e000", 16); will solve your problem. Also, new java.math.BigInteger("fffff8000261e000", 16).toString() will yield 18446735277656498176 exactly.

The number you are parsing is too large to fit in a java Long. Adding an L wouldn't help. If Long had been an unsigned data type, it would have fit.
One way to cope is to divide the string in two parts and then use bit shift when adding them together:
String s= "fffff8000261e000";
long number;
long n1, n2;
if (s.length() < 16) {
number = Long.parseLong(s, 16);
}
else {
String s1 = s.substring(0, 1);
String s2 = s.substring(1, s.length());
n1=Long.parseLong(s1, 16) << (4 * s2.length());
n2= Long.parseLong(s2, 16);
number = (Long.parseLong(s1, 16) << (4 * s2.length())) + Long.parseLong(s2, 16);
System.out.println( Long.toHexString(n1));
System.out.println( Long.toHexString(n2));
System.out.println( Long.toHexString(number));
}
Note:
If the number is bigger than Long.MAX_VALUE the resulting long will be a negative value, but the bit pattern will match the input.

Related

Why am I getting a number format exception when the data type accepts the values im parsing?

I do not want the answer, I would just like guidance and for someone to point to why my code is not performing as expected
My task is to flip an integer into binary, reformat the binary to a 32 bit number and then return the unsigned integer. So far my code successfully makes the conversions and flips the bits however I am getting a NumberFormatException when I attempt to parse the string value into a long that ill convert to an unsigned integer.
What is the issue with my code? What have I got misconstrued here? I know there are loads of solutions to this problem online but I prefer working things out my own way?
Could I please get some guidance? Thank you
public class flippingBits {
public static void main(String[] args) {
//change the number to bits
long bits = Long.parseLong(Long.toBinaryString(9));
String tempBits = String.valueOf(bits);
//reformat so that you get 32 bits
tempBits = String.format("%" + (32) + "s", tempBits).replace(" ", "0");
//flip the bits
tempBits = tempBits.replace("1", "5");
tempBits = tempBits.replace("0", "1");
tempBits = tempBits.replace("5", "0");
//Pass this to a long data type so that you can then eventually convert the new bits
// to an unsigned integer
long backToNum = Long.parseLong(tempBits);
}
}

You're directly parsing the bits into a long value instead of converting the bits into an equivalent value.
You need to use the following method (Long.parseUnsignedLong()):
long backToNum = Long.parseUnsignedLong(tempBits, 2); //output: 4294967286
The second argument represents radix:
To interpret a number written in a particular representation, it is necessary to know the radix or base of that representation. This allows the number to be converted into a real value.
See the representation of each radix (From Wikipedia):

Why am I not able to mask 32 bits on a long data type in Java

I cannot figure out why this works. I am attempting to mask the least significant 32 bits of java on a long but it does not properly AND the 33rd and 34th bit and further. Here is my example
class Main {
public static void main(String[] args) {
long someVal = 17592096894893l; //hex 0xFFFFAAFAFAD
long mask = 0xFF; //binary
long result = mask & someVal;
System.out.println("Example 1 this works on one byte");
System.out.printf("\n%x %s", someVal, Long.toBinaryString(someVal) );
System.out.printf("\n%x %s", result, Long.toBinaryString(result) );
long someVal2 = 17592096894893l; //hex 0xFFFFAAFAFAD
mask = 0xFFFFFFFF; //binary
result = mask & someVal2;
System.out.println("\nExample 2 - this does not work");
System.out.printf("\n%x %s", someVal2, Long.toBinaryString(someVal2) );
System.out.printf("\n%x %s", result, Long.toBinaryString(result) );
}
}
I was expecting the results to drop the most significant byte to be a zero since the AND operation did it on 32 bits. Here is the output I get.
Example 1 - this works
ffffaafafad 11111111111111111010101011111010111110101101
ad 10101101
Example 2 - this does not work
ffffaafafad 11111111111111111010101011111010111110101101
ffffaafafad 11111111111111111010101011111010111110101101
I would like to be able to mask the first least significant 4 bytes of the long value.

I believe what you’re seeing here is the fact that Java converts integers to longs using sign extension.
For starters, what should this code do?
int myInt = -1;
long myLong = myInt;
System.out.println(myLong);
This should intuitively print out -1, and that’s indeed what happens. I mean, it would be kinda weird if in converting an int to a long, we didn’t get the same number we started with.
Now, let’s take this code:
int myInt = 0xFFFFFFFF;
long myLong = myInt;
System.out.println(myLong);
What does this print? Well, 0xFFFFFFFF is the hexadecimal version of the signed 32-bit number -1. That means that this code is completely equivalent to the above code, so it should (and does) print the same value, -1.
But the value -1, encoded as a long, doesn’t have representation 0x00000000FFFFFFFF. That would be 232 - 1, not -1. Rather, since it’s 64 bits long, -1 is represented as 0xFFFFFFFFFFFFFFFFF. Oops - all the upper bits just got activated! That makes it not very effective as a bitmask.
The rule in Java is that if you convert an int to a long, if the very first bit of the int is 1, then all 32 upper bits of the long will get set to 1 as well. That’s in place so that converting an integer to a long preserves the numeric value.
If you want to make a bitmask that’s actually 64 bits long, initialize it with a long literal rather than an int literal:
mask = 0xFFFFFFFFL; // note the L
Why does this make a difference? Without the L, Java treats the code as
Create the integer value 0xFFFFFFFF = -1, giving 32 one bits.
Convert that integer value into a long. To do so, use sign extension to convert it to the long value -1, giving 64 one bits in a row.
However, if you include the L, Java interprets things like this:
Create the long value 0xFFFFFFFF = 232 - 1, which is 32 zero bits followed by 32 one bits.
Assign that value to mask.
Hope this helps!

Represent long in least amount of characters

I need to represent both very large and small numbers in the shortest string possible. The numbers are unsigned. I have tried just straight Base64 encode, but for some smaller numbers, the encoded string is longer than just storing the number as a string. What would be the best way to most optimally store a very large or short number in the shortest string possible with it being URL safe?

I have tried just straight Base64 encode, but for some smaller numbers, the encoded string is longer than just storing the number as a string
Base64 encoding of binary byte data will make it longer, by about a third. It is not supposed to make it shorter, but to allow safe transport of binary data in formats that are not binary safe.
However, base 64 is more compact than decimal representation of a number (or of byte data), even if it is less compact than base 256 (the raw byte data). Encoding your numbers in base 64 directly will make them more compact than decimal. This will do it:
private static final String base64Chars =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
static String encodeNumber(long x) {
char[] buf = new char[11];
int p = buf.length;
do {
buf[--p] = base64Chars.charAt((int)(x % 64));
x /= 64;
} while (x != 0);
return new String(buf, p, buf.length - p);
}
static long decodeNumber(String s) {
long x = 0;
for (char c : s.toCharArray()) {
int charValue = base64Chars.indexOf(c);
if (charValue == -1) throw new NumberFormatException(s);
x *= 64;
x += charValue;
}
return x;
}
Using this encoding scheme, Long.MAX_VALUE will be the string H__________, which is 11 characters long, compared to its decimal representation 9223372036854775807 which is 19 characters long. Numbers up to about 16 million will fit in a mere 4 characters. That's about as short as you'll get it. (Technically there are two other characters which do not need to be encoded in URLs: . and ~. You can incorporate those to get base 66, which would be a smidgin shorter for some numbers, although that seems a bit pedantic.)

To extend on Stephen C's answer, here is a piece of code to convert to base 62 (but you can increase this by adding more characters to the digits String (just pick what characters are valid for you):
public static String toString(long n) {
String digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
int base = digits.length();
String s = "";
while (n > 0) {
long d = n % base;
s = digits.charAt(d) + s;
n = n / base;
}
return s;
}
This will never result in the string representation being longer than the digit one.

Assuming that you don't do any compression, and that you restrict yourself to URL safe characters, then the following procedure will give you the most compact encoding possible.
Make a list of all URL safe characters
Count them. Suppose you have N.
Represent your number in base N, representing 0 by the first character, 1 by the 2nd and so on.
So, what about compression ...
If you assume that the numbers you are representing are uniformly distributed across their range, then there is no real opportunity for compression.
Otherwise, there is potential for compression. If you can reduce the size of the common numbers then you can typically achieve a saving by compression. This is how Huffman encoding works.
But the downside is that compression at this level is not perfect across the range of numbers. It reduces the size of some numbers, but it inevitably increases the size of others.
So what does this mean for your use-case?
I think it means that you are looking at the problem the wrong way. You should not be aiming for a minimal encoded size for every number. You should be aiming to minimize the size on average ... averaged over the actual distribution of your numbers.

Java code to convert from Base-10 to Base-9

How to convert a long number in base 10 to base 9 without converting to string ?

FWIW, all values are actually in base 2 inside your machine (I bet you already knew that). It only shows up as base 10 because string conversion creates string representations in base 10 (e.g. when you print), because methods like parseLong assumes the input string is in base 10 and because the compiler expects all literals to be in base 10 when you actually write code. In other words, everything is in binary, the computer only converts stuff into and from base 10 for the convenience of us humans.
It follows that we should be easily able to change the output base to be something other than 10, and hence get string representations for the same value in base 9. In Java this is done by passing an optional extra base parameter into the Long.toString method.
long x=10;
System.out.println(Long.toString(x,9));

Long base10 = 10;
Long.valueOf(base10.toString(), 9);

What does "convert to base 9 without converting to string" actually mean?
Base-9, base-10, base-2 (binary), base-16 (hexadecimal), are just ways to represent numbers. The value itself does not depend on how you represent it. int x = 256 is exactly the same as int x = 0xff as far as the compiler is concerned.
If you don't want to "convert to string" (I read this as meaning you are not concerned with the representation of the value), then what do you want to do exactly?

You can't convert to base 9 without converting to string.
When you write
Long a = 123;
you're making the implicit assumption that it's in base 10. If you want to interpret that as a base 9 number that's fine, but there's no way Java (or any other language I know of) is suddenly going to see it that way and so 8+1 will return 9 and not 10. There's native support for base 2, 8, 16 and 10 but for any other base you'll have to treat it as a string. (And then, if you're sure you want this, convert it back to a long)

You have to apply the algorithm that converts number from one base to another by applying repeated modulo operations. Look here for a Java implementation. I report here the code found on that site. The variable M must contain the number to be converted, and N is the new base.
Caveat: for the snippet to work properly, N>=1 && N<=10 must be true. The extension with N>10 is left to the interested reader (you have to use letters instead of digits).
String Conversion(int M, int N) // return string, accept two integers
{
Stack stack = new Stack(); // create a stack
while (M >= N) // now the repetitive loop is clearly seen
{
stack.push(M mod N); // store a digit
M = M/N; // find new M
}
// now it's time to collect the digits together
String str = new String(""+M); // create a string with a single digit M
while (stack.NotEmpty())
str = str+stack.pop() // get from the stack next digit
return str;
}

If you LITERALLY can do anything but convert to string do the following:
public static long toBase(long num, int base) {
long result;
StringBuilder buffer = new StringBuilder();
buffer.append(Long.toString(num, base));
return Long.parseLong(buffer.toString());
}

How to convert Java long's as Strings while keeping natural order

I'm currently looking at a simple programming problem that might be fun to optimize - at least for anybody who believes that programming is art :) So here is it:
How to best represent long's as Strings while keeping their natural order?
Additionally, the String representation should match ^[A-Za-z0-9]+$. (I'm not too strict here, but avoid using control characters or anything that might cause headaches with encodings, is illegal in XML, has line breaks, or similar characters that will certainly cause problems)
Here's a JUnit test case:
#Test
public void longConversion() {
final long[] longs = { Long.MIN_VALUE, Long.MAX_VALUE, -5664572164553633853L,
-8089688774612278460L, 7275969614015446693L, 6698053890185294393L,
734107703014507538L, -350843201400906614L, -4760869192643699168L,
-2113787362183747885L, -5933876587372268970L, -7214749093842310327L, };
// keep it reproducible
//Collections.shuffle(Arrays.asList(longs));
final String[] strings = new String[longs.length];
for (int i = 0; i < longs.length; i++) {
strings[i] = Converter.convertLong(longs[i]);
}
// Note: Comparator is not an option
Arrays.sort(longs);
Arrays.sort(strings);
final Pattern allowed = Pattern.compile("^[A-Za-z0-9]+$");
for (int i = 0; i < longs.length; i++) {
assertTrue("string: " + strings[i], allowed.matcher(strings[i]).matches());
assertEquals("string: " + strings[i], longs[i], Converter.parseLong(strings[i]));
}
}
and here are the methods I'm looking for
public static class Converter {
public static String convertLong(final long value) {
// TODO
}
public static long parseLong(final String value) {
// TODO
}
}
I already have some ideas on how to approach this problem. Still, I though I might get some nice (creative) suggestions from the community.
Additionally, it would be nice if this conversion would be
as short as possible
easy to implement in other languages
EDIT: I'm quite glad to see that two very reputable programmers ran into the same problem as I did: using '-' for negative numbers can't work as the '-' doesn't reverse the order of sorting:
-0001
-0002
0000
0001
0002

Ok, take two:
class Converter {
public static String convertLong(final long value) {
return String.format("%016x", value - Long.MIN_VALUE);
}
public static long parseLong(final String value) {
String first = value.substring(0, 8);
String second = value.substring(8);
long temp = (Long.parseLong(first, 16) << 32) | Long.parseLong(second, 16);
return temp + Long.MIN_VALUE;
}
}
This one takes a little explanation. Firstly, let me demonstrate that it is reversible and the resultant conversions should demonstrate the ordering:
for (long aLong : longs) {
String out = Converter.convertLong(aLong);
System.out.printf("%20d %16s %20d\n", aLong, out, Converter.parseLong(out));
}
Output:
-9223372036854775808 0000000000000000 -9223372036854775808
9223372036854775807 ffffffffffffffff 9223372036854775807
-5664572164553633853 316365a0e7370fc3 -5664572164553633853
-8089688774612278460 0fbba6eba5c52344 -8089688774612278460
7275969614015446693 e4f96fd06fed3ea5 7275969614015446693
6698053890185294393 dcf444867aeaf239 6698053890185294393
734107703014507538 8a301311010ec412 734107703014507538
-350843201400906614 7b218df798a35c8a -350843201400906614
-4760869192643699168 3dedfeb1865f1e20 -4760869192643699168
-2113787362183747885 62aa5197ea53e6d3 -2113787362183747885
-5933876587372268970 2da6a2aeccab3256 -5933876587372268970
-7214749093842310327 1be00fecadf52b49 -7214749093842310327
As you can see Long.MIN_VALUE and Long.MAX_VALUE (the first two rows) are correct and the other values basically fall in line.
What is this doing?
Assuming signed byte values you have:
-128 => 0x80
-1 => 0xFF
0 => 0x00
1 => 0x01
127 => 0x7F
Now if you add 0x80 to those values you get:
-128 => 0x00
-1 => 0x7F
0 => 0x80
1 => 0x81
127 => 0xFF
which is the correct order (with overflow).
Basically the above is doing that with 64 bit signed longs instead of 8 bit signed bytes.
The conversion back is a little more roundabout. You might think you can use:
return Long.parseLong(value, 16);
but you can't. Pass in 16 f's to that function (-1) and it will throw an exception. It seems to be treating that as an unsigned hex value, which long cannot accommodate. So instead I split it in half and parse each piece, combining them together, left-shifting the first half by 32 bits.

EDIT: Okay, so just adding the negative sign for negative numbers doesn't work... but you could convert the value into an effectively "unsigned" long such that Long.MIN_VALUE maps to "0000000000000000", and Long.MAX_VALUE maps to "FFFFFFFFFFFFFFFF". Harder to read, but will get the right results.
Basically you just need to add 2^63 to the value before turning it into hex - but that may be a slight pain to do in Java due to it not having unsigned longs... it may be easiest to do using BigInteger:
private static final BigInteger OFFSET = BigInteger.valueOf(Long.MIN_VALUE)
.negate();
public static String convertLong(long value) {
BigInteger afterOffset = BigInteger.valueOf(value).add(OFFSET);
return String.format("%016x", afterOffset);
}
public static long parseLong(String text) {
BigInteger beforeOffset = new BigInteger(text, 16);
return beforeOffset.subtract(OFFSET).longValue();
}
That wouldn't be terribly efficient, admittedly, but it works with all your test cases.

If you don't need a printable String, you can encode the long in four chars after you've shifted the value by Long.MIN_VALUE (-0x80000000) to emulate an unsigned long:
public static String convertLong(long value) {
value += Long.MIN_VALUE;
return "" +
(char)(value>>48) + (char)(value>>32) +
(char)(value>>16) + (char)value;
}
public static long parseLong(String value) {
return (
(((long)value.charAt(0))<<48) +
(((long)value.charAt(1))<<32) +
(((long)value.charAt(2))<<16) +
(long)value.charAt(3)) + Long.MIN_VALUE;
}
Usage of surrogate pairs is not a problem, since the natural order of a string is defined by the UTF-16 values in its chars and not by the UCS-2 codepoint values.

There's a technique in RFC2550 -- an April 1st joke RFC about the Y10K problem with 4-digit dates -- that could be applied to this purpose. Essentially, each time the integer's string representation grows to require another digit, another letter or other (printable) character is prepended to retain desired sort-order. The negative rules are more arcane, yielding strings that are harder to read at a glance... but still easy enough to apply in code.
Nicely, for positive numbers, they're still readable.
See:
http://www.faqs.org/rfcs/rfc2550.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.