How can I convert a non-numeric String to an Integer?
I got for instance:
String unique = "FUBAR";
What's a good way to represent the String as an Integer with no collisions e.g. "FUBAR" should always be represented as the same number and shan't collide with any other String. For instance, String a = "A"; should be represented as the Integer 1 and so on, but what is a method that does this (preferrably for all unicode strings, but in my case ASCII values could be sufficient).
This is impossible. Think about it, an Integer can only be 32 bits. So, by the pigeonhole principle, there must exist at least two strings that have the same Integer value no matter what technique you use for conversion. In reality, there are infinite with the same values...
If you're just looking for an efficient mapping, then I suggest that you just use the int returned by hashCode(), which for reference is actually 31 bits.
You can map Strings to unique IDs using table. There is not way to do this generically.
final Map<String, Integer> map = new HashMap<>();
public int idFor(String s) {
Integer id = map.get(s);
if (id == null)
map.put(s, id = map.size());
return id;
}
Note: having unique id's doesn't guarantee no collisions in a hash collection.
http://vanillajava.blogspot.co.uk/2013/10/unique-hashcodes-is-not-enough-to-avoid.html
If you know the character set used in your strings, then you can think of the string as number with base other than 10. For example, hexadecimal numbers contain letters from A to F.
Therefore, if you know that your strings only contain letters from an 8-bit character set, you can treat the string as a 256-base number. In pseudo code this would be:
number n;
for each letter in string
n = 256 * n + (letter's position in character set)
If your character set contains 65535 characters, then just multiply 'n' with that number on each step. But beware, the 32 bits of an integer will be easily overflown. You probably need to use a type that can hold a larger number.
private BigDecimal createBigDecimalFromString(String data)
{
BigDecimal value = BigDecimal.ZERO;
try
{
byte[] tmp = data.getBytes("UTF-8");
int numBytes = tmp.length;
for(int i = numBytes - 1; i >= 0; i--)
{
BigDecimal exponent = new BigDecimal(256).pow(i);
value = value.add(exponent.multiply(new BigDecimal(tmp[i])));
}
}
catch (UnsupportedEncodingException e)
{
}
return value;
}
Maybe a little bit late, but I'm going to give my 10 cents to simplify it (internally is similar to BigDecimal suggested by #Romain Hippeau)
public static BigInteger getNumberId(final String value) {
return new BigInteger(value.getBytes(Charset.availableCharsets().get("UTF-8")));
}
Regardless of the accepted answer, it is possible to represent any String as an Integer by computing that String's Gödelnumber, which is a unique product of prime numbers for every possible String. With that being said it's quite impractical and slow to implement, also for most Strings you would need a BigInteger rather than a normal Integer and to decode a Gödelnumber into its corresponding String you need to have a defined Charset.
Related
public class HelloWorld{
public static void main(String []args){
String str = "100.00";
Short sObj2 = Short.valueOf(str);
System.out.println(sObj2);
}
}
Getting below exception :
Exception in thread "main" java.lang.NumberFormatException: For input string: "100.00"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Short.parseShort(Short.java:118)
at java.lang.Short.valueOf(Short.java:174)
at java.lang.Short.valueOf(Short.java:200)
at HelloWorld.main(HelloWorld.java:5)
How to resolve above issue?
First a Short is not a byte (your question summary indicates you are trying to convert a string to a byte). A Short holds integer values from -32,768 to 32,767 (inclusive). Trying to parse a floating point value into an integer datatype causes this exception.
If you simply want code that will run without an exception, either of the following should work:
public class HelloWorld{
public static void main(String []args){
String str = "100";
Short sObj2 = Short.valueOf(str);
System.out.println(sObj2);
}
}
This first example makes it run by changing the string to an integer value.
or
public class HelloWorld{
public static void main(String []args){
String str = "100.00";
Double sObj2 = Double.valueOf(str);
System.out.println(sObj2);
}
}
This second one works by parsing a string representing a floating point value into a variable type that supports floating points.
Try this
String str = "100";
Short sObj2 = Short.valueOf(str);
or if you want to deal with decimal values,
String str = "100.00";
Float fObj2 = Float.valueOf(str);
To begin with, as your post title suggests, you want to convert from a String data type to a byte data type. This doesn't necessarily include just displaying the a value which doesn't generate a NumberFormatException error. I'm assuming you actually want to work with those particular data types.
To throw a small twist into things, you want to convert all this from a string representation of a numerical value which can be from either a float or double data type ("100.00"). It's this decimal point within the numerical string that throws a glitch into things when doing any conversion and therefore needs to be handled before doing any such thing.
Some things to consider:
As a String, you can represent any number you like in any format you like. It can be as big as you want and it can be as small as you want. I can even be a number that is imaginary or doesn't exist, but the bottom line is....it will always be a String and you can do such things with String data types. Converting a String numerical value to an actual numerical data type such as byte, short, integer, long, double, float, etc is a different ball game altogether. Some String numerical values are easy to convert and yet some require more specific attention to detail.
A byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive).
A short data type is a 16-bit signed two's complement integer. It has a minimum value of -32,768 and a maximum value of 32,767 (inclusive).
The int (integer) data type is a 32-bit signed two's complement integer, which has a minimum value of -2147483648 and a maximum value of 2147483647.
The long data type is a 64-bit two's complement integer. The signed long has a minimum value of -9223372036854775808 and a maximum value of 9223372036854775807.
At the end of it all these four data types all maintain integer values with each data type also maintaining a minimum and maximum of values. You need to also consider this to some extent when doing data type conversions. If you are going to create a conversion method to convert from one data type to another you need to ensure that you do not exceed that minimum and maximum allowable value for the data type you want to convert to. Not a big deal if you want to convert a byte data type to a short data type or a short to an integer since we know that the lesser will always play in the larger but this is not necessarily so when a larger is to play in a lesser (short to byte).
Your conversion method needs to check the value to convert so as to ensure it will actually fit into the desired data type. Java has constants to assist you with this so that you don't have to remember these minimums and maximums, for example: Integer.MIN_VALUE and Integer.MAX_VALUE or Byte.MIN_VALUE and Byte.MAX_VALUE.
When dealing with numerical strings you may also want to ensure that the string you're dealing with is actually a string representation of a numerical value and not a alphanumeric value such as that of a Hexidecimal string or just a plain out entry error whereas a character other than a digit has crept into the string somehow. In my opinion, the string: "100.00" is a string representation of both a alphanumeric value (because of the period) and a numeric value since it is a string representation of a double data type. What it will truly be depends upon how you handle the period (decimal point) in the string within your conversion method.
Let's take another look at that string value ("100.00"). Another thing you may want to consider is, what if our string value was: "100.74"? How do you want to handle this particular value? Do you want to Round Down to 100 or *Round Up to 101 before you convert it to a data type that requires a integer value?
Let's convert the String representation of the value "100.00" to a short data type. Now keep in mind that the methods I provide below by default will always convert a string representation of a double data type (if supplied) downwards, for example 100.45 or 100.99 will be 100. If you want to properly round up or down for this type of value then supply a boolean true in the optional roundUpDown argument:
private short StringToShort(final String input, final boolean... roundUpDown) {
// Make sure there no dead whitespaces...
String inputValue = input.replaceAll(" ", "");
int i = 0; // default return value is 0
// If inputValue contains nothing ("") then return 0
if(inputValue.equals("")) { return 0; }
// Is inputValue an actual numerical value?
// Throws an exception if not.
// Handles negative and decimal point...
if (!inputValue.matches("-?\\d+(\\.\\d+)?")) {
throw new IllegalArgumentException("\nStringToShort() Method Error!\n"
+ "The value supplied is not numeric (" + inputValue + ").\n");
}
// Was the optional roundUpDown argument supplied?
boolean round = false; // default is false
if (roundUpDown.length > 0) { round = roundUpDown[0]; }
// Convert the String to a Integer value
if (inputValue.contains(".")) {
// Must be a double type representation supplied
Double value = Double.parseDouble(inputValue);
if (round) { i = (int) Math.round(value); }
else { i = (int) value.intValue(); }
}
else {
// Must be a Integer type representation supplied
i = Integer.parseInt(inputValue);
}
// Is the Integer value now too small or too
// large to be a Short data type?
if (i > Short.MAX_VALUE || i < Short.MIN_VALUE) {
throw new IllegalArgumentException("\nStringToShort() Method Error!\n"
+ "The value supplied is too small or too large (" + inputValue + ").\n"
+ "Only values from " + Short.MIN_VALUE + " to " + Short.MAX_VALUE
+ " are allowed!\n");
}
// Finally, cast and return a short data type...
return (short) i;
}
If you read all the comments within the code you can see that we've covered all the issues discussed above. Now, according to your post title, you wanted to convert to byte. Well, it's pretty much the very same method but with perhaps five or so small changes done, see if you can spot them:
private byte StringToByte(final String input, final boolean... roundUpDown) {
// Make sure there no dead whitespaces...
String inputValue = input.replaceAll(" ", "");
int i = 0; // default return value is 0
// If inputValue contains nothing ("") then return 0
if(inputValue.equals("")) { return 0; }
// Is inputValue an actual numerical value?
// Throws an exception if not.
// Handles negative and decimal point...
if (!inputValue.matches("-?\\d+(\\.\\d+)?")) {
throw new IllegalArgumentException("\nStringToByte() Method Error!\n"
+ "The value supplied is not numeric (" + inputValue + ").\n");
}
// Was the optional roundUpDown argument supplied?
boolean round = false; // default is false
if (roundUpDown.length > 0) { round = roundUpDown[0]; }
// Convert the String to a Integer value
if (inputValue.contains(".")) {
// Must be a double type representation supplied
Double value = Double.parseDouble(inputValue);
if (round) { i = (int) Math.round(value); }
else { i = (int) value.intValue(); }
}
else {
// Must be a Integer type representation supplied
i = Integer.parseInt(inputValue);
}
// Is the Integer value now too small or too
// large to be a Byte data type?
if (i > Byte.MAX_VALUE || i < Byte.MIN_VALUE) {
throw new IllegalArgumentException("\nStringToByte() Method Error!\n"
+ "The value supplied is too small or too large (" + inputValue + ").\n"
+ "Only values from " + Byte.MIN_VALUE + " to " + Byte.MAX_VALUE
+ " are allowed!\n");
}
// Finally, cast and return a byte data type...
return (byte) i;
}
I need to represent both very large and small numbers in the shortest string possible. The numbers are unsigned. I have tried just straight Base64 encode, but for some smaller numbers, the encoded string is longer than just storing the number as a string. What would be the best way to most optimally store a very large or short number in the shortest string possible with it being URL safe?
I have tried just straight Base64 encode, but for some smaller numbers, the encoded string is longer than just storing the number as a string
Base64 encoding of binary byte data will make it longer, by about a third. It is not supposed to make it shorter, but to allow safe transport of binary data in formats that are not binary safe.
However, base 64 is more compact than decimal representation of a number (or of byte data), even if it is less compact than base 256 (the raw byte data). Encoding your numbers in base 64 directly will make them more compact than decimal. This will do it:
private static final String base64Chars =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
static String encodeNumber(long x) {
char[] buf = new char[11];
int p = buf.length;
do {
buf[--p] = base64Chars.charAt((int)(x % 64));
x /= 64;
} while (x != 0);
return new String(buf, p, buf.length - p);
}
static long decodeNumber(String s) {
long x = 0;
for (char c : s.toCharArray()) {
int charValue = base64Chars.indexOf(c);
if (charValue == -1) throw new NumberFormatException(s);
x *= 64;
x += charValue;
}
return x;
}
Using this encoding scheme, Long.MAX_VALUE will be the string H__________, which is 11 characters long, compared to its decimal representation 9223372036854775807 which is 19 characters long. Numbers up to about 16 million will fit in a mere 4 characters. That's about as short as you'll get it. (Technically there are two other characters which do not need to be encoded in URLs: . and ~. You can incorporate those to get base 66, which would be a smidgin shorter for some numbers, although that seems a bit pedantic.)
To extend on Stephen C's answer, here is a piece of code to convert to base 62 (but you can increase this by adding more characters to the digits String (just pick what characters are valid for you):
public static String toString(long n) {
String digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
int base = digits.length();
String s = "";
while (n > 0) {
long d = n % base;
s = digits.charAt(d) + s;
n = n / base;
}
return s;
}
This will never result in the string representation being longer than the digit one.
Assuming that you don't do any compression, and that you restrict yourself to URL safe characters, then the following procedure will give you the most compact encoding possible.
Make a list of all URL safe characters
Count them. Suppose you have N.
Represent your number in base N, representing 0 by the first character, 1 by the 2nd and so on.
So, what about compression ...
If you assume that the numbers you are representing are uniformly distributed across their range, then there is no real opportunity for compression.
Otherwise, there is potential for compression. If you can reduce the size of the common numbers then you can typically achieve a saving by compression. This is how Huffman encoding works.
But the downside is that compression at this level is not perfect across the range of numbers. It reduces the size of some numbers, but it inevitably increases the size of others.
So what does this mean for your use-case?
I think it means that you are looking at the problem the wrong way. You should not be aiming for a minimal encoded size for every number. You should be aiming to minimize the size on average ... averaged over the actual distribution of your numbers.
Two strings representing two numbers have been provided as input. The numbers in the string can be so large that they may not be represented by the Java data type int. The objective is to compare the two numbers and output that number as a string
for example we have to compare:
"874986754789289867753896798679854698798789857387687546456"
and
"98347598375689758967756458678976893478967586857687569874"
which both are out of range of long and int data types in JAVA
and after comparing we have to output that number as a string
you could start by first looking at each string's length. if one of them is longer and you know they are both unsigned values, the longer string has the bigger number. if they both have the same length, you start comparing the strings char by char starting from left to right. when you found your first bigger digit you conclude that number is bigger.
Whether you're trying to compare or subtract isn't clear, but Java's BigInteger class has both operations. This is probably what you need:
BigInteger a = new BigInteger("874986754789289867753896798679854698798789857387687546456");
BigInteger b = new BigInteger("98347598375689758967756458678976893478967586857687569874");
System.out.println( (a.compareTo(b) > 0 ? "a" : "b") + " is larger.");
If you need to subtract the numbers I'd definitely use BigInteger, but if all you need to do is compare them you could write your own method. One of your input strings is longer than the other, so barring leading zeroes that would tell you right away which is larger. If the strings are the same length you could compare them character by character in a loop to see which is larger. The String charAt and toCharArray methods give you two different ways to implement this approach.
In order to use numbers that big, they must be declared as double. You can use parsers to transform Strings into numbers.
public static void main(String [] args) {
String numberOne = "874986754789289867753896798679854698798789857387687546456";
double parsedNumberOne = Double.parseDouble(numberOne);
String numberTwo = "98347598375689758967756458678976893478967586857687569874";
double parsedNumberTwo = Double.parseDouble(numberTwo);
double result = compareMethod (numberOne, numberTwo);
System.out.println("Result = " + result);
}
I declared the numbers explicitly and not by reading user input, but you can modify that easily to be inline with your code. In the compareMethod, you can compare those number anyway you like.
Regards
Just comparing two strings using compareTo() is more than enough. The comparison will be done by converting each character into its ASCII value.
String number1 = "874986754789289867753896798679854698798789857387687546457";
String number2 = "874986754789289867753896798679854698798789857387687546456";
int result = number1.compareTo(number2);
if (result == 0) {
System.out.println("Both are equal");
} else if (result > 0) {
System.out.println("Number1 is greater");
} else {
System.out.println("Number2 is greater");
}
If it is school assignment in which you can't use built-in solutions but you need to write your own method to compare such strings then you need to
check sign of numbers (if one is negative and one positive then answer is obvious)
remove leading zeroes - 0000123 represents 123
after removing leading zeroes compare length of numbers (longer one among positive numbers is larger)
if both numbers have same length iterate over them from left to right until you will find two different digits (if you will not find any numbers are equal).
Members of String class you may need
toCharArray()
charAt(int index)
length
In case you can use built-in solutions you can use these strings to create BigInteger instances and use its compareTo method which will tell if one number is larger (positive result) equal to (0) or smaller (negative result) than other.
You need to compare characters (converted to numbers) one by one (or group by group) from right to left. Use methods like charAt(int) or substring(int, int) respectively.
I need to generate a unique integer id for a string.
Reason:
I have a database application that can run on different databases. This databases contains parameters with parameter types that are generated from external xml data.
the current situation is that i use the ordinal number of the Enum. But when a parameter is inserted or removed, the ordinals get mixed up:
(FOOD = 0 , TOYS = 1) <--> (FOOD = 0, NONFOOD = 1, TOYS = 2)
The ammount of Parameter types is between 200 and 2000, so i am scared a bit using hashCode() for a string.
P.S.: I am using Java.
Thanks a lot
I would use a mapping table in the database to map these Strings to an auto increment value. These mapping should then be cached in the application.
Use a cryptographic hash. MD5 would probably be sufficient and relatively fast. It will be unique enough for your set of input.
How can I generate an MD5 hash?
The only problem is that the hash is 128 bits, so a standard 64-bit integer won't hold it.
If you need to be absolute certain that the id are unique (no collissions) and your strings are up to 32 chars, and your number must be of no more than 10 digits (approx 32 bits), you obviously cannot do it by a one way function id=F(string).
The natural way is to keep some mapping of the string to unique numbers (typically a sequence), either in the DB or in the application.
If you know the type of string values (length, letter patterns), you can count the total number of strings in this set and if it fits within 32 bits, the count function is your integer value.
Otherwise, the string itself is your integer value (integer in math terms, not Java).
By Enum you mean a Java Enum? Then you could give each enum value a unique int by your self instead of using its ordinal number:
public enum MyEnum {
FOOD(0),
TOYS(1),
private final int id;
private MyEnum(int id)
{
this.id = id;
}
}
I came across this post that's sensible: How to convert string to unique identifier in Java
In it the author describes his implementation:
public static long longHash(String string) {
long h = 98764321261L;
int l = string.length();
char[] chars = string.toCharArray();
for (int i = 0; i < l; i++) {
h = 31*h + chars[i];
}
return h;
}
I'm currently looking at a simple programming problem that might be fun to optimize - at least for anybody who believes that programming is art :) So here is it:
How to best represent long's as Strings while keeping their natural order?
Additionally, the String representation should match ^[A-Za-z0-9]+$. (I'm not too strict here, but avoid using control characters or anything that might cause headaches with encodings, is illegal in XML, has line breaks, or similar characters that will certainly cause problems)
Here's a JUnit test case:
#Test
public void longConversion() {
final long[] longs = { Long.MIN_VALUE, Long.MAX_VALUE, -5664572164553633853L,
-8089688774612278460L, 7275969614015446693L, 6698053890185294393L,
734107703014507538L, -350843201400906614L, -4760869192643699168L,
-2113787362183747885L, -5933876587372268970L, -7214749093842310327L, };
// keep it reproducible
//Collections.shuffle(Arrays.asList(longs));
final String[] strings = new String[longs.length];
for (int i = 0; i < longs.length; i++) {
strings[i] = Converter.convertLong(longs[i]);
}
// Note: Comparator is not an option
Arrays.sort(longs);
Arrays.sort(strings);
final Pattern allowed = Pattern.compile("^[A-Za-z0-9]+$");
for (int i = 0; i < longs.length; i++) {
assertTrue("string: " + strings[i], allowed.matcher(strings[i]).matches());
assertEquals("string: " + strings[i], longs[i], Converter.parseLong(strings[i]));
}
}
and here are the methods I'm looking for
public static class Converter {
public static String convertLong(final long value) {
// TODO
}
public static long parseLong(final String value) {
// TODO
}
}
I already have some ideas on how to approach this problem. Still, I though I might get some nice (creative) suggestions from the community.
Additionally, it would be nice if this conversion would be
as short as possible
easy to implement in other languages
EDIT: I'm quite glad to see that two very reputable programmers ran into the same problem as I did: using '-' for negative numbers can't work as the '-' doesn't reverse the order of sorting:
-0001
-0002
0000
0001
0002
Ok, take two:
class Converter {
public static String convertLong(final long value) {
return String.format("%016x", value - Long.MIN_VALUE);
}
public static long parseLong(final String value) {
String first = value.substring(0, 8);
String second = value.substring(8);
long temp = (Long.parseLong(first, 16) << 32) | Long.parseLong(second, 16);
return temp + Long.MIN_VALUE;
}
}
This one takes a little explanation. Firstly, let me demonstrate that it is reversible and the resultant conversions should demonstrate the ordering:
for (long aLong : longs) {
String out = Converter.convertLong(aLong);
System.out.printf("%20d %16s %20d\n", aLong, out, Converter.parseLong(out));
}
Output:
-9223372036854775808 0000000000000000 -9223372036854775808
9223372036854775807 ffffffffffffffff 9223372036854775807
-5664572164553633853 316365a0e7370fc3 -5664572164553633853
-8089688774612278460 0fbba6eba5c52344 -8089688774612278460
7275969614015446693 e4f96fd06fed3ea5 7275969614015446693
6698053890185294393 dcf444867aeaf239 6698053890185294393
734107703014507538 8a301311010ec412 734107703014507538
-350843201400906614 7b218df798a35c8a -350843201400906614
-4760869192643699168 3dedfeb1865f1e20 -4760869192643699168
-2113787362183747885 62aa5197ea53e6d3 -2113787362183747885
-5933876587372268970 2da6a2aeccab3256 -5933876587372268970
-7214749093842310327 1be00fecadf52b49 -7214749093842310327
As you can see Long.MIN_VALUE and Long.MAX_VALUE (the first two rows) are correct and the other values basically fall in line.
What is this doing?
Assuming signed byte values you have:
-128 => 0x80
-1 => 0xFF
0 => 0x00
1 => 0x01
127 => 0x7F
Now if you add 0x80 to those values you get:
-128 => 0x00
-1 => 0x7F
0 => 0x80
1 => 0x81
127 => 0xFF
which is the correct order (with overflow).
Basically the above is doing that with 64 bit signed longs instead of 8 bit signed bytes.
The conversion back is a little more roundabout. You might think you can use:
return Long.parseLong(value, 16);
but you can't. Pass in 16 f's to that function (-1) and it will throw an exception. It seems to be treating that as an unsigned hex value, which long cannot accommodate. So instead I split it in half and parse each piece, combining them together, left-shifting the first half by 32 bits.
EDIT: Okay, so just adding the negative sign for negative numbers doesn't work... but you could convert the value into an effectively "unsigned" long such that Long.MIN_VALUE maps to "0000000000000000", and Long.MAX_VALUE maps to "FFFFFFFFFFFFFFFF". Harder to read, but will get the right results.
Basically you just need to add 2^63 to the value before turning it into hex - but that may be a slight pain to do in Java due to it not having unsigned longs... it may be easiest to do using BigInteger:
private static final BigInteger OFFSET = BigInteger.valueOf(Long.MIN_VALUE)
.negate();
public static String convertLong(long value) {
BigInteger afterOffset = BigInteger.valueOf(value).add(OFFSET);
return String.format("%016x", afterOffset);
}
public static long parseLong(String text) {
BigInteger beforeOffset = new BigInteger(text, 16);
return beforeOffset.subtract(OFFSET).longValue();
}
That wouldn't be terribly efficient, admittedly, but it works with all your test cases.
If you don't need a printable String, you can encode the long in four chars after you've shifted the value by Long.MIN_VALUE (-0x80000000) to emulate an unsigned long:
public static String convertLong(long value) {
value += Long.MIN_VALUE;
return "" +
(char)(value>>48) + (char)(value>>32) +
(char)(value>>16) + (char)value;
}
public static long parseLong(String value) {
return (
(((long)value.charAt(0))<<48) +
(((long)value.charAt(1))<<32) +
(((long)value.charAt(2))<<16) +
(long)value.charAt(3)) + Long.MIN_VALUE;
}
Usage of surrogate pairs is not a problem, since the natural order of a string is defined by the UTF-16 values in its chars and not by the UCS-2 codepoint values.
There's a technique in RFC2550 -- an April 1st joke RFC about the Y10K problem with 4-digit dates -- that could be applied to this purpose. Essentially, each time the integer's string representation grows to require another digit, another letter or other (printable) character is prepended to retain desired sort-order. The negative rules are more arcane, yielding strings that are harder to read at a glance... but still easy enough to apply in code.
Nicely, for positive numbers, they're still readable.
See:
http://www.faqs.org/rfcs/rfc2550.html