How to hash a string to integer with a range in Java

How to hash a string to integer with a range in Java - java

Suppose I have n number of strings, now I want to map each string to an integer within range from 0 to n-1 using a function such that whenever I call a function and pass the string and the n it will give me same and unique mapping on the go. So suppose if I have 4 strings "str1","str2","str3","str4" then the mapping will be from 0-3 and unique.
I tried doing something like : str.hashCode() % n, this is giving me the same mapping but is not within the range of 0 to n-1. I found something in PHP which is similar to this here-
https://madcoda.com/2014/04/how-to-hash-a-string-to-integer-with-a-range-php/

For the record
In Java, hash random strings down to an integer:
Math.abs(str.hashCode() % 7)
Result will be 0 inclusive to 6 inclusive.
Note:
If the input strings are really random and the same length etc (for example ... the input is a whole lot of uuids) then the output here will be randomly balanced.
If the input is - say - many human names, it is very unlikely the output will be randomly balanced.
Note:
What the OP was asking literally in the headline is answered here.
In fact, what the OP was asking about (in the body) has no connection at all to hashing. (That would just be a lookup table, a regex or such.)

Related

How `Projections.max("registrationNo")` will work when registrationNo is String

I have an entity called StudentDetails.In that there is a field RegistrationNo of type String.My client follows a pattern like xxxxx-xxxx (Number in sequence 00000 To 10000)-(current Year Number).To Store RegistrationNo in this pattern we declared it as String.Every time new student is joined we have to increment the sequence number we have to store it.
I tried Without knowing that it stored as String i tried to fetch last number using Projections.max("registrationNo") Luckily it returned max number i don't know how.But still again problem raised when sequence number reached 6 digits like xxxxxx-xxxx then Projections.max("registrationNo") is not returning 6 digit number.It is returning only max of 5 digit number .
How projection is returning max of 5 digit number but not 6 digit number.
by the way i solved that problem using id of the record to know last RegistrationNo.But Projections.max("registrationNo") is puzzling me on how it worked for sometime.

As already stated in the comments the problem is most likely a string comparison, i.e. calling max() on a property of type string will result in the "maximum" string value being returned, even if those strings represent numbers.
A string comparison is normally done by comparing characters from start to end until there is a difference or the end of one input string is reached (in which case the longer string would be the greater one).
Thus as long as your sequence numbers are of equal length it should work since comparing 10000 and 00001 will result in the "correct" characters being compared.
However, once the string lengths are different, a normal string compare won't work anymore, since the characters do represent different digits. Hence comparing 98765 to 123456 will result in 98765 being greater since the first characters to be compared will be 1 and 9 and 1 < 9 almost all of the time (unless you changed that somehow).
To solve this you can take a couple of routes which depend on your environment and goals:
store the sequence number in a separate numeric property
make the sequence strings longer right from the start, i.e. allow for a bigger range
add a specialized comparator in the code or the database (just as a hint, I'd have to look up how to do it)
From a performance point of view I'd probably take the first route.

Find if a permutation (using + and -) on a string of integers matches a number

Basically what I am doing it taking a string of integers (e.g. "1234"), and I am able to insert a + or - anywhere in this string, as much or little as I want. For example, I can do "1 + 2 + 3 + 4", "12 + 34", "123 - 4", etc. It is required to use all integers of the string, I cannot exclude any.
What I am trying to do is take another array of integers, and find if it was possible to get that number using the permutations mentioned in the first paragraph. I am somewhat lost on where to start looking for this. I could possibly create a recursive loop function to create every possible combination of the string and see if each result matches but this seems like it will be terribly slow. Another thought was to index them into an array - that way I could simply look up the answers after calculating them once.
Anyone have any suggestions?

I could possibly create a recursive loop function to create every possible combination of the string and see if each result matches but this seems like it will be terribly slow.
Doing an exhaustive search is your only option here. Fortunately, the timing isn't going to be too bad even for moderately long strings of up to 7..10 characters, because you do not need to "redo" additions and subtractions of a prior string when you process the "tail".
An outline of a possible implementation could be as follows:
Put all desired results from your array of integers in a hash set
Make a recursive method that takes the result so far, the string, and the position of the next "cut"
When the next "cut" is at the end of the string, check the result so far against the hash set from step 1
Otherwise, try these three possibilities in a loop on k
Use a k-digit number from the "cut" as a positive number, and make a recursive invocation with the "cut" moved by k digits. This is equivalent to inserting a + at the cut
Use a k-digit number from the "cut" as a negative number, and make a recursive invocation with the "cut" moved by k digits. This is equivalent to inserting a - at the cut

I'll give start help, with the approach for such a solution.
formal problem statement;
data model;
algorithm;
heuristics, cleverness.
For N digits there are some 3^N possibilities.
The solution must model the running data as:
the digits, as int[]
the sum
index from which to advance, prior digits were done.
number partalready tried, plus sign. Sign must come separate (as -1, +1) as the coming digit may be 0;
(What I leave out is the collecting of the entire result.)
The brute force solution then could be:
boolean solve(int[] digits, int sum) {
return solve(digits, sum, 1, 0, 0);
}
boolean solve(int[] digits, int sum, int signum, int part, int index) {
if (index >= digits.length) {
return signum * part == sum;
}
// Before the digit at index do either nothing, +, or -
return solve(digits, sum, signum, part * 10 + digits[index], index + 1)
|| solve(digits, sum - signum * part, 1, 0, index + 1)
|| solve(digits, sum - signum * part, -1, 0, index + 1);
}
Mind you could also split the digits in half and try to insert (nothing, +, -) there.
There are pruning opportunities, to diminish the number of tries. First the above can be done in a loop, the alternatives need not all to be tried. The order of evaluation might favor more likely candidates:
if digit 0 ...
if part > sum first - then +
...
Unfortunately +/- make a number theoretical approach AFAIK for me illusory.
#dasblinkenlight mentions even better data models, allowing to not
repeat evaluation in the alternatives. That would be even more
interesting. But might fail miserably due to time constraints. And I
wanted to come with something concrete. Without providing an entirely
ready made solution.

It is reasonable to take a brute force approach if you can rely on the input string not to be too long. If it contains n digits then you can construct 3n-1 formulae from it (between each pair of digits you can insert '+', '-', or nothing, for n-1 internal positions). For a 12-digit input string that's roughly 270000 formulae, which should be computable quite quickly. Of course, you would build and compute each one once, and compare the result to all the alternatives. Don't redo the computation for each array element.
It may be that there's a dynamic programming approach to this, but I'm not immediately seeing it, at least not one that would be substantially better than brute force.

Handling large numbers

I have this problem:
A positive integer is called a palindrome if its representation in the decimal system is the same when read from left to right and from right to left. For a given positive integer K of not more than 1000000 digits, write the value of the smallest palindrome larger than K to output. Numbers are always displayed without leading zeros.
Input
The first line contains integer t, the number of test cases. Integers K are given in the next t lines.
Output
For each K, output the smallest palindrome larger than K.
Example
Input:
2
808
2133
Output:
818
2222
My code converts the input to a string and evaluates either end of the string making adjustments accordingly and moves inwards. However, the problem requires that it can take values up to 10^6 digits long, if I try to parse large numbers I get a number format exception i.e.
Integer.parseInt(LARGENUMBER);
or
Long.parseInt(LARGENUMBER);
and LARGENUMBER is out of range. can anyone think of a work around or how to handle such large numbers?

You could probably use the BigInteger class to handle large integers like this.
However, I wouldn't count on it being efficient at such massive sizes. Because it still uses O(n^2) algorithms for multiplication and conversions.

Think of your steps that you do now. Do you see something that seems a little superfluous since you're converting the number to a string to process it?

While this problem talks about integers, its doing so only to restrict the input and output characters and format. This is really a string operations question with careful selection. Since this is the case, you really don't need to actually read the input in as integers, only strings.
This will make validating the palindrome simple. The only thing you should need to work out is choosing the next higher one.

how can i generate a unique int from a unique string?

I have an object with a String that holds a unique id .
(such as "ocx7gf" or "67hfs8")
I need to supply it an implementation of int hascode() which will be unique obviously.
how do i cast a string to a unique int in the easiest/fastest way?
10x.
Edit - OK. I already know that String.hashcode is possible. But it is not recommended in any place. Actually' if any other method is not recommended - Should I use it or not if I have my object in a collection and I need the hashcode. should I concat it to another string to make it more successful?

No, you don't need to have an implementation that returns a unique value, "obviously", as obviously the majority of implementations would be broken.
What you want to do, is to have a good spread across bits, especially for common values (if any values are more common than others). Barring special knowledge of your format, then just using the hashcode of the string itself would be best.
With special knowledge of the limits of your id format, it may be possible to customise and result in better performance, though false assumptions are more likely to make things worse than better.
Edit: On good spread of bits.
As stated here and in other answers, being completely unique is impossible and hash collisions are possible. Hash-using methods know this and can deal with it, but it does impact upon performance, so we want collisions to be rare.
Further, hashes are generally re-hashed so our 32-bit number may end up being reduced to e.g. one in the range 0 to 22, and we want as good a distribution within that as possible to.
We also want to balance this with not taking so long to compute our hash, that it becomes a bottleneck in itself. An imperfect balancing act.
A classic example of a bad hash method is one for a co-ordinate pair of X, Y ints that does:
return X ^ Y;
While this does a perfectly good job of returning 2^32 possible values out of the 4^32 possible inputs, in real world use it's quite common to have sets of coordinates where X and Y are equal ({0, 0}, {1, 1}, {2, 2} and so on) which all hash to zero, or matching pairs ({2,3} and {3, 2}) which will hash to the same number. We are likely better served by:
return ((X << 16) | (x >> 16)) ^ Y;
Now, there are just as many possible values for which this is dreadful than for the former, but it tends to serve better in real-world cases.
Of course, there is a different job if you are writing a general-purpose class (no idea what possible inputs there are) or have a better idea of the purpose at hand. For example, if I was using Date objects but knew that they would all be dates only (time part always midnight) and only within a few years of each other, then I might prefer a custom hash code that used only the day, month and lower-digits of the years, over the standard one. The writer of Date though can't work on such knowledge and has to try to cater for everyone.
Hence, If I for instance knew that a given string is always going to consist of 6 case-insensitive characters in the range [a-z] or [0-9] (which yours seem to, but it isn't clear from your question that it does) then I might use an algorithm that assigned a value from 0 to 35 (the 36 possible values for each character) to each character, and then walk through the string, each time multiplying the current value by 36 and adding the value of the next char.
Assuming a good spread in the ids, this would be the way to go, especially if I made the order such that the lower-significant digits in my hash matched the most frequently changing char in the id (if such a call could be made), hence surviving re-hashing to a smaller range well.
However, lacking such knowledge of the format for sure, I can't make that call with certainty, and I could well be making things worse (slower algorithm for little or even negative gain in hash quality).
One advantage you have is that since it's an ID in itself, then presumably no other non-equal object has the same ID, and hence no other properties need be examined. This doesn't always hold.

You can't get a unique integer from a String of unlimited length. There are 4 billionish (2^32) unique integers, but an almost infinite number of unique strings.
String.hashCode() will not give you unique integers, but it will do its best to give you differing results based on the input string.
EDIT
Your edited question says that String.hashCode() is not recommended. This is not true, it is recommended, unless you have some special reason not to use it. If you do have a special reason, please provide details.

Looks like you've got a base-36 number there (a-z + 0-9). Why not convert it to an int using Integer.parseInt(s, 36)? Obviously, if there are too many unique IDs, it won't fit into an int, but in that case you're out of luck with unique integers and will need to get by using String.hashCode(), which does its best to be close to unique.

Unless your strings are limited in some way or your integers hold more bits than the strings you're trying to convert, you cannot guarantee the uniqueness.
Let's say you have a 32 bit integer and a 64-character character set for your strings. That means six bits per character. That will allow you to store five characters into an integer. More than that and it won't fit.

represent each string character by a five-digit binary digit, eg. a by 00001 b by 00010 etc. thus 32 combinations are possible, for example, cat might be written as 00100 00001 01100, then convert this binary into decimal, eg. this would be 4140, thus cat would be 4140, similarly, you can get cat back from 4140 by converting it to binary first and Map the five digit binary to string

One way to do it is assign each letter a value, and each place of the string it's own multiple ie a = 1, b = 2, and so on, then everything in the first digit (read left to right) would be multiplied by a prime number, the next the next prime number and so on, such that the final digit was multiplied by a prime larger than the number of possible subsets in that digit (26+1 for a space or 52+1 with capitols and so on for other supported characters). If the number is mapped back to the first digits (leftmost character) any number you generate from a unique string mapping back to 1 or 6 whatever the first letter will be, gives a unique value.
Dog might be 30,3(15),101(7) or 782, while God 33,3(15),101(4) or 482. More importantly than unique strings being generated they can be useful in generation if the original digit is kept, like 30(782) would be unique to some 12(782) for the purposes of differentiating like strings if you ever managed to go over the unique possibilities. Dog would always be Dog, but it would never be Cat or Mouse.

Another question, this time regarding breaking a string down for validity

Thanks a bunch for the tip on the static to all of you folks who answered! Feeling a little less frustrated now.
I am not going to ask questions step by step through my whole assignment, but I want to make sure that this is the way to go about one of the next tasks. I have written the following, which compiles fine (the purpose is to check the string to make sure that it is numeric, and the user may have also entered the ISBN as a number with or without dashes):
private String validateISBN(String bookNum)
{
String[] book;
int j=0;
for ( int i=0;i<bookNum.length();i++)
if (character.isDigit(bookNum.charAt[i]))
bookNum.charAt[i]=book[j];j++;
I haven't written the next part, which has to allow for an X as the last digit in the string (which is apparently how ISBN numbers work). I would assume that if the above is correct (or close), that all i need to do is check that the ninth character is a digit or an X, by writing something like:
if book[9] isDigit() || if book[9] == "x" || if book[9] == "X";
Is that about right (ISBN numbers are always 10 numbers or 9 numbers and an X at the end)?

The last digit of ISBN-10 is the check digit. Since your method is supposed to check if the entered ISBN is correct, you have to calculate the check digit on your own and compare it to the given one (plus making sure all characters are digits).
If you're not knowing what this means, read the validating ISBN-10 section at http://en.wikipedia.org/wiki/International_Standard_Book_Number#Check_digits
The 2001 edition of the official manual of the International ISBN Agency says that the ISBN-10 check digit[18] — which is the last digit of the ten-digit ISBN — must range from 0 to 10 (the symbol X is used instead of 10) and must be such that the sum of all the ten digits, each multiplied by the integer weight, descending from 10 to 1, is a multiple of the number 11. Modular arithmetic is convenient for calculating the check digit using modulus 11. Each of the first nine digits of the ten-digit ISBN — excluding the check digit, itself — is multiplied by a number in a sequence from 10 to 2, and the remainder of the sum, with respect to 11, is computed. The resulting remainder, plus the check digit, must equal 11; therefore, the check digit is 11 minus the remainder of the sum of the products.
On a related note: there is also the 13-digit ISBN-13...
Maybe your solution might look like this:
Remove the dashes, break down the string to an int array, compute the tenth-digit (see wiki-link), if all is fine return the input string.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to hash a string to integer with a range in Java - java

Related

How `Projections.max("registrationNo")` will work when registrationNo is String

Find if a permutation (using + and -) on a string of integers matches a number

Handling large numbers

how can i generate a unique int from a unique string?

Another question, this time regarding breaking a string down for validity

Categories

Resources