Check whether a string is parsable into Long without try-catch? - java

Long.parseLong("string") throws an error if string is not parsable into long.
Is there a way to validate the string faster than using try-catch?
Thanks

You can create rather complex regular expression but it isn't worth that. Using exceptions here is absolutely normal.
It's natural exceptional situation: you assume that there is an integer in the string but indeed there is something else. Exception should be thrown and handled properly.
If you look inside parseLong code, you'll see that there are many different verifications and operations. If you want to do all that stuff before parsing it'll decrease the performance (if we are talking about parsing millions of numbers because otherwise it doesn't matter). So, the only thing you can do if you really need to improve performance by avoiding exceptions is: copy parseLong implementation to your own function and return NaN instead of throwing exceptions in all correspondent cases.

From commons-lang StringUtils:
public static boolean isNumeric(String str) {
if (str == null) {
return false;
}
int sz = str.length();
for (int i = 0; i < sz; i++) {
if (Character.isDigit(str.charAt(i)) == false) {
return false;
}
}
return true;
}

You could do something like
if(s.matches("\\d*")){
}
Using regular expression - to check if String s is full of digits.
But what do you stand to gain? another if condition?

org.apache.commons.lang3.math.NumberUtils.isParsable(yourString) will determine if the string can be parsed by one of: Integer.parseInt(String), Long.parseLong(String), Float.parseFloat(String) or Double.parseDouble(String)
Since you are interested in Longs you could have a condition that checks for isParsable and doesn't contain a decimal
if (NumberUtils.isParsable(yourString) && !StringUtils.contains(yourString,".")){ ...

This is a valid question because there are times when you need to infer what type of data is being represented in a string. For example, you may need to import a large CSV into a database and represent the data types accurately. In such cases, calling Long.parseLong and catching an exception can be too slow.
The following code only handles ASCII decimal:
public class LongParser {
// Since tryParseLong represents the value as negative during processing, we
// counter-intuitively want to keep the sign if the result is negative and
// negate it if it is positive.
private static final int MULTIPLIER_FOR_NEGATIVE_RESULT = 1;
private static final int MULTIPLIER_FOR_POSITIVE_RESULT = -1;
private static final int FIRST_CHARACTER_POSITION = 0;
private static final int SECOND_CHARACTER_POSITION = 1;
private static final char NEGATIVE_SIGN_CHARACTER = '-';
private static final char POSITIVE_SIGN_CHARACTER = '+';
private static final int DIGIT_MAX_VALUE = 9;
private static final int DIGIT_MIN_VALUE = 0;
private static final char ZERO_CHARACTER = '0';
private static final int RADIX = 10;
/**
* Parses a string representation of a long significantly faster than
* <code>Long.ParseLong</code>, and avoids the noteworthy overhead of
* throwing an exception on failure. Based on the parseInt code from
* http://nadeausoftware.com/articles/2009/08/java_tip_how_parse_integers_quickly
*
* #param stringToParse
* The string to try to parse as a <code>long</code>.
*
* #return the boxed <code>long</code> value if the string was a valid
* representation of a long; otherwise <code>null</code>.
*/
public static Long tryParseLong(final String stringToParse) {
if (stringToParse == null || stringToParse.isEmpty()) {
return null;
}
final int inputStringLength = stringToParse.length();
long value = 0;
/*
* The absolute value of Long.MIN_VALUE is greater than the absolute
* value of Long.MAX_VALUE, so during processing we'll use a negative
* value, then we'll multiply it by signMultiplier before returning it.
* This allows us to avoid a conditional add/subtract inside the loop.
*/
int signMultiplier = MULTIPLIER_FOR_POSITIVE_RESULT;
// Get the first character.
char firstCharacter = stringToParse.charAt(FIRST_CHARACTER_POSITION);
if (firstCharacter == NEGATIVE_SIGN_CHARACTER) {
// The first character is a negative sign.
if (inputStringLength == 1) {
// There are no digits.
// The string is not a valid representation of a long value.
return null;
}
signMultiplier = MULTIPLIER_FOR_NEGATIVE_RESULT;
} else if (firstCharacter == POSITIVE_SIGN_CHARACTER) {
// The first character is a positive sign.
if (inputStringLength == 1) {
// There are no digits.
// The string is not a valid representation of a long value.
return null;
}
} else {
// Store the (negative) digit (although we aren't sure yet if it's
// actually a digit).
value = -(firstCharacter - ZERO_CHARACTER);
if (value > DIGIT_MIN_VALUE || value < -DIGIT_MAX_VALUE) {
// The first character is not a digit (or a negative sign).
// The string is not a valid representation of a long value.
return null;
}
}
// Establish the "maximum" value (actually minimum since we're working
// with negatives).
final long rangeLimit = (signMultiplier == MULTIPLIER_FOR_POSITIVE_RESULT)
? -Long.MAX_VALUE
: Long.MIN_VALUE;
// Capture the maximum value that we can multiply by the radix without
// overflowing.
final long maxLongNegatedPriorToMultiplyingByRadix = rangeLimit / RADIX;
for (int currentCharacterPosition = SECOND_CHARACTER_POSITION;
currentCharacterPosition < inputStringLength;
currentCharacterPosition++) {
// Get the current digit (although we aren't sure yet if it's
// actually a digit).
long digit = stringToParse.charAt(currentCharacterPosition)
- ZERO_CHARACTER;
if (digit < DIGIT_MIN_VALUE || digit > DIGIT_MAX_VALUE) {
// The current character is not a digit.
// The string is not a valid representation of a long value.
return null;
}
if (value < maxLongNegatedPriorToMultiplyingByRadix) {
// The value will be out of range if we multiply by the radix.
// The string is not a valid representation of a long value.
return null;
}
// Multiply by the radix to slide all the previously parsed digits.
value *= RADIX;
if (value < (rangeLimit + digit)) {
// The value would be out of range if we "added" the current
// digit.
return null;
}
// "Add" the digit to the value.
value -= digit;
}
// Return the value (adjusting the sign if needed).
return value * signMultiplier;
}
}

You can use java.util.Scanner
Scanner sc = new Scanner(s);
if (sc.hasNextLong()) {
long num = sc.nextLong();
}
This does range checking etc, too. Of course it will say that "99 bottles of beer" hasNextLong(), so if you want to make sure that it only has a long you'd have to do extra checks.

This case is common for forms and programs where you have the input field and are not sure if the string is a valid number. So using try/catch with your java function is the best thing to do if you understand how try/catch works compared to trying to write the function yourself. In order to setup the try catch block in .NET virtual machine, there is zero instructions of overhead, and it is probably the same in Java. If there are instructions used at the try keyword then these will be minimal, and the bulk of the instructions will be used at the catch part and that only happens in the rare case when the number is not valid.
So while it "seems" like you can write a faster function yourself, you would have to optimize it better than the Java compiler in order to beat the try/catch mechanism you already use, and the benefit of a more optimized function is going to be very minimal since number parsing is quite generic.
If you run timing tests with your compiler and the java catch mechanism you already described, you will probably not notice any above marginal slowdown, and by marginal I mean it should be almost nothing.
Get the java language specification to understand the exceptions more and you will see that using such a technique in this case is perfectly acceptable since it wraps a fairly large and complex function. Adding on those few extra instructions in the CPU for the try part is not going to be such a big deal.

I think that's the only way of checking if a String is a valid long value. but you can implement yourself a method to do that, having in mind the biggest long value.

There are much faster ways to parse a long than Long.parseLong. If you want to see an example of a method that is not optimized then you should look at parseLong :)
Do you really need to take into account "digits" that are non-ASCII?
Do you really need to make several methods calls passing around a radix even tough you're probably parsing base 10?
:)
Using a regexp is not the way to go: it's harder to determine if you're number is too big for a long: how do you use a regexp to determine that 9223372036854775807 can be parsed to a long but that 9223372036854775907 cannot?
That said, the answer to a really fast long parsing method is a state machine and that no matter if you want to test if it's parseable or to parse it. Simply, it's not a generic state machine accepting complex regexp but a hardcoded one.
I can both write you a method that parses a long and another one that determines if a long can be parsed that totally outperforms Long.parseLong().
Now what do you want? A state testing method? In that case a state testing method may not be desirable if you want to avoid computing twice the long.
Simply wrap your call in a try/catch.
And if you really want something faster than the default Long.parseLong, write one that is tailored to your problem: base 10 if you're base 10, not checking digits outside ASCII (because you're probably not interested in Japanese's itchi-ni-yon-go etc.).

Hope this helps with the positive values. I used this method once for validating database primary keys.
private static final int MAX_LONG_STR_LEN = Long.toString(Long.MAX_VALUE).length();
public static boolean validId(final CharSequence id)
{
//avoid null
if (id == null)
{
return false;
}
int len = id.length();
//avoid empty or oversize
if (len < 1 || len > MAX_LONG_STR_LEN)
{
return false;
}
long result = 0;
// ASCII '0' at position 48
int digit = id.charAt(0) - 48;
//first char cannot be '0' in my "id" case
if (digit < 1 || digit > 9)
{
return false;
}
else
{
result += digit;
}
//start from 1, we already did the 0.
for (int i = 1; i < len; i++)
{
// ASCII '0' at position 48
digit = id.charAt(i) - 48;
//only numbers
if (digit < 0 || digit > 9)
{
return false;
}
result *= 10;
result += digit;
//if we hit 0x7fffffffffffffff
// we are at 0x8000000000000000 + digit - 1
// so negative
if (result < 0)
{
//overflow
return false;
}
}
return true;
}

Try to use this regular expression:
^(-9223372036854775808|0)$|^((-?)((?!0)\d{1,18}|[1-8]\d{18}|9[0-1]\d{17}|92[0-1]\d{16}|922[0-2]\d{15}|9223[0-2]\d{14}|92233[0-6]\d{13}|922337[0-1]\d{12}|92233720[0-2]\d{10}|922337203[0-5]\d{9}|9223372036[0-7]\d{8}|92233720368[0-4]\d{7}|922337203685[0-3]\d{6}|9223372036854[0-6]\d{5}|92233720368547[0-6]\d{4}|922337203685477[0-4]\d{3}|9223372036854775[0-7]\d{2}|922337203685477580[0-7]))$
It checks all possible numbers for Long.
But as you know in Java Long can contain additional symbols like +, L, _ and etc. And this regexp doesn't validate these values. But if this regexp is not enough for you, you can add additional restrictions for it.

Guava Longs.tryParse("string") returns null instead of throwing an exception if parsing fails. But this method is marked as Beta right now.

You could try using a regular expression to check the form of the string before trying to parse it?

A simple implementation to validate an integer that fits in a long would be:
public static boolean isValidLong(String str) {
if( str==null ) return false;
int len = str.length();
if (str.charAt(0) == '+') {
return str.matches("\\+\\d+") && (len < 20 || len == 20 && str.compareTo("+9223372036854775807") <= 0);
} else if (str.charAt(0) == '-') {
return str.matches("-\\d+") && (len < 20 || len == 20 && str.compareTo("-9223372036854775808") <= 0);
} else {
return str.matches("\\d+") && (len < 19 || len == 19 && str.compareTo("9223372036854775807") <= 0);
}
}
It doesn't handle octal, 0x prefix or so but that is seldom a requirement.
For speed, the ".match" expressions are easy to code in a loop.

Related

Removing a substring from a string, repeatedly

Problem:
Remove the substring t from a string s, repeatedly and print the number of steps involved to do the same.
Explanation/Working:
For Example: t = ab, s = aabb. In the first step, we check if t is
contained within s. Here, t is contained in the middle i.e. a(ab)b.
So, we will remove it and the resultant will be ab and increment the
count value by 1. We again check if t is contained within s. Now, t is
equal to s i.e. (ab). So, we remove that from s and increment the
count. So, since t is no more contained in s, we stop and print the
count value, which is 2 in this case.
So, here's what I have tried:
Code 1:
static int maxMoves(String s, String t) {
int count = 0,i;
while(true)
{
if(s.contains(t))
{
i = s.indexOf(t);
s = s.substring(0,i) + s.substring(i + t.length());
}
else break;
++count;
}
return count;
}
I am just able to pass 9/14 test cases on Hackerrank, due to some reason (I am getting "Wrong Answer" for rest of the cases). After a while, I found out that there is something called replace() method in Java. So, I tried using that by replacing the if condition and came up with a second version of code.
Code 2:
static int maxMoves(String s, String t) {
int count = 0,i;
while(true)
{
if(s.contains(t))
s.replace(t,""); //Marked Statement
else break;
++count;
}
return count;
}
But for some reason (I don't know why), the "Marked Statement" in the above code gets executed infinitely (this I noticed when I replaced the "Marked Statement" with System.out.println(s.replace(t,""));). I don't the reason for the same.
Since, I am passing only 9/14 test cases, there must be some logical error that is leading to a "Wrong Answer". How do I overcome that if I use Code 1? And if I use Code 2, how do I avoid infinite execution of the "Marked Statement"? Or is there anyone who would like to suggest me a Code 3?
Thank you in advance :)
Try saving the new (returned) string instead of ignoring it.
s = s.replace(t,"");
replace returns a new string; you seemed to think that it alters the given string in-place.
Try adding some simple parameter checks of the strings. The strings shouldn't be equal to null and they should have a length greater than 0 to allow for counts greater than 0.
static int maxMoves(String s, String t) {
int count = 0,i;
if(s == null || s.length() == 0 || t == null || t.length() == 0)
return 0;
while(true)
{
if(s.contains(t) && !s.equals(""))
s = s.replace(t,""); //Marked Statement
else break;
++count;
}
return count;
}
You might be missing on the edge cases in the code 1.
In code 2, you are not storing the new string formed after the replace function.
The replace function replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
Try this out:
public static int findCount(String s, String t){
if( null == s || "" == s || null == t || "" == t)
return 0;
int count =0;
while(true){
if(s.contains(t)){
count++;
int i = s.indexOf(t);
s = s.substring(0, i)+s.substring(i+t.length(), s.length());
// s = s.replace(t,"");
}
else
break;
}
return count;
}
String r1="ramraviraravivimravi";
String r2="ravi";
int count=0,i;
while(r1.contains(r2))
{
count++;
i=r1.indexOf(r2);
StringBuilder s1=new StringBuilder(r1);
s1.delete(i,i+r2.length());
System.out.println(s1.toString());
r1=s1.toString();
}
System.out.println(count);
First of all no logical difference in both the codes.
All the mentioned answers are to rectify the error of code 2 but none told how to pass all (14/14) cases.
Here I am mentioning a test case where your code will fail.
s = "abcabcabab";
t = "abcab"
Your answer 1
Expected answer 2
According to your code:
In 1st step, removig t from index 0 of s,
s will reduce to "cabab", so the count will be 1 only.
But actual answer should be 2
I first step, remove t from index 3 of s,
s will reduced to "abcab", count = 1.
In 2nd step removing t from index 0,
s will reduced to "", count = 2.
So answer would be 2.
If anyone know how to handle such cases, please let me know.

Java | Create an explicit addition function only using recursion and conditionals

Preface
By finding some free time in my schedule, I quested myself into improving my recursion skills (unfortunately). As practice, I want to recreate all the operators by using recursion, the first one being addition. Although I'm kind of stuck.
Question
As implied, I want to recreate the addition operator by only using recursion and conditionals. Although I got a good portion of the code done, there is still one problem as I included a single addition operator. Here is the code (which runs fine and adds as intended in all variations of positive, negative, and zero inputs). I also included some mediocre comments as help.
public class Test {
public static void main(String[] args) {
// Numbers to add
int firstNumb = -5, secondNumb = 3;
// Call the add function and save the result
int result = add(firstNumb, secondNumb);
// Print result
System.out.println(result);
}
/*
* Function recursively takes a number from 'giver' one at a time and
* "gives"/"adds" it to 'receiver'. Once nothing more to "give" (second == 0),
* then return the number that received the value, 'receiver'.
*/
public static int add(int receiver, int giver) {
/*
* Base Case since nothing more to add on. != to handle signed numbers
* instead of using > or <
*/
if (giver != 0) {
/*
* Recursive Call.
*
* The new 'giver' param is the incremental value of the number
* towards 0. Ex: -5 -> -4 , 5 -> 4 (so I guess it may decrement).
*
* The new 'receiver' param is the incremental value based on the
* opposite direction the 'giver' incremented (as to why the
* directionalIncrement() function needs both values to determine
* direction.
*/
return add(directionalIncrement(receiver, giver),
directionalIncrement(giver, -giver));
} else {
// Return 'receiver' which now contains all values from 'giver'
return receiver;
}
}
// Increments (or decrements) the 'number' based on the sign of the 'direction'
public static int directionalIncrement(int number, int direction) {
// Get incremental value (1 or -1) by dividing 'direction' by absolute
// value of 'direction'
int incrementalValue = direction / abs(direction);
// Increment (or decrement I guess)
return number + incrementalValue;
}
// Calculates absolute value of a number
public static int abs(int number) {
// If number is positive, return number, else make it positive by multiplying by -1 then return
number = (number > 0.0F) ? number : -number;
return number;
}
}
The problem is the line that contains return number + incrementalValue;. As mentioned before, the code works with this although doesn't meet my own specifications of not involving any addition operators.
I changed the line to return add(number, incrementalValue); but seems like it cannot break out of the recursion and indeed throws the title of this website, a StackOverflowException.
All help appreciated. Thanks in advance.
Note
Constraint does not include any implicit increment/decrement (i++/i--) nor does it include bitwise. Try and answer towards the specific problem I am having in my own implementation.
public static int add(int a, int b) {
if(b == 0) return a;
int sum = a ^ b; //SUM of two integer is A XOR B
int carry = (a & b) << 1; //CARRY of two integer is A AND B
return add(sum, carry);
}
Shamefully taken from here. All credit goes to its author.
public static int add (int a, int b) {
if (b == 0) return a;
if (b > a) return add (b, a);
add (++a, --b);
}
Just with ++/--.

Print out Yijing Hexagram Symbols

I encountered a problem while coding and I can't seem to find where I messed up or even why I get a wrong result.
First, let me explain the task.
It's about "Yijing Hexagram Symbols".
The left one is the original and the right one is the result that my code should give me.
Basically every "hexagram" contains 6 lines that can be either diveded or not.
So there are a total of
2^6 = 64 possible "hexagrams"
The task is to calculate and code a methode to print all possible combinations.
Thats what I have so far :
public class test {
public String toBin (int zahl) {
if(zahl ==0) return "0";
if (zahl ==1 ) return "1";
return ""+(toBin( zahl/2)+(zahl%2));
}
public void show (String s) {
for (char c : s.toCharArray()){
if (c == '1'){
System.out.println("--- ---");
}
if(c=='0'){
System.out.println("-------");
}
}
}
public void ausgeben (){
for(int i = 0 ; i < 64; i++) {
show (toBin(i));
}
}
}
The problem is, when I test the 'show'-methode with "10" I get 3 lines and not 2 as intended.
public class runner {
public static void main(String[] args){
test a = new test();
a.ausgeben();
a.show("10");
}
}
Another problem I've encoutered is, that since I'm converting to binary i sometimes have not enough lines because for example 10 in binary is 0001010 but the first "0" are missing. How can I implement them in an easy way without changing much ?
I am fairly new to all this so if I didn't explain anything enough or made any mistakes feel free to tell me.
You may find it easier if you use the Integer.toBinaryString method combined with the String.format and String.replace methods.
String binary = String.format("%6s", Integer.toBinaryString(zahl)).replace(' ', '0');
This converts the number to binary, formats it in a field six spaces wide (with leading spaces as necessary), and then replaces the spaces with '0'.
Well, there are many ways to pad a string with zeros, or create a binary string that is already padded with zeros.
For example, you could do something like:
public String padToSix( String binStr ) {
return "000000".substring( 0, 5 - binStr.length() ) + binStr;
}
This would check how long your string is, and take as many zeros are needed to fill it up to six from the "000000" string.
Or you could simply replace your conversion method (which is recursive, and that's not really necessary) with one that specializes in six-digit numbers:
public static String toBin (int zahl) {
char[] digits = { '0','0','0','0','0','0' };
int currDigitIndex = 5;
while ( currDigitIndex >= 0 && zahl > 0 ) {
digits[currDigitIndex] += (zahl % 2);
currDigitIndex--;
zahl /= 2;
}
return new String(digits);
}
This one modifies the character array ( which initially has only zeros ) from the right to the left. It adds the value of the current bit to the character at the given place. '0' + 0 is '0', and '0' + 1 is '1'. Because you know in advance that you have six digits, you can start from the right and go to the left. If your number has only four digits, well, the two digits we haven't touched will be '0' because that's how the character array was initialized.
There are really a lot of methods to achieve the same thing.
Your problem reduces to printing all binary strings of length 6. I would go with this code snippet:
String format = "%06d";
for(int i = 0; i < 64; i++)
{
show(String.format(format, Integer.valueOf(Integer.toBinaryString(i))));
System.out.println();
}
If you don't wish to print leading zeros, replace String.format(..) with Integer.toBinaryString(i).

Efficiently parse single digit arithmetic expression

How would you efficiently (optimizing for runtime but also keeping space at a minimum) parse and evaluate a single digit arithmetic expression in Java.
The following arithmetic expressions are all valid:
eval("-5")=-5
eval("+4")=4
eval("4")=4
eval("-7+2-3")=-8
eval("5+7")=12
My approach is to iterate over all elements, keeping track of the current arithmetic operation using a flag, and evaluate digit by digit.
public int eval(String s){
int result = 0;
boolean add = true;
for(int i = 0; i < s.length(); i++){
char current = s.charAt(i);
if(current == '+'){
add = true;
} else if(current == '-'){
add = false;
} else {
if(add){
result += Character.getNumericValue(current);
} else {
result -= Character.getNumericValue(current);
}
}
}
return result;
}
Is this the only optimal solution? I have tried to use stacks to keep track of the arithmetic operator, but I am not sure this is any more efficient. I also have not tried regular expressions. I only ask because I gave the above solution in an interview and was told it is sub-optimal.
This seems a bit more compact. It certainly requires fewer lines and conditionals. The key is addition is the "default" behavior and each minus sign you encounter changes the sign of what you want to add; provided you remember to reset the sign after each addition.
public static int eval(String s){
int result = 0;
int sign = 1;
for(int i = 0; i < s.length(); i++){
char current = s.charAt(i);
switch (current)
{
case '+': break;
case '-': sign *= -1; break;
default:
result += sign * Character.getNumericValue(current);
sign = 1;
break;
}
}
return result;
}
As a note, I don't think yours produces correct results for adding a negative, e.g., "4- -3". Your code produces 1, rather than the correct value of 7. On the other hand, mine allows expressions such as "5+-+-3", which would produce the result 8 (I suppose that's correct? :). However, you didn't list validation as a requirement and neither of us are checking for sequential digits, alpha characters, white space, etc. If we assume the data is properly formatted, the above implementation should work. I don't see how adding data structures (such as queues) could possibly be helpful here. I'm also assuming just addition and subtraction.
These test cases produce the following results:
System.out.println(eval("1+2+3+4"));
System.out.println(eval("1--3"));
System.out.println(eval("1+-3-2+4+-3"));
10
4
-3
You need to lookup up 'recursive descent expression parser' or the Dijkstra shunting-yard algorithm. Your present approach is doomed to failure the moment you have to cope with operator precedence or parentheses. You also need to forget about regular expressions and resign yourself to writing a proper scanner.

Find word in dictionary of unknown size using only a method to get a word by index

A few days ago I had interview in some big company, name is not required :), and interviewer asked me to find solution to the next task:
Predefined:
There is dictionary of words with unspecified size, we just know that all words in dictionary are sorted (for example by alphabet). Also we have just a one method
String getWord(int index) throws IndexOutOfBoundsException
Needs:
Need to develop algorithm to find some input word in dictionary using java. For this we should implement method
public boolean isWordInTheDictionary(String word)
Limitations:
We cannot change the internal structure of dictionary, we have no access to internal structure, we do not know counts of elements in dictionary.
Issues:
I have developed modified-binary search, and will publish my variant(works variant) of algorithm, but are there another variants with logarithmic complexity? My variant has complexity O(logN).
My variant of implementation:
public class Dictionary {
private static final int BIGGEST_TOP_MASK = 0xF00000;
private static final int LESS_TOP_MASK = 0x0F0000;
private static final int FULL_MASK = 0xFFFFFF;
private String[] data;
private static final int STEP = 100; // for real test step should be Integer.MAX_VALUE
private int shiftIndex = -1;
private static final int LESS_MASK = 0x0000FF;
private static final int BIG_MASK = 0x00FF00;
public Dictionary() {
data = getData();
}
String getWord(int index) throws IndexOutOfBoundsException {
return data[index];
}
public String[] getData() {
return new String[]{"a", "aaaa", "asss", "az", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "test", "u", "v", "w", "x", "y", "z"};
}
public boolean isWordInTheDictionary(String word) {
boolean isFound = false;
int constantIndex = STEP; // predefined step
int flag = 0;
int i = 0;
while (true) {
i++;
if (flag == FULL_MASK) {
System.out.println("Word is not found ... Steps " + i);
break;
}
try {
String data = getWord(constantIndex);
if (null != data) {
int compareResult = word.compareTo(data);
if (compareResult > 0) {
if ((flag & LESS_MASK) == LESS_MASK) {
constantIndex = prepareIndex(false, constantIndex);
if (shiftIndex == 1)
flag |= BIGGEST_TOP_MASK;
} else {
constantIndex = constantIndex * 2;
}
flag |= BIG_MASK;
} else if (compareResult < 0) {
if ((flag & BIG_MASK) == BIG_MASK) {
constantIndex = prepareIndex(true, constantIndex);
if (shiftIndex == 1)
flag |= LESS_TOP_MASK;
} else {
constantIndex = constantIndex / 2;
}
flag |= LESS_MASK;
} else {
// YES!!! We found word.
isFound = true;
System.out.println("Steps " + i);
break;
}
}
} catch (IndexOutOfBoundsException e) {
if (flag > 0) {
constantIndex = prepareIndex(true, constantIndex);
flag |= LESS_MASK;
} else constantIndex = constantIndex / 2;
}
}
return isFound;
}
private int prepareIndex(boolean isBiggest, int constantIndex) {
shiftIndex = (int) Math.ceil(getIndex(shiftIndex == -1 ? constantIndex : shiftIndex));
if (isBiggest)
constantIndex = constantIndex - shiftIndex;
else
constantIndex = constantIndex + shiftIndex;
return constantIndex;
}
private double getIndex(double constantIndex) {
if (constantIndex <= 1)
return 1;
return constantIndex / 2;
}
}
It sounds like the part they really want you to think about is how to handle the fact that you don't know the size of the dictionary. I think they assume that you can give them a binary search. So the real question is how do you manipulate the range of the search as it progresses.
Once you have found a value in the dictionary that is greater than your search target (or out of bounds), the rest looks like standard binary search. The hard part is how do you optimally expand the range when the target value is greater than the dictionary value that you've looked up. It looks like you are expanding by a factor of 1.5. This could be really problematic with a huge dictionary and a small fixed initial step like you have (100). Think if there were 50 million words how many times your algorithm would have to expand the range upwards if you're searching for 'zebra'.
Here's an idea: use the ordered nature of the collection to your advantage by assuming the first letter of each word is evenly distributed amongst the letters of the alphabet (this will never be true, but without knowing more about the collection of words it's probably the best you can do). Then weight the amount of your range expansion by how far from the end you would expect the dictionary word to be.
So if you took your initial step of 100 and looked up the dictionary word at that index and it was 'aardvark', you would expand your range a lot more for the next step than if it was 'walrus.' Still O(log n) but probably much better for most collections of words.
Here is an alternative implementation that uses Collections.binarySearch. It fails if one of the words in the list starts with the Character '\uffff' (that is Unicode 0xffff and not a legal not a valid unicode character).
public static class ListProxy extends AbstractList<String> implements RandomAccess
{
#Override public String get( int index )
{
try {
return getWord( index );
} catch( IndexOutOfBoundsException ex ) {
return "\uffff";
}
}
#Override public int size()
{
return Integer.MAX_VALUE;
}
}
public static boolean isWordInTheDictionary( String word )
{
return Collections.binarySearch( new ListProxy(), word ) >= 0;
}
Update: I modified it so that it implements RandomAccess since the binarySearch in Collections would otherwise use a iterator based search on such a large list which would be extremely slow. This should now however be decently fast since the binary search will need only 31 iterations even though the List pretends to be as large as possible.
Here is a slightly modified version that remembers the smallest failed index to converge its proclaimed size to the actual size of the dictionary en passant and thus avoids almost all exceptions in successive lookups. Although you would need to create a new ListProxy instance whenever the size of the dictionary could have changed.
public static class ListProxy extends AbstractList<String> implements RandomAccess
{
private int size = Integer.MAX_VALUE;
#Override public String get( int index )
{
try {
if( index < size )
return getWord( index );
} catch( IndexOutOfBoundsException ex ) {
size = index;
}
return "\uffff";
}
#Override public int size()
{
return size;
}
}
private static ListProxy listProxy = new ListProxy();
public static boolean isWordInTheDictionary( String word )
{
return Collections.binarySearch( listProxy , word ) >= 0;
}
You have the right idea, but I think your implementation is overly complicated. You want to do a binary search, but you don't know what the upper bound is. So instead of starting at the middle, you start at index 1 (assuming dictionary indexes start at 0).
If the word you're looking for is "less than" the current dictionary word, halve the distance between the current index and your "low" value. ("low" starts at 0, of course).
If the word you're looking for is "greater than" the word at the index you just examined, then either halve the distance between the current index and your "high" value ("high" starts at 2) or, if index and "high" are the same, double the index.
If doubling the index gives you an out of range exception, you halve the distance between the current value and the doubled value. So if going from 16 to 32 throws an exception, try 24. And, of course, keep track of the fact that 32 is more than the max.
So a search sequence might look like 1, 2, 4, 8, 16, 12, 14 - found!
It's the same concept as a binary search, but rather than starting with low = 0, high = n-1, you start with low = 0, high = 2, and double the high value when you need to. It's still O(log N), although the constant is going to be a bit larger than with a "normal" binary search.
You can incur a one-time cost of O(n), if you know that the dictionary will not change. You can add all the words in the dictionary to a hashtable, and then any subsequent calls to isWordInDictionary() will be O(1) (in theory).
Use the getWord() API to copy the entire contents of the dictionary into a more sensible data structure (e.g. hash table, trie, perhaps even augmented by a Bloom filter). ;-)
In a different language:
#!/usr/bin/perl
$t=0;
$cur=1;
$under=0;
$EOL=int(rand(1000000))+1;
$TARGET=int(rand(1000000))+1;
if ($TARGET>$EOL)
{
$x=$EOL;
$EOL=$TARGET;
$TARGET=$x;
}
print "Looking for $TARGET with EOL $EOL\n";
sub testWord($)
{
my($a)=#_;
++$t;
return 0 if ($a eq $TARGET);
return -2 if ($a > $EOL);
return 1 if ($a > $TARGET);
return -1;
}
while ($r = testWord($cur))
{
print "Tested $cur, got $r\n";
if ($r == 1) { $over=$cur; }
if ($r == -1) { $under=$cur; }
if ($r == -2) { $over = $cur; }
if ($over)
{
$cur = int(($over-$under)/2)+$under;
$cur++ if ($cur <= $under);
$cur-- if ($cur >= $over);
}
else
{
$cur *= 2;
}
}
print "Found $TARGET at $r in $t tests\n";
The main benefit of this one is it is a bit simpler to understand. I think it may be more efficient if your first guesses are below the target since I don't think you are taking advantage of the space you have already "searched", but that is just with a quick glance at your code. Since it is looking for numbers for simplicity, it doesn't have to deal with not finding the target, but that is an easy extension.
#Sergii Zagriichuk hope the interview went well. Good luck with that.
I think just as #alexcoco said Binary Search is the answer.
Other options I see are only available if you could extend the dictionary. You could make it slightly better. E.g. You could count the words on each letter, and keep their track this way you would effectively had to work only on a subset of words.
Or yea as guys are saying to entirely implement your own dictionary structure.
I know this doesn't answer you question properly. But I cannot see other possibilities.
BTW would be nice to see your algorithm.
EDIT:
Expanding on my comment under answer of bshields...
#Sergii Zagriichuk even better it would be to remember the last index where we had null (no word), I think. Then at each run you could check if it is still true. If not then expand the range to a 'previous index' obtained by reversing the binary search behaviour, so we have null again. This way you would always adjust the size of the range of your search algorithm, thus adapting to the current state of the dictionary as needed. Plus the changes would have to be significant in order to cause your range adjustment so the adjustment wouldn't have any real negative impact on the algorithm. Also dictionaries tend to be static in nature so this should work :)
On one hand yes you are right with binary search implementation. But on the other hand in case dictionary is static and is not changed between lookups - we could suggest different algorithm. Here we have common problem - string sorting/search is different comparing to sorting/searching int array, so getWord(int i).compareTo(string) is O(min(length0, length1)).
Suppose we have request to find words w0, w1, ... wN, during lookup we could build up a tree with indicies (probably some suffix tree will good enough for this task).
During next lookup request we have following set a1, a2, ... aM, so to decrease average time we could first decrease range by searching position in the tree.
The problem with this implementation is concurrency and memory usage, so next step is implementing strategy to make search tree smaller.
PS: main aim was to check ideas and problems you suggest.
Well i think the info that dictionary is sorted can be utilized in a better way.
Say you are looking for a word "Zebra" , whereas the first guess search resulted in "abcg".
So we can use this info in chossing the second guess index . like in my case the resulted word is starting with a , whereas i am looking for something starting with z. So rather than making a static jump , i can make some calculated jump based on the current result and desired result. So in this way suppose if my next jump takes me to the word "yvu" , i now i am very near , so i will make a rather slow small jump than in the prev case.
Here is my solution.. uses O(logn) operations. First part of the code tries to find a estimate of the length and then the second part takes advantage of the fact that the dictionary is sorted and performs a binary search.
boolean isWordInTheDictionary(String word){
if (word == null){
return false;
}
// estimate the length of the dictionary array
long len=2;
String temp= getWord(len);
while(true){
len = len * 2;
try{
temp = getWord(len);
}catch(IndexOutOfBoundsException e){
// found upped bound break from loop
break;
}
}
// Do a modified binary search using the estimated length
long beg = 0 ;
long end = len;
String tempWrd;
while(true){
System.out.println(String.format("beg: %s, end=%s, (beg+end)/2=%s ", beg,end,(beg+end)/2));
if(end - beg <= 1){
return false;
}
long idx = (beg+end)/2;
tempWrd = getWord(idx);
if(tempWrd == null){
end=idx;
continue;
}
if ( word.compareTo(tempWrd) > 0){
beg = idx;
}
else if(word.compareTo(tempWrd) < 0){
end= idx;
}else{
// found the word..
System.out.println(String.format("getword at index: %s, =%s", idx,getWord(idx)));
return true;
}
}
}
Assuming the dictionary is 0-based, I would decompose the search in two parts.
First, given that the index to parameter to getWord() is an integer, and assuming that the index must be a number between 0 and the maximum positive integer, perform a binary search over that range in order to find the maximum valid index (irrespective of the word values). This operation is O(log N), since is a simple binary search.
Once obtained the size of the dictionary, a second ordinary binary search (again of complexity O(log N)) will bring on the desired answer.
Since O(log N)+O(log N) is O(log N), this algorithm complies with your requirement.
I'm in a hiring proccess which asked me this same problem...
My approach was a bit different, and considering the dictionary (webservice) I have, it's about 30% more efficient (for the words I've tested).
Here is the solution:
https://github.com/gustavompo/wordfinder
I'll not post the whole solution here because it's decoupled through classes and methods, but the core algorithm is this:
public WordFindingResult FindWord(string word)
{
var callsCount = 0;
var lowerLimit = new WordFindingLimit(0, null);
var upperLimit = new WordFindingLimit(int.MaxValue, null);
var wordToFind = new Word(word);
var wordIndex = _initialIndex;
while (callsCount <= _maximumCallsCount)
{
if (CouldNotFindWord(lowerLimit, upperLimit))
return new WordFindingResult(callsCount, -1, string.Empty, WordFindingResult.ErrorCodes.NOT_FOUND);
var wordFound = RetrieveWordAt(wordIndex);
callsCount++;
if (wordToFind.Equals(wordFound))
return new WordFindingResult(callsCount, wordIndex, wordFound.OriginalWordString);
else if (IsIndexTooHigh(wordToFind, wordFound))
{
upperLimit = new WordFindingLimit(wordIndex, wordFound);
wordIndex = IndexConsideringTooHighPreviousResult(lowerLimit, wordIndex);
}
else
{
lowerLimit = new WordFindingLimit(wordIndex, wordFound);
wordIndex = IndexConsideringTooLowPreviousResult(lowerLimit, upperLimit, wordToFind);
}
}
return new WordFindingResult(callsCount, -1, string.Empty, WordFindingResult.ErrorCodes.CALLS_LIMIT_EXCEEDED);
}
private int IndexConsideringTooHighPreviousResult(WordFindingLimit maxLowerLimit, int current)
{
return BinarySearch(maxLowerLimit.Index, current);
}
private int IndexConsideringTooLowPreviousResult(WordFindingLimit maxLowerLimit, WordFindingLimit minUpperLimit, Word target)
{
if (AreLowerAndUpperLimitsDefined(maxLowerLimit, minUpperLimit))
return BinarySearch(maxLowerLimit.Index, minUpperLimit.Index);
var scoreByIndexPosition = maxLowerLimit.Index / maxLowerLimit.Word.Score;
var indexOfTargetBasedInScore = (int)(target.Score * scoreByIndexPosition);
return indexOfTargetBasedInScore;
}

Categories

Resources