Scala: Function test a string for unique char - java

Solved! Solution at the bottom.
I'm porting some Java code to Scala for fun and I trapped into a pretty nifty way of bit-shifting in Java. The Java code below takes a String as input and tests if it consists of unique characters.
public static boolean isUniqueChars(String str) {
if (str.length() > 256)return false; }
int checker = 0;
for (int i = 0; i < str.length(); i++) {
int val = str.charAt(i) - 'a';
if ((checker & (1 << val)) > 0) return false;
checker |= (1 << val);
}
return true;
Full listing is here: https://github.com/marvin-hansen/ctci/blob/master/java/Chapter%201/Question1_1/Question.java
How the code exactly works is explained here:
How does this Java code which determines whether a String contains all unique characters work?
Porting this directly to Scala doesn't really work so I'm looking for a more functional way to re-write the stuff above.
I have tried BigInt & BitSet
def isUniqueChars2(str : String) : Boolean =
// Java, char's are Unicode so there are 32768 values
if (str.length() > 32768) false
val checker = BigInt(1)
for(i <- 0 to str.length){
val value = str.charAt(i)
if(checker.testBit(value)) false
checker.setBit(value)
}
true
}
This works, however, but without bit-shifting and without lowercase assumption.
Performance is rather unknown ....
However, I would like to do a more functional style solution.
Thanks to user3189923 for the solution.
def isUniqueChars(str : String) = str.distinct == str
That's it. Thank you.

str.distinct == str
In general, method distinct preserves order of occurrence after removing duplicates. Consider
implicit class RichUnique(val str: String) extends AnyVal {
def isUniqueChars() = str.distinct == str
}
and so
"abc".isUniqueChars
res: Boolean = true
"abcc".isUniqueChars
res: Boolean = false

How about:
str.toSet.size == str.size
?

Related

java.text.Collator treats "v" and "w" as the same letter for Swedish language/locale

The following test passes correctly with Java 8.
Comparator<String> stringComparator = Collator.getInstance(new Locale("sv", "SE"));
Assert.assertTrue(stringComparator.compare("aaaa", "bbbb") < 0);
Assert.assertTrue(stringComparator.compare("waaa", "vbbb") < 0);
Assert.assertTrue(stringComparator.compare("vaaa", "wbbb") < 0);
This orders waaa before vbbb and vaaa before wbbb. Apparently it treats v and w as the same letter.
In fact, according to Wikipedia, in Swedish language:
By 2006, 'W' had grown in usage because of new loanwords, so 'W' officially became a letter, and the 'V' = 'W' sorting rule was deprecated. Pre-2006 books and software generally use the rule. After the rule was deprecated, some books and software continued to apply it.
Does anyone have a general workaround to this, so that v and w are treated as separate letters within Swedish locale?
Create your own RuleBasedCollator.
Check the value of the string returned by
((RuleBasedCollator)Collator.getInstance(new Locale("sv", "SE"))).getRules()
and modify it to suit your needs and then create a new collator with your modified rules.
And probably submit a JDK bug report too, for good measure.
This orders waaa before vbbb and vaaa before wbbb. Apparently it
treats v and w as the same letter.
JDK indeed doesn't treat 'w' and 'v' as the same characters even in Swedish locale. The letter 'v' comes before 'w'.
Assert.assertEquals(1, stringComparator.compare("w", "v"));//TRUE
However, based on the Swedish collation rules, JDK orders 'wa' ahead of 'vb'.
Assert.assertEquals(1, stringComparator.compare("wa", "vb"));//FALSE
You could create a custom comparator, which wraps the collator and manually handles v and w the way you want.
I have made two implementations of this.
The first one is short and elegant, it uses Guavas lexicographical comparator together with the tricky regex that Holger provided in a comment.
private static final Pattern VW_BOUNDARY = Pattern.compile("(?=[vw])|(?<=[vw])", Pattern.CASE_INSENSITIVE);
public static Comparator<String> smallCorrectVwWrapper(Comparator<Object> original) {
return Comparator.comparing(
s -> Arrays.asList(VW_BOUNDARY.split((String) s)),
Comparators.lexicographical(original));
The second implementation is a big and complex thing that does the same thing, but manually implemented, without libraries and regexes.
public static Comparator<String> correctVwWrapper(Comparator<Object> original) {
return (s1, s2) -> compareSplittedVw(original, s1, s2);
}
/**
* Compares the two string by first splitting them into segments separated by W
* and V, then comparing the segments one by one.
*/
private static int compareSplittedVw(Comparator<Object> original, String s1, String s2) {
List<String> l1 = splitVw(s1);
List<String> l2 = splitVw(s2);
int minSize = Math.min(l1.size(), l2.size());
for (int ix = 0; ix < minSize; ix++) {
int comp = original.compare(l1.get(ix), l2.get(ix));
if (comp != 0) {
return comp;
}
}
return Integer.compare(l1.size(), l2.size());
}
private static boolean isVw(int ch) {
return ch == 'V' || ch == 'v' || ch == 'W' || ch == 'w';
}
/**
* Splits the string into segments separated by V and W.
*/
public static List<String> splitVw(String s) {
var b = new StringBuilder();
var result = new ArrayList<String>();
for (int offset = 0; offset < s.length();) {
int ch = s.codePointAt(offset);
if (isVw(ch)) {
if (b.length() > 0) {
result.add(b.toString());
b.setLength(0);
}
result.add(Character.toString((char) ch));
} else {
b.appendCodePoint(ch);
}
offset += Character.charCount(ch);
}
if (b.length() > 0) {
result.add(b.toString());
}
return result;
}
Usage:
public static void main(String[] args) throws Exception {
Comparator<String> stringComparator = correctVwWrapper(Collator.getInstance(new Locale("sv", "SE")));
System.out.println(stringComparator.compare("a", "z") < 0); // true
System.out.println(stringComparator.compare("wa", "vz") < 0); // false
System.out.println(stringComparator.compare("wwa", "vvz") < 0); // false
System.out.println(stringComparator.compare("va", "wz") < 0); // true
System.out.println(stringComparator.compare("v", "w") < 0); // true
}
It is a little more work to implement a wrapping Collator, but it should not be too complicated.
I know this is an old question but I recently had this issue and thought I would share my half-assed solution. This is based on what #DodgyCodeExceptions wrote but I include the code I used.
MyComparator comparator = new MyComparator();
Locale locale = new Locale("sv", "SE");
collator = Collator.getInstance(locale);
String collRuleSVStr = ((RuleBasedCollator) collator).getRules();
// For some reason removing this part of the string get us what we want.
String newCollRulesSVStr = collRuleSVStr.replace("Ø & V ; w , W& Y,", "");
RuleBasedCollator newColl = new RuleBasedCollator(newCollRulesSVStr );
comparator.setCollator(newColl);
I used the getRules() method to get the rules string and printed it out.
This isn't the whole string but just the parts containing the rules for letters:
<a,A<b,B<c,C<d,D<ð,Ð<e,E<f,F<g,G<h,H<i,I<j,J<k,K<l,L<m,M<n,N<o,O<p,P<q,Q<r,R<s, S & SS,ß<t,T& TH, Þ &TH, þ <u,U<v,V<w,W<x,X<y,Y<z,Z&AE,Æ&AE,æ&OE,Œ&OE,œ& Z < å , Å< ä , Ä < a̋, A̋ < æ , Æ < ö , Ö < ő , Ő ; ø , Ø & V ; w , W& Y, ü , Ü; ű, Ű
The parts relevant to V and W are:
U<v,V<w,W<x,
and
Ø & V ; w , W& Y,
The first part looks ok and is the same for other languages such as Norwegian.
So I figured the second part had to be causing a problem so I simply removed it from the string and created a new collation. I am not proficient enough in the syntax to tell you exactly why this part is causing Wa to come before Vb, but simply creating custom Collate without that part works.
Perhaps someone without some insight can explain.

Bitwise operations in Java: Test if in "1010101111011" a bit is set?

Let's say, I have some user input (as a String) like "11010011011".
Now I want to check if a bit at a bit at a particular position is set (each digit should act as a flag).
Note: I am receiving the user's input as a String.
How can I do that?
You could work with the string as is - say you want to check the first bit on the left:
if (input.charAt(0) == '1') { //
Alternatively if you want to work with a BitSet you can initialise it in a loop:
String input = "11010011011";
BitSet bs = new BitSet(input.length());
int i = 0;
for (char c : input.toCharArray()) {
if (c == '1') bs.set(i);
i++;
}
Then to check if the i-th bit is set:
boolean isSet = bs.get(i);
If you want to use bitwise operations, then first convert the string to integer and test with bitmasks:
int val = Integer.parseInt("11010011011", 2);
System.out.println(val & (1<<0)); //First bit
System.out.println(val & (1<<1)); //Second bit
System.out.println(val & (1<<2)); //Third bit
.....

Recursive method to determine if a string is a hex number - Java

This is a homework question that I am having a bit of trouble with.
Write a recursive method that determines if a String is a hex number.
Write javadocs for your method.
A String is a hex number if each and every character is either
0 or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9
or a or A or b or B or c or C or d or D or e or E or f or f.
At the moment all I can see to test this is if the character at 0 of the string is one of these values he gave me then that part of it is a hex.
Any hints or suggestions to help me out?
This is what I have so far: `
public boolean isHex(String string){
if (string.charAt(0)==//what goes here?){
//call isHex again on s.substring(1)
}else return false;
}
`
If you're looking for a good hex digit method:
boolean isHexDigit(char c) {
return Character.isDigit(c) || (Character.toUpperCase(c) >= 'A' && Character.toUpperCase(c) <= 'F');
}
Hints or suggestions, as requested:
All recursive methods call themselves with a different input (well, hopefully a different input!)
All recursive methods have a stop condition.
Your method signature should look something like this
boolean isHexString(String s) {
// base case here - an if condition
// logic and recursion - a return value
}
Also, don't forget that hex strings can start with "0x". This might be (more) difficult to do, so I would get the regular function working first. If you tackle it later, try to keep in mind that 0xABCD0x123 shouldn't pass. :-)
About substring: Straight from the String class source:
public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > count) {
throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex > endIndex) {
throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex, value);
}
offset is a member variable of type int
value is a member variable of type char[]
and the constructor it calls is
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}
It's clearly an O(1) method, calling an O(1) constructor. It can do this because String is immutable. You can't change the value of a String, only create a new one. (Let's leave out things like reflection and sun.misc.unsafe since they are extraneous solutions!) Since it can't be changed, you also don't have to worry about some other Thread messing with it, so it's perfectly fine to pass around like the village bicycle.
Since this is homework, I only give some hints instead of code:
Write a method that always tests the first character of a String if it fulfills the requirements. If not, return false, if yes, call the method again with the same String, but the first character missing. If it is only 1 character left and it is also a hex character then return true.
Pseudocode:
public boolean isHex(String testString) {
If String has 0 characters -> return true;
Else
If first character is a hex character -> call isHex with the remaining characters
Else if the first character is not a hex character -> return false;
}
When solving problems recursively, you generally want to solve a small part (the 'base case'), and then recurse on the rest.
You've figured out the base case - checking if a single character is hex or not.
Now you need to 'recurse on the rest'.
Here's some pseudocode (Python-ish) for reversing a string - hopefully you will see how similar methods can be applied to your problem (indeed, all recursive problems)
def ReverseString(str):
# base case (simple)
if len(str) <= 1:
return str
# recurse on the rest...
return last_char(str) + ReverseString(all_but_last_char(str))
Sounds like you should recursively iterate the characters in string and return the boolean AND of whether or not the current character is in [0-9A-Fa-f] with the recursive call...
You have already received lots of useful answers. In case you want to train your recursive skills (and Java skills in general) a bit more I can recommend you to visit Coding Bat. You will find a lot of exercises together with automated tests.

Suggestion on equalsIgnoreCase in Java to C implementation

All,
In a bid to improve my C skills, I decided to start implementing various Java libraries/library functions to C code. This would ensure that everyone knows the functionality of my implementation at least. Here is the link to the C source code that simulates the equalsIgnoreCase() of String class in Java : C source code. I have tested the code and it looks fine as per my testing skills are concerned. My aim was to use as much basic operations and datatypes as possible. Though, it would be great if the gurus here can:
1 > Give me any suggestion to improve the code quality
2 > Enlighten me with any missing coding standard/practices
3 > Locate bugs in my logic.
100 lines of code is not too long to post here.
You calculate the string length twice. In C, the procedure to calculate the string length starts at the beginning of the string and runs along all of it (not necessarily in steps of 1 byte) until it finds the terminating null byte. If your strings are 2Mbyte long, you "walk" along 4Mbyte unnecessarily.
in <ctype.h> there are the two functions tolower() and toupper() declared. You can use one of them (tolower) instead of extractFirstCharacterASCIIVal(). The advantage of using the library function is that it is not locked in to ASCII and may even work with foreign characters when you go 'international'.
You use awkward (very long) names for your variables (and functions too). eg: ch1 and ch2 do very well for characters in file 1 and file 2 respectively :-)
return 1; at the end of main usually means something went wrong with the program. return 0; is idiomatic for successful termination.
Edit: for comparison with tcrosley version
#include <ctype.h>
int cmpnocase(const char *s1, const char *s2) {
while (*s1 && *s2) {
if (tolower((unsigned char)*s1) != tolower((unsigned char)*s2)) break;
s1++;
s2++;
}
return (*s1 != *s2);
}
With C++, you could replace performComparison(char* string1, char * string2) with stricmp.
However stricmp is not part of the standard C library. Here is a version adapted to your example. Note you don't need the extractFirstCharacterASCIIVal function, use tolower instead. Also note there is no need to explicitly calculate the string length ahead of time, as strings in C are terminated by the NULL character '\0'.
int performComparison(char* string1, char * string2)
{
char c1, c2;
int v;
do {
c1 = *string1++;
c2 = *string2++;
v = (UINT) tolower(c1) - (UINT) tolower(c2);
} while ((v == 0) && (c1 != '\0') && (c2 != '\0') );
return v != 0;
}
If you do want to use your own extractFirstCharacterASCIIVal function instead of the tolower macro, to make the code more transparent then you should code it like so:
if ((str >= 'a') && (str <= 'z'))
{
returnVal = str - ('a' - 'A');
}
else
{
returnVal = str;
}
to make it more obvious what you are doing. Also you should include a comment that this assumes the characters a..z and A..Z are contiguous. (They are in ASCII, but not always in other encodings.)

Check whether a string is parsable into Long without try-catch?

Long.parseLong("string") throws an error if string is not parsable into long.
Is there a way to validate the string faster than using try-catch?
Thanks
You can create rather complex regular expression but it isn't worth that. Using exceptions here is absolutely normal.
It's natural exceptional situation: you assume that there is an integer in the string but indeed there is something else. Exception should be thrown and handled properly.
If you look inside parseLong code, you'll see that there are many different verifications and operations. If you want to do all that stuff before parsing it'll decrease the performance (if we are talking about parsing millions of numbers because otherwise it doesn't matter). So, the only thing you can do if you really need to improve performance by avoiding exceptions is: copy parseLong implementation to your own function and return NaN instead of throwing exceptions in all correspondent cases.
From commons-lang StringUtils:
public static boolean isNumeric(String str) {
if (str == null) {
return false;
}
int sz = str.length();
for (int i = 0; i < sz; i++) {
if (Character.isDigit(str.charAt(i)) == false) {
return false;
}
}
return true;
}
You could do something like
if(s.matches("\\d*")){
}
Using regular expression - to check if String s is full of digits.
But what do you stand to gain? another if condition?
org.apache.commons.lang3.math.NumberUtils.isParsable(yourString) will determine if the string can be parsed by one of: Integer.parseInt(String), Long.parseLong(String), Float.parseFloat(String) or Double.parseDouble(String)
Since you are interested in Longs you could have a condition that checks for isParsable and doesn't contain a decimal
if (NumberUtils.isParsable(yourString) && !StringUtils.contains(yourString,".")){ ...
This is a valid question because there are times when you need to infer what type of data is being represented in a string. For example, you may need to import a large CSV into a database and represent the data types accurately. In such cases, calling Long.parseLong and catching an exception can be too slow.
The following code only handles ASCII decimal:
public class LongParser {
// Since tryParseLong represents the value as negative during processing, we
// counter-intuitively want to keep the sign if the result is negative and
// negate it if it is positive.
private static final int MULTIPLIER_FOR_NEGATIVE_RESULT = 1;
private static final int MULTIPLIER_FOR_POSITIVE_RESULT = -1;
private static final int FIRST_CHARACTER_POSITION = 0;
private static final int SECOND_CHARACTER_POSITION = 1;
private static final char NEGATIVE_SIGN_CHARACTER = '-';
private static final char POSITIVE_SIGN_CHARACTER = '+';
private static final int DIGIT_MAX_VALUE = 9;
private static final int DIGIT_MIN_VALUE = 0;
private static final char ZERO_CHARACTER = '0';
private static final int RADIX = 10;
/**
* Parses a string representation of a long significantly faster than
* <code>Long.ParseLong</code>, and avoids the noteworthy overhead of
* throwing an exception on failure. Based on the parseInt code from
* http://nadeausoftware.com/articles/2009/08/java_tip_how_parse_integers_quickly
*
* #param stringToParse
* The string to try to parse as a <code>long</code>.
*
* #return the boxed <code>long</code> value if the string was a valid
* representation of a long; otherwise <code>null</code>.
*/
public static Long tryParseLong(final String stringToParse) {
if (stringToParse == null || stringToParse.isEmpty()) {
return null;
}
final int inputStringLength = stringToParse.length();
long value = 0;
/*
* The absolute value of Long.MIN_VALUE is greater than the absolute
* value of Long.MAX_VALUE, so during processing we'll use a negative
* value, then we'll multiply it by signMultiplier before returning it.
* This allows us to avoid a conditional add/subtract inside the loop.
*/
int signMultiplier = MULTIPLIER_FOR_POSITIVE_RESULT;
// Get the first character.
char firstCharacter = stringToParse.charAt(FIRST_CHARACTER_POSITION);
if (firstCharacter == NEGATIVE_SIGN_CHARACTER) {
// The first character is a negative sign.
if (inputStringLength == 1) {
// There are no digits.
// The string is not a valid representation of a long value.
return null;
}
signMultiplier = MULTIPLIER_FOR_NEGATIVE_RESULT;
} else if (firstCharacter == POSITIVE_SIGN_CHARACTER) {
// The first character is a positive sign.
if (inputStringLength == 1) {
// There are no digits.
// The string is not a valid representation of a long value.
return null;
}
} else {
// Store the (negative) digit (although we aren't sure yet if it's
// actually a digit).
value = -(firstCharacter - ZERO_CHARACTER);
if (value > DIGIT_MIN_VALUE || value < -DIGIT_MAX_VALUE) {
// The first character is not a digit (or a negative sign).
// The string is not a valid representation of a long value.
return null;
}
}
// Establish the "maximum" value (actually minimum since we're working
// with negatives).
final long rangeLimit = (signMultiplier == MULTIPLIER_FOR_POSITIVE_RESULT)
? -Long.MAX_VALUE
: Long.MIN_VALUE;
// Capture the maximum value that we can multiply by the radix without
// overflowing.
final long maxLongNegatedPriorToMultiplyingByRadix = rangeLimit / RADIX;
for (int currentCharacterPosition = SECOND_CHARACTER_POSITION;
currentCharacterPosition < inputStringLength;
currentCharacterPosition++) {
// Get the current digit (although we aren't sure yet if it's
// actually a digit).
long digit = stringToParse.charAt(currentCharacterPosition)
- ZERO_CHARACTER;
if (digit < DIGIT_MIN_VALUE || digit > DIGIT_MAX_VALUE) {
// The current character is not a digit.
// The string is not a valid representation of a long value.
return null;
}
if (value < maxLongNegatedPriorToMultiplyingByRadix) {
// The value will be out of range if we multiply by the radix.
// The string is not a valid representation of a long value.
return null;
}
// Multiply by the radix to slide all the previously parsed digits.
value *= RADIX;
if (value < (rangeLimit + digit)) {
// The value would be out of range if we "added" the current
// digit.
return null;
}
// "Add" the digit to the value.
value -= digit;
}
// Return the value (adjusting the sign if needed).
return value * signMultiplier;
}
}
You can use java.util.Scanner
Scanner sc = new Scanner(s);
if (sc.hasNextLong()) {
long num = sc.nextLong();
}
This does range checking etc, too. Of course it will say that "99 bottles of beer" hasNextLong(), so if you want to make sure that it only has a long you'd have to do extra checks.
This case is common for forms and programs where you have the input field and are not sure if the string is a valid number. So using try/catch with your java function is the best thing to do if you understand how try/catch works compared to trying to write the function yourself. In order to setup the try catch block in .NET virtual machine, there is zero instructions of overhead, and it is probably the same in Java. If there are instructions used at the try keyword then these will be minimal, and the bulk of the instructions will be used at the catch part and that only happens in the rare case when the number is not valid.
So while it "seems" like you can write a faster function yourself, you would have to optimize it better than the Java compiler in order to beat the try/catch mechanism you already use, and the benefit of a more optimized function is going to be very minimal since number parsing is quite generic.
If you run timing tests with your compiler and the java catch mechanism you already described, you will probably not notice any above marginal slowdown, and by marginal I mean it should be almost nothing.
Get the java language specification to understand the exceptions more and you will see that using such a technique in this case is perfectly acceptable since it wraps a fairly large and complex function. Adding on those few extra instructions in the CPU for the try part is not going to be such a big deal.
I think that's the only way of checking if a String is a valid long value. but you can implement yourself a method to do that, having in mind the biggest long value.
There are much faster ways to parse a long than Long.parseLong. If you want to see an example of a method that is not optimized then you should look at parseLong :)
Do you really need to take into account "digits" that are non-ASCII?
Do you really need to make several methods calls passing around a radix even tough you're probably parsing base 10?
:)
Using a regexp is not the way to go: it's harder to determine if you're number is too big for a long: how do you use a regexp to determine that 9223372036854775807 can be parsed to a long but that 9223372036854775907 cannot?
That said, the answer to a really fast long parsing method is a state machine and that no matter if you want to test if it's parseable or to parse it. Simply, it's not a generic state machine accepting complex regexp but a hardcoded one.
I can both write you a method that parses a long and another one that determines if a long can be parsed that totally outperforms Long.parseLong().
Now what do you want? A state testing method? In that case a state testing method may not be desirable if you want to avoid computing twice the long.
Simply wrap your call in a try/catch.
And if you really want something faster than the default Long.parseLong, write one that is tailored to your problem: base 10 if you're base 10, not checking digits outside ASCII (because you're probably not interested in Japanese's itchi-ni-yon-go etc.).
Hope this helps with the positive values. I used this method once for validating database primary keys.
private static final int MAX_LONG_STR_LEN = Long.toString(Long.MAX_VALUE).length();
public static boolean validId(final CharSequence id)
{
//avoid null
if (id == null)
{
return false;
}
int len = id.length();
//avoid empty or oversize
if (len < 1 || len > MAX_LONG_STR_LEN)
{
return false;
}
long result = 0;
// ASCII '0' at position 48
int digit = id.charAt(0) - 48;
//first char cannot be '0' in my "id" case
if (digit < 1 || digit > 9)
{
return false;
}
else
{
result += digit;
}
//start from 1, we already did the 0.
for (int i = 1; i < len; i++)
{
// ASCII '0' at position 48
digit = id.charAt(i) - 48;
//only numbers
if (digit < 0 || digit > 9)
{
return false;
}
result *= 10;
result += digit;
//if we hit 0x7fffffffffffffff
// we are at 0x8000000000000000 + digit - 1
// so negative
if (result < 0)
{
//overflow
return false;
}
}
return true;
}
Try to use this regular expression:
^(-9223372036854775808|0)$|^((-?)((?!0)\d{1,18}|[1-8]\d{18}|9[0-1]\d{17}|92[0-1]\d{16}|922[0-2]\d{15}|9223[0-2]\d{14}|92233[0-6]\d{13}|922337[0-1]\d{12}|92233720[0-2]\d{10}|922337203[0-5]\d{9}|9223372036[0-7]\d{8}|92233720368[0-4]\d{7}|922337203685[0-3]\d{6}|9223372036854[0-6]\d{5}|92233720368547[0-6]\d{4}|922337203685477[0-4]\d{3}|9223372036854775[0-7]\d{2}|922337203685477580[0-7]))$
It checks all possible numbers for Long.
But as you know in Java Long can contain additional symbols like +, L, _ and etc. And this regexp doesn't validate these values. But if this regexp is not enough for you, you can add additional restrictions for it.
Guava Longs.tryParse("string") returns null instead of throwing an exception if parsing fails. But this method is marked as Beta right now.
You could try using a regular expression to check the form of the string before trying to parse it?
A simple implementation to validate an integer that fits in a long would be:
public static boolean isValidLong(String str) {
if( str==null ) return false;
int len = str.length();
if (str.charAt(0) == '+') {
return str.matches("\\+\\d+") && (len < 20 || len == 20 && str.compareTo("+9223372036854775807") <= 0);
} else if (str.charAt(0) == '-') {
return str.matches("-\\d+") && (len < 20 || len == 20 && str.compareTo("-9223372036854775808") <= 0);
} else {
return str.matches("\\d+") && (len < 19 || len == 19 && str.compareTo("9223372036854775807") <= 0);
}
}
It doesn't handle octal, 0x prefix or so but that is seldom a requirement.
For speed, the ".match" expressions are easy to code in a loop.

Categories

Resources