Can someone explain how does this regular expression behave? - java

String [] numbers =s.split("(?<=\\G.{50})");
I know what split is, but why do I need [], what do those do? And most importantly, can someone explain "(?<=\\G.{50})" thoroughly?

The returned array will contain one String for each result returned by the split function, for any matches returned after separating the input string on the provided regular expression.
This regular expression provided here is making use of zero-width positive lookbehinds, as documented at https://docs.oracle.com/javase/8/docs/api/index.html?java/util/regex/Pattern.html . It is searching for anything that comes BEFORE the end of the previous match (\G - escaped with another \ as a Java String literal), followed by any 50 characters.
In short - this is just splitting your input of s into 50-character chunks. (Not sure I would have used a Regular Expression for this - but it works...)

Related

^ and $ in Java regular expression

I know that ^ and $ means "matches the beginning of the line" and "matches the end of line"
However, when I did some coding today, I didn't notice any difference between including them and excluding them in a regular expression used in Java.
For example, I want to match a positive Integer using
^[1-9]\\d*$
, and when I exclude them in the regular expression like
[1-9]\\d*
, it seems that there is no difference. I have tried to test with a String that "contains" an integer like ###123###, and the second regular expression can still recognize it is not valid like the first one.
So are the two regular expressions above completely equal to the other one? Thanks!
Do you need to search a string like 2343, or [SPACE]2345, or abc234?
The anchored regex will only find the number in the first string. The un-anchored will find them in all strings.
It all depends on what your requirements are. Are you analyzing lines in a text file, where each line contains only digits?, or are you analyzing the text in a prose document or source-code, where digits may be interspersed among a whole bunch of other stuff?
In the former case, the anchors are good. In the latter, they are bad.
More info: http://www.regular-expressions.info/anchors.html
They are different, the first input checks the whole line so from the begin to the end of the line and second doesn't care about the line.
For more check: regex-bounds
Well...no, the regular expressions aren't equivalent. They're also not doing what you think they are.
You intend to match a positive digit - what your regular expression aims to do is to match some character between 1 and 9, then match any number of digit characters after that (which includes zero).
The difference between the two is the anchoring, as you've noted - the first regex will only match values that literally begin with a 1 through 9, then zero or more digits, then expect there to be nothing else in the string.
The correct regex to match any positive number anywhere in the string would look like this:
[1-9]*\\d*
...and the correct regex to match any line that is a positive number would be this:
^[1-9]*\\d*$

What all characters can be used as String Delimiters in Java?

I am trying break a String in various pieces using delimiter(":").
String sepIds[]=ids.split(":");
It is working fine. But when I replace ":" with " * " and use " * " as delimiter, it doesn't work.
String sepIds[]=ids.split("*"); //doesn't work
It just hangs up there, and doesn't execute further.
What mistake I am making here?
String#split takes a regular expression as parameter. In regex some chars have special meanings so they need to be escaped, for example:
"foo*bar".split("\\*")
the result will be as you expect:
[foo, bar]
You could also use the method Pattern#quote to simplify the task.
"foo*bar".split(Pattern.quote("*"))
String.split expects a regular expression argument. * has got a meaning in regex. So if you want to use them then you need to escape them like this:
String sepIds[]=ids.split("\\*");
The argument of .split() is a regular expression, not a string literal. Therefore you need to escape * since it is a special regex character. Write:
ids.split("\\*");
This is how you would split agaisnt one or more spaces:
ids.split("\\s+");
Note that Guava has Splitter which is very, very fast and can split against literals:
Splitter.on('*').split(ids);
'*' and '.' are special characters you have to blackshlash it.
String sepIds[]=ids.split("\\*");
To read more about java patterns please visit that page.
That is expected behaviour. The documentation for the String split function says that the input string is treated as a regular expression (with a link explaining how that works). As Germann points out, '*' is a special character in regular expressions.
Java's String.split() uses regular expressions to split up the string (unlike similar functions in C# or python). * is a special character in regular expressions and you need to escape it with a \ (backslash). So you should use instead:
String sepIds[]=ids.split("\\*");
You can find more information on regular expressions anywhere on the internet a quite complete list of special characters supported by java should be here: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Java replaceAll to javascript regex

I want to move some user input test from Java to javascript. The code suppose to remove wildcard characters out of user input string, at any position. I'm attempting to convert the following Java notation to javascript, but keep getting error
"Invalid regular expression: /(?<!\")~[\\d\\.]*|\\?|\\*/: Invalid group".
I have almost no experience with regex expressions. Any help will be much appreciated:
JAVA:
str = str.replaceAll("(?<!\")~[\\d\\.]*|\\?|\\*","");
My failing javascript version:
input = input.replace( /(?<!\")~[\\d\\.]*|\\?|\\*/g, '');
The problem, as anubhava points out, is that JavaScript doesn't support lookbehind assertions. Sad but true. The lookbehind assertion in your original regex is (?<!\"). Specifically, it's looking only for strings that don't start with a double quotation mark.
However, all is not lost. There are some tricks you can use to achieve the same result as a lookbehind. In this case, the lookbehind is there only to prevent the character prior to the tilde from being replaced as well. We can accomplish this in JavaScript by matching the character anyway, but then including it in the replacement:
input = input.replace( /([^"])~[\d.]*|\?|\*/g, '$1' );
Note that for the alternations \? and \*, there will be no groups, so $1 will evaluate to the empty string, so it doesn't hurt to include it in the replacement.
NOTE: this is not 100% equivalent to the original regular expression. In particular, lookaround assertions (like the lookbehind above) also prevent the input stream from being consumed, which can sometimes be very helpful when matching things that are right next to each other. However, in this case, I can't think of a way that that would be a problem. To make a completely equivalent regex would be more difficult, but I believe this meets the need of the original regex.

Regex to detect number within String

I'm confronted with a String:
[something] -number OR number [something]
I want to be able to cast the number. I do not know at which position is occures. I cannot build a sub-string because there's no obvious separator.
Is there any method how I could extract the number from the String by matching a pattern like
[-]?[0..9]+
, where the minus is optional? The String can contain special characters, which actually drives me crazy defining a regex.
-?\b\d+\b
That's broken down by:
-? (optional minus sign)
\b word boundary
\d+ 1 or more digits
[EDIT 2] - nod to Alan Moore
Unfortuantely Java doesn't have verbatim strings, so you'll have to escape the Regex above as:
String regex = "-?\\b\\d+\\b"
I'd also recommend a site like http://regexlib.com/RETester.aspx or a program like Expresso to help you test and design your regular expressions
[EDIT] - after some good comments
If haven't done something like *?(-?\d+).* (from #Voo) because I wasn't sure if you wanted to match the entire string, or just the digits. Both versions should tell you if there are digits in the string, and if you want the actual digits, use the first regex and look for group[0]. There are clever ways to name groups or multiple captures, but that would be a complicated answer to a straight forward question...

Regex for java's String.matches method?

Basically my question is this, why is:
String word = "unauthenticated";
word.matches("[a-z]");
returning false? (Developed in java1.6)
Basically I want to see if a string passed to me has alpha chars in it.
The String.matches() function matches your regular expression against the whole string (as if your regex had ^ at the start and $ at the end). If you want to search for a regular expression somewhere within a string, use Matcher.find().
The correct method depends on what you want to do:
Check to see whether your input string consists entirely of alphabetic characters (String.matches() with [a-z]+)
Check to see whether your input string contains any alphabetic character (and perhaps some others) (Matcher.find() with [a-z])
Your code is checking to see if the word matches one character. What you want to check is if the word matches any number of alphabetic characters like the following:
word.matches("[a-z]+");
with [a-z] you math for ONE character.
What you’re probably looking for is [a-z]*

Categories

Resources