Necessary to escape a java regular expression in matches()? - java

I'm currently doing a test on an HTTP Origin to determine if it came from SSL:
(HttpHeaders.Names.ORIGIN).matches("/^https:\\/\\//")
But I'm finding it's not working. Do I need to escape matches() strings like a regular expression or can I leave it like https://? Is there any way to do a simple string match?
Seems like it would be a simple question, but surprisingly I'm not getting anywhere even after using a RegEx tester http://www.regexplanet.com/advanced/java/index.html. Thanks.

Java's regex doesn't need delimiters. Simply do:
.matches("https://.*")
Note that matches validates the entire input string, hence the .* at the end. And if the input contains line break chars (which . will not match), enable DOT-ALL:
.matches("(?s)https://.*")
Of couse, you could also simply do:
.startsWith("https://")
which takes a plain string (no regex pattern).

How about this Regex:
"^(https:)\/\/.*"
It works in your tester

Related

Regex for detecting struts `.do` pages - tests failing

I am testing the following regex that exists in an older project I have inherited:
.*\\.do
Within Java, the regex is declared as:
private static final String[] ACCESS_REGEX = {".*\\.do", ""};
And is essentially checked using the wrapper for Pattern.matches method: value.matches(check).
This old regex is working fine for various incoming requests such as home.do and I am doing a test on various regex test sites (listed below):
http://www.regexplanet.com/advanced/java/index.html
http://www.freeformatter.com/java-regex-tester.html
However, I can't see to get the regex to match various strings that I believe should match... I thought the regex above matches Strings that end with .do and have some characters in front. However, when I test for these no matches are found.
Example Test Strings:
home.do
\home.do
mmm\mmm\home.do
\mmm\home.do
home.do
Remind the special meaning, the \ character has in regular expressions and in Java string literals!
The regular expression should be
.*\.do
This works very well on http://www.freeformatter.com/java-regex-tester.html.
In a Java string literal you also need to escape the \ character, hence the regular expression in Java must be
.*\\.do
The issue is that the online regex tools you're using expect plain regex (without special characters escaped), which is .*\.do in your particular case - mind the single backslash.
On the other hand, when defined in a string literal in Java, regexes need special characters escaped, hence ".*\\.do" in your Java code.
Use unescaped regexes in the online test tools.
you need single back slash rather than double back slash in regexp.
You want to escape the dot only. remove the first back slash and the tests will pass

Java replaceAll to javascript regex

I want to move some user input test from Java to javascript. The code suppose to remove wildcard characters out of user input string, at any position. I'm attempting to convert the following Java notation to javascript, but keep getting error
"Invalid regular expression: /(?<!\")~[\\d\\.]*|\\?|\\*/: Invalid group".
I have almost no experience with regex expressions. Any help will be much appreciated:
JAVA:
str = str.replaceAll("(?<!\")~[\\d\\.]*|\\?|\\*","");
My failing javascript version:
input = input.replace( /(?<!\")~[\\d\\.]*|\\?|\\*/g, '');
The problem, as anubhava points out, is that JavaScript doesn't support lookbehind assertions. Sad but true. The lookbehind assertion in your original regex is (?<!\"). Specifically, it's looking only for strings that don't start with a double quotation mark.
However, all is not lost. There are some tricks you can use to achieve the same result as a lookbehind. In this case, the lookbehind is there only to prevent the character prior to the tilde from being replaced as well. We can accomplish this in JavaScript by matching the character anyway, but then including it in the replacement:
input = input.replace( /([^"])~[\d.]*|\?|\*/g, '$1' );
Note that for the alternations \? and \*, there will be no groups, so $1 will evaluate to the empty string, so it doesn't hurt to include it in the replacement.
NOTE: this is not 100% equivalent to the original regular expression. In particular, lookaround assertions (like the lookbehind above) also prevent the input stream from being consumed, which can sometimes be very helpful when matching things that are right next to each other. However, in this case, I can't think of a way that that would be a problem. To make a completely equivalent regex would be more difficult, but I believe this meets the need of the original regex.

How check that string is NOT blank by java regular expression?

There is regular expression for finding blank string and I want only negation. I also see this question but it does not work for java (see examples). Solution also not work for me (see 3-rd line in example).
For example
Pattern.compile("/^$|\\s+/").matcher(" ").matches() - false
Pattern.compile("/^$|\\s+/").matcher(" a").matches()- false
Pattern.compile("^(?=\\s*\\S).*$").matcher("\t\n a").matches() - false
return false in both cases.
P.S. If something is not clear ask me questions.
UPDATED
I want to use this regular expression in #Pattern annotation without creating custom annotation and programmatic validator for it. That's why I want a "plain" regexp solution without using find function.
It's not clear what you mean by negation.
If you mean "a string that contains at least one non-blank character," then you can use this:
Pattern.compile("\\S").matcher(str).find()
If it's really necessary to use matches, then you can do it with this.
Pattern.compile("\\A\\s*\\S.*\\Z").matcher(str).matches()
This just matches 0 or more spaces followed by a non-space followed by any characters at all up to the end of the string.
If you mean "a string that is all non-blank with at least one such character," then you can use this:
Pattern.compile("\\A\\S+\\Z").matcher(str).matches()
You need to study the Java regex syntax. In Java, regular expressions are compiled from strings, so there's no need for special delimiters like /.../ or %r{...} as you'll see in other languages.
How about this:
if(!string.trim().isEmpty()) {
// do something
}
Use regex \s : A whitespace character: \t\n\x0B\f\r.
Pattern.compile("\\s")

Java: validating a certain string pattern

I am trying to validate a string in a 'iterative way' and all my tryouts just fail!
I find it a bit complicated and i'm guessing maybe you could teach me how to do it right.
I assume that most of you will suggest me to use regex patterns but i dont really know how, and in general, how can a regex be defined for infinite "sets"?
The string i want to validate is
"ANYTHING|NUMBER_ONLY,ANYTHING|NUMBER_ONLY..."
for example: "hello|5,word|10" and "hello|5,word|10," are both valid.
note: I dont mind if the string ends with or without a comma ','.
Kleene star (*) lets you define "infinite sets" in regular expressions. Following pattern should do the trick:
[^,|]+\|\d+(,[^,|]+\|\d+)*,?
A----------B--------------C-
Part A matches the first element. Part B matches any following elements (notice the star). Part C is the optional comma at the end.
WARNING: Remember to escape backslashes in Java string.
I'd suggest splitting your string to array by | delimiter. And validate each part separately. Each part (except first one) should match following pattern \d+(,.*)?
UPDATED
Split by , and validate each part with .*|\d+

How do I write a regular expression to find the following pattern?

I am trying to write a regular expression to do a find and replace operation. Assume Java regex syntax. Below are examples of what I am trying to find:
12341+1
12241+1R1
100001+1R2
So, I am searching for a string beginning with one or more digits, followed by a "1+1" substring, followed by 0 or more characters. I have the following regex:
^(\d+)(1\\+1).*
This regex will successfully find the examples above, however, my goal is to replace the strings with everything before "1+1". So, 12341+1 would become 1234, and 12241+1R1 would become 1224. If I use the first grouped expression $1 to replace the pattern, I get the wrong result as follows:
12341+1 becomes 12341
12241+1R1 becomes 12241
100001+1R2 becomes 100001
Any ideas?
Your existing regex works fine, just that you are missing a \ before \d
String str = "100001+1R2";
str = str.replaceAll("^(\\d+)(1\\+1).*","$1");
Working link
IMHO, the regex is correct.
Perhaps you wrote it wrong in the code. If you want to code the regex ^(\d+)(1\+1).* in a string, you have to write something like String regex = "^(\\d+)(1\\+1).*".
Your output is the result of ^(\d+)(1+1).* replacement, as you miss some backslash in the string (e.g. "^(\\d+)(1\+1).*").
Your regex looks fine to me - I don't have access to java but in JavaScript the code..
"12341+1".replace(/(\d+)(1\+1)/g, "$1");
Returns 1234 as you'd expect. This works on a string with many 'codes' in too e.g.
"12341+1 54321+1".replace(/(\d+)(1\+1)/g, "$1");
gives 1234 5432.
Personally, I wouldn't use a Regex at all (it'd be like using a hammer on a thumbtack), I'd just create a substring from (Pseudocode)
stringName.substring(0, stringName.indexOf("1+1"))
But it looks like other posters have already mentioned the non-greedy operator.
In most Regex Syntaxes you can add a '?' after a '+' or '*' to indicate that you want it to match as little as possible before moving on in the pattern. (Thus: ^(\d+?)(1+1) matches any number of digits until it finds "1+1" and then, NOT INCLUDING the "1+1" it continues matching, whereas your original would see the 1 and match it as well).

Categories

Resources