Why do strings with newlines do not match regular expressions in Java? - java

I have a String string that contains a newline (\n). When I try to match it with a regular expression pattern it returns false, although there should be a match.
package com.stackoverflow;
public class ExgExTest {
public static void main(String[] args) {
String pattern = ".*[0-9]{2}[A-Z]{2}.*";
String string = "123ABC\nDEF";
if (string.matches(pattern)) {
System.out.println("Matches.");
} else {
System.out.println("Does not match.");
}
} // END: main()
} // END: class
How can I match multiline strings with a regular expression?

How can I match multiline strings with a regular expression?
You need to use DOTALL (s) flag for this:
String pattern = "(?s).*[0-9]{2}[A-Z]{2}.*";
Take note of (?s) which will make DOT match new lines also.

You should use Pattern.quote(pattern) to escape all special characters in the pattern.
Documentation.

Related

Java Regex to replace a pattern in a certain string

I want to replace a word starting with # in a string which contains set of words with the same word (# omitted)
example
"word1 word2 #user" should be replaced with "word1 word2 user"
Can someone help me?
You can use regex. Lets start with
yourText = yourText.replaceAll("#(\\S+)", "$1");
in regex:
\S represents any non-whitespace characters
+ represents one or more
\S+ represents one or more non-whitespace characters
(\S+) -parenthesis create group containing one or more non-whitespace characters, this group will be indexed as 1
in replacement
$1 in replacement allows us to use content of group 1.
In other words it will try to find #non-whitespaces (which and replace it with non-whitespaces part.
But this solution doesn't require # to be start of word. To do this we could check if before # there is
whitespace space \s,
or start of the string ^.
To test if something is before our element without actually including it in our match we can use look-behind (?<=...).
So our final solution can look like
yourText = yourText.replaceAll("(?<=^|\\s)#(\\S+)", "$1");
yes, String.replaceAll()
String foo = "#user"
foo = foo.replaceAll("#", "");
You have not very clear use case, but my assumptions with code example:
omit all symbols with replaceAll function
omit just first symbol with substring function
public class TestRegex {
public static void main(String[] args) {
String omitInStart = "#user";
String omitInMiddle = "#user";
String omitInEnd = "#user";
String omitFewSymbols = "#us#er";
List<String> listForOmit = Arrays.asList(omitInStart, omitInMiddle, omitInEnd, omitFewSymbols);
listForOmit.forEach(e -> System.out.println(omitWithReplace(e)));
listForOmit.forEach(e -> System.out.println(omitFirstSymbol(e)));
}
private static String omitFirstSymbol(String stringForOmit) {
return stringForOmit.substring(1);
}
private static String omitWithReplace(String stringForOmit) {
String symbolForOmit = "#";
return stringForOmit.replaceAll(symbolForOmit, "");
}
}

How to check specific special character in String

I am having below String value, in that how can I find the only this four specified special character like [],:,{},-() (square bracket, curly bracket, hyphen and colon) in a given String.
String str = "[1-10],{10-20},dhoni:kholi";
Kindly help me as I am new to Java.
I think you can use regular expression like this.
class MyRegex
{
public static void main (String[] args) throws java.lang.Exception
{
String str = "[1-10],{10-20},dhoni:kholi";
String text = str.replaceAll("[a-zA-Z0-9]",""); // replacing all numbers and alphabets with ""
System.out.print(text); // result string
}
}
Hope this will help you.
If it is only characters that you want to check then you can use String.replaceAll method with regular expression
System.out.println("[Hello {}:-,World]".replaceAll("[^\\]\\[:\\-{}]", ""));

Reading patterns from a file vs. string literals

I have a problem with my regex. I used the following code to get all my regexes out of an ArrayList, compile it and search for matches:
public boolean match(String command){
for (String regex : regexA) {
System.out.println(regex);
Pattern regPatter = Pattern.compile(regex);
Matcher regMatcher = regPatter.matcher(command);
if(regMatcher.find())
return true;
}
return false;
}
I test it like that:
public static void main(String[] args){
RegexMatcher reg = new RegexMatcher(new File("C:\\Users\\XXX\\Desktop\\regex.txt"));
System.out.println(reg.match("password cisco"));
}
It will return the following:
pas[a-z]\\s*\\w+
er\\w*\\s+(?!s).*
us[a-z]*\\s+((?!cisco).)*$
tr[a-z]*\\s+i[a-z]*\\s+\\w*\\s*
f[a-z]*\\s+f.*\\s*
en[a-z]*\\s+v.*
false
It will return false. But if I do it different like that it works:
public boolean match(String command){
Pattern regPatter = Pattern.compile("pas[a-z]\\s*\\w+");
Matcher regMatcher = regPatter.matcher(command);
if(regMatcher.find())
return true;
return false;
}
So my problem is if I enter the string directly in Pattern.compile() it works, but if I do like in my match() method it won't work.
Your regex.txt file should contain just single back-slashes "\", not double ones - ie. it should be :
pas[a-z]\s*\w+
er\w*\s+(?!s).*
us[a-z]*\s+((?!cisco).)*$
tr[a-z]*\s+i[a-z]*\s+\w*\s*
f[a-z]*\s+f.*\s*
en[a-z]*\s+v.*
In Java strings, backslashes are used to "escape" special characters - eg. "\n" results in a string containing just a single newline character, not a "\" followed by an "n".
Similarly, the double-backslash "\" results in a string containing a single backslash. That is what you want for a Regex.
Files don't need to escape anything (they have newlines, etc already encoded), so they don't need to escape backslashes - which is why they only need single ones.
In a string literal, backslashes must be escaped. That means that the string \foo, when written as a string literal in Java Source code, must be written "\\foo".
Your second example uses the literal string "pas[a-z]\\s*\\w+". Which in fact corresponds to the actual string pas[a-z]\s*\w+". But the string in the list isn't that string, it is pas[a-z]\\s*\\w+.
To be very simple, regexps read from file are escaped differently than regexps in the Java string. for example, the correct string is "\\w", but the correct line in the file is
\w

Java class validating a Pattern expression compile

I've written a really simple regular expression to validate a phone number that I can see works in the engine provided by zytrax.com regex. When I use it in the class to compile as a pattern I get en error with the escaped characters for the Pattern.compile string to process.
package Test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class FindMainTestExcercisePN {
private static String phone;
private static Matcher matcher;
private boolean getCheckNumber(String pn) {
boolean valid = matcher.matches();
return valid;
}
private void PhoneNumber(String input) {
Pattern pattern = Pattern.compile("^(?:(?:\\+?\\s*1\\s*(?:[.-\\s*]?)(?:[.\\s*-]?))?(?:(\\s*([0-9]|[0-9]|[0-9])\\s*)|([0-9]|[0-9]|[0-9]))\\s*(?:[.-\\s*]?)?)?([0-9]|[0-9]|[0-9]{2})\\s*(?:[.\\s*-]?)(?:[.-\\s*]?)?([0-9]|[0-9]|[0-9]|[0-9]{4})\\s*");
matcher = pattern.matcher(input);
}
public static void main(String[] a) {
FindMainTestExcercisePN ex15 = new FindMainTestExcercisePN();
phone = "1-098-234-5454";
ex15.PhoneNumber(phone);
boolean bool = ex15.getCheckNumber(phone);
System.out.println("The number is valid= " + bool);
}
}
If you take out the escapes it will work just fine (prime ex. 1-345-345-3324) so any suggestions please?
This expression is illegal:
[.-\\s*]
In a character class, the dash character is a range operator, eg [0-9] means "any character in the range 0 to 9"., but here you have coded a range .-\s, which attempts to express "any character in the range dot to 'any whitespace'", which is clearly nonsense.
To code a literal dash in a character class, code it first or last.
If the intention if this expression is "a dot, dash, whitespace or star", then code:
[.\\s*-]
If the star is not intended as a literal, but you want to express "a dot or dash, or any number of whitespace", use this:
([.-]?|\\s*)
you method getCheckNumber always return true

java regular expression returning false

I am newbie to java regular expression. I wrote following code for validating the non digit number. If we enter any non digit number it should return false. for me the below code always return false. whats the wrong here?
package regularexpression;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class NumberValidator {
private static final String NUMBER_PATTERN = "\\d";
Pattern pattern;
public NumberValidator() {
pattern = Pattern.compile(NUMBER_PATTERN);
}
public boolean validate(String line){
Matcher matcher = pattern.matcher(line);
return matcher.matches();
}
public static void main(String[] args) {
NumberValidator validator = new NumberValidator();
boolean validate = validator.validate("123");
System.out.println("validate:: "+validate);
}
}
From Java documentation:
The matches method attempts to match the entire input sequence against the pattern.
Your regular expression matches a single digit, not a number. Add + after \\d to matchone or more digits:
private static final String NUMBER_PATTERN = "\\d+";
As a side note, you can combine initialization and declaration of pattern, making the constructor unnecessary:
Pattern pattern = Pattern.compile(NUMBER_PATTERN);
matches "returns true if, and only if, the entire region sequence matches this matcher's pattern."
The string is 3 digits, which doesn't match the pattern \d, meaning 'a digit'.
Instead you want the pattern \d+, meaning 'one or more digits.' This is expressed in a string as "\\d+"

Categories

Resources