regular expression with letters, numbers and some punctuation

regular expression with letters, numbers and some punctuation - java

I am very new to regular expressions. I need to construct a regular expression which if used in the code below would produce a newLine that had only letters (upper and lowercase), numbers, #, -, _ and ..
The following expression does not work: ([^\\w][^#][^-][^_][^\\.]). It replaces some of the letters and not all of the unwanted characters. Why doesn't it work?
String line = in.nextLine();
String newLine = line.replaceAll( "([^\\w][^#][^-][^_][^\\.])", " ");

System.out.println(s.replaceAll("[^a-zA-Z0-9-_#.]",""));

Related

How to surround all Bracket groups with * in a string

I have been trying to get a string replaceAll to work in Java that was originally from a JavaScript code block. I have the following
String regexSearch = "((?!([ \\*]))|^)\\[[A-Za-z0-9\\s]*\\](?!\\*)"; //Java Version must escape special characters again
String regexReplacement = "*$&*";
String inputString = "This is a User, [USER 1], and a second user [USER 2]";
Pattern p = Pattern.compile(regexSearch);
Matcher m = p.matcher(inputString);
System.out.println(m.replaceAll(regexReplacement));
My desired output is
This is a User, *[USER 1]*, and a second user *[USER 2]*
I keep getting illegal group reference errors.
Requirements are as follows. Any text that is surrounded by square brackets "[" and "]" will be surrounded by "*" while still retaining the brackets. However if within the bracketed text there is a "|" character then this will not apply.

Your initial ((?!([ \*]))|^)\[[A-Za-z0-9\s]*\](?!\*) regex attempts (but fails) to match [...] strings when not enclosed with * chars. In Java, you would write it as
(?<!\*)\[[A-Za-z0-9\s]*](?!\*)
String regexSearch = "(?<!\\*)\[[A-Za-z0-9\\s]*](?!\\*)";
However, you may use a more lenient expression like
String regexSearch = "\\[[^\\]\\[|]*]";
Or, if you need to keep the original behavior to fail the matches inside asterisks:
String regexSearch = "(?<!\\*)\\[[^\\]\\[|]*](?!\\*)";
See the regex demo.
It matches:
(?<!\*) - a negative lookbehind that fails the match if there is a * char immediately to the left of the current location
\[ - a [ char
[^\]\[|]* - 0 or more chars other than [, ] and |
] - a ] char
(?!\*) - a negative lookahead that fails the match if there is a * char immediately to the right of the current location.
So, it will match from the [ till the closest ] without matching other [ and | inside, i.e. it will match innermost substrings between square brackets. It will also allow any other special and non-speical chars inside brackets, like hyphens, apostrophes, etc. [A-Za-z0-9\s] only allowed ASCII letters, digits and whitespaces.
Java demo:
String regexSearch = "\\[[^\\]\\[|]*]";
String regexReplacement = "*$0*";
String inputString = "This is a User, [USER 1], and a second user [USER 2] not [USER | 3]";
Pattern p = Pattern.compile(regexSearch);
Matcher m = p.matcher(inputString);
System.out.println(m.replaceAll(regexReplacement));
// => This is a User, *[USER 1]*, and a second user *[USER 2]* not [USER | 3]

You don't need to worry about matching the whole line, the following is sufficient:
\[(.*?)\]
Replacing this with *[$1]*.
Here's a demo on RegExr.
Further explanation: taking each element in the regex in turn:
\[ - we need to escape the opening square bracket because square brackets are a reserved character in regular expressions.
(.*?) - the .*? matches zero or more of any character lazily. This is surrounded in parentheses to indicate it's a capture group.
] - close the square bracket.
We then replace this with an an asterisk followed by an open square bracket *[, the first capture group $1 and then the closing square bracket and another asterisk. ]*.

It can be done as simple as this:
String s = inputString.replaceAll("\\[.*?]", "*$0*")
No capture groups needed.
Result
This is a User, *[USER 1]*, and a second user *[USER 2]*
Explanation
\\[ Match '[', escaped since '[' has special meaning, double-escaped because of Java
.*? Match any text on single line, match as little as possible
] Match ']', no need to escape since it's not in a character class
* Literal '*'
$0 Entire matched text '[XXX]'
* Literal '*'

This should do it.
String.replaceAll -- first argument is a regex.
The second argument is the replacement string. The $1 is capture group.
String regexSearch = "\\[.*?]";
String inputString = "This is a User, [USER 1], and a second user [USER 2]";
inputString = inputString.replaceAll(regexSearch, "*$1*");
System.out.println(inputString);
Prints
This is a User, *[USER 1]*, and a second user *[USER 2]*

Try replace all [ with - *[* and do the same for ] using the string method .replace(oldChar, newChar) in java.

Scanning letters and floats using the java scanner

I have a string which looks like this:
"m 535.71429,742.3622 55.71428,157.14286 c 0,0 165.71429,-117.14286 -55.71428,-157.14286 z"
and i want the java scanner to ouput the following strings: "m", "535.71429", "742.3622", "55.71428", "157.14286", "c", ...
so everything seperated by a comma or a space, but I am having troubles getting it to work.
This is how my code looks like:
Scanner scanner = new Scanner(path_string);
scanner.useDelimiter(",||//s");
String s = scanner.next();
if (s.equals("m")){
s = scanner.next();
point[0] = Float.parseFloat(s);
s = scanner.next();
point[1] = Float.parseFloat(s);
....
but the strings that come out are: "m", " ", "5", "3", ...

I think trouble is with //s. You have to use this pattern:
scanner.useDelimiter("(,|\\s)");
Regex patterns:
abc… Letters
123… Digits
\d Any Digit
\D Any Non-digit character
. Any Character
\. Period
[abc] Only a, b, or c
[^abc] Not a, b, nor c
[a-z] Characters a to z
[0-9] Numbers 0 to 9
\w Any Alphanumeric character
\W Any Non-alphanumeric character
{m} m Repetitions
{m,n} m to n Repetitions
* Zero or more repetitions
+ One or more repetitions
? Optional character
\s Any Whitespace
\S Any Non-whitespace character
^…$ Starts and ends
(…) Capture Group
(a(bc)) Capture Sub-group
(.*) Capture all
(ab|cd) Matches ab or cd
We use dual \ because this is special symbol and | isn't

If you want the output to be strings, the Float.parseFloat(s); is of no use for your problem. Is your array a float-array?
Because if it is, your should not get any output but an NumberFormatException, because the string "m" cannot be parsed into a float.
Furthermore, to solve the problem of the single values, you could use a StringBuilder which constructs your numbers and ignores the letters and commas. A special use of the letters should be implemented.
Finally, if it is not absolutely neccessary, use double instead of float. It's just so much safer and might save your from some more problems within you program!

Why isn't my regex matching uppercase characters and underscores?

I have the following Java code:
public static void main(String[] args) {
String var = "ROOT_CONTEXT_MATCHER";
boolean matches = var.matches("/[A-Z][a-zA-Z0-9_]*/");
System.out.println("The value of 'matches' is: " + matches);
}
This prints: The value of 'matches' is: false
Why doesn't my var match the regex? If I am reading my regex correctly, it matches any String:
Beginning with an upper-case char, A-Z; then
Consisting of zero or more:
Lower-case chars a-z; or
Upper-case chars A-Z; or
Digits 0-9; or
An underscore
The String "ROOT_CONTEXT_MATCHER":
Starts with an A-Z char; and
Consists of 19 subsequent characters that are all uppper-case A-Z or are an underscore
What's going on here?!?

The issue is with the forward slash characters at the beginning and at the end of the regex. They don't have any special meaning here and are treated as literals. Simply remove them to get it fixed:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
If you intended to use metacharacters for boundary matching, the correct characters are ^ for the beginning of the line, and $ for the end of the line:
boolean matches = var.matches("^[A-Z][a-zA-Z0-9_]*$");
although these are not needed here because String#matches would match the entire string.

You need to remove regex delimiers i.e. / from Java regex:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
That can be further shortened to:
boolean matches = var.matches("[A-Z]\\w*");
Since \\w is equivalent of [a-zA-Z0-9_] (word character)

Java Regular Expression: match any number of digits in round brackets if the closing bracket is the last char in the String

I need some help to save my day (or my night). I would like to match:
Any number of digits
Enclosed by round brackets "()" [The brackets contain nothing else than digits]
If the closing bracket ")" is the last character in the String.
Here's the code I have come up with:
// this how the text looks, the part I want to match are the digits in the brackets at the end of it
String text = "Some text 45 Some text, text and text (1234)";
String regex = "[no idea how to express this.....]"; // this is where the regex should be
Pattern regPat = Pattern.compile(regex);
Matcher matcher = regPat.matcher(text);
String matchedText = "";
if (matcher.find()) {
matchedText = matcher.group();
}
Please help me out with the magic expression I have only managed to match any number of digits, but not if they are enclosed in brackets and are at the end of the line...
Thanks!

You can try this regex:
String regex = "\\(\\d+\\)$";

If you need to extract just the digits, you can use this regex:
String regex = "\\((\\d+)\\)$";
and get the value of matcher.group(1). (Explanation: The ( and ) characters preceded by backslashes match the round brackets literally; the ( and ) characters not preceded by
backslashes tell the matcher that the part inside, i.e. just the digits, form a capture group, and the part matching the group can be obtained by matcher.group(1), since this is the first, and only, capture group in the regex.)

This is the required regex for your condition
\\(\\d+\\)$

Java regex space matches $ and * ??

I'm not understanding something about how Java's regex matching for \s works. In the simple class below, \s seems to match [at least] $ and *, which is worrisome. When I don't include \s, the last char of each word gets chopped. And, neither regex seems to catch the ending " in the string. Would somebody please explain what's going on? Or point me to a useful resource? Thanks.
public class SanitizeText {
public static void main(String[] args)
{
String s = "123. ... This is Evil !##$ Wicked %^&* _ Mean ()+<> and ;:' - Nasty. \\ =\"";
String t = "123. ... This is Evil !##$ Wicked %^&* _ Mean ()+<> and ;:' - Nasty. \\ =\"";
s = s.replaceAll(".[^\\w\\s.]", " "); // Does the \s match non-space chars? Sees like at least $ and * are matched.
s = s.replaceAll(" {2,}", " ");
t = t.replaceAll(".[^\\w.]", " "); // Why does this regex chopping the trailing char of each word ??
t = t.replaceAll(" {2,}", " ");
System.out.println ("s: " + s);
System.out.println ("t: " + t);
}
}
// produces:
// s: 123. ... This is Evil $ Wicked * _ Mean and Nasty . "
// t: 123 .. Thi i Evi Wicke Mea an Nast "

\\s does not match non-space chars.
The regex .[^\\w\\s.] will match Any character, followed by a non-word, non-space, non-period character.
It seems to work exactly like that to me.

Answer to Why does this regex chopping the trailing char of each word ??
.[^\\w.] is matching any character (the .) followed by a non word, non dot character and repaces it by a space. So it matches each last letter in a word and the following whitespace.
Answer to Does the \s match non-space chars? Sees like at least $ and * are matched.
No. You are matching a char (.) followed by a non word, non whitespace character. So two characters each time.
.[^\\w\\s.]
will match on
Wicked %^&* _
1. ^^
2. ^^
and the * is not matched, because there is a whitespace following, therefor it is not replaced.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regular expression with letters, numbers and some punctuation - java

System.out.println(s.replaceAll("[^a-zA-Z0-9-_#.]",""));

Related

How to surround all Bracket groups with * in a string

Scanning letters and floats using the java scanner

Why isn't my regex matching uppercase characters and underscores?

Java Regular Expression: match any number of digits in round brackets if the closing bracket is the last char in the String

Java regex space matches $ and * ??

Categories

Resources