Regex matching a space a digit and 8 characters - java

I want to match a string containing,
a space
any number of digit
a space
1-8 characters - (alphanumeric and special characters)
example,
01 Stack
This is what i tried,
\\s\\d+\\s[^.]{1, 8} - i tried here except for .,

Try this, to catch (and restrict to) the punctuation and alphanumerics: \s\d+\s[\p{Punct}\p{Alnum}]{1,8}; wrap it all in ^...$ if you want the begin/end line anchors.
If "any number of digits" means 1 or more digit, then the pattern above is fine. If it means "zero or more digits", then the \d+ needs to become \d*.
As an aside, the pattern [^.] will match anything that's not a period. It includes a bit too much, I think, and excludes a bit too much. So I'm opting for the more specific pattern [\p{Punct}\p{Alnum}].
See documentation here.

Try \\s\\d+\\s[^.]{1,8}? It looks like the only problem here is a superfluous space.
Also, \\S is for everything except whitespaces. [^ ] is for everything excpet space. . is for everything.

I don't understand the use of [^.]. The character . matches "any character". So you are asking it to match "any character except any character". Instead you should match non-space characters with \\S.

Related

Why adding white space makes my regex wrong?

(^\s*\d+\)(.*) | ) | (^\s*Q\d+\.\s*(.*))
The above regex is not matching Q1. qeqwewqeqeq qerqer
But If I remove white space before and after |
(^\s*\d+\)(.*) | )|(^\s*Q\d+\.\s*(.*))
It matches my string.
What does white space mean? Is it equal to \s? It affects my readability.
Yes, whitespace affects your regex. No, it is not equivalent to \s.
The \s shorthand character class is equivalent to the character class [ \t\r\n\f] - i.e. a character class that will match any whitespace character. So, while your formatting spaces are included in \s, they are not equivalent to it.
As has been said in the comments, literal whitespace is important in regexes. In fact, I believe it's causing an error in your first alternate (the sub-pattern (^\s*\d+\)(.*) | )).
If I'm reading the intent of that sub-pattern right, it's supposed to match text of the form
2) some_text
But it will:
Only match this text if it's followed by a space
Also match a single literal space
A better way to construct this sub-pattern would be (^\s*\d+\)(.*)), disposing of the end space and the alternation altogether. Furthermore, in order to improve readability, we can do this:
(^\s*(?:Q\d+\.|\d+\))\s*(.*))
Which only alternates on the question number format, rather than the whole pattern.
Demo on Regex101
The contents of a regex are 100% applicable to the determination of whether or not an input matches. Your imagination does not change regex processing.
The regex "\dignore this part\d" will not match the input "12" but will match the input "1ignore this part2". No matter how much in imagine the "ignore this part" will be skipped, it is still part of the regular expression.
In your case, the extra spaces are your form of "ignore this part".
Inside a regex pattern, spaces are meaningful atoms that match spaces. If you need to format your pattern with spaces and tabs and newlines - with whitespace that will not be accounted for by the regex engine - you may use the (?x) modifier, or the Pattern.COMMENTS flag.
Then, to match a literal space in such a pattern with (?x) option, you need to escape spaces to match literal spaces. Or, you may consider matching any whitespace with \s:
\s A whitespace character: [ \t\n\x0B\f\r]
Note that in case you add (?U) modifier, Pattern.UNICODE_CHARACTER_CLASS flag, \s will match all Unicode whitespace (like [\p{Zs}\t\r\n]).

regex for allowing only certain special characters and also including the alphanumerical characters

I'm struggling with REGEX and require it for a program.
The input require only alphanumerical keys and also (allow only comma,:,space,/,- in special chars)
I have tried = (^[a-zA-Z0-9,:\S/-]*$)
As far as i understand and please correct me if I'm wrong.
a-zA-Z0-9 - The alphanumerical keys.
,: - Comma and colon
\S - Space
/ - I'm not sure how to represent a forward slash thus i escaped it
- - Dash also not sure if it is needed to escape it.
Would be appreciated if this can be corrected and also a explanation of each part.
Thanks in advance.
You can replace a-zA-Z0-9 with just \\w which is short for [a-zA-Z_0-9]. Furthermore, \\S is any character, but not a whitespace, you should use a \\s instead. You don't need to escape /, and even - if it's the first one or the last one, because if it's placed between two characters it could be interpreted as range and you'll have to escape it. So, you can make your regex like ^([\w,:\s/-]*)$
The \S shorthand matches any character except whitespace, just the opposite of what you want. Lowercase \s matches whitespace [\t\v\n\r\f ]. But if you only want spaces, just put a space in the character class.
a hyphen - needs to be escaped inside characters, unless it's the first or last character in the character class, but you could always escape it just to be sure.
Slashes / don't need to be escaped. They're escaped in other languages where you use them as pattern delimiters. ie: /regex/i.
Besides hyphens and shorthands, only backslashes \\ and closing brackets \] need to be escaped.
Remember in java, you always need to use double backslashes (one is interpreted by java, the other by the regex engine).
Regex
pattern = "^[a-zA-Z0-9 ,:/\\-]*$"
Move the Start of Line ^ and End of Line $ outside the group - like
^([a-zA-Z0-9,:\S/-]*)$
That should do it.

How to replace all non-digit charaters in a string?

I need to replace all non-digit charaters in the string. For instance:
String: 987sdf09870987=-0\\\`42
Replaced: 987**sdf**09870987**=-**0**\\\`**42
That's all non-digit char-sequence wrapped into ** charaters. How can I do that with String::replaceAll()?
(?![0-9]+$).*
the regex doesn't match what I want. How can I do that?
(\\D+)
You can use this and replace by **$1**.See demo.
https://regex101.com/r/fM9lY3/2
You can use a negated character class for a non-digit and use the 0th group back-reference to avoid overhead with capturing groups (it is minimal here, but still is):
String x = "987sdf09870987=-0\\\\\\`42";
x = x.replaceAll("[^0-9]+", "**$0**");
System.out.println(x);
See demo on IDEONE. Output: 987**sdf**09870987**=-**0**\\\`**42.
Also, in Java regex, character classes look neater than multiple escape symbols, that is why I prefer this [^0-9]+ pattern meaning match 1 or more (+) symbols other than (because of ^) digits from 0 to 9 ([0-9]).
A couple of words about your (?![0-9]+$).* regex. It consists of a negative lookahead (?![0-9]+$) that checks if from the current position onward there are no digits only (if there are only digits up to the end of string, the match fails), and .* matching any characters but a newline. You can see example of what it is doing here. I do not think it can help you since you need to actually match non-numbers, not just check if digits are absent.

Require Help for Regular Expression

I am Doing a Check on the JTextfield Values that it Should be XX.YY.Z format
10.01.5
No space at beginning or after allowed.
EDIT:-
How Can I Specify Last as Alphanumeric character i.e. Z can be Number or character
\d matches a digit, and \. matches a dot.
\d\d\.\d\d\.\d
i.e. "\\d\\d\\.\\d\\d\\.\\d".
I don't have much experience in Java but this what I would do in PHP.
^\d\d.\d\d.\d$
\d represents one degit, \d\d represents two degits
^ a caret character is there to ensure that it must start with the number (No spaces at the beginning)
$ a dollar sign ensures that there will be no spaces or other characters at the end.
You could use quantifiers
\d{2}\.\d{2}\.\d
That is the indicated, and your regex becomes more easy to read and to change.
more on Quantifiers

Help with regex

I'm constructing a regex which will accept at least 1 alpha numerical character and any number of spaces.
Right now I've got...[A-Za-z0-9]+[ \t\r\n]* which I understand to be at least 1 alphanumeric OR at least 1 space. How would I fix this?
EDIT: To answer the comments below I want it to accept strings which contain ATLEAST 1 alphanumeric AND any number of (including no) spaces. Right now it will accept JUST a whitespace.
EDIT2: To clarify, I don't want the any number of whitespace (including 0) to be accepted unless there is at least 1 alphanumeric character
\s*\p{Alnum}[\p{Alnum}\s]*
Your regex, [A-Za-z0-9]+[ \t\r\n]*, requires the string to start with a letter or digit (or, more accurately, it doesn't start matching until it sees one). Adding \s* allows the match to start with whitespace, but you still won't match any alphanumerics after the first whitespace character that follows an alphanumeric (for example, it won't match the xyz in abc xyz. Changing the trailing \s* to [\p{Alnum}\s]* fixes that problem.
On a side note, \p{Alnum} is exactly equivalent to [A-Za-z0-9] in Java, which is not the case in all regex flavors. I used \p{Alnum}, not just because it's shorter, but because it gives more protection from typos like [A-z] (which is syntactically valid, but almost certainly not what the author really meant).
EDIT: Performance should be considered, too. I originally included a + after the first \p{Alnum}, but I realized that wasn't a good idea. If this were part of a longer regex, and the regex didn't match right away, it could end up wasting a lot of time trying to match the same groups of characters with \p{Alnum}+ or [\p{Alnum}\s]*. The leading \s* is okay, though, because \s doesn't match any of the characters that \p{Alnum} matches.
Any one or more word char zero or more whitespace
\w+\s*
Hey try this ([^\s]+\s*) [^\s] means catch everything that is not white space, while \s* means that an white space is optional (if you really want at least one white space put + instead of )
Edit: sory mine catch everithing not only alphanumeric (put ([a-zA-Z0-9]+\s) for alphanumeric)
This should do the trick:
\s*\p{Alnum}+\s*
\p{Alnum} is an alphanumeric character: [\p{Alpha}\p{Digit}]
* says "zero or more times"
+ says "at least one" (not "or" as you seem to believe, or is written |)
| means "or"
\s is a whitespace character: [ \t\n\x0B\f\r]
EDIT: To answer the comments below I want it to accept strings which contain AT LEAST 1 alphanumeric AND any number of (including no) spaces.
The pattern I suggested requires at least one alpha numeric character.
EDIT2: To clarify, I don't want the any number of whitespace (including 0) to be accepted unless there is at least 1 alphanumeric character
The pattern I suggested will not accept only white space characters only.

Categories

Resources