Java regex for matching certain parts of JSON substrings - java

I am trying to write a regex that looks for strings with the following pattern:
Begin with an opening bracket { followed by a double-quote "
Then allows for a string of 1+ alphanumeric characters a-zA-Z0-9
Then another double-quote " followed by a colon : and an opening brace [
Then allows for any string of 0+ alphanumeric characters a-zA-Z0-9
So some strings that would match the regex:
{"hello":[blah
{"hello":[
{"1":[
And some strings that would not match:
{hello:[blah
hello":[
{"2:[
So far, the best I've been able to come up with is:
String regex = "{\"[a-zA-Z0-9]+\":\[[a-zA-Z0-9]*";
if(myString.matches(regex))
// do something
But I know I'm way off base. Can any regex gurus help reel me in? Thanks in advance!

String regex = "{\"[a-zA-Z0-9]+\":\[[a-zA-Z0-9]*";
The problem here is that you need an extra backslash before the square bracket. This is because you need the regex to contain \[ in order to match a square bracket, which means the string literal needs to contain \\[ to escape the backslash for the Java code parser. Similarly, you may also need to escape the { in the regex as it is a metacharacter (for bounded repetition counts)
String regex = "\\{\"[a-zA-Z0-9]+\":\\[[a-zA-Z0-9]*";

Related

Regular expression for matching texts before and after string

I would like to match URL strings which can be specified in the following manner.
xxx.yyy.com (For example, the regular expression should match all strings like 4xxx.yyy.com, xxx4.yyy.com, xxx.yyy.com, 4xxx4.yyy.com, 444xxx666.yyy.com, abcxxxdef.yyy.com etc).
I have tried to use
([a-zA-Z0-9]+$)xxx([a-zA-Z0-9]+$).yyy.com
([a-zA-Z0-9]*)xxx([a-zA-Z0-9]*).yyy.com
But they don't work. Please help me write a correct regular expression. Thanks in advance.
Note: I'm trying to do this in Java.
If you want to make sure there is xxx and you want to allow all non whitespace chars before and after. If you want to match the whole string, you could add anchors at the start and end.
Note to escape the dot to match it literally.
^\S*xxx\S*\.yyy\.com$
^ Start of string
\S*xxx\S* Match xxx between optional non whitespace chars
\.yyy Match .yyy
\.com Match .com
$ End of string
Regex demo
In Java double escape the backslash
String regex = "^\\S*xxx\\S*\\.yyy\\.com$";
Or specify the characters on the left and right that you would allow to match in the character class:
^[0-9A-Za-z!##$%^&*()_+]*xxx[0-9A-Za-z!##$%^&*()_+]*\.yyy\.com$
Regex demo

Regex-How to prevent repeated special characters?

I don't have an experience on Regular Expressions. I need to a regular expression which doesn't allow to repeat of special characters (+-*/& etc.)
The string can contain digits, alphanumerics, and special characters.
This should be valid : abc,df
This should be invalid : abc-,df
i will be really appreciated if you can help me ! Thanks for advance.
Two solutions presented so far match a string that is not allowed.
But the tilte is How to prevent..., so I assume that the regex
should match the allowed string. It means that the regex should:
match the whole string if it does not contain 2
consecutive special characters,
not match otherwise.
You can achieve this putting together the following parts:
^ - start of string anchor,
(?!.*[...]{2}) - a negative lookahead for 2 consecutive special
characters (marked here as ...), in any place,
a regex matching the whole (non-empty) string,
$ - end of string anchor.
So the whole regex should be:
^(?!.*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2}).+$
Note that within a char class (between [ and ]) a backslash
escaping the following char should be placed before - (if in
the middle of the sequence), closing square bracket,
a backslash itself and / (regex terminator).
Or if you want to apply the regex to individual words (not the whole
string), then the regex should be:
\b(?!\S*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2})\S+
[\,\+\-\*\/\&]{2,} Add more characters in the square bracket if you want.
Demo https://regex101.com/r/CBrldL/2
Use the following regex to match the invalid string.
[^A-Za-z0-9]{2,}
[^\w!\s]{2,} This would be a shortest version to match any two consecutive special characters (ignoring space)
If you want to consider space, please use [^\w]{2,}

Regex match double quote which does not follow slash character

I have string like this:
"abcd\" efg\" hi" jklm"
I want to get sub string between two first characters ", which is not \"
For example, in the above string, I want to get abcd\" efg\" hi
Currently, I replace \" by another character, then use the regex "([^"]*)" to extract the sub string between two first characters ". Is there any way to use regex directly without replacing \" by another character.
Use this regex:
[^\\]?"(.*?[^\\])"
Explanation:
[^\\]? match an optional single character which is not backslash
"(.*? match a quote followed by anything (non-greedy)
[^\\])" match a quote preceded by anything other than backslash
This regex will match the least content between an opening quote and closing quote which does not have a backslash.
Regex101

Java regular expression value.split("\\."), "the back slash dot" divides by character?

From what I understand, the backslash dot (\.) means one character of any character? So because backslash is an escape, it should be backslash backslash dot ("\\.")
What does this do to a string? I just saw this in an existing code I am working on. From what I understand, it will split the string into individual characters. Why do this instead of String.toCharArray(). So this splits the string to an array of string which contains only one char for each string in the array?
My guess is that you are missing that backslash ('\') characters are escape characters in Java String literals. So when you want to use a '\' escape in a regex written as a Java String you need to escape it; e.g.
Pattern.compile("\."); // Java syntax error
// A regex that matches a (any) character
Pattern.compile(".");
// A regex that matches a literal '.' character
Pattern.compile("\\.");
// A regex that matches a literal '\' followed by one character
Pattern.compile("\\\\.");
The String.split(String separatorRegex) method splits a String into substrings separated by substrings matching the regex. So str.split("\\.") will split str into substrings separated by a single literal '.' character.
The regex "." would match any character as you state. However an escaped dot "\." would match literal dot characters. Thus 192.168.1.1 split on "\." would result in {"192", "168", "1", "1"}.
Your wording isn't completely clear, but I think this is what you're asking.

How to escape a square bracket for Pattern compilation?

I have comma separated list of regular expressions:
.{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]
I have done a split on the comma. Now I'm trying to match this regex against a generated password. The problem is that Pattern.compile does not like square brackets that is not escaped.
Can some please give me a simple function that takes a string like so: [0-9] and returns the escaped string \[0-9\].
For some reason, the above answer didn't work for me. For those like me who come after, here is what I found.
I was expecting a single backslash to escape the bracket, however, you must use two if you have the pattern stored in a string. The first backslash escapes the second one into the string, so that what regex sees is \]. Since regex just sees one backslash, it uses it to escape the square bracket.
\\]
In regex, that will match a single closing square bracket.
If you're trying to match a newline, for example though, you'd only use a single backslash. You're using the string escape pattern to insert a newline character into the string. Regex doesn't see \n - it sees the newline character, and matches that. You need two backslashes because it's not a string escape sequence, it's a regex escape sequence.
You can use Pattern.quote(String).
From the docs:
public static String quote​(String s)
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
You can use the \Q and \E special characters...anything between \Q and \E is automatically escaped.
\Q[0-9]\E
Pattern.compile() likes square brackets just fine. If you take the string
".{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]"
and split it on commas, you end up with five perfectly valid regexes: the first one matches eight non-line-separator characters, the second matches an ASCII digit, and so on. Unless you really want to match strings like ".{8}" and "[0-9]", I don't see why you would need to escape anything.

Categories

Resources