Use "[", " ]" symbols in regex in java - java

I wanted to know how to use "[", "]" symbols in regex, I am getting compile time error. Could anyone help me?
I have to split a string which holds "[", "]" symbols in java, but getting compile time error
String strng = "[11 11] 2000";
String ready[] = strng.split("[] ");
Compile time error : Unclosed character class;

You need to escape those characters by using \[\] because in regex [] are used to create a range.
E.g. [A-Z] are all characters from A to Z in uppercase.
Or [ABC] is checking if your string contains one of the three characters A, B or C.
So in Java use this code to split your string:
strng.split("\\[\\] ");

Your regex needs to be structured as a character class containing characters '[', ']', and ' '. Since character class syntax includes square brackets, the brackets inside the character class need to be escaped with slashes. Finally, since escape sequences are used as part of a Java string, you need to escape each backslash, like this:
String ready[] = strng.split("[\\[\\] ]");

String strng = "[11 11] 2000";
String ready[] = strng.split("\\[|\\]");
System.out.println("ready = " + ready[1]);
ready = 11 11
You need to escape the special characters by escape characters.
so [ has to be represented as \[ and a ] has to be represented as \].
In Java, you need to add \\ to represent a \.
So effectively \\[ and \\] are used to avoid the special meaning associated with [] and use them as a normal character.

Related

Regular expression for unicode in java Dash version

It is possible to improve the performance of the following through a regular expression, the code is functional, but I want to know if there is any way to select the possible dash that exist in the unicode to standardize my dash
Words:
48553−FS002
48553-FS002
48553 FS002
48553-FS002-ESD12
Java
String reference = "48553−FS002";
String separador = reference.replaceFirst ( "\\w+(\\W)?\\w+", "$1" );
if(!separator.equals ( " " )) {
reference = reference.replaceAll ( separator, "-" );
}
Or you could search for the unicode code, I was reading the following: dash, but i haven't managed to make it work Java Regex Unicode
If you need to match any non-word but space, you may use
reference = reference.replaceAll("[^\\w ]", "-");
Or, with character class subtraction:
reference = reference.replaceAll("[\\W&&[^ ]]", "-");
You can use the following pattern to match your hyphen or dash like patterns:
[\p{Pd}\u00AD\u2212]
Here,
\p{Pd} - matches any Punctuation, Dash symbols
\u00AD - matches a soft hyphen
\u2212 - matches a minus symbol.
If you know your strings only contain word characters and separators, as seems to be the case, then you can just use
reference = reference.replaceAll("[^ \\w]", "-");

Java Regexp to match words only (', -, space)

What is the Java Regular expression to match all words containing only :
From a to z and A to Z
The ' - Space Characters but they must not be in the beginning or the
end.
Examples
test'test match
test' doesn't match
'test doesn't match
-test doesn't match
test- doesn't match
test-test match
You can use the following pattern: ^(?!-|'|\\s)[a-zA-Z]*(?!-|'|\\s)$
Below are the examples:
String s1 = "abc";
String s2 = " abc";
String s3 = "abc ";
System.out.println(s1.matches("^(?!-|'|\\s)[a-zA-Z]*(?!-|'|\\s)$"));
System.out.println(s2.matches("^(?!-|'|\\s)[a-zA-Z]*(?!-|'|\\s)$"));
System.out.println(s3.matches("^(?!-|'|\\s)[a-zA-Z]*(?!-|'|\\s)$"));
When you mean the whitespace char it is: [a-zA-Z ]
So it checks if your string contains a-z(lowercase) and A-Z(uppercase) chars and the whitespace chars. If not, the test will fail
Here's my solution:
/(\w{2,}(-|'|\s)\w{2,})/g
You can take it for a spin on Regexr.
It is first checking for a word with \w, then any of the three qualifiers with "or" logic using |, and then another word. The brackets {} are making sure the words on either end are at least 2 characters long so contractions like don't aren't captured. You could set that to any value to prevent longer words from being captured or omit them entirely.
Caveat: \w also looks for _ underscores. If you don't want that you could replace it with [a-zA-Z] like so:
/([a-zA-Z]{2,}(-|'|\s)[a-zA-Z]{2,})/g

How to split a string with double quotes " as the delimiter?

I tried splitting like this-
tableData.split("\\"")
but it does not work.
It seems that you tried to escape it same way as you would escape | which is "\\|". But difference between | and " is that
| is metacharacter in regex engine (it represents OR operator)
" is metacharacter in Java language in string literal (it represents start/end of the string)
To escape any String metacharacter (like ") you need to place before it other String metacharacter responsible for escaping which is \1. So to create String which would contain " like this is "quote" you would need to write it as
String s = "this is \"quote\"";
// ^^ ^^ these represent " literal, not end of string
Same idea is applied if we would like to create \ literal (we would need to escape it by placing another \ before it). For instance if we would want to create string representing c:\foo\bar we would need to write it as
String s = "c:\\foo\\bar";
// ^^ ^^ these will represent \ literal
So as you see \ is used to escape metacharacters (make them simple literals).
This character is used in Java language for Strings, but it also is used in regex engine to escape its metacharacters:
\, ^, $, ., |, ?, *, +, (, ), [, {.
If you would like to create regex which will match [ character you will need to use regex \[ but String representing this regex in Java needs to be written as
String leftBracketRegex = "\\[";
// ^^ - Remember what was said earlier?
// To create \ literal in String we need to escape it
So to split on [ we would need to invoke split("\\[") because regex representing [ is \[ which needs to be written as "\\[" in Java.
Since " is not special character in regex but it is special in String we need to escape it only in string literal by writing it as
split("\"");
1) \ is also used to create other characters line separators \n, tab \t. It can also be used to create Unicode characters like \uXXXX where XXXX is index of character in Unicode table in hexadecimal form.
You have escaped the \ by putting in \ twice, try
tableData.split("\"")
Why does this happen?
A backslash escapes the following character. Since the next character is another backslash, the second backslash will be escaped, thus the doublequote won't.
Your resulting escaped string is \", where it should really be just ".
Edit:
Also keep in mind, that String.split() interprets its pattern parameter as a regular expression, which has several special characters, which have to be escaped in the resulting string.
So if you want split by a .(which is a special regex character), you need to specify it as String.split("\\."). The first backslash escapes the escaping function of the second backlash and would result in "\.".
In case of regex characters you could also just use Pattern.quote(); to escape your desired delimiter, but this is far out of the scope the question orignally had.
Try with single backslash \
tableData.split("\"")
Try like this by escaping " with single backslash \ :
tableData.split("\"")
You are not escaping properly. The snippet code will not even compile because of it. The correct way to do it is
tableData.split("\"");
A single backslash will do the trick.
Like this:
tableData.split("\"");
You can actually split without the backward slash. You only have to use single quote
tableData.split('"');

Java regex for matching certain parts of JSON substrings

I am trying to write a regex that looks for strings with the following pattern:
Begin with an opening bracket { followed by a double-quote "
Then allows for a string of 1+ alphanumeric characters a-zA-Z0-9
Then another double-quote " followed by a colon : and an opening brace [
Then allows for any string of 0+ alphanumeric characters a-zA-Z0-9
So some strings that would match the regex:
{"hello":[blah
{"hello":[
{"1":[
And some strings that would not match:
{hello:[blah
hello":[
{"2:[
So far, the best I've been able to come up with is:
String regex = "{\"[a-zA-Z0-9]+\":\[[a-zA-Z0-9]*";
if(myString.matches(regex))
// do something
But I know I'm way off base. Can any regex gurus help reel me in? Thanks in advance!
String regex = "{\"[a-zA-Z0-9]+\":\[[a-zA-Z0-9]*";
The problem here is that you need an extra backslash before the square bracket. This is because you need the regex to contain \[ in order to match a square bracket, which means the string literal needs to contain \\[ to escape the backslash for the Java code parser. Similarly, you may also need to escape the { in the regex as it is a metacharacter (for bounded repetition counts)
String regex = "\\{\"[a-zA-Z0-9]+\":\\[[a-zA-Z0-9]*";

Regular expression to replace (") with (\")?

Why does the following not work:
String test = "hello\"world".replaceAll("\"", "\\\"");
System.out.println(test);
What I'm trying to do is replace any occurrence of " with \".
So I want to get as output:
hello\"world
Regular expressions are overkill for this.
myString.replace("\"", "\\\"")
should do just fine and is more readable to someone familiar with the core libraries.
The replace method just replaces one substring with another.
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
You need to two more \\ to escape the escape character, for a total of 5 \s.
\\ - escape the escape character
\\ - to display the character
\ - to escape the quote.
Try:
String test = "hello\"world".replaceAll("\"", "\\\\\"");
String test = "hello\"world".replaceAll("\"", "\\\\\"");
System.out.println(test);

Categories

Resources