Java regex, replace certain characters except if it matches a pattern - java

I have this string "person","hobby","key" and I want to remove " " for all words except for key so the output will be person,hobby,"key"
String str = "\"person\",\"hobby\",\"key\"";
System.out.println(str+"\n");
str=str.replaceAll("/*regex*/","");
System.out.println(str); //person,hobby,"key"

You may use the following pattern:
\"(?!key\")(.+?)\"
And replace with $1
Details:
\" - Match a double quotation mark character.
(?!key\") - Negative Lookahead (not followed by the word "key" and another double quotation mark).
(.+?) - Match one or more characters (lazy) and capture them in group 1.
\" - Match another double quotation mark character.
Substitution: $1 - back reference to whatever was matched in group 1.
Regex demo.
Here's a full example:
String str = "\"person\",\"hobby\",\"key\"";
String pattern = "\"(?!key\")(.+?)\"";
String result = str.replaceAll(pattern, "$1");
System.out.println(result); // person,hobby,"key"
Try it online.

Related

Regex to find the first word in a string java without using the string name

I am having a string which can have a sentence containing symbols and numbers and also the sentence can have different lengths
For Example
String myString = " () Huawei manufactures phones"
And the next time myString can have the following words
String myString = " * Audi has amazing cars &^"
How can i use regex to get the first word from the string so that the only word i get in the first myString is "Huawei" and the word i get on the second myString is Audi
Below is what i have tried but it fails when there is a space before the first words and symbols
String regexString = myString .replaceAll("\\s.*","")
You may use this regex with a capture group for matching:
^\W*\b(\w+).*
and replace with: $1
RegEx Demo
Java Code:
s = s.replaceAll("^\\W*\\b(\\w+).*", "$1");
RegEx Details:
^: Start
\W*: Match 0 or more non-word characters
\b: Word boundary
(\w+): Match 1+ word characters and capture it in group #1
.*: Match anything aftereards
See how you get on with:
s = s.replaceAll("^[^\\p{Alpha}]*", "");

JAVA Check for multiple strings ending with a regex pattern in a sentence

I have a sentence jafjaklf domain1-12-123.eng.abc.com amkfg,fmgsklfgm domain2-134-135.eng.abc.com. I want to replace the words ending with .eng.abc.com with "".
I used the regex pattern:
\b(.*\.eng\.abc\.com)\b
But it matches " domain1-12-123.eng.abc.com amkfg,fmgsklfgm domain2-134-135.eng.abc.com".
Could anyone help me with the pattern
It seems that the "words" you want to match may contain non-word chars. I suggest matching those parts with a \S, non-whitespace pattern:
\b\S*\.eng\.abc\.com\b
See the regex demo
Details:
\b - a word boundary
\S* - 0+ chars other than whitespace
\.eng\.abc\.com - a literal .eng.abc.com substring
\b - end of word.
Do not forget to double the backslashes in the Java string literal.
Java demo:
String s = "jafjaklf domain1-12-123.eng.abc.com amkfg,fmgsklfgm domain2-134-135.eng.abc.com";
String pat = "\\s*\\b\\S*\\.eng\\.abc\\.com\\b";
String res = s.replaceAll(pat ,"");
System.out.println(res);
// => jafjaklf amkfg,fmgsklfgm
You don't need to use regex, you can use String.endsWith and String.substring:
String str = "domain1-12-123.eng.abc.com amkfg,fmgsklfgm domain2-134-135.eng.abc.com";
if (str.endsWith(".eng.abc.com")) {
str = str.substring(0, str.length() - 12);
}
System.out.println(str); // domain1-12-123.eng.abc.com amkfg,fmgsklfgm domain2-134-135
if(str.matches(".*com.?$") || str.matches(".*abc.?$") || str.matches(".*eng.?$"))

Java regex exact match with question mark and word boundary

In java, I am trying to determine if a user inputted string (meaning I do not know what the input will be) is contained exactly within another string, on word boundaries. So input of the should not be matched in the text there is no match. I am running into issues when there is punctuation in the inputted string however and could use some help.
With no punctuation, this works just fine:
String input = "string contain";
Pattern p = Pattern.compile("\\b" + Pattern.quote(input) + "\\b");
//both should and do match
System.out.println(p.matcher("does this string contain the input").find());
System.out.println(p.matcher("does this string contain? the input").find());
However when the input has a question mark in it, the matching with the word boundary doesn't seem to work:
String input = "string contain?";
Pattern p = Pattern.compile("\\b" + Pattern.quote(input) + "\\b");
//should not match - doesn't
System.out.println(p.matcher("does this string contain the input").find());
//expected match - doesn't
System.out.println(p.matcher("does this string contain? the input").find());
//should not match - doesn't
System.out.println(p.matcher("does this string contain?fail the input").find());
Any help would be appreciated.
There's no word boundary between ? and , because there's no adjacent word character; that's why your pattern doesn't match. You can change it to this:
Pattern.compile("(^|\\W)" + Pattern.quote(input) + "($|\\W)");
That matches begin of input or non-word character - pattern - end of input or non-word character. Or, better, you use a negative lookbehind and a negative lookahead:
Pattern p = Pattern.compile("(?<!\\w)" + Pattern.quote(input) + "(?!\\w)");
This means, before and after your pattern there must not be a word character.
You can use :
Pattern p = Pattern.compile("(\\s|^)" + Pattern.quote(input) + "(\\s|$)");
//---------------------------^^^^^^^----------------------------^^^^^^^
for Strings you will get :
does this string contain the input -> false
does this string contain? the input -> true
does this fail the input string contain? -> true
does this string contain?fail the input -> false
string contain? the input -> true
The idea is, matches the strings that contains your input + space, or end with your input.
You are matching using word boundaries: \b.
Java RegEx implementation deems following characters as word characters:
\w := [a-zA-Z_0-9]
Any non-word characters are simply ones outside the above group
[^\w] := [^a-zA-Z_0-9]
Word boundary is a transition from [a-zA-Z_0-9] to [^a-zA-Z_0-9] and vice-versa.
For input "does this string contain? the input" and literal pattern \\b\\Qstring contain?\\E\\b the last word boundary \\b falls within the input text into a transition from ? to <white space> and therefore is not a valid word to non-word nor non-word to word transition as per above definitions, which means that it is not a word boundary.

how to parse a double-quote delimited string that can contain escaped double quotes

I need to parse the line from the stream that would look like this: command "string1" "string2" string can contain spaces and escaped double-quotes. I need to split it so that I get command, string1 and string2 as array elements. I think split() with regex matching " but not \" ( .split("(?<!\\\\)\"") ) would do the job, but I hear that that is not a good idea.
Is there any better way of doing this in Java?
Something like that should do the trick, assuming you want to remove the external double quotes when applicable (if you don't, it's just a matter of changing the first capturing group to also include the quotes):
public class Demo {
private static final Pattern WORD =
Pattern.compile("\"((?:[^\\\\\"]|\\\\.)*)\"|([^\\s\"]+)");
public static void main(String[] args) {
String cmd =
"command " +
"\"string with blanks\" " +
"\"anotherStringBetweenQuotes\" " +
"\"a string with \\\"escaped\\\" quotes\" " +
"stringWithoutBlanks";
Matcher matcher = WORD.matcher(cmd);
while (matcher.find()) {
String capturedGroup = matcher.group(1) != null ? matcher.group(1) : matcher.group(2);
System.out.println("Matched: " + capturedGroup);
}
}
}
Output:
Matched: command
Matched: string with blanks
Matched: anotherStringBetweenQuotes
Matched: a string with \"escaped\" quotes
Matched: stringWithoutBlanks
The regex is a bit complicated, so it well deserves a bit of explanation:
[^\\\\\"] matches everything but a backslash or double quotes
\\\\. matches a backslash followed by any character (including double quotes), namely escaped characters
(?:[^\\\\\"]|\\\\.)* matches any sequence of escaped or non-escaped characters, but without capturing the group (because of the (?:))
"\"((?:[^\\\\\"]|\\\\.)*)\" matches any such sequence wrapped into double quotes and captures the inside of the quotes
([^\\s\"]+) matches any non-empty sequence of non-blank characters, and captures it in a group

What is wrong in regexp in Java

I want to get the word text2, but it returns null. Could you please correct it ?
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR\\w+&&(\\w+)'\\)\\)");
Matcher matcher = patter1.matcher(str);
String result = null;
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
One way to do it is to match all possible pattern in parentheses:
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR[(]{2}&&\\w+\\s*'&&(\\w+)'[)]{2}");
Matcher matcher = patter1.matcher(str);
String result = "";
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
See IDEONE demo
You can also use [^()]* inside the parentheses to just get to the value inside single apostrophes:
Pattern patter1 = Pattern.compile("SETVAR[(]{2}[^()]*'&&(\\w+)'[)]{2}");
^^^^^^
See another demo
Let me break down the regex for you:
SETVAR - match SETVAR literally, then...
[(]{2} - match 2 ( literally, then...
[^()]* - match 0 or more characters other than ( or ) up to...
'&& - match a single apostrophe and two & symbols, then...
(\\w+) - match and capture into Group 1 one or more word characters
'[)]{2} - match a single apostrophe and then 2 ) symbols literally.
Your regex doesn't match your string, because you didn't specify the opened parenthesis also \\w+ will match any combinations of word character and it won't match space and &.
Instead you can use a negated character class [^']+ which will match any combinations of characters with length 1 or more except one quotation :
String str = "Text SETVAR((&&text1 '&&text2'))";
"SETVAR\\(\\([^']+'&&(\\w+)'\\)\\)"
Debuggex Demo

Categories

Resources