Java string replace using regex - java

I have strings with values "Address Line1", "Address Line2" ... etc.
I want to add a space if there is any numeric value in the string like
"Address Line 1", "Address Line 2".
I can do this using contains and replace like this
String sample = "Address Line1";
if (sample.contains("1")) {
sample = sample.replace("1"," 1");
}
But how can I do this using regex?

sample = sample.replaceAll("\\d+"," $0");

To use regex you will need replaceAll instead of replace method:
as regex you can use
\\d+ to match any group of one or more continues digits. We need all continues digits here because matching only one would create from foo123 something like foo 1 2 3
(?<=[a-zA-Z])\\d if you want to add space only before digit which has alphabetic character before it. (?<=\\[a-zA-Z]) part is look-behind and it just checks if tested digit has character from range a-z or A-Z before it.
and as replacement you can use " $0 which means space and match from group 0 which means part currently matched by regex.
So try with
sample = sample.replaceAll("\\d+", " $0")
or
sample = sample.replaceAll("(?<=[a-zA-Z])\\d", " $0")
which will change "hello 1 world2" into "hello 1 world 2" - notice that only 2 has additional space.

First Create a Pattern Object of what you want to search and compile it in your case Pattern object will be as follows:-
Pattern p=Pattern.compile("1");
Now Create Matcher object for your string
Matcher m=p.matcher(sample);
Now put a condition to check if Matcher has found any your Pattern String and if it has put a replaceAll method to replace it
if(m.find())
{
sample=m.replaceAll(" 1");
}
The Complete code is as follows:-
import java.io.*;
import java.util.regex.*;
class demo
{
public static void main(String args[])
{
String sample = "Address Line1";
Pattern p=Pattern.compile("1");
Matcher m=p.matcher(sample);
if(m.find())
{
sample=m.replaceAll(" 1");
}
System.out.println(sample);
}
}

Related

Regex to identify JSON in a String to split using : separator

I have an input String which is being split using : (colon) as separator. This String can contain all string values OR possibly a json object itself as part of this incoming string.
For example:
Case#1: 123:HARRY_POTTER:ENGLAND:MALE
Case#2: 123:HARRY_POTTER:[{"key":"City", "value":"LONDON"}]:MALE
There is code in place that uses str.split(":") which is handling the case#1, but for case#2 since the json part of the string contains : (which are to be ignored while splitting), the program breaks.
I need a regex that could (1) identify the json in string and (2) a regex that would not split if : is preceded and followed by " (":") as it appears in JSON string.
So if the string is identifed to contain json i can use str.split(<regex-to-split-string-with-json>)
I arrived at these regex to match for a " preceding to : none of which are working unfortunately:
Negative Look Behind: (?<!\"): and (?<!\")[:]
Positive Look Behind: (?<=\"): and (?<=\")[:]
Please suggest!
Please try regex (?<=[a-zA-Z0-9\\]]): its working as expected for your both the cases:
import java.util.Arrays;
public class Solution {
public static void main(String[] args) {
String str = "123:HARRY_POTTER:[{\"key\":\"City\", \"value\":\"LONDON\"}]:MALE";
String regex = "(?<=[a-zA-Z0-9\\]]):";
String arr[] = str.split(regex);
System.out.println("Length: " + arr.length);
System.out.println(Arrays.toString(arr));
}
}
Output:
Length: 4
[123, HARRY_POTTER, [{"key":"City", "value":"LONDON"}], MALE]
For the example data, perhaps it would be enough to either match from [ till ] or match any char except :
But as this data comes in mixed, it is not easy to determine what is actual valid json. You would still have to validate that afterwards.
Note that this is a brittle solution, and does not take any nesting of the square brackets into account.
\[[^]\[\r\n]+]|[^\r\n:]+
\[[^]\[\r\n]+] Match from [...]
| Or
[^\r\n:]+ Match 1+ times any char except : or a newline
See a regex and Java demo.
Example
String regex = "\\[[^]\\[\\r\\n]+]|[^\\r\\n:]+";
String string = "123:HARRY_POTTER:[{\"key\":\"City\", \"value\":\"LONDON\"}]:MALE";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
123
HARRY_POTTER
[{"key":"City", "value":"LONDON"}]
MALE

How to parse string using regex

I'm pretty new to java, trying to find a way to do this better. Potentially using a regex.
String text = test.get(i).toString()
// text looks like this in string form:
// EnumOption[enumId=test,id=machine]
String checker = text.replace("[","").replace("]","").split(",")[1].split("=")[1];
// checker becomes machine
My goal is to parse that text string and just return back machine. Which is what I did in the code above.
But that looks ugly. I was wondering what kinda regex can be used here to make this a little better? Or maybe another suggestion?
Use a regex' lookbehind:
(?<=\bid=)[^],]*
See Regex101.
(?<= ) // Start matching only after what matches inside
\bid= // Match "\bid=" (= word boundary then "id="),
[^],]* // Match and keep the longest sequence without any ']' or ','
In Java, use it like this:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=\\bid=)[^],]*");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}
This results in
machine
Assuming you’re using the Polarion ALM API, you should use the EnumOption’s getId method instead of deparsing and re-parsing the value via a string:
String id = test.get(i).getId();
Using the replace and split functions don't take the structure of the data into account.
If you want to use a regex, you can just use a capturing group without any lookarounds, where enum can be any value except a ] and comma, and id can be any value except ].
The value of id will be in capture group 1.
\bEnumOption\[enumId=[^=,\]]+,id=([^\]]+)\]
Explanation
\bEnumOption Match EnumOption preceded by a word boundary
\[enumId= Match [enumId=
[^=,\]]+, Match 1+ times any char except = , and ]
id= Match literally
( Capture group 1
[^\]]+ Match 1+ times any char except ]
)\]
Regex demo | Java demo
Pattern pattern = Pattern.compile("\\bEnumOption\\[enumId=[^=,\\]]+,id=([^\\]]+)\\]");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
machine
If there can be more comma separated values, you could also only match id making use of negated character classes [^][]* before and after matching id to stay inside the square bracket boundaries.
\bEnumOption\[[^][]*\bid=([^,\]]+)[^][]*\]
In Java
String regex = "\\bEnumOption\\[[^][]*\\bid=([^,\\]]+)[^][]*\\]";
Regex demo
A regex can of course be used, but sometimes is less performant, less readable and more bug-prone.
I would advise you not use any regex that you did not come up with yourself, or at least understand completely.
PS: I think your solution is actually quite readable.
Here's another non-regex version:
String text = "EnumOption[enumId=test,id=machine]";
text = text.substring(text.lastIndexOf('=') + 1);
text = text.substring(0, text.length() - 1);
Not doing you a favor, but the downvote hurt, so here you go:
String input = "EnumOption[enumId=test,id=machine]";
Matcher matcher = Pattern.compile("EnumOption\\[enumId=(.+),id=(.+)\\]").matcher(input);
if(!matcher.matches()) {
throw new RuntimeException("unexpected input: " + input);
}
System.out.println("enumId: " + matcher.group(1));
System.out.println("id: " + matcher.group(2));

get everything after a particular string

I have a String coming as "process_client_123_Tree" and "process_abc_pqr_client_123_Tree". I want to extract everything after "process_client_" and "process_abc_pqr_client_" and store it in a String variable.
Here currentKey variable can contain either of above two strings.
String clientId = // how to use currentKey here so that I can get remaining portion in this variable
What is the right way to do this? Should I just use split here or some regex?
import java.util.regex.*;
class test
{
public static void main(String args[])
{
Pattern pattern=Pattern.compile("^process_(client_|abc_pqr_client_)(.*)$");
Matcher matcher = pattern.matcher("process_client_123_Tree");
while(matcher.find())
System.out.println("String 1 Group 2: "+matcher.group(2));
matcher = pattern.matcher("process_abc_pqr_client_123_Tree");
while(matcher.find())
System.out.println("String 2 Group 2: "+matcher.group(2));
System.out.println("Another way..");
System.out.println("String 1 Group 2: "+"process_client_123_Tree".replace("process_client_", ""));
System.out.println("String 2 Group 2: "+"process_abc_pqr_client_123_Tree".replace("process_abc_pqr_client_", ""));
}
}
Output:
$ java test
String 1 Group 2: 123_Tree
String 2 Group 2: 123_Tree
Another way..
String 1 Group 2: 123_Tree
String 2 Group 2: 123_Tree
Regex breakup:
^ match start of line
process_(client_|abc_pqr_client_) match "process_" followed by "client_" or abc_pqr_client_" (captured as group 1)
(.*)$ . means any char and * means 0 or more times, so it match the rest chars in string until end ($) and captures it as group 2
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Matchit{
public static void main(String []args){
String str = "process_abc_pqr_client_123_Tree";
Pattern p = Pattern.compile("process_abc_pqr_client_(.*)|process_client_(.*)");
Matcher m = p.matcher("process_abc_pqr_client_123_Tree");
if (m.find( )) {
System.out.println("Found value: " + m.group(1) );
}
}
}
Gets you:
123_Tree
The parentheses in the regexp define the match groups. The pipe is a logical or. Dot means any character and star means any number. So, I create a pattern object with that regexp and then use a matcher object to get the part of the string that has been matched.
A regex pattern could be: "process_(?:abc_pqr_)?client_(\\w+)" regex101 demo
(?:abc_pqr_)? is the optional part
(?: opens a non capture group )? zero or one times
\w+ matches one or more word characters [A-Za-z0-9_]
Demo at RegexPlanet. Matches will be in group(1) / first capturing group.
To extend it with limit to the right, match lazily up to the right token
"process_(?:abc_pqr_)?client_(\\w+?)_trace_count"
where \w+? matches as few as possible word characters to meet condition.

Need help in Regex to exclude splitting string within "

I need to split a String based on comma as seperator, but if the part of string is enclosed with " the splitting has to stop for that portion from starting of " to ending of it even it contains commas in between.
Can anyone please help me to solve this using regex with look around.
Resurrecting this question because it had a simple regex solution that wasn't mentioned. This situation sounds very similar to ["regex-match a pattern unless..."][4]
\"[^\"]*\"|(,)
The left side of the alternation matches complete double-quoted strings. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right ones because they were not matched by the expression on the left.
Here is working code (see online demo):
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) {
String subject = "\"Messages,Hello\",World,Hobbies,Java\",Programming\"";
Pattern regex = Pattern.compile("\"[^\"]*\"|(,)");
Matcher m = regex.matcher(subject);
StringBuffer b = new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits)
System.out.println(split);
} // end main
} // end Program
Reference
How to match pattern except in situations s1, s2, s3
Please try this:
(?<!\G\s*"[^"]*),
If you put this regex in your program, it should be:
String regex = "(?<!\\G\\s*\"[^\"]*),";
But 2 things are not clear:
Does the " only start near the ,, or it can start in the middle of content, such as AAA, BB"CC,DD" ? The regex above only deal with start neer , .
If the content has " itself, how to escape? use "" or \"? The regex above does not deal any escaped " format.

Regex for 2 different strings accounting for optional elements

I have two strings "2007 AL PLAIN TEXT 5567 (NS)" and "5567" in the second string, I only want to extract one group out of both the strings which is 5567. How do I write a java regex for this ? The format will be 4 digit year, 2 digit jurisdiction, the string plain text, then the number I want to extract and finally (NS) but the problem is all except the number can be optional, How do I write a regex for this that can capture the number 5567 only in a group ?
You can do it in one line:
String num = input.replaceAll("(.*?)?(\\b\\w{4,}\\b)(\\s*\\(NS\\))?$", "$2");
Assuming your target is "a word at least 4 alphanumeric characters long".
You need to use ? quantifier, which means that the match is optional, '?:' groups a match, but doesn't create a backreference for that group.Here is the code:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Regexp
{
public static void main(String args[])
{
String x = "2007 AL PLAIN TEXT 5567 (NS)";
String y = "5567";
Pattern pattern = Pattern.compile( "(?:.*[^\\d])?(\\d{4,}){1}(?:.*)?");
Matcher matcher = pattern.matcher(x);
while (matcher.find())
{
System.out.format("Text found in x: => \"%s\"\n",
matcher.group(1));
}
matcher = pattern.matcher(y);
while (matcher.find())
{
System.out.format("Text found in y: => \"%s\"\n",
matcher.group(1));
}
}
}
Output:
$ java Regexp
Text found in x: => "5567"
Text found in y: => "5567"

Categories

Resources