Regex for 2 different strings accounting for optional elements

Regex for 2 different strings accounting for optional elements - java

I have two strings "2007 AL PLAIN TEXT 5567 (NS)" and "5567" in the second string, I only want to extract one group out of both the strings which is 5567. How do I write a java regex for this ? The format will be 4 digit year, 2 digit jurisdiction, the string plain text, then the number I want to extract and finally (NS) but the problem is all except the number can be optional, How do I write a regex for this that can capture the number 5567 only in a group ?

You can do it in one line:
String num = input.replaceAll("(.*?)?(\\b\\w{4,}\\b)(\\s*\\(NS\\))?$", "$2");
Assuming your target is "a word at least 4 alphanumeric characters long".

You need to use ? quantifier, which means that the match is optional, '?:' groups a match, but doesn't create a backreference for that group.Here is the code:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Regexp
{
public static void main(String args[])
{
String x = "2007 AL PLAIN TEXT 5567 (NS)";
String y = "5567";
Pattern pattern = Pattern.compile( "(?:.*[^\\d])?(\\d{4,}){1}(?:.*)?");
Matcher matcher = pattern.matcher(x);
while (matcher.find())
{
System.out.format("Text found in x: => \"%s\"\n",
matcher.group(1));
}
matcher = pattern.matcher(y);
while (matcher.find())
{
System.out.format("Text found in y: => \"%s\"\n",
matcher.group(1));
}
}
}
Output:
$ java Regexp
Text found in x: => "5567"
Text found in y: => "5567"

Related

Regex for finding only single alphabets in a string and ignore consecutive double

I have searched a lot but I am unable to find a regex that could select only single alphabets and double them while those alphabets which are already double, should remain untouched.
I tried
String str = "yahoo";
str = str.replaceAll("(\\w)\\1+", "$0$0");
But since this (\\w)\\1+ selects all double elements, my output becomes yahoooo. I tried to add negation to it !(\\w)\\1+ but didn't work and output becomes same as input. I have tried
str.replaceAll(".", "$0$0");
But that doubles every character including which are already doubled.
Please help to write an regex that could replace all single character with double while double character should remain untouched.
Example
abc -> aabbcc
yahoo -> yyaahhoo (o should remain untouched)
opinion -> ooppiinniioonn
aaaaaabc -> aaaaaabbcc

You can match using this regex:
((.)\2+)|(.)
And replace it with:
$1$3$3
RegEx Demo
RegEx Explanation:
((.)\2+): Match a character and capture in group #2 and using \2+ next to it to make sure we match all multiple repeats of captured character. Capture all the repeated characters in group #1
|: OR
(.): Match any character and capture in group #3
Code Demo:
import java.util.List;
class Ideone {
public static void main(String[] args) {
List<String> input = List.of("aaa", "abc", "yahoo",
"opinion", "aaaaaabc");
for (String s: input) {
System.out.println( s + " => " +
s.replaceAll("((.)\\2+)|(.)", "$1$3$3") );
}
}
}
Output:
aaa => aaa
abc => aabbcc
yahoo => yyaahhoo
opinion => ooppiinniioonn
aaaaaabc => aaaaaabbcc

The solution by #anubhava, if viable in Java, is probably the best way to go. For a more brute force approach, we can try a regex iteration approach on the following pattern:
(\\w)\\1+|\\w
This matches, eagerly, a series of similar letters (two or more of them), followed by, that failing, a single letter. For each match, we can no-op on the multi-letter match, and double up any other single letter. Here is a short Java code which does this:
List<String> inputs = Arrays.asList(new String[] {"abc", "yahoo", "opinion", "aaaaaabc"});
String pattern = "(\\w)\\1+|\\w";
Pattern r = Pattern.compile(pattern);
for (String input : inputs) {
Matcher m = r.matcher(input);
StringBuffer buffer = new StringBuffer();
while (m.find()) {
if (m.group().matches("(\\w)\\1+")) {
m.appendReplacement(buffer, m.group());
}
else {
m.appendReplacement(buffer, m.group() + m.group());
}
}
m.appendTail(buffer);
System.out.println(input + " => " + buffer.toString());
}
}
This prints:
abc => aabbcc
yahoo => yyaahhoo
opinion => ooppiinniioonn
aaaaaabc => aaaaaabbcc

I've got two different understandings of the question.
If the goal is to get an even amount of each word character:
Search for (\w)\1? and replace with $1$1 (regex101 demo).
If just solely characters should be duplicated and others left untouched:
Search for (\w)\1?(\1*) and replace with $1$1$2 (regex 101 demo).
Captures a word character \w to $1, optionally matches the same character again. The second variant captures any more of the same character to $2 for attaching in the replacement.
FYI: If using as a Java string remember to escape the pattern. E.g. \1 -> \\1, \w ->\\w, ...

Add all the numbers which have + symbol and replace the same with the added value

I would like to group all the numbers to add if they are supposed to be added.
Test String: '82+18-10.2+3+37=6 + 7
Here 82+18 cab be added and replaced with the value as '100.
Then test string will become: 100-10.2+3+37=6 +7
Again 2+3+37 can be added and replaced in the test string as
follows: 100-10.42=6 +7
Now 6 +7 cannot be done because there is a space after value
'6'.
My idea was to extract the numbers which are supposed to be added like below:
82+18
2+3+37
And then add it and replace the same using the replace() method in string
Tried Regex:
(?=([0-9]{1,}[\\+]{1}[0-9]{1,}))
Sample Input:
82+18-10.2+3+37=6 + 7
Java Code for identifying the groups to be added and replaced:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ReplaceAddition {
static String regex = "(?=([0-9]{1,}[\\+]{1}[0-9]{1,}))";
static String testStr = "82+18-10.2+3+37=6 + 7 ";
public static void main(String[] args) {
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(testStr);
while (matcher.find()) {
System.out.println(matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
}
}
Output:
82+18
2+18
2+3
3+37
Couldn't understand where I'm missing. Help would be appreciated...

I tried simplifying the regexp by removing the positive lookahead operator
(?=...)
And the enclosing parenthesis
(...)
After these changes, the regexp is as follows
static String regex = "[0-9]{1,}[\\+]{1}[0-9]{1,}";
When I run it, I'm getting the following result:
82+18
2+3
This is closer to the expected, but still not perfect, because we're getting "2+3" instead of 2+3+37. In order to handle any number of added numbers instead of just two, the expression can be further tuned up to:
static String regex = "[0-9]{1,}(?:[\\+]{1}[0-9]{1,})+";
What I added here is a non-capturing group
(?:...)
with a plus sign meaning one or more repetition. Now the program produces the output
82+18
2+3+37
as expected.

Another solution is like so:
public static void main(String[] args)
{
final var p = Pattern.compile("(?:\\d+(?:\\+\\d+)+)");
var text = new StringBuilder("82+18-10.2+3+37=6 + 7 ");
var m = p.matcher(text);
while(m.find())
{
var sum = 0;
var split = m.group(0).split("\\+");
for(var str : split)
{
sum += Integer.parseInt(str);
}
text.replace(m.start(0),m.end(0),""+sum);
m.reset();
}
System.out.println(text);
}
The regex (?:\\d+(?:\\+\\d+)+) finds:
(?: Noncapturing
\\d+ Any number of digits, followed by
(?: Noncapturing
\\+ A plus symbol, and
\\d+ Any number of digits
)+ Any number of times
) Once
So, this regex matches an instance of any number of numbers separated by '+'.

Regex matcher not giving expected result. Not matching number properly

I cannot understand why 2nd group is giving me only 0. I expect 3000. And do point me to a resource where I can understand better.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );//?
System.out.println("Found value: " + m.group(3) );
}else {
System.out.println("NO MATCH");
}
}
}

Precise the pattern, add QT before the \d pattern, or use .*? instead of the first .* to get as few chars as possible.
String pattern = "(.*QT)(\\d+)(.*)";
or
String pattern = "(.*?)(\\d+)(.*)";
will do. See a Java demo.
The (.*QT)(\\d+)(.*) will match and capture into Group 1 any 0+ chars other than line break chars, as many as possible, up to the last occurrence of QT (followed with the subsequent subpatterns), then will match and capture 1+ digits into Group 2, and then will match and capture into Group 3 the rest of the line.
The .*? in the alternative pattern will matchand capture into Group 1 any 0+ chars other than line break chars, as few as possible, up to the first chunk of 1 or more digits.
You may also use a simpler pattern like String pattern = "QT(\\d+)"; to get all digits after QT, and the result will be in Group 1 then (you won't have the text before and after the number).

The * quantifier will try to match as many as possible, because it is a greedy quantifier.
You can make it non-greedy (lazy) by changing it to *?
Then, your regex will become :
(.*?)(\d+)(.*)
And you will match 3000 in the 2nd capturing group.
Here is a regex101 demo

get everything after a particular string

I have a String coming as "process_client_123_Tree" and "process_abc_pqr_client_123_Tree". I want to extract everything after "process_client_" and "process_abc_pqr_client_" and store it in a String variable.
Here currentKey variable can contain either of above two strings.
String clientId = // how to use currentKey here so that I can get remaining portion in this variable
What is the right way to do this? Should I just use split here or some regex?

import java.util.regex.*;
class test
{
public static void main(String args[])
{
Pattern pattern=Pattern.compile("^process_(client_|abc_pqr_client_)(.*)$");
Matcher matcher = pattern.matcher("process_client_123_Tree");
while(matcher.find())
System.out.println("String 1 Group 2: "+matcher.group(2));
matcher = pattern.matcher("process_abc_pqr_client_123_Tree");
while(matcher.find())
System.out.println("String 2 Group 2: "+matcher.group(2));
System.out.println("Another way..");
System.out.println("String 1 Group 2: "+"process_client_123_Tree".replace("process_client_", ""));
System.out.println("String 2 Group 2: "+"process_abc_pqr_client_123_Tree".replace("process_abc_pqr_client_", ""));
}
}
Output:
$ java test
String 1 Group 2: 123_Tree
String 2 Group 2: 123_Tree
Another way..
String 1 Group 2: 123_Tree
String 2 Group 2: 123_Tree
Regex breakup:
^ match start of line
process_(client_|abc_pqr_client_) match "process_" followed by "client_" or abc_pqr_client_" (captured as group 1)
(.*)$ . means any char and * means 0 or more times, so it match the rest chars in string until end ($) and captures it as group 2

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Matchit{
public static void main(String []args){
String str = "process_abc_pqr_client_123_Tree";
Pattern p = Pattern.compile("process_abc_pqr_client_(.*)|process_client_(.*)");
Matcher m = p.matcher("process_abc_pqr_client_123_Tree");
if (m.find( )) {
System.out.println("Found value: " + m.group(1) );
}
}
}
Gets you:
123_Tree
The parentheses in the regexp define the match groups. The pipe is a logical or. Dot means any character and star means any number. So, I create a pattern object with that regexp and then use a matcher object to get the part of the string that has been matched.

A regex pattern could be: "process_(?:abc_pqr_)?client_(\\w+)" regex101 demo
(?:abc_pqr_)? is the optional part
(?: opens a non capture group )? zero or one times
\w+ matches one or more word characters [A-Za-z0-9_]
Demo at RegexPlanet. Matches will be in group(1) / first capturing group.
To extend it with limit to the right, match lazily up to the right token
"process_(?:abc_pqr_)?client_(\\w+?)_trace_count"
where \w+? matches as few as possible word characters to meet condition.

Java string replace using regex

I have strings with values "Address Line1", "Address Line2" ... etc.
I want to add a space if there is any numeric value in the string like
"Address Line 1", "Address Line 2".
I can do this using contains and replace like this
String sample = "Address Line1";
if (sample.contains("1")) {
sample = sample.replace("1"," 1");
}
But how can I do this using regex?

sample = sample.replaceAll("\\d+"," $0");

To use regex you will need replaceAll instead of replace method:
as regex you can use
\\d+ to match any group of one or more continues digits. We need all continues digits here because matching only one would create from foo123 something like foo 1 2 3
(?<=[a-zA-Z])\\d if you want to add space only before digit which has alphabetic character before it. (?<=\\[a-zA-Z]) part is look-behind and it just checks if tested digit has character from range a-z or A-Z before it.
and as replacement you can use " $0 which means space and match from group 0 which means part currently matched by regex.
So try with
sample = sample.replaceAll("\\d+", " $0")
or
sample = sample.replaceAll("(?<=[a-zA-Z])\\d", " $0")
which will change "hello 1 world2" into "hello 1 world 2" - notice that only 2 has additional space.

First Create a Pattern Object of what you want to search and compile it in your case Pattern object will be as follows:-
Pattern p=Pattern.compile("1");
Now Create Matcher object for your string
Matcher m=p.matcher(sample);
Now put a condition to check if Matcher has found any your Pattern String and if it has put a replaceAll method to replace it
if(m.find())
{
sample=m.replaceAll(" 1");
}
The Complete code is as follows:-
import java.io.*;
import java.util.regex.*;
class demo
{
public static void main(String args[])
{
String sample = "Address Line1";
Pattern p=Pattern.compile("1");
Matcher m=p.matcher(sample);
if(m.find())
{
sample=m.replaceAll(" 1");
}
System.out.println(sample);
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex for 2 different strings accounting for optional elements - java

You can do it in one line: String num = input.replaceAll("(.?)?(\\b\\w{4,}\\b)(\\s\\(NS\\))?$", "$2"); Assuming your target is "a word at least 4 alphanumeric characters long".

Related

Regex for finding only single alphabets in a string and ignore consecutive double

Add all the numbers which have + symbol and replace the same with the added value

Regex matcher not giving expected result. Not matching number properly

get everything after a particular string

Java string replace using regex

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex for 2 different strings accounting for optional elements - java

You can do it in one line: String num = input.replaceAll("(.*?)?(\\b\\w{4,}\\b)(\\s*\\(NS\\))?$", "$2"); Assuming your target is "a word at least 4 alphanumeric characters long".

Related

Regex for finding only single alphabets in a string and ignore consecutive double

Add all the numbers which have + symbol and replace the same with the added value

Regex matcher not giving expected result. Not matching number properly

get everything after a particular string

Java string replace using regex

Categories

Resources

You can do it in one line: String num = input.replaceAll("(.?)?(\\b\\w{4,}\\b)(\\s\\(NS\\))?$", "$2"); Assuming your target is "a word at least 4 alphanumeric characters long".