Regex to add digit between delimiter characters if missing

Regex to add digit between delimiter characters if missing - java

I didn't use regex a lot and I need a little bit of help. I have a situation where I have digits which are separated with dot char, something like this:
0.0.1
1.1.12.1
20.3.4.00.1
Now I would like to ensure that each number between . has two digits:
00.00.01
01.01.12.01
20.03.04.00.01
How can I accomplish that? Thank you for your help.

You can use String.split() to accomplish this:
public static void main(String[] args) {
String[] splitString = "20.3.4.00.1".split("\\.");
String output = "";
for(String a : splitString)
{
if(a.length() < 2)
{
a = "0" + a;
}
output += a + ".";
}
output = output.substring(0, output.length() - 1);
System.out.println(output);
}

use this pattern
\b(?=\d(?:\.|$))
and replace with 0
Demo
\b # <word boundary>
(?= # Look-Ahead
\d # <digit 0-9>
(?: # Non Capturing Group
\. # "."
| # OR
$ # End of string/line
) # End of Non Capturing Group
) # End of Look-Ahead

You can iterate over the matching groups retrieved from matching the following expression: /([^.]+)/g.
Example:
public class StackOverFlow {
public static String text;
public static String pattern;
static {
text = "20.3.4.00.1";
pattern = "([^.]+)";
}
public static String appendLeadingZero(String text) {
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(text);
StringBuilder sb = new StringBuilder();
while (m.find()) {
String firstMatchingGroup = m.group(1);
if (firstMatchingGroup.length() < 2) {
sb.append("0" + firstMatchingGroup);
} else {
sb.append(firstMatchingGroup);
}
sb.append(".");
}
return sb.substring(0, sb.length() - 1);
}
public static void main(String[] args) {
System.out.println(appendLeadingZero(text));
}
}

I am going with the assumption that you want to ensure every integer is at least two digits, both between . and on the ends. This is what I came up with
public String ensureTwoDigits(String original){
return original.replaceAll("(?<!\\d)(\\d)(?!\\d)","0$1");
}
Test case
public static void main(String[] args) {
Foo f = new Foo();
List<String> values = Arrays.asList("1",
"1.1",
"01.1",
"01.01.1.1",
"01.2.01",
"01.01.01");
values.forEach(s -> System.out.println(s + " -> " + f.ensureTwoDigits(s)));
}
Test output
1 -> 01
1.1 -> 01.01
01.1 -> 01.01
01.01.1.1 -> 01.01.01.01
01.2.01 -> 01.02.01
01.01.01 -> 01.01.01
The regex (?<!\\d)(\\d)(?!\\d) uses both negative lookbehind and negative lookahead to check if a single digit has other digits around it. Otherwise, it will put a zero in front of every single digit. The replacement string "0$1" says put a 0 in front of the first capturing group. There really is only one, that being (\\d) -- the single digit occurrance.
EDIT: I should note that I realize this is not a strict match to the original requirements. It won't matter what you use between single digits -- letters, various punctuation, et. al., will all return just fine with zero in front of any single digit. If you want it to fail or skip strings that may contain characters other than digits and ., the regex would need to be changed.

you can use this simple regex:
\\b\\d\\b
and replace with 0$0

Related

Add all the numbers which have + symbol and replace the same with the added value

I would like to group all the numbers to add if they are supposed to be added.
Test String: '82+18-10.2+3+37=6 + 7
Here 82+18 cab be added and replaced with the value as '100.
Then test string will become: 100-10.2+3+37=6 +7
Again 2+3+37 can be added and replaced in the test string as
follows: 100-10.42=6 +7
Now 6 +7 cannot be done because there is a space after value
'6'.
My idea was to extract the numbers which are supposed to be added like below:
82+18
2+3+37
And then add it and replace the same using the replace() method in string
Tried Regex:
(?=([0-9]{1,}[\\+]{1}[0-9]{1,}))
Sample Input:
82+18-10.2+3+37=6 + 7
Java Code for identifying the groups to be added and replaced:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ReplaceAddition {
static String regex = "(?=([0-9]{1,}[\\+]{1}[0-9]{1,}))";
static String testStr = "82+18-10.2+3+37=6 + 7 ";
public static void main(String[] args) {
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(testStr);
while (matcher.find()) {
System.out.println(matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
}
}
Output:
82+18
2+18
2+3
3+37
Couldn't understand where I'm missing. Help would be appreciated...

I tried simplifying the regexp by removing the positive lookahead operator
(?=...)
And the enclosing parenthesis
(...)
After these changes, the regexp is as follows
static String regex = "[0-9]{1,}[\\+]{1}[0-9]{1,}";
When I run it, I'm getting the following result:
82+18
2+3
This is closer to the expected, but still not perfect, because we're getting "2+3" instead of 2+3+37. In order to handle any number of added numbers instead of just two, the expression can be further tuned up to:
static String regex = "[0-9]{1,}(?:[\\+]{1}[0-9]{1,})+";
What I added here is a non-capturing group
(?:...)
with a plus sign meaning one or more repetition. Now the program produces the output
82+18
2+3+37
as expected.

Another solution is like so:
public static void main(String[] args)
{
final var p = Pattern.compile("(?:\\d+(?:\\+\\d+)+)");
var text = new StringBuilder("82+18-10.2+3+37=6 + 7 ");
var m = p.matcher(text);
while(m.find())
{
var sum = 0;
var split = m.group(0).split("\\+");
for(var str : split)
{
sum += Integer.parseInt(str);
}
text.replace(m.start(0),m.end(0),""+sum);
m.reset();
}
System.out.println(text);
}
The regex (?:\\d+(?:\\+\\d+)+) finds:
(?: Noncapturing
\\d+ Any number of digits, followed by
(?: Noncapturing
\\+ A plus symbol, and
\\d+ Any number of digits
)+ Any number of times
) Once
So, this regex matches an instance of any number of numbers separated by '+'.

Regex to allow only one punctuation character in Java string

I need to parse raw data and allow strings that can contain alphabets and ONLY one punctuation character.
Here is what I have done so far:
public class ProcessRawData {
public static void main(String[] args) {
String myData = "Australia India# America#!";
ProcessRawData data = new ProcessRawData();
data.process(myData);
}
public void process(String rawData) {
String[] splitData = rawData.split(" ");
for (String s : splitData) {
System.out.println("My Data Elements: " + s);
Pattern pattern = Pattern.compile("^[\\p{Alpha}\\p{Punct}]*$");
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println("Allowed");
} else {
System.out.println("Not allowed");
}
}
}
}
It prints below,
My Data Elements: Australia
Allowed
My Data Elements: India#
Allowed
My Data Elements: America#!
Allowed
Expected is it should NOT print America#! as it contains more than one punctuation character.
I guess I might need to use quantifiers, but not sure where to place them so that it will allow ONLY one punctuation character?
Can someone help?

You should compile your Pattern outside the loop.
When using matches(), there's no need for ^ and $, since it'll match against the entire string anyway.
If you need at most one punctuation character, you need to match a single optional punctuation character, preceded and/or followed by optional alphabet characters.
Note that using \\p{Alpha} and \\p{Punct} excludes digits. No digit will be allowed. If you want to consider a digit as a special character, replace \\p{Punct} with \\P{Alpha} (uppercase P means not Alpha).
public static void main(String[] args) {
process("Australia India# Amer$ca America#! America1");
}
public static void process(String rawData) {
Pattern pattern = Pattern.compile("\\p{Alpha}*\\p{Punct}?\\p{Alpha}*");
for (String s : rawData.split(" ")) {
System.out.println("My Data Elements: " + s);
if (pattern.matcher(s).matches()) {
System.out.println("Allowed");
} else {
System.out.println("Not allowed");
}
}
}
Output
My Data Elements: Australia
Allowed
My Data Elements: India#
Allowed
My Data Elements: Amer$ca
Allowed
My Data Elements: America#!
Not allowed
My Data Elements: America1
Not allowed

You may use
^\\p{Alpha}*(?:\\p{Punct}\\p{Alpha}*)?$
Explanation:
^ - start of string
\\p{Alpha}* - zero or more letters
(?:\\p{Punct}\\p{Alpha}*)? - one or zero (due to the ? quantifier) sequences of:
\\p{Punct} - a single occurrence of a punctuation symbol
\\p{Alpha}* - zero or more letters
$ - end of string.
Using it with String#matches will allow dropping the ^ and $ anchors since the pattern will then be anchored by default:
if (input.matches("\\p{Alpha}*(?:\\p{Punct}\\p{Alpha}*)?")) { ... }

You can do it with a simple negative look-ahead:
((?!\\p{Punct}{2}).)*
So your code becomes simply:
public void process(String rawData) {
if (input.matches("((?!\\p{Punct}{2}).)*"))
System.out.println("Allowed");
} else {
System.out.println("Not allowed");
}
}
The regex just asserts that each character is not a {Punct} followed by another {Punct}.

I hope that would be helpful.
public static void process(String rawData) {
String[] splitData = rawData.split(" ");
for (String s : splitData) {
Pattern pNum = Pattern.compile("[0-9]");
Matcher match = pNum.matcher(s);
if (match.find()) {
System.out.println(s + ": Not Allowed");
continue;
}
Pattern p = Pattern.compile("[^a-z]", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(s);
int count = 0;
while (m.find()) {
count = count + 1;
}
if (count > 1) {
System.out.println(s + ": Not Allowed");
} else {
System.out.println(s + ": Allowed");
}
}
}
Output
Australia: Allowed
India#: Allowed
America#!: Not Allowed
America1: Not Allowed

Alright! edit again
You can use following regex
^[A-Za-z]*[!"\#$%&'()*+,\-.\/:;<=>?#\[\\\]^_`{|}~]?[A-Za-z]*$
Regex
This will work for only one punctuation residing at any place.

Splitting a nested string keeping quotation marks

I am working on a project in Java that requires having nested strings.
For an input string that in plain text looks like this:
This is "a string" and this is "a \"nested\" string"
The result must be the following:
[0] This
[1] is
[2] "a string"
[3] and
[4] this
[5] is
[6] "a \"nested\" string"
Note that I want the \" sequences to be kept.
I have the following method:
public static String[] splitKeepingQuotationMarks(String s);
and I need to create an array of strings out of the given s parameter by the given rules, without using the Java Collection Framework or its derivatives.
I am unsure about how to solve this problem.
Can a regex expression be made that would get this solved?
UPDATE based on questions from comments:
each unescaped " has its closing unescaped " (they are balanced)
each escaping character \ also must be escaped if we want to create literal representing it (to create text representing \ we need to write it as \\).

You can use the following regex:
"[^"\\]*(?:\\.[^"\\]*)*"|\S+
See the regex demo
Java demo:
String str = "This is \"a string\" and this is \"a \\\"nested\\\" string\"";
Pattern ptrn = Pattern.compile("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|\\S+");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Explanation:
"[^"\\]*(?:\\.[^"\\]*)*" - a double quote that is followed with any 0+ characters other than a " and \ ([^"\\]) followed with 0+ sequences of any escaped sequence (\\.) followed with any 0+ characters other than a " and \
| - or...
\S+ - 1 or more non-whitespace characters
NOTE
#Pshemo's suggestion - "\"(?:\\\\.|[^\"])*\"|\\S+" (or "\"(?:\\\\.|[^\"\\\\])*\"|\\S+" would be more correct) - is the same expression, but much less efficient since it is using an alternation group quantified with *. This construct involves much more backtracking as the regex engine has to test each position, and there are 2 probabilities for each position. My unroll-the-loop based version will match chunks of text at once, and is thus much faster and reliable.
UPDATE
Since String[] type is required as output, you need to do it in 2 steps: count the matches, create the array, and then re-run the matcher again:
int cnt = 0;
String str = "This is \"a string\" and this is \"a \\\"nested\\\" string\"";
Pattern ptrn = Pattern.compile("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|\\S+");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
cnt++;
}
System.out.println(cnt);
String[] result = new String[cnt];
matcher.reset();
int idx = 0;
while (matcher.find()) {
result[idx] = matcher.group(0);
idx++;
}
System.out.println(Arrays.toString(result));
See another IDEONE demo

Another regex approach that works uses a negative lookbehind: "words" (\w+) OR "quote followed by anything up to the next quote that ISN'T preceded by a backslash", and set your match to "global" (don't return on first match)
(\w+|".*?(?<!\\)")
see it here.

An alternative method that does not use a regex:
import java.util.ArrayList;
import java.util.Arrays;
public class SplitKeepingQuotationMarks {
public static void main(String[] args) {
String pattern = "This is \"a string\" and this is \"a \\\"nested\\\" string\"";
System.out.println(Arrays.toString(splitKeepingQuotationMarks(pattern)));
}
public static String[] splitKeepingQuotationMarks(String s) {
ArrayList<String> results = new ArrayList<>();
StringBuilder last = new StringBuilder();
boolean inString = false;
boolean wasBackSlash = false;
for (char c : s.toCharArray()) {
if (Character.isSpaceChar(c) && !inString) {
if (last.length() > 0) {
results.add(last.toString());
last.setLength(0); // Clears the s.b.
}
} else if (c == '"') {
last.append(c);
if (!wasBackSlash)
inString = !inString;
} else if (c == '\\') {
wasBackSlash = true;
last.append(c);
} else
last.append(c);
}
results.add(last.toString());
return results.toArray(new String[results.size()]);
}
}
Output:
[This, is, "a string", and, this, is, "a \"nested\" string"]

Java Regex replacing every digit in beginning

How can I replace with regex each digit at the beginning of word with the underscore character, as well as in the rest part of the word to replace all characters except letters, digits, dashes and dots to underscores?
I tried this regex:
^(\d+)|[^\w-.]
However, it replaces all digits in the beginning with a single underscore character.
So, 34567fgf-kl.)*/676hh is converted to _fgf-kl.___676hh while I need every digit in the beginning to be replaced with one underscore character like _____fgf-kl.___676hh.
Is it possible to achieve using a regex?

You can do it like this with Matcher.appendReplacement used with Matcher.find:
String fileText = "34567fgf-kl.)*/676hh";
String pattern = "^\\d+|[^\\w.-]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(fileText);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, repeat("_", m.group(0).length()));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb);
And the repeat is
public static String repeat(String s, int n) {
if(s == null) {
return null;
}
final StringBuilder sb = new StringBuilder(s.length() * n);
for(int i = 0; i < n; i++) {
sb.append(s);
}
return sb.toString();
}
See IDEONE demo
Also, repeat can be replaced with String repeated = StringUtils.repeat("_", m.group(0).length()); using Commons Lang StringUtils.repeat().

You can use a negative-lookbehind to individually match each leading digit, i.e. any digit that doesn't have a non-digit before it.
(?<!\D.{0,999})\d|[^\w-.]
Due to constraints in lookbehind, it cannot be unlimited. The above code can handle at most 999 leading digits.

You can also use replaceAll() with regex:
(^\d)|(?<=\d\G)\d|[^-\w.\n]
which means match:
(^\d) - digit on beginning of a line,
| - or
(?<=\d\G)\d - digit if it is preceded by previously matched digit,
| - or
[^-\w.\n] - not dash, word character (\w is [A-Za-z_0-9]), point or
new line (\n). As a [^-\w.\n] is rather broad category, maybe you will like to add some more characters, or character groups, to exclude from matching, it is enough to add it inside brackets,
DEMO
\n is added if string could be multiline. If there is just one-line string, \n is redundant.
Example in Java:
public class Test {
public static void main(String[] args) {
String example = "34567fgf-kl.)*/676hh";
System.out.println(example.replaceAll("(^\\d)|(?<=\\d\\G)\\d|[^\\w.-]", "_"));
}
}
with output:
_____fgf-kl.___676hh

Regex required for space delimited strings java

I have an operation that deals with many space delimited strings, I am looking for a regex for the String matches function which will trigger pass if first two strings before first space starts with capital letters and will return false if they are not.
Examples:
"AL_RIT_121 PA_YT_32 rit cell 22 pulse"
will return true as first two substring AL_RIT_121 and PA_YT_32 starts with capital letter A and P respectively
"AL_RIT_252 pa_YT_21 mal cell reg 32 1 ri"
will return false as p is in lower case.

Pattern.compile("^\\p{Lu}\\S*\\s+\\p{Lu}")
will work with the .find() method. There's no reason to use matches on a prefix test, but if you have an external constraint, just do
Pattern.compile("^\\p{Lu}\\S*\\s+\\p{Lu}.*", Pattern.DOTALL)
To break this down:
^ matches the start of the string,
\\p{Lu} matches any upper-case letter,
\\S* matches zero or more non-space characters, including _
\\s+ matches one or more space characters, and
the second \\p{Lu} matches the upper-case letter starting the second word.
In the second variant, .* combined with Pattern.DOTALL matches the rest of the input.

Simply string.matches("[A-Z]\\w+ [A-Z].*")

You can use a specific regex if those two examples demonstrate your input format:
^(?:[A-Z]+_[A-Z]+_\d+\s*)+
Which means:
^ - Match the beginning of the string
(?: - Start a non-capturing group (used to repeat the following)
[A-Z]+ - Match one or more uppercase characters
_ - Match an underscore
[A-Z]+ - Match one or more uppercase characters
_ - Match an underscore
\d+ - Match one or more decimals (0-9)
\s* - Match zero or more space characters
)+ - Repeat the above group one or more times
You would use it in Java like this:
Pattern pattern = Pattern.compile("^(?:[A-Z]+_[A-Z]+_\\d+\\s*)+");
Matcher matcher = p.matcher( inputString);
if( matcher.matches()) {
System.out.println( "Match found.");
}

Check this out:
public static void main(String[] args)
{
String text = "AL_RIT_121 pA_YT_32 rit cell 22 pulse";
boolean areFirstTwoWordsCapitalized = areFirstTwoWordsCapitalized(text);
System.out.println("areFirstTwoWordsCapitalized = <" + areFirstTwoWordsCapitalized + ">");
}
private static boolean areFirstTwoWordsCapitalized(String text)
{
boolean rslt = false;
String[] words = text.split("\\s");
int wordIndx = 0;
boolean frstWordCap = false;
boolean scndWordCap = false;
for(String word : words)
{
wordIndx++;
//System.out.println("word = <" + word + ">");
Pattern ptrn = Pattern.compile("^[A-Z].+");
Matcher mtchr = ptrn.matcher(word);
while(mtchr.find())
{
String match = mtchr.group();
//System.out.println("\tMatch = <" + match + ">");
if(wordIndx == 1)
{
frstWordCap = true;
}
else if(wordIndx == 2)
{
scndWordCap = true;
}
}
}
rslt = frstWordCap && scndWordCap;
return rslt;
}

Try this:
public class RegularExp
{
/**
* #param args
*/
public static void main(String[] args) {
String regex = "[A-Z][^\\s.]*\\s[A-Z].*";
String str = "APzsnnm lmn Dlld";
System.out.println(str.matches(regex));
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex to add digit between delimiter characters if missing - java

use this pattern \b(?=\d(?:\.|$)) and replace with 0 Demo \b # <word boundary> (?= # Look-Ahead \d # <digit 0-9> (?: # Non Capturing Group \. # "." | # OR $ # End of string/line ) # End of Non Capturing Group ) # End of Look-Ahead

you can use this simple regex: \\b\\d\\b and replace with 0$0

Related

Add all the numbers which have + symbol and replace the same with the added value

Regex to allow only one punctuation character in Java string

Splitting a nested string keeping quotation marks

Java Regex replacing every digit in beginning

Regex required for space delimited strings java

Categories

Resources