Basically my desired outcome is to split a string based on known keywords regardless on if whitespace seperates the keyword. Below is an example of my current implementation, expect param String line = "sum:=5;":
private static String[] nextLineAsToken(String line) {
return line.split("\\s+(?=(:=|<|>|=))");
}
Expected:
String[] {"sum", ":=", "5;"};
Actual:
String[] {"sum:=5;"};
I have a feeling this isn't possible, but it would be great to hear from you guys.
Thanks.
Here is an example code that you can use to split your input into groups. White space characters like regular space are ignored. It is later printed to the output in for loop:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example {
public static void main(String[] args) {
final String regex = "(\\w*)\\s*(:=)\\s*(\\d*;)";
final String string = "sum:=5;";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
And this is the output:
Full match: sum:=5;
Group 1: sum
Group 2: :=
Group 3: 5;
Your main problem is you coded \s+ instead of \s*, which required there to be spaces to split, instead of spaces being optional. The other problem is your regex only splits before operators.
Use this regex:
\s*(?=(:=|<|>|(?<!:)=))|(?<=(=|<|>))\s*
See live demo.
Or as Java:
return line.split("\\s*(?=(:=|<|>|(?<!:)=))|(?<=(=|<|>))\\s*");
Which uses a look ahead to split before operators and a look behind to split after operators.
\s* has been added to consume any spaces between terms.
Note also the negative look behind (?<!:) within the look ahead to prevent splitting between : and =.
Related
I have a String 5x^3-2x^2+5x
I want a regex which splits this string as
5x^3,
-2x^2,
5x
I tried "(-)|(\\+)",
but this did not work. As it did not consider negative power terms.
You can split your string using this regex,
\+|(?=-)
The way this works is, it splits the string consuming + character but if there is - then it splits using - but doesn't consume - as that is lookahead.
Check out this Java code,
String s = "5x^3-2x^2+5x";
System.out.println(Arrays.toString(s.split("\\+|(?=-)")));
Gives your expected output below,
[5x^3, -2x^2, 5x]
Edit:
Although in one of OP's comment in his post he said, there won't be negative powers but just in case you have negative powers as well, you can use this regex which handles negative powers as well,
\+|(?<!\^)(?=-)
Check this updated Java code,
List<String> list = Arrays.asList("5x^3-2x^2+5x", "5x^3-2x^-2+5x");
for (String s : list) {
System.out.println(s + " --> " +Arrays.toString(s.split("\\+|(?<!\\^)(?=-)")));
}
New output,
5x^3-2x^2+5x --> [5x^3, -2x^2, 5x]
5x^3-2x^-2+5x --> [5x^3, -2x^-2, 5x]
Maybe,
-?[^\r\n+-]+(?=[+-]|$)
or some similar expressions might have been worked OK too, just in case you might have had constants in the equations.
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "-?[^\r\n+-]+(?=[+-]|$)";
final String string = "5x^3-2x^2+5x\n"
+ "5x^3-2x^2+5x-5\n"
+ "-5x^3-2x^2+5x+5";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
In below program , You can get break of every single variable. So debug it and combine regex as you need it. It will work fine for all input.
import java.util.regex.*;
class Main
{
public static void main(String[] args)
{
String txt="5x^3-2x^2+5x";
String re1="([-+]\\d+)"; // Integer Number 1
String re2="((?:[a-z][a-z0-9_]*))"; // Variable Name 1
String re3="(\\^)"; // Any Single Character 1
String re4="([-+]\\d+)"; // Integer Number 2
String re5="([-+]\\d+)"; // Integer Number 1
String re6="((?:[a-z][a-z0-9_]*))"; // Variable Name 2
String re7="(\\^)"; // Any Single Character 2
String re8="([-+]\\d+)"; // Integer Number 3
String re9="([-+]\\d+)"; // Integer Number 2
String re10="((?:[a-z][a-z0-9_]*))"; // Variable Name 3
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String int1=m.group(1);
String var1=m.group(2);
String c1=m.group(3);
String int2=m.group(4);
String signed_int1=m.group(5);
String var2=m.group(6);
String c2=m.group(7);
String int3=m.group(8);
String signed_int2=m.group(9);
String var3=m.group(10);
System.out.print("("+int1.toString()+")"+"("+var1.toString()+")"+"("+c1.toString()+")"+"("+int2.toString()+")"+"("+signed_int1.toString()+")"+"("+var2.toString()+")"+"("+c2.toString()+")"+"("+int3.toString()+")"+"("+signed_int2.toString()+")"+"("+var3.toString()+")"+"\n");
}
}
}
I would like to group all the numbers to add if they are supposed to be added.
Test String: '82+18-10.2+3+37=6 + 7
Here 82+18 cab be added and replaced with the value as '100.
Then test string will become: 100-10.2+3+37=6 +7
Again 2+3+37 can be added and replaced in the test string as
follows: 100-10.42=6 +7
Now 6 +7 cannot be done because there is a space after value
'6'.
My idea was to extract the numbers which are supposed to be added like below:
82+18
2+3+37
And then add it and replace the same using the replace() method in string
Tried Regex:
(?=([0-9]{1,}[\\+]{1}[0-9]{1,}))
Sample Input:
82+18-10.2+3+37=6 + 7
Java Code for identifying the groups to be added and replaced:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ReplaceAddition {
static String regex = "(?=([0-9]{1,}[\\+]{1}[0-9]{1,}))";
static String testStr = "82+18-10.2+3+37=6 + 7 ";
public static void main(String[] args) {
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(testStr);
while (matcher.find()) {
System.out.println(matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
}
}
Output:
82+18
2+18
2+3
3+37
Couldn't understand where I'm missing. Help would be appreciated...
I tried simplifying the regexp by removing the positive lookahead operator
(?=...)
And the enclosing parenthesis
(...)
After these changes, the regexp is as follows
static String regex = "[0-9]{1,}[\\+]{1}[0-9]{1,}";
When I run it, I'm getting the following result:
82+18
2+3
This is closer to the expected, but still not perfect, because we're getting "2+3" instead of 2+3+37. In order to handle any number of added numbers instead of just two, the expression can be further tuned up to:
static String regex = "[0-9]{1,}(?:[\\+]{1}[0-9]{1,})+";
What I added here is a non-capturing group
(?:...)
with a plus sign meaning one or more repetition. Now the program produces the output
82+18
2+3+37
as expected.
Another solution is like so:
public static void main(String[] args)
{
final var p = Pattern.compile("(?:\\d+(?:\\+\\d+)+)");
var text = new StringBuilder("82+18-10.2+3+37=6 + 7 ");
var m = p.matcher(text);
while(m.find())
{
var sum = 0;
var split = m.group(0).split("\\+");
for(var str : split)
{
sum += Integer.parseInt(str);
}
text.replace(m.start(0),m.end(0),""+sum);
m.reset();
}
System.out.println(text);
}
The regex (?:\\d+(?:\\+\\d+)+) finds:
(?: Noncapturing
\\d+ Any number of digits, followed by
(?: Noncapturing
\\+ A plus symbol, and
\\d+ Any number of digits
)+ Any number of times
) Once
So, this regex matches an instance of any number of numbers separated by '+'.
String regex = "^;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}.[\\d]{1,}";
String str = ";ABC12;10;250.3";
System.out.println(str.matches(regex));
The above regex works fine.
Consider the following strings
str1=";ABC12;10;250.3"
str2=;ABB62;5;2.3
str3=;ABF02;8;25120.3
str4=;AKC12;11;2504.303
Now i have the string as String strToMatch= str1,str2,str3,str4
How do i convert my regex expression above inorder to match the above string.
Note : There can be n number of comma separated values in the above string. And i also need to take care that the string strToMatch doesnot end with comma.
You can capture the regex with round brackets and repeat one or more times:
String regex = "^(;[A-Z0-9]{5};\\d+;\\d+\\.\\d+){1,}";
Try this pattern instead: (;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}\\.[\\d]{1,},?)+
This has two differences to your pattern: first I use \\. to denote that this has to be a . because a single dot means "any character" in regex.
Then I used the grouping brackets (...) and the + at the end to say: "Look for this once or more". As the , is optional at the end, I added a ?
If you want to get single matches to process using a Matcher later on, a simple modification should do the trick: (;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}\\.[\\d]{1,}),?
The + is gone and the ,? is outside the grouping brackets, because those are now capturing brackets (as well).
Example:
final Pattern pattern = Pattern.compile("(;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}\\.[\\d]{1,}),?");
final Matcher matcher = pattern.matcher(";ABC12;10;250.3,;ABB62;5;2.3,;ABF02;8;25120.3,;AKC12;11;2504.303");
while (matcher.find()) {
System.out.println("Whole match: " + matcher.group());
for (int i = 1; i <= matcher.groupCount(); ++i) {
System.out.println("Group #" + i + ": " + matcher.group(i));
}
}
I have found below way of solving the problem.
String strToMatch = ";ABC12;10;250.3,;ABB62;5;2.3,;ABF02;8;25120.3,;AKC12;11;2504.303";
if(strToMatch.endsWith(",") || strToMatch.startsWith(","))
return false;
else{
String[] str = strToMatch.split(",");
int count = 0;
for (String s : str){
String regex = ";[A-Z0-9]{5};\\d+;\\d+\\.\\d+";
if(s.matches(regex))
return false;
}
return true;
}
Any simpler way than this?
I need to split a String based on comma as seperator, but if the part of string is enclosed with " the splitting has to stop for that portion from starting of " to ending of it even it contains commas in between.
Can anyone please help me to solve this using regex with look around.
Resurrecting this question because it had a simple regex solution that wasn't mentioned. This situation sounds very similar to ["regex-match a pattern unless..."][4]
\"[^\"]*\"|(,)
The left side of the alternation matches complete double-quoted strings. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right ones because they were not matched by the expression on the left.
Here is working code (see online demo):
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) {
String subject = "\"Messages,Hello\",World,Hobbies,Java\",Programming\"";
Pattern regex = Pattern.compile("\"[^\"]*\"|(,)");
Matcher m = regex.matcher(subject);
StringBuffer b = new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits)
System.out.println(split);
} // end main
} // end Program
Reference
How to match pattern except in situations s1, s2, s3
Please try this:
(?<!\G\s*"[^"]*),
If you put this regex in your program, it should be:
String regex = "(?<!\\G\\s*\"[^\"]*),";
But 2 things are not clear:
Does the " only start near the ,, or it can start in the middle of content, such as AAA, BB"CC,DD" ? The regex above only deal with start neer , .
If the content has " itself, how to escape? use "" or \"? The regex above does not deal any escaped " format.
I'm currently trying to solve a problem from codingbat.com with regular expressions.
I'm new to this, so step-by-step explanations would be appreciated. I could solve this with String methods relatively easily, but I am trying to use regular expressions.
Here is the prompt:
Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"
etc
My code thus far:
String regex = ".?" + word+ ".?";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
String newStr = "";
while(m.find())
newStr += m.group().replace(word, "");
return newStr;
The problem is that when there are multiple instances of word in a row, the program misses the character preceding the word because m.find() progresses beyond it.
For example: wordEnds("abc1xyz1i1j", "1") should return "cxziij", but my method returns "cxzij", not repeating the "i"
I would appreciate a non-messy solution with an explanation I can apply to other general regex problems.
This is a one-liner solution:
String wordEnds = input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
This matches your edge case as a look ahead within a non-capturing group, then matches the usual (consuming) case.
Note that your requirements don't require iteration, only your question title assumes it's necessary, which it isn't.
Note also that to be absolutely safe, you should escape all characters in word in case any of them are special "regex" characters, so if you can't guarantee that, you need to use Pattern.quote(word) instead of word.
Here's a test of the usual case and the edge case, showing it works:
public static String wordEnds(String input, String word) {
word = Pattern.quote(word); // add this line to be 100% safe
return input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
}
public static void main(String[] args) {
System.out.println(wordEnds("abcXY123XYijk", "XY"));
System.out.println(wordEnds("abc1xyz1i1j", "1"));
}
Output:
c13i
cxziij
Use positive lookbehind and postive lookahead which are zero-width assertions
(?<=(.)|^)1(?=(.)|$)
^ ^ ^-looks for a character after 1 and captures it in group2
| |->matches 1..you can replace it with any word
|
|->looks for a character just before 1 and captures it in group 1..this is zero width assertion that doesn't move forward to match.it is just a test and thus allow us to capture the values
$1 and $2 contains your value..Go on finding till the end
So this should be like
String s1 = "abcXY123XYiXYjk";
String s2 = java.util.regex.Pattern.quote("XY");
String s3 = "";
String r = "(?<=(.)|^)"+s2+"(?=(.)|$)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s1);
while(m.find()) s3 += m.group(1)+m.group(2);
//s3 now contains c13iij
works here
Use regex as follows:
Matcher m = Pattern.compile("(.|)" + Pattern.quote(b) + "(?=(.?))").matcher(a);
for (int i = 1; m.find(); c += m.group(1) + m.group(2), i++);
Check this demo.