What RegEx separates terms of Polynomial - java

I have a String 5x^3-2x^2+5x
I want a regex which splits this string as
5x^3,
-2x^2,
5x
I tried "(-)|(\\+)",
but this did not work. As it did not consider negative power terms.

You can split your string using this regex,
\+|(?=-)
The way this works is, it splits the string consuming + character but if there is - then it splits using - but doesn't consume - as that is lookahead.
Check out this Java code,
String s = "5x^3-2x^2+5x";
System.out.println(Arrays.toString(s.split("\\+|(?=-)")));
Gives your expected output below,
[5x^3, -2x^2, 5x]
Edit:
Although in one of OP's comment in his post he said, there won't be negative powers but just in case you have negative powers as well, you can use this regex which handles negative powers as well,
\+|(?<!\^)(?=-)
Check this updated Java code,
List<String> list = Arrays.asList("5x^3-2x^2+5x", "5x^3-2x^-2+5x");
for (String s : list) {
System.out.println(s + " --> " +Arrays.toString(s.split("\\+|(?<!\\^)(?=-)")));
}
New output,
5x^3-2x^2+5x --> [5x^3, -2x^2, 5x]
5x^3-2x^-2+5x --> [5x^3, -2x^-2, 5x]

Maybe,
-?[^\r\n+-]+(?=[+-]|$)
or some similar expressions might have been worked OK too, just in case you might have had constants in the equations.
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "-?[^\r\n+-]+(?=[+-]|$)";
final String string = "5x^3-2x^2+5x\n"
+ "5x^3-2x^2+5x-5\n"
+ "-5x^3-2x^2+5x+5";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

In below program , You can get break of every single variable. So debug it and combine regex as you need it. It will work fine for all input.
import java.util.regex.*;
class Main
{
public static void main(String[] args)
{
String txt="5x^3-2x^2+5x";
String re1="([-+]\\d+)"; // Integer Number 1
String re2="((?:[a-z][a-z0-9_]*))"; // Variable Name 1
String re3="(\\^)"; // Any Single Character 1
String re4="([-+]\\d+)"; // Integer Number 2
String re5="([-+]\\d+)"; // Integer Number 1
String re6="((?:[a-z][a-z0-9_]*))"; // Variable Name 2
String re7="(\\^)"; // Any Single Character 2
String re8="([-+]\\d+)"; // Integer Number 3
String re9="([-+]\\d+)"; // Integer Number 2
String re10="((?:[a-z][a-z0-9_]*))"; // Variable Name 3
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String int1=m.group(1);
String var1=m.group(2);
String c1=m.group(3);
String int2=m.group(4);
String signed_int1=m.group(5);
String var2=m.group(6);
String c2=m.group(7);
String int3=m.group(8);
String signed_int2=m.group(9);
String var3=m.group(10);
System.out.print("("+int1.toString()+")"+"("+var1.toString()+")"+"("+c1.toString()+")"+"("+int2.toString()+")"+"("+signed_int1.toString()+")"+"("+var2.toString()+")"+"("+c2.toString()+")"+"("+int3.toString()+")"+"("+signed_int2.toString()+")"+"("+var3.toString()+")"+"\n");
}
}
}

Related

How to detect keyword in String without spaces?

Basically my desired outcome is to split a string based on known keywords regardless on if whitespace seperates the keyword. Below is an example of my current implementation, expect param String line = "sum:=5;":
private static String[] nextLineAsToken(String line) {
return line.split("\\s+(?=(:=|<|>|=))");
}
Expected:
String[] {"sum", ":=", "5;"};
Actual:
String[] {"sum:=5;"};
I have a feeling this isn't possible, but it would be great to hear from you guys.
Thanks.
Here is an example code that you can use to split your input into groups. White space characters like regular space are ignored. It is later printed to the output in for loop:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example {
public static void main(String[] args) {
final String regex = "(\\w*)\\s*(:=)\\s*(\\d*;)";
final String string = "sum:=5;";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
And this is the output:
Full match: sum:=5;
Group 1: sum
Group 2: :=
Group 3: 5;
Your main problem is you coded \s+ instead of \s*, which required there to be spaces to split, instead of spaces being optional. The other problem is your regex only splits before operators.
Use this regex:
\s*(?=(:=|<|>|(?<!:)=))|(?<=(=|<|>))\s*
See live demo.
Or as Java:
return line.split("\\s*(?=(:=|<|>|(?<!:)=))|(?<=(=|<|>))\\s*");
Which uses a look ahead to split before operators and a look behind to split after operators.
\s* has been added to consume any spaces between terms.
Note also the negative look behind (?<!:) within the look ahead to prevent splitting between : and =.

Java regex must match at beginning or end of String

I'm writing a program that takes two Strings as input and searches through the second if the first one is present. To return true, the first String has to be at the beginning/end of a word inside the second String. It cannot be in the middle of a word in the second String.
Example 1 (must return false):
String s1 = "press";
String s2 = "Regular expressions is hard to read"
Example 2 (must return true):
String s1 = "ONE";
String s2 = "ponep,onep!"
Example 3 (must return true):
String s1 = "ho";
String s2 = "Wow! How awesome is that!"
Here is my code, it returns false instead of true in the third example:
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
String part = scanner.nextLine();
String line = scanner.nextLine();
Pattern pattern = Pattern.compile("((.+\\s+)*|(.+,+)*"+part+"\\w.*)"+"|"+"(.+"+part+"(\\s+.+)*)",Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(line);
System.out.println(matcher.matches());
}
please help
Check out the word boundary matcher. It is a 0 length matcher but only matches at the boundary of a word (a position between a word and non-word character \w and \W).
Your regex is then essentially \bkeyword|keyword\b. Either the keyword at the beginning or end of a word.
boolean check(String s1, String s2) {
Pattern pattern = Pattern.compile("\\b" + Pattern.quote(s1) + "|" + Pattern.quote(s1) + "\\b", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(s2);
return matcher.find();
}
Some key points I've added is Pattern.quote(s1) to ensure that if the first word is something like ab|c, it will match those 4 characters literally and not interpret it as a regex. Also, I've switched the check at the end to matcher.find() so we can write a simpler regex as the concern is simply the existence of a matching substring.
my suggestion would be
Split the second string with specified delimiter(space or comma if that's your case)
create regexp to match the specified word either at beginning or end.
map the split words with regexp to get a boolean result array
return true if any true is included in the result array
sample code
class Test {
public static void main(String[] args) {
String first = "ho";
String second = "Wow! How awesome is that!";
String[] words = second.split("\\s|,");
List<Boolean> results = Arrays.stream(words)
.map(String::toLowerCase)
.map(word -> match(first.toLowerCase(), word)).collect(Collectors.toList());
System.out.println(results);
System.out.println(results.contains(true));
}
private static boolean match(String patternWord, String matchedWord) {
Pattern patten1 = Pattern.compile("^" + patternWord + "\\S*");
Matcher matcher1 = patten1.matcher(matchedWord);
Pattern pattern2 = Pattern.compile("\\S*" + patternWord + "$");
Matcher matcher2 = pattern2.matcher(matchedWord);
return matcher1.matches() || matcher2.matches();
}
}

Add all the numbers which have + symbol and replace the same with the added value

I would like to group all the numbers to add if they are supposed to be added.
Test String: '82+18-10.2+3+37=6 + 7
Here 82+18 cab be added and replaced with the value as '100.
Then test string will become: 100-10.2+3+37=6 +7
Again 2+3+37 can be added and replaced in the test string as
follows: 100-10.42=6 +7
Now 6 +7 cannot be done because there is a space after value
'6'.
My idea was to extract the numbers which are supposed to be added like below:
82+18
2+3+37
And then add it and replace the same using the replace() method in string
Tried Regex:
(?=([0-9]{1,}[\\+]{1}[0-9]{1,}))
Sample Input:
82+18-10.2+3+37=6 + 7
Java Code for identifying the groups to be added and replaced:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ReplaceAddition {
static String regex = "(?=([0-9]{1,}[\\+]{1}[0-9]{1,}))";
static String testStr = "82+18-10.2+3+37=6 + 7 ";
public static void main(String[] args) {
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(testStr);
while (matcher.find()) {
System.out.println(matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
}
}
Output:
82+18
2+18
2+3
3+37
Couldn't understand where I'm missing. Help would be appreciated...
I tried simplifying the regexp by removing the positive lookahead operator
(?=...)
And the enclosing parenthesis
(...)
After these changes, the regexp is as follows
static String regex = "[0-9]{1,}[\\+]{1}[0-9]{1,}";
When I run it, I'm getting the following result:
82+18
2+3
This is closer to the expected, but still not perfect, because we're getting "2+3" instead of 2+3+37. In order to handle any number of added numbers instead of just two, the expression can be further tuned up to:
static String regex = "[0-9]{1,}(?:[\\+]{1}[0-9]{1,})+";
What I added here is a non-capturing group
(?:...)
with a plus sign meaning one or more repetition. Now the program produces the output
82+18
2+3+37
as expected.
Another solution is like so:
public static void main(String[] args)
{
final var p = Pattern.compile("(?:\\d+(?:\\+\\d+)+)");
var text = new StringBuilder("82+18-10.2+3+37=6 + 7 ");
var m = p.matcher(text);
while(m.find())
{
var sum = 0;
var split = m.group(0).split("\\+");
for(var str : split)
{
sum += Integer.parseInt(str);
}
text.replace(m.start(0),m.end(0),""+sum);
m.reset();
}
System.out.println(text);
}
The regex (?:\\d+(?:\\+\\d+)+) finds:
(?: Noncapturing
\\d+ Any number of digits, followed by
(?: Noncapturing
\\+ A plus symbol, and
\\d+ Any number of digits
)+ Any number of times
) Once
So, this regex matches an instance of any number of numbers separated by '+'.

Java regex input from txt file

I have a text file that includes some mathematical expressions.
I need to parse the text into components (words, sentences, punctuation, numbers and arithmetic signs) using regular expressions, calculate mathematical expressions and return the text in the original form with the calculated numbers expressions.
I done this without regular expressions (without calculation). Now I am trying to do this using regular expressions.
I not fully understand how to do this correctly. The input text is like this:
Pete like mathematic 5+3 and jesica too sin(3).
In the output I need:
Pete like mathematic 8 and jesica too 0,14.
I need some advice with regex and calculation from people who know how to do this.
My code:
final static Pattern PUNCTUATION = Pattern.compile("([\\s.,!?;:]){1,}");
final static Pattern LETTER = Pattern.compile("([а-яА-Яa-zA-Z&&[^sin]]){1,}");
List<Sentence> sentences = new ArrayList<Sentence>();
List<PartOfSentence> parts = new ArrayList<PartOfSentence>();
StringTokenizer st = new StringTokenizer(text, " \t\n\r:;.!?,/\\|\"\'",
true);
The code with regex (not working):
while (st.hasMoreTokens()) {
String s = st.nextToken().trim();
int size = s.length();
for (int i=0; i<s.length();i++){
//with regex. not working variant
Matcher m = LETTER.matcher(s);
if (m.matches()){
parts.add(new Word(s.toCharArray()));
}
m = PUNCTUATION.matcher(s);
if (m.matches()){
parts.add(new Punctuation(s.charAt(0)));
}
Sentence buf = new Sentence(parts);
if (buf.getWords().size() != 0) {
sentences.add(buf);
parts = new ArrayList<PartOfSentence>();
} else
parts.add(new Punctuation(s.charAt(0)));
Without regex (working):
if (size < 1)
continue;
if (size == 1) {
switch (s.charAt(0)) {
case ' ':
continue;
case ',':
case ';':
case ':':
case '\'':
case '\"':
parts.add(new Punctuation(s.charAt(0)));
break;
case '.':
case '?':
case '!':
parts.add(new Punctuation(s.charAt(0)));
Sentence buf = new Sentence(parts);
if (buf.getWords().size() != 0) {
sentences.add(buf);
parts = new ArrayList<PartOfSentence>();
} else
parts.add(new Punctuation(s.charAt(0)));
break;
default:
parts.add(new Word(s.toCharArray()));
}
} else {
parts.add(new Word(s.toCharArray()));
}
}
This is not a trivial problem to solve as even matching numbers can become quite involved.
Firstly, a number can be matched by the regular expression "(\\d*(\\.\\d*)?\\d(e\\d+)?)" to account for decimal places and exponent formats.
Secondly, there are (at least) three types of expressions that you want to solve: binary, unary and functions. For each one, we create a pattern to match in the solve method.
Thirdly, there are numerous libraries that can implement the reduce method like this or this.
The implementation below does not handle nested expressions e.g., sin(5) + cos(3) or spaces in expressions.
private static final String NUM = "(\\d*(\\.\\d*)?\\d(e\\d+)?)";
public String solve(String expr) {
expr = solve(expr, "(" + NUM + "(!|\\+\\+|--))"); //unary operators
expr = solve(expr, "(" + NUM + "([+-/*]" + NUM + ")+)"); // binary operators
expr = solve(expr, "((sin|cos|tan)\\(" + NUM + "\\))"); // functions
return expr;
}
private String solve(String expr, String pattern) {
Matcher m = Pattern.compile(pattern).matcher(expr);
// assume a reduce method :String -> String that solve expressions
while(m.find()){
expr = m.replaceAll(reduce(m.group()));
}
return expr;
}
//evaluate expression using exp4j, format to 2 decimal places,
//remove trailing 0s and dangling decimal point
private String reduce(String expr){
double res = new ExpressionBuilder(expr).build().evaluate();
return String.format("%.2f",res).replaceAll("0*$", "").replaceAll("\\.$", "");
}
I think you could start by looking for "Function" matching in your input String. Then all is not matched with a Function is simply returned.
For example, this short code do, i hope, what you are seeking :
Class with Main method.
public class App {
StringTokenizer st = new StringTokenizer("Pete likes Mathematics 3+3 and Jessica too 6+3.", " \t\n\r:;.!?,/\\|\"\'", true);
public static void main(String[] args) {
new App();
}
public App(){
ArrayList<String> renderedStrings = new ArrayList<String>();
while(st.hasMoreTokens()){
String s = st.nextToken();
if(!AdditionPatternFuntion.render(s, renderedStrings)){
renderedStrings.add(s);
}
}
for(String s : renderedStrings){
System.out.print(s);
}
}
}
Class "AdditionPattern" that does the real Job
import java.util.ArrayList;
import java.util.StringTokenizer;
import java.util.regex.Pattern;
class AdditionPatternFuntion{
public static boolean render(String s, ArrayList<String> renderedStrings){
Pattern pattern = Pattern.compile("(\\d\\+\\d)");
boolean match = pattern.matcher(s).matches();
if(match){
StringTokenizer additionTokenier = new StringTokenizer(s, "+", false);
Integer firstOperand = new Integer(additionTokenier.nextToken());
Integer secondOperand = new Integer(additionTokenier.nextToken());
renderedStrings.add(new Integer(firstOperand + secondOperand).toString());
}
return match;
}
}
When I run with this input :
Pete likes Mathematics 3+3 and Jessica too 6+3.
I getthis output :
Pete likes Mathematics 6 and Jessica too 9.
To handle "sin()" function you can do the same : Create a new class, "SinPatternFunction" for instance, and do the same.
I think you should even create an Abstract class "FunctionPattern" with a abstract method "render" inside it which you will implement with the AssitionPatternFunction and SinPatternFunction classes.
Finally, you would be able to create a class, let's call it "PatternFunctionHandler", which will create a list of PatternFunction (a SinPatternFunction, an AdditionPatternFunction (and so on)) then call render on each one and return the result.
Your specified requirement is to use regular expressions to:
Divide text into components (words, ...)
Return text with inner arithmetic expressions evaluated
You have started with first step using regular expressions, but have not quite completed it -- after completing it, there remains to:
Recognize and parse components that form arithmetic (sub)expressions.
Evaluate recognized (sub)expression components and produce a value. For evaluating (sub)expressions in infix notation, there exists a very helpful answer.
Substituting value replacements back into original string -- should be simple.
For text division into components defined strictly enough to allow later unambiguos evaluation of the subexpression, I coded a sample, trying out named capturing groups in Java. This sample handles only integer numbers, but floating point should be simple to add.
Sample output on some test inputs was as follows:
Matching 'Pete like mathematic 5+3 and jesica too sin(3).'
WORD('Pete'),WS(' '),WORD('like'),WS(' '),WORD('mathematic'),WS(' '),NUM('5'),OP('+'),NUM('3'),WS(' '),WORD('and'),WS(' '),WORD('jesica'),WS(' '),WORD('too'),WS(' '),FUNC('sin'),FOPENP('('),NUM('3'),CLOSEP(')'),DOT('.')
Matching 'How about solving sin(3 + cos(x)).'
WORD('How'),WS(' '),WORD('about'),WS(' '),WORD('solving'),WS(' '),FUNC('sin'),FOPENP('('),NUM('3'),WS(' '),OP('+'),WS(' '),FUNC('cos'),FOPENP('('),WORD('x'),CLOSEP(')'),CLOSEP(')'),DOT('.')
Matching 'Or arcsin(4.2) we do not know about?'
WORD('Or'),WS(' '),WORD('arcsin'),OPENP('('),NUM('4'),DOT('.'),NUM('2'),CLOSEP(')'),WS(' '),WORD('we'),WS(' '),WORD('do'),WS(' '),WORD('not'),WS(' '),WORD('know'),WS(' '),WORD('about'),PUNCT('?')
Matching ''sin sin sin' the catholic priest has said...'
PUNCT('''),WORD('sin'),WS(' '),WORD('sin'),WS(' '),WORD('sin'),PUNCT('''),WS(' '),WORD('the'),WS(' '),WORD('catholic'),WS(' '),WORD('priest'),WS(' '),WORD('has'),WS(' '),WORD('said'),DOT('.'),DOT('.'),DOT('.')
On named capturing group usage, I found it inconvenient that compiled Pattern or acquired Matcher APIs do not provide access to present group names. Sample code below.
import java.util.*;
import java.util.regex.*;
import static java.util.stream.Collectors.joining;
public class Lexer {
// differentiating _function call opening parentheses_ from expressions one
static final String S_FOPENP = "(?<fopenp>\\()";
static final String S_FUNC = "(?<func>(sin|cos|tan))" + S_FOPENP;
// expression or text opening parentheses
static final String S_OPENP = "(?<openp>\\()";
// expression or text closing parentheses
static final String S_CLOSEP = "(?<closep>\\))";
// separate dot, should help with introducing floating-point support
static final String S_DOT = "(?<dot>\\.)";
// other recognized punctuation
static final String S_PUNCT = "(?<punct>[,!?;:'\"])";
// whitespace
static final String S_WS = "(?<ws>\\s+)";
// integer number pattern
static final String S_NUM = "(?<num>\\d+)";
// treat '* / + -' as mathematical operators. Can be in dashed text.
static final String S_OP = "(?<op>\\*|/|\\+|-)";
// word -- refrain from using \w character class that also includes digits
static final String S_WORD = "(?<word>[a-zA-Z]+)";
// put the predefined components together into single regular expression
private static final String S_ALL = "(" +
S_OPENP + "|" + S_CLOSEP + "|" + S_FUNC + "|" + S_DOT + "|" +
S_PUNCT + "|" + S_WS + "|" + S_NUM + "|" + S_OP + "|" + S_WORD +
")";
static final Pattern ALL = Pattern.compile(S_ALL); // ... & form Pattern
// named capturing groups defined in regular expressions
static final List<String> GROUPS = Arrays.asList(
"func", "fopenp",
"openp", "closep",
"dot", "punct", "ws",
"num", "op",
"word"
);
// divide match into components according to capturing groups
static final List<String> tokenize(Matcher m) {
List<String> tokens = new LinkedList<>();
while (m.find()){
for (String group : GROUPS) {
String grResult = m.group(group);
if (grResult != null)
tokens.add(group.toUpperCase() + "('" + grResult + "')");
}
}
return tokens;
}
// some sample inputs to test
static final List<String> INPUTS = Arrays.asList(
"Pete like mathematic 5+3 and jesica too sin(3).",
"How about solving sin(3 + cos(x)).",
"Or arcsin(4.2) we do not know about?",
"'sin sin sin' the catholic priest has said..."
);
// test
public static void main(String[] args) {
for (String input: INPUTS) {
Matcher m = ALL.matcher(input);
System.out.println("Matching '" + input + "'");
System.out.println(tokenize(m).stream().collect(joining(",")));
}
}
}

How to split 2 strings using regular expression?

I am trying to split a string into two strings using regular expression
For example
String original1 = "Calpol Plus 100MG";
The above string should split into
String string1 = "Calpol Plus"; and String string2 = "100MG";
I tried using the .split(" ") method on string but it works only if the original string is "Calpol 100MG"
As I am new to regex I searched a few regular expressions and made a regex as "[^0-9MG]"
but it still doesn't work on a string like "Syrup 10ML"
I want to use a general regex which would work on both the types of string.
Just split your input according to one or more space characters which was just before to the <number>MG string or <number>ML string.
string.split("\\s+(?=\\d+M[LG])"); // Use this regex "\\s+(?=\\d+(?:\\.\\d+)?M[LG])" if the there is a possibility of floating point numbers.
Example:
String original1 = "Calpol Plus 100MG";
String strs[] = original1.split("\\s+(?=\\d+M[LG])");
for (int i=0; i<strs.length; i++) {
System.out.println(strs[i]);
}
To assign the results to a variable.
String original1 = "Calpol Plus 100MG";
String strs[] = original1.split("\\s+(?=\\d+M[LG])");
String string1 = strs[0];
String string2 = strs[1];
System.out.println(string1);
System.out.println(string2);
Output:
Calpol Plus
100MG
Code 2:
String original1 = "Syrup 10ML";
String strs[] = original1.split("\\s+(?=\\d+M[LG])");
String string1 = strs[0];
String string2 = strs[1];
System.out.println(string1);
System.out.println(string2);
Output:
Syrup
10ML
Explanation:
\s+ Matches one or more space characters.
(?=\\d+M[LG]) Positive lookahead asserts that match must be followed by one or more digits \d+ and further followed by MG or ML
ReGex DEMO
Try something like:
String original1 = "Calpol Plus 100MG";
Pattern p = Pattern.compile("[A-Za-z ]+|[0-9]*.*");
Matcher m = p.matcher(original1);
while (m.find()) {
System.out.println(m.group());
}
I present two solutions:
You can create a pattern that matches the whole String and use groups to extract the desired information
You can use look-ahead-assertions to ensure you split in front of a digit
Which solution works best for you depends on the variety of inputs you have. If you use groups you will always find the last amount-part. If you use split you may be able to extract more complex amount-groups like "2 tea-spoons" (with the first solution you would need to extend the [A-Za-z] class to include - e.g.by using [-A-Za-z] instead) or "2.5L" (with the first solution you would need to extend the [0-9] class to include . e.g.by using [0-9.] instead) more easily.
Source:
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Created for http://stackoverflow.com/q/27329519/1266906
*/
public class RecipeSplitter {
/**
* {#code ^} the Pattern has to be applied from the start of the String on
* {#code (.*)} match any characters into Group 1
* {#code \\s+} followed by at least one whitespace
* {#code ([0-9]+\s*[A-Za-z]+)} followed by Group 2 which is made up by at least one digit, optional whitespace and
* at least one character
* {#code $} the Pattern has to be applied so that at the End of the Pattern the End of the String is reached
*/
public static final Pattern INGREDIENT_PATTERN = Pattern.compile("^(.*)\\s+([0-9]+\\s*[A-Za-z]+)$");
/**
* {#code \\s+} at least one whitespace
* {#code (?=[0-9])} next is a digit (?= will ensure it is there but doesn't include it into the match so we don't
* remove it
*/
public static final Pattern WHITESPACE_FOLLOWED_BY_DIGIT_PATTERN = Pattern.compile("\\s+(?=[0-9])");
public static void matchWholeString(String input) {
Matcher matcher = INGREDIENT_PATTERN.matcher(input);
if (matcher.find()) {
System.out.println(
"\"" + input + "\" was split into \"" + matcher.group(1) + "\" and \"" + matcher.group(2) + "\"");
} else {
System.out.println("\"" + input + "\" was not of the expected format");
}
}
public static void splitBeforeNumber(String input) {
String[] strings = WHITESPACE_FOLLOWED_BY_DIGIT_PATTERN.split(input);
System.out.println("\"" + input + "\" was split into " + Arrays.toString(strings));
}
public static void main(String[] args) {
matchWholeString("Calpol Plus 100MG");
// "Calpol Plus 100MG" was split into "Calpol Plus" and "100MG"
matchWholeString("Syrup 10ML");
// "Syrup 10ML" was split into "Syrup" and "10ML"
splitBeforeNumber("Calpol Plus 100MG");
// "Calpol Plus 100MG" was split into [Calpol Plus, 100MG]
splitBeforeNumber("Syrup 10ML");
// "Syrup 10ML" was split into [Syrup, 10ML]
}
}

Categories

Resources