How to split 2 strings using regular expression? - java

I am trying to split a string into two strings using regular expression
For example
String original1 = "Calpol Plus 100MG";
The above string should split into
String string1 = "Calpol Plus"; and String string2 = "100MG";
I tried using the .split(" ") method on string but it works only if the original string is "Calpol 100MG"
As I am new to regex I searched a few regular expressions and made a regex as "[^0-9MG]"
but it still doesn't work on a string like "Syrup 10ML"
I want to use a general regex which would work on both the types of string.

Just split your input according to one or more space characters which was just before to the <number>MG string or <number>ML string.
string.split("\\s+(?=\\d+M[LG])"); // Use this regex "\\s+(?=\\d+(?:\\.\\d+)?M[LG])" if the there is a possibility of floating point numbers.
Example:
String original1 = "Calpol Plus 100MG";
String strs[] = original1.split("\\s+(?=\\d+M[LG])");
for (int i=0; i<strs.length; i++) {
System.out.println(strs[i]);
}
To assign the results to a variable.
String original1 = "Calpol Plus 100MG";
String strs[] = original1.split("\\s+(?=\\d+M[LG])");
String string1 = strs[0];
String string2 = strs[1];
System.out.println(string1);
System.out.println(string2);
Output:
Calpol Plus
100MG
Code 2:
String original1 = "Syrup 10ML";
String strs[] = original1.split("\\s+(?=\\d+M[LG])");
String string1 = strs[0];
String string2 = strs[1];
System.out.println(string1);
System.out.println(string2);
Output:
Syrup
10ML
Explanation:
\s+ Matches one or more space characters.
(?=\\d+M[LG]) Positive lookahead asserts that match must be followed by one or more digits \d+ and further followed by MG or ML
ReGex DEMO

Try something like:
String original1 = "Calpol Plus 100MG";
Pattern p = Pattern.compile("[A-Za-z ]+|[0-9]*.*");
Matcher m = p.matcher(original1);
while (m.find()) {
System.out.println(m.group());
}

I present two solutions:
You can create a pattern that matches the whole String and use groups to extract the desired information
You can use look-ahead-assertions to ensure you split in front of a digit
Which solution works best for you depends on the variety of inputs you have. If you use groups you will always find the last amount-part. If you use split you may be able to extract more complex amount-groups like "2 tea-spoons" (with the first solution you would need to extend the [A-Za-z] class to include - e.g.by using [-A-Za-z] instead) or "2.5L" (with the first solution you would need to extend the [0-9] class to include . e.g.by using [0-9.] instead) more easily.
Source:
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Created for http://stackoverflow.com/q/27329519/1266906
*/
public class RecipeSplitter {
/**
* {#code ^} the Pattern has to be applied from the start of the String on
* {#code (.*)} match any characters into Group 1
* {#code \\s+} followed by at least one whitespace
* {#code ([0-9]+\s*[A-Za-z]+)} followed by Group 2 which is made up by at least one digit, optional whitespace and
* at least one character
* {#code $} the Pattern has to be applied so that at the End of the Pattern the End of the String is reached
*/
public static final Pattern INGREDIENT_PATTERN = Pattern.compile("^(.*)\\s+([0-9]+\\s*[A-Za-z]+)$");
/**
* {#code \\s+} at least one whitespace
* {#code (?=[0-9])} next is a digit (?= will ensure it is there but doesn't include it into the match so we don't
* remove it
*/
public static final Pattern WHITESPACE_FOLLOWED_BY_DIGIT_PATTERN = Pattern.compile("\\s+(?=[0-9])");
public static void matchWholeString(String input) {
Matcher matcher = INGREDIENT_PATTERN.matcher(input);
if (matcher.find()) {
System.out.println(
"\"" + input + "\" was split into \"" + matcher.group(1) + "\" and \"" + matcher.group(2) + "\"");
} else {
System.out.println("\"" + input + "\" was not of the expected format");
}
}
public static void splitBeforeNumber(String input) {
String[] strings = WHITESPACE_FOLLOWED_BY_DIGIT_PATTERN.split(input);
System.out.println("\"" + input + "\" was split into " + Arrays.toString(strings));
}
public static void main(String[] args) {
matchWholeString("Calpol Plus 100MG");
// "Calpol Plus 100MG" was split into "Calpol Plus" and "100MG"
matchWholeString("Syrup 10ML");
// "Syrup 10ML" was split into "Syrup" and "10ML"
splitBeforeNumber("Calpol Plus 100MG");
// "Calpol Plus 100MG" was split into [Calpol Plus, 100MG]
splitBeforeNumber("Syrup 10ML");
// "Syrup 10ML" was split into [Syrup, 10ML]
}
}

Related

Splitting a String that has a particular structure

I have a string that goes something like this
"330 Daniel T92435"
Now I need to obtain the name "Daniel", and I could simply just type
string.substring(4,11);
But the position where a name ("Daniel") is placed could vary.
And I don't want to use the split[] method.
I was thinking if there was a way to make the substring method read data until a whitespace is found.
If input string always has the following string structure "someSymbols Name someSymbols" you can use the following regular expression to extract the name:
"[^\\s]+\\s+(\\p{Alpha}+)\\s+[^\\s]+"
\\p{Alpha} - alphabetic character;
\\s - white space;
[^\\s] - any symbol apart from the white space.
In the code below Pattern is as object representing the regular expression. In turn, Matcher is a special object that is responsible for navigation over the given string and allows discovering the parts of this string that match the pattern.
public static String findName(String source) {
Pattern pattern = Pattern.compile("[^\\s]+\\s+(\\p{Alpha}+)\\s+[^\\s]+");
Matcher matcher = pattern.matcher(source);
String result = "no match was found";
if (matcher.find()) {
result = matcher.group(1); // group 1 corresponds to the first element enclosed in parentheses (\\p{Alpha}+)
}
return result;
}
main()
public static void main(String[] args) {
System.out.println(findName("330 Daniel T92435"));
}
Output
Daniel
You can use the str.indexOf(" ") function.
int start = string.indexOf(" ")+1;
string.substring(start,start + 7);
Edit: You can use
int start = string.indexOf(" ")+1;
int end = string.indexOf(" ", start+1);
string.substring(start,end >= 0 ? end : string.length());
if you want to select the first word and don't know how long it will be.

What RegEx separates terms of Polynomial

I have a String 5x^3-2x^2+5x
I want a regex which splits this string as
5x^3,
-2x^2,
5x
I tried "(-)|(\\+)",
but this did not work. As it did not consider negative power terms.
You can split your string using this regex,
\+|(?=-)
The way this works is, it splits the string consuming + character but if there is - then it splits using - but doesn't consume - as that is lookahead.
Check out this Java code,
String s = "5x^3-2x^2+5x";
System.out.println(Arrays.toString(s.split("\\+|(?=-)")));
Gives your expected output below,
[5x^3, -2x^2, 5x]
Edit:
Although in one of OP's comment in his post he said, there won't be negative powers but just in case you have negative powers as well, you can use this regex which handles negative powers as well,
\+|(?<!\^)(?=-)
Check this updated Java code,
List<String> list = Arrays.asList("5x^3-2x^2+5x", "5x^3-2x^-2+5x");
for (String s : list) {
System.out.println(s + " --> " +Arrays.toString(s.split("\\+|(?<!\\^)(?=-)")));
}
New output,
5x^3-2x^2+5x --> [5x^3, -2x^2, 5x]
5x^3-2x^-2+5x --> [5x^3, -2x^-2, 5x]
Maybe,
-?[^\r\n+-]+(?=[+-]|$)
or some similar expressions might have been worked OK too, just in case you might have had constants in the equations.
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "-?[^\r\n+-]+(?=[+-]|$)";
final String string = "5x^3-2x^2+5x\n"
+ "5x^3-2x^2+5x-5\n"
+ "-5x^3-2x^2+5x+5";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
In below program , You can get break of every single variable. So debug it and combine regex as you need it. It will work fine for all input.
import java.util.regex.*;
class Main
{
public static void main(String[] args)
{
String txt="5x^3-2x^2+5x";
String re1="([-+]\\d+)"; // Integer Number 1
String re2="((?:[a-z][a-z0-9_]*))"; // Variable Name 1
String re3="(\\^)"; // Any Single Character 1
String re4="([-+]\\d+)"; // Integer Number 2
String re5="([-+]\\d+)"; // Integer Number 1
String re6="((?:[a-z][a-z0-9_]*))"; // Variable Name 2
String re7="(\\^)"; // Any Single Character 2
String re8="([-+]\\d+)"; // Integer Number 3
String re9="([-+]\\d+)"; // Integer Number 2
String re10="((?:[a-z][a-z0-9_]*))"; // Variable Name 3
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String int1=m.group(1);
String var1=m.group(2);
String c1=m.group(3);
String int2=m.group(4);
String signed_int1=m.group(5);
String var2=m.group(6);
String c2=m.group(7);
String int3=m.group(8);
String signed_int2=m.group(9);
String var3=m.group(10);
System.out.print("("+int1.toString()+")"+"("+var1.toString()+")"+"("+c1.toString()+")"+"("+int2.toString()+")"+"("+signed_int1.toString()+")"+"("+var2.toString()+")"+"("+c2.toString()+")"+"("+int3.toString()+")"+"("+signed_int2.toString()+")"+"("+var3.toString()+")"+"\n");
}
}
}

Replace all characters between two delimiters using regex

I'm trying to replace all characters between two delimiters with another character using regex. The replacement should have the same length as the removed string.
String string1 = "any prefix [tag=foo]bar[/tag] any suffix";
String string2 = "any prefix [tag=foo]longerbar[/tag] any suffix";
String output1 = string1.replaceAll(???, "*");
String output2 = string2.replaceAll(???, "*");
The expected outputs would be:
output1: "any prefix [tag=foo]***[/tag] any suffix"
output2: "any prefix [tag=foo]*********[/tag] any suffix"
I've tried "\\\\\[tag=.\*?](.\*?)\\\\[/tag]" but this replaces the whole sequence with a single "\*".
I think that "(.\*?)" is the problem here because it captures everything at once.
How would I write something that replaces every character separately?
you can use the regex
\w(?=\w*?\[)
which would match all characters before a "[\"
see the regex demo, online compiler demo
You can capture the chars inside, one by one and replace them by * :
public static String replaceByStar(String str) {
String pattern = "(.*\\[tag=.*\\].*)\\w(.*\\[\\/tag\\].*)";
while (str.matches(pattern)) {
str = str.replaceAll(pattern, "$1*$2");
}
return str;
}
Use like this it will print your tx2 expected outputs :
public static void main(String[] args) {
System.out.println(replaceByStar("any prefix [tag=foo]bar[/tag] any suffix"));
System.out.println(replaceByStar("any prefix [tag=foo]loooongerbar[/tag] any suffix"));
}
So the pattern "(.*\\[tag=.*\\].*)\\w(.*\\[\\/tag\\].*)" :
(.*\\[tag=.*\\].*) capture the beginning, with eventually some char in the middle
\\w is for the char you want to replace
(.*\\[\\/tag\\].*) capture the end, with eventually some char in the middle
The substitution $1*$2:
The pattern is (text$1)oneChar(text$2) and it will replace by (text$1)*(text$2)

Java RegExp Replace

Hello I've been trying to make some replacement with not success
public class demo {
public static void main(String[] args){
String url = "/demoapi/api/user/123";
String newurl = "/user/?user=$1";
Pattern pattern = Pattern.compile("/^\\/demoapi\\/api\\/user\\/([0-9]\\d*)$/i");
Matcher match = pattern.matcher(url);
}
}
I want to replace $1 with 123 , how do I do this ?!
Thank you !
I want to replace $1 with 123 , how do I do this ?!
Simply use replace method but never forget to escape $
"/user/?user=$1".replace(/(\$1)/,"123");
I think you are looking for something like this:
String url = "/demoapi/api/user/123";
String newurl = "/user/?user=$1";
Pattern pattern = Pattern.compile(".*/user/(\\d*)");
Matcher match = pattern.matcher(url);
if(match.matches()){
newurl = newurl.replace("$1", match.group(1));
}
System.out.println(newurl);
Hope this helps.
You don't need to enter the whole text ^\\/demoapi\\/api\\/user\\ in the pattern. Just a ^.*\\/ will match upto the last / symbol. So Your java code would be,
String url = "/demoapi/api/user/123";
String newurl = "/user/?user=$1";
String m1 = url.replaceAll("(?i)^.*\\/([0-9]+)$", "$1");
String m2 = newurl.replaceAll("\\$1", m1);
System.out.println(m2);
Output:
/user/?user=123
Explanation:
(?i) Turn on the case insensitive mode.
^.*\\/ Matches upto the last / symbol.
([0-9]+)$ Captures the last digits.
IDEONE
OR
String url = "/demoapi/api/user/123";
String m1 = url.replaceAll(".*/(\\d*)$", "/user/?user=$1");
You need to put / before (\\d*), so that it would capture the numbers from starting ie, 123. Otherwise it would print the last number ie, 3.
You can use any of the following method :-
public class Test {
public static void main(String[] args) {
public static void main(String[] args) {
String url = "/demoapi/api/user/123";
String newurl = "/user/?user=$1";
String s1 = newurl.replaceAll("\\$1", Matcher.quoteReplacement("123"));
System.out.println("s1 : " + s1);
// OR
String s2 = newurl.replaceAll(Pattern.quote("$1"),Matcher.quoteReplacement("123"));
System.out.println("s2 : " + s2);
// OR
String s3 = newurl.replaceAll("\\$1", "123");
System.out.println("s3 : " + s3);
// OR
String s4 = newurl.replace("$1", "123");
System.out.println("s4 : " + s4);
}
}
Explanation of Methods Used :
Pattern.quote(String s) : Returns a literal pattern String for the
specified String. This method produces a String that can be used to
create a Pattern that would match the string s as if it were a
literal pattern. Metacharacters or escape sequences in the input
sequence will be given no special meaning.
Matcher.quoteReplacement(String s) : Returns a literal replacement
String for the specified String. This method produces a String that
will work as a literal replacement s in the appendReplacement method
of the Matcher class. The String produced will match the sequence of
characters in s treated as a literal sequence. Slashes ('\') and
dollar signs ('$') will be given no special meaning.
String.replaceAll(String regex, String replacement) : Replaces each
substring of this string that matches the given regular expression
with the given replacement.
An invocation of this method of the form str.replaceAll(regex, repl)
yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceAll(repl)
Note that backslashes () and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
String.replace(CharSequence target, CharSequence replacement) :
Replaces each substring of this string that matches the literal
target sequence with the specified literal replacement sequence. The
replacement proceeds from the beginning of the string to the end, for
example, replacing "aa" with "b" in the string "aaa" will result in
"ba" rather than "ab".
Compact Search: .*?(\d+)$
This is all you need:
String replaced = yourString.replaceAll(".*?(\\d+)$", "/user/?user=$1");
In the regex demo, see the substitutions at the bottom.
Explanation
(\d+) matches one or more digits (this is capture Group 1)
The $ anchor asserts that we are at the end of the string
We replace with /user/?user= and Group 1, $1

How to identify string pattern within a string but ignore if the match falls inside of identified pattern

I want to search a string for occurences of a string that matches a specific pattern.
Then I will write that unique list of found strings separated by commas.
The pattern is to look for "$FOR_something" as long as that pattern does not fall inside of "#LOOKING( )" or "/* */" and the _something part does not have any other special characters.
For example, if I have this string,
"Not #LOOKING( $FOR_one $FOR_two) /* $FOR_three */ not $$$FOR_four or $FOR_four_b, but $FOR_five; and $FOR_six and not $FOR-seven or $FOR_five again"
The resulting list of found patterns I'm looking for from the above quoted string would be:
$FOR_five, $FOR_six
I started with this example:
import java.lang.StringBuffer;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class testIt {
public static void main(String args[]) {
String myWords = "Not #LOOKING( $FOR_one $FOR_two) /* $FOR_three */ not $$$FOR_four or $FOR_four_b, but $FOR_five; and $FOR_six and not $FOR-seven or $FOR_five again";
StringBuffer sb = new StringBuffer(0);
if ( myWords.toUpperCase().contains("$FOR") )
{
Pattern p = Pattern.compile("\\$FOR[\\_][a-zA-Z_0-9]+[\\s]*", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(myWords);
String myFors = "";
while (m.find())
{
myFors = myWords.substring( m.start() , m.end() ).trim();
if ( sb.length() == 0 ) sb = sb.append(myFors);
else
{
if ( !(sb.toString().contains(myFors))) sb = sb.append(", " + myFors );
}
}
}
System.out.println(sb);
}
}
But it is not giving me what I want. What I want is:
$FOR_five, $FOR_six
Instead, I get all of the $FOR_somethings. I don't know how to ignore the occurences inside of the /**/ or the #LOOKING().
Any suggestions?
This problem goes beyond regular regex I would say. The $$$ patterns can be fixed with negative lookbehind, the others won't as easily.
What I would recommend you to do is to first use tokenizing / manual string parsing to discard unwanted data, such as /* ... */ or #LOOKING( .... ). This could however also be removed by another regex such as:
myWords.replaceAll("/\\*[^*/]+\\*/", ""); // removes /* ... */
myWords.replaceAll("#LOOKING\\([^)]+\\)", ""); // removes #LOOKING( ... )
Once stripped of context-based content you can use e..g, the following regex:
(?<!\\$)\\$FOR_\\p{Alnum}+(?=[\\s;])
Explanation:
(?<!\\$) // Match iff not prefixed with $
\\$FOR_ // Matches $FOR_
\\p{Alnum}+ // Matches one or more alphanumericals [a-zA-Z0-9]
(?=[\\s;]) // Match iff followed by space or ';'
Note that the employed (?...) are known as lookahead/lookbehind expressions which are not captured in the result itself. They act only as prefix/suffix conditions in the above sample.

Categories

Resources