Java regex patterns conjunction pattern - java

Is there any way to get Pattern that will be a conjunction of two another, such that any String will match it if in matches two another both?
Some math:
S — set of strings
P — set of patterns (where each pattern has one or more string representation (e.g. “[0-9]” and “\d” are the same pattern))
Sᵢ — subset of strings (Sᵢ ⊂ S) that match pᵢ pattern (where instead of i could be any index).
In equation form: “Sᵢ = {s | s ∈ S, s matches pᵢ, pᵢ ∈ P}” — that meas: “Sᵢ is a set of elements that are strings and match pᵢ pattern”.
Or another notation: “Sᵢ ⊂ S, ∀pᵢ ∈ P ∀s ∈ S (s matches pᵢ ≡ s ∈ Sᵢ)” — that meas: “Sᵢ is subset of strings and any string is element of Sᵢ if it matches pᵢ pattern”.
Let's define conjunction of patterns: “p₁ ∧ p₂ = p₃ ≡ S₁ ∩ S₂ = S₃” — that means: “Set of strings that match conjunction of patterns p₁ and p₂ is intersection of sets of strings that match p₁ pattern and that match p₂ pattern”.

Assuming you want exact matches (that is, tomato does not match omat), then you need to wrap each p_i between (?=^(?: and )$), then join them.
If you want inexact matches (tomato does match omat), then you need to wrap each p_i between (?=.*?(?: and )), then join them. Note that in this case, there is the potential for catastrophic backtracking.
In both cases, you can add .* after joining if you want to eat the word (remember, lookaheads match the empty string).
Explanation
In the exact case, the outside is wrapped in a lookahead so that no characters are eaten. Inside are anchors ^ and $ (this provides the exactness) surrounding a non-capturing group. The non-capturing group is so if you have an or expression at the upper level in one of p_i, the anchors apply to the entire group rather than to the first expression.
The inexact case is exactly the same, except instead of anchoring, we eat characters until we get to the match position.
You can see a detailed example on www.debuggex.com.

private static String combineRE(String p1, String p2){
int groups1 = 0, groups2=0;
StringBuilder newP = new StringBuilder("(?=");
newP.append(p1);
newP.append("$)(?=");
Pattern capturingGroup = Pattern.compile("(?<!\\\\)(\\\\\\\\)*\\((?!\\?)");
Matcher m = capturingGroup.matcher(p1);
while(m.find()) groups1 ++;
m = capturingGroup.matcher(p2);
while(m.find()) groups2 ++;
String new2 = p2;
for(int i=1; i<=groups2; i++)
new2 = new2.replaceAll("(?<!\\\\)\\\\"+i, "\\\\" + (i+groups1));
newP.append(new2);
newP.append("$).*");
return newP.toString();
}
This function uses the basic structure (?=p1$)(?=p2$).*, while recounting the numbered backreferences int the second pattern. It uses a regular expression to count the number of capturing group openers (unescaped (s not followed by a ?) in each pattern, then updates the backrefences in the second pattern before placing it in the resultant pattern. I've set up a testing environment with ideone: Please, add all the test cases you can think of, but I think this answers your question.
http://ideone.com/Wm8cRc
Round 2:
There's no good way to generate a Pattern that will find() a substring to match two patterns. I toyed briefly with (?=p1(?<p2)).*(?<(?=p1)p2) and other such nonsense before giving up and writing, instead, an algorithm. First, I slightly modified my CombineRE from before:
private static String combineRE(String p1, String p2, boolean anchors){
int groups1 = 0, groups2=0;
StringBuilder newP = new StringBuilder((anchors)?"^(?=":"(?=");
newP.append(p1);
if (anchors) newP.append('$');
newP.append(")(?=");
Pattern capturingGroup = Pattern.compile("(?<!\\\\)(\\\\\\\\)*\\((?!\\?)");
Matcher m = capturingGroup.matcher(p1);
while(m.find()) groups1 ++;
m = capturingGroup.matcher(p2);
while(m.find()) groups2 ++;
String new2 = p2;
for(int i=1; i<=groups2; i++)
new2 = new2.replaceAll("(?<!\\\\)\\\\"+i, "\\\\" + (i+groups1));
newP.append(new2);
if (anchors) newP.append('$');
newP.append(')');
if (anchors) newP.append(".*");
return newP.toString();
}
You'll see that it now supports optional anchors. I used this functionality in my new function:
private static String[] findAllCombinedRE(String p1, String p2, String haystack, boolean overlap){
ArrayList<String> toReturn = new ArrayList<String>();
Pattern pCombo = Pattern.compile(combineRE(p1,p2, false));
String pComboMatch = combineRE(p1,p2, true);
Matcher m = pCombo.matcher(haystack);
int s = 0;
while (m.find(s)){
String match = haystack.substring(m.start());
s = m.start()+1;
for (int i=match.length(); i>0; i--){
String sMatch = match.substring(0,i);
if (Pattern.matches(pComboMatch, sMatch)){
toReturn.add(sMatch);
/**
* Note that at this point we can caluclute match
* object like Information:
*
* group() = sMatch;
* start() = m.start();
* end() = m.start() + i;
*
* If it so suited us, we could pass this information
* back in a wrapped object.
*/
if (!overlap){
s = m.start()+i;
break;
}
}
}
}
return toReturn.toArray(new String[]{});
}
It uses an anchor free version of the two Regular Expressions to find all strings that might match, then chops off one letter at a time until the string matches the anchored version. It also includes a boolean to control for overlapping matches.
http://ideone.com/CBoBN5 Works pretty well.

Related

How to split a string and save the 2 characters that I split with?

I am trying to split a given string using the java split method while the string should be devided by two different characters (+ and -) and I am willing to save the characters inside the array aswell in the same index the string has been saven.
for example :
input : String s = "4x^2+3x-2"
output :
arr[0] = 4x^2
arr[1] = +3x
arr[2] = -2
I know how to get the + or - characters in a different index between the numbers but it is not helping me,
any suggestions please?
You can face this problem in many ways. I´m sure there are clever and fancy ways to split this expression. I will show you the simplest problem-solving process that can help you.
State the problem you need to solve, the input and output
Problem: Split a math expression into subexpressions at + and - signals
Input: 4x^2+3x-2
Output: 4x^2,+3x,-2
Create a pseudo code with some logic you might think works
Given an expression string
Create an empty list of expressions
Create a subExpression string
For each character in the expression
Check if the character is + ou - then
add the subExpression in the list and create a new empty subexpression
otherwise, append the character in the subExpression
In the end, add the left subexpression in the list
Implement the pseudo-code in the programming language of your choice
String expression = "4x^2+3x-2";
List<String> expressions = new ArrayList();
StringBuilder subExpression = new StringBuilder();
for (int i = 0; i < expression.length(); i++) {
char character = expression.charAt(i);
if (character == '-' || character == '+') {
expressions.add(subExpression.toString());
subExpression = new StringBuilder(String.valueOf(character));
} else {
subExpression.append(String.valueOf(character));
}
}
expressions.add(subExpression.toString());
System.out.println(expressions);
Output
[4x^2, +3x, -2]
You will end with one algorithm that works for your problem. You can start to improve it.
Try this code:
String s = "4x^2+3x-2";
s = s.replace("+", "#+");
s = s.replace("-", "#-");
String[] ss = s.split("#");
for (int i = 0; i < ss.length; i++) {
Log.e("XOP",ss[i]);
}
This code replaces + and - with #+ and #- respectively and then splits the string with #. That way the + and - operators are not lost in the result.
If you require # as input character then you can use any other Unicode character instead of #.
Try this one:
String s = "4x^2+3x-2";
String[] arr = s.split("[\\+-]");
for(int i=0;i<arr.length;i++){
System.out.println(arr[i]);
}
Personally I like it better to have positive matches of patterns, especially if the split pattern itself is empty.
So for instance you could use a Pattern and Matcher like this:
Pattern p = Pattern.compile("(^|[+-])([^+-]*)");
Matcher m = p.matcher("4x^2+3x-2");
while (m.find()) {
System.out.printf("%s or %s %s%n", m.group(), m.group(1), m.group(2));
}
This matches the start of the string or a plus or minus: ^|[+-], followed by any amount of characters that are not a plus or minus: [^+-]*.
Do note that the ^ first matches the start of the string, and is then used to negate a character class when used between brackets. Regular expressions are tricky like that.
Bonus: you can also use the two groups (within the parenthesis in the pattern) to match the operators - if any.
All this is presuming that you want to use/test regular expressions; generally things like this require a parser rather than a regular expression.
A one-liner for persons thinking that this is too complex:
var expressions = Pattern.compile("^|[+-][^+-]*")
.matcher("4x^2+3x-2")
.results()
.map(r -> r.group())
.collect(Collectors.toList());

Finding and retrieving consecutive matches

Say I want to match a string that should solely consist of parts adhering to a specific (regex) pattern and retrieve the elements in a loop. For this it seems that Matcher.find() was invented. However, find will match any string, not just one that is directly after the pattern, so intermediate characters are skipped.
So - for instance - I want to match \\p{Xdigit}{2} (two hexadecimal digits) in such a way that:
aabb matches;
_aabb doesn't match;
aa_bb doesn't match;
aabb_ doesn't match.
by using find (or any other iterated call to the regex) so I can directly process each byte in the array. So I want to process aa and bb separately, after matching.
OK, that's it, the most elegant way of doing this wins the accept.
Notes:
the hexadecimal parsing is just an example of a simple repeating pattern;
preferably I would like to keep the regex to the minimal required to match the element;
yes, I know about using (\\p{XDigit}{2})*, but I don't want to scan string twice (as it should be usable on huge input strings).
It appears you want to get all (multiple) matches that appear at the start of the string or right after a successful match. You may combine \G operator with a lookahead that will assure the string only matches some repeated pattern.
Use
(?:\G(?!^)|^(?=(?:\p{XDigit}{2})*$))\p{XDigit}{2}
See the regex demo
Details
(?: - start of a non-capturing group with 2 alternatives:
\G(?!^) - the end of the previous successful match
| - or
^(?=(?:\p{XDigit}{2})*$) - start of a string (^) that is followed with 0+ occurrences of \p{XDigit}{2} pattern up to the end of the string ($)
) - end of the non-capturing group
\p{XDigit}{2} - 2 hex chars.
Java demo:
String regex = "(?:\\G(?!^)|^(?=(?:[0-9a-fA-F]{2})*$))[0-9a-fA-F]{2}";
String[] strings = {"aabb","_aabb","aa_bb", "aabb_"};
Pattern pattern = Pattern.compile(regex);
for (String s : strings) {
System.out.println("Checking " + s);
Matcher matcher = pattern.matcher(s);
List<String> res = new ArrayList<>();
while (matcher.find()) {
res.add(matcher.group(0));
}
if (res.size() > 0) {
System.out.println(res);
} else {
System.out.println("No match!");
}
}
Output:
Checking aabb
[aa, bb]
Checking _aabb
No match!
Checking aa_bb
No match!
Checking aabb_
No match!
OK, I may finally have had a brainstorm: the idea is to remove the find() method out of the condition of the while loop. Instead I should simply keep a variable holding the location and only stop parsing when the whole string has been processed. The location can also be used to produce a more informative error message.
The location starts at zero and is updated to the end of the match. Each time a new match is found the start of the match is compared with the location, i.e. end of the last match. An error occurs if:
the pattern is not found;
the pattern is found, but not at the end of the last match.
Code:
private static byte[] parseHex(String hex){
byte[] bytes = new byte[hex.length() / 2];
int off = 0;
// the pattern is normally a constant
Pattern hexByte = Pattern.compile("\\p{XDigit}{2}");
Matcher hexByteMatcher = hexByte.matcher(hex);
int loc = 0;
// so here we would normally do the while (hexByteMatcher.find()) ...
while (loc < hex.length()) {
// optimization in case we have a maximum size of the pattern
hexByteMatcher.region(loc, loc + 2);
// instead we try and find the pattern, and produce an error if not found at the right location
if (!hexByteMatcher.find() || hexByteMatcher.start() != loc) {
// only a single throw, message includes location
throw new IllegalArgumentException("Hex string invalid at offset " + loc);
}
// the processing of the pattern, in this case a double hex digit representing a byte value
bytes[off++] = (byte) Integer.parseInt(hexByteMatcher.group(), 16);
// set the next location to the end of the match
loc = hexByteMatcher.end();
}
return bytes;
}
The method can be improved by adding \\G (end of last match) to the regex: \\G\\p{XDigit}{2}: this way the regular expression will fail immediately if the pattern cannot be found starting at the end of the last match or the start of the string).
For regular expressions with an expected maximum size (2 in this case) it is of course also possible to adjust the end of the region that needs to be matched.

Java: Stringtokenizer To Array

Given a polynomial, I'm attempting to write code to create a polynomial that goes by the degree's, and adds like terms together For instance... given
String term = "323x^3+2x+x-5x+5x^2" //Given
What I'd like = "323x^3+5x^2-2x" //result
So far I've tokenized the given polynomial by this...
term = term.replace("+" , "~+");
term = term.replace("-", "~-");
System.out.println(term);
StringTokenizer multiTokenizer = new StringTokenizer(term, "~");
int numberofTokens = multiTokenizer.countTokens();
String[] tokensArray = new String[numberofTokens];
int x=0;
while (multiTokenizer.hasMoreTokens())
{
System.out.println(multiTokenizer.nextToken());
}
Resulting in
323x^3~+2x~+x~-5x~+5x^2
323x^3
+2x
+x
-5x
+5x^2
How would I go about splitting the coefficient from the x value, saving each coefficient in an array, and then putting the degrees in a different array with the same index as it's coefficient? I will then use this algorithm to add like terms....
for (i=0;i<=biggest_Root; i++)
for(j=0; j<=items_in_list ; j++)
if (degree_array[j] = i)
total += b1[j];
array_of_totals[i] = total;
Any and all help is much appreciated!
You can also update the terms so they all have coefficients:
s/([+-])x/\11/g
So +x^2 becomes +1x^2.
Your individual coefficients can be pulled out by simple regex expressions.
Something like this should suffice:
/([+-]?\d+)x/ // match for x
/([+-]?\d+)x\^2/ // match for x^2
/([+-]?\d+)x\^3/ // match for x^3
/([+-]?\d+)x\^4/ // match for x^4
Then
sum_of_coefficient[degree] += match
where "match" is the parseInt of the the regex match (special case where coefficient is 1 and has no number eg. +x)
sum_of_coefficient[3] = 323
sum_of_coefficient[1] = +2+1-5 = -2
sum_of_coefficient[2] = 5
Using a "Regular Expression" Pattern to Simplify the Parsing
(and make the code cooler and more concise)
Here is a working example that parses coefficient, variable and degree for each term based on the terms you've parsed so far. It just inserted the terms shown into your example into a list of Strings and then processes each string the same way.
This program runs and produces output, and if you like it you can splice it into your program. To try it:
$ javac parse.java
$ java parse
Limitations and Potential Improvements:
Technically speaking the coefficient and degrees could be fractional, so the regular expression could easily be changed to handle those kinds of numbers. And then instead of Integer.parseInt() you could use Float.parseFloat() instead to convert the matched value to a variable you can use.
import java.util.*;
import java.util.regex.*;
public class parse {
public static void main(String args[]) {
/*
* Substitute this List with your own list or
* array from the code you've written already...
*
* vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv */
List<String>terms = new ArrayList<String>();
terms.add("323x^3");
terms.add("+2x");
terms.add("+x");
terms.add("-5x");
terms.add("+5x^2");
/* ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ */
for (String term : terms) {
System.out.print("Term: " + term + ": \n");
Pattern pattern = Pattern.compile("([+-]*\\d*)([A-Za-z]*)\\^*(\\d*)");
Matcher matcher = pattern.matcher(term);
if (matcher.find()) {
int coefficient = 1;
try {
coefficient = Integer.parseInt(matcher.group(1));
} catch (Exception e) {}
String variable = matcher.group(2);
int degree = 1;
try {
degree = Integer.parseInt(matcher.group(3));
} catch (Exception e) {}
System.out.println(" coefficient = " + coefficient);
System.out.println(" variable = " + variable);
System.out.println(" degree = " + degree);
/*
* Here, do what you need to do with
* variable, coefficient and degree
*/
}
}
}
}
Explanation of the Regular Expression in the Example Code:
This is the regular expression used:
([+-]*\\d*)([A-Za-z]*)\\^*(\\d*)
Each parenthesized section represents part of the term I want to match and extract into my result. It puts whatever is matched in a group corresponding to the set of parenthesis. First set of parenthesis goes into group 1, second into group 2, etc...
The first matcher (grouped by ( )), is ([+-]*\\d*)
That is designed match (e.g. extract) the coefficient (if any) and put it into group 1. It expects something that has zero or more occurances of '+' or '-' characters, followed by zero or more digits. I probably should have written in [+-]?\\d* which would match zero or one + or - characters.
The next grouped matcher is ([A-Za-z]*) That says match zero or more capital or lowercase letters.
That is trying to extract the variable name, if any and put it into group 2.
Following that, there is an ungrouped \\^*, which matches 0 or more ^ characters. It's not grouped in parenthesis, because we want to account for the ^ character in the text, but not stash it anywhere. We're really interested in the exponent number following it. Note: Two backslashes are how you make one backslash in a Java string. The real world regular expression we're trying to represent is \^*. The reason it's escaped here is because ^ unescaped has special meaning in regular expressions, but we just want to match/allow for the possibility of an actual caret ^ at that position in the algebraic term we're parsing.
The final pattern group is (\\d*). Outside of a string literal, as most regex's in the wild are, that would simply be \d*. It's escaped because, by default, in a regex, d, unescaped, means match a literal d at the current position in the text, but, escaped,\d is a special regex pattern that matches match any digit [0-9] (as the Pattern javadoc explains). * means expect (match) zero or more digits at that point. Alternatively, + would mean expect 1 or more digits in the text at the current position, and ? would mean 0 or 1 digits are expected in the text at the current position. So, essentially, the last group is designed to match and extract the exponent (if any) after the optional caret, putting that number into group 3.
Remember the ( ) (parenthesized) groupings are just so that we can extract those areas parsed into separate groups.
If this doesn't all make perfect sense, study regular expressions in general and read the Java Pattern class javadoc online. The are NOT as scary as they first look, and an extremely worthwhile study for any programmer ASAP, as it crosses most popular scripting languages and compilers, so learn it once and you have an extremely powerful tool for life.
This looks like a homework question, so I won't divulge the entire answer here but here's how I'd get started
public class Polynomial {
private String rawPolynomial;
private int lastTermIndex = 0;
private Map<Integer, Integer> terms = new HashMap<>();
public Polynomial(String poly) {
this.rawPolynomial = poly;
}
public void simplify() {
while(true){
String term = getNextTerm(rawPolynomial);
if ("".equalsIgnoreCase(term)) {
return;
}
Integer degree = getDegree(term);
Integer coeff = getCoefficient(term);
System.out.println(String.format("%dx^%d", coeff, degree));
terms.merge(degree, coeff, Integer::sum);
}
}
private String getNextTerm(String poly) {
...
}
private Integer getDegree(String poly) {
...
}
private Integer getCoefficient(String poly) {
...
}
#Override public String toString() {
return terms.toString();
}
}
and some tests to get you started -
public class PolynomialTest {
#Test public void oneTermPolynomialRemainsUnchanged() {
Polynomial poly = new Polynomial("3x^2");
poly.simplify();
assertTrue("3x^2".equalsIgnoreCase(poly.toString()));
}
}
You should be able to fill in the blanks, hope this helps. I'll be happy to help you further if you're stuck somewhere.

Pattern Matcher Vs String Split, which should I use?

First time posting.
Firstly I know how to use both Pattern Matcher & String Split.
My questions is which is best for me to use in my example and why?
Or suggestions for better alternatives.
Task:
I need to extract an unknown NOUN between two known regexp in an unknown string.
My Solution:
get the Start and End of the noun (from Regexp 1&2) and substring to extract the noun.
String line = "unknownXoooXNOUNXccccccXunknown";
int goal = 12 ;
String regexp1 = "Xo+X";
String regexp2 = "Xc+X";
I need to locate the index position AFTER the first regex.
I need to locate the index position BEFORE the second regex.
A) I can use pattern matcher
Pattern p = Pattern.compile(regexp1);
Matcher m = p.matcher(line);
if (m.find()) {
int afterRegex1 = m.end();
} else {
throw new IllegalArgumentException();
//TODO Exception Management;
}
B) I can use String Split
String[] split = line.split(regex1,2);
if (split.length != 2) {
throw new UnsupportedOperationException();
//TODO Exception Management;
}
int afterRegex1 = line.indexOf(split[1]);
Which Approach should I use and why?
I don't know which is more efficient on time and memory.
Both are near enough as readable to myself.
I'd do it like this:
String line = "unknownXoooXNOUNXccccccXunknown";
String regex = "Xo+X(.*?)Xc+X";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
if (m.find()) {
String noun = m.group(1);
}
The (.*?) is used to make the inner match on the NOUN reluctant. This protects us from a case where our ending pattern appears again in the unknown portion of the string.
EDIT
This works because the (.*?) defines a capture group. There's only one such group defined in the pattern, so it gets index 1 (the parameter to m.group(1)). These groups are indexed from left to right starting at 1. If the pattern were defined like this
String regex = "(Xo+X)(.*?)(Xc+X)";
Then there would be three capture groups, such that
m.group(1); // yields "XoooX"
m.group(2); // yields "NOUN"
m.group(3); // yields "XccccccX"
There is a group 0, but that matches the whole pattern, and it's equivalent to this
m.group(); // yields "XoooXNOUNXccccccX"
For more information about what you can do with the Matcher, including ways to get the start and end positions of your pattern within the source string, see the Matcher JavaDocs
You should use String.split() for readability unless you're in a tight loop.
Per split()'s javadoc, split() does the equivalent of Pattern.compile(), which you can optimize away if you're in a tight loop.
It looks like you want to get a unique occurrence. For this do simply
input.replaceAll(".*Xo+X(.*)Xc+X.*", "$1")
For efficiency, use Pattern.matcher(input).replaceAll instead.
In case you input contains line breaks, use Pattern.DOTALL or the s modifier.
In case you want to use split, consider using Guava's Splitter. It behaves more sane and also accepts a Pattern which is good for speed.
If you really need the locations you can do it like this:
String line = "unknownXoooXNOUNXccccccXunknown";
String regexp1 = "Xo+X";
String regexp2 = "Xc+X";
Matcher m=Pattern.compile(regexp1).matcher(line);
if(m.find())
{
int start=m.end();
if(m.usePattern(Pattern.compile(regexp2)).find())
{
final int end = m.start();
System.out.println("from "+start+" to "+end+" is "+line.substring(start, end));
}
}
But if you just need the word in between, I recommend the way Ian McLaird has shown.

Iterating through String with .find() in Java regex

I'm currently trying to solve a problem from codingbat.com with regular expressions.
I'm new to this, so step-by-step explanations would be appreciated. I could solve this with String methods relatively easily, but I am trying to use regular expressions.
Here is the prompt:
Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"
etc
My code thus far:
String regex = ".?" + word+ ".?";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
String newStr = "";
while(m.find())
newStr += m.group().replace(word, "");
return newStr;
The problem is that when there are multiple instances of word in a row, the program misses the character preceding the word because m.find() progresses beyond it.
For example: wordEnds("abc1xyz1i1j", "1") should return "cxziij", but my method returns "cxzij", not repeating the "i"
I would appreciate a non-messy solution with an explanation I can apply to other general regex problems.
This is a one-liner solution:
String wordEnds = input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
This matches your edge case as a look ahead within a non-capturing group, then matches the usual (consuming) case.
Note that your requirements don't require iteration, only your question title assumes it's necessary, which it isn't.
Note also that to be absolutely safe, you should escape all characters in word in case any of them are special "regex" characters, so if you can't guarantee that, you need to use Pattern.quote(word) instead of word.
Here's a test of the usual case and the edge case, showing it works:
public static String wordEnds(String input, String word) {
word = Pattern.quote(word); // add this line to be 100% safe
return input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
}
public static void main(String[] args) {
System.out.println(wordEnds("abcXY123XYijk", "XY"));
System.out.println(wordEnds("abc1xyz1i1j", "1"));
}
Output:
c13i
cxziij
Use positive lookbehind and postive lookahead which are zero-width assertions
(?<=(.)|^)1(?=(.)|$)
^ ^ ^-looks for a character after 1 and captures it in group2
| |->matches 1..you can replace it with any word
|
|->looks for a character just before 1 and captures it in group 1..this is zero width assertion that doesn't move forward to match.it is just a test and thus allow us to capture the values
$1 and $2 contains your value..Go on finding till the end
So this should be like
String s1 = "abcXY123XYiXYjk";
String s2 = java.util.regex.Pattern.quote("XY");
String s3 = "";
String r = "(?<=(.)|^)"+s2+"(?=(.)|$)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s1);
while(m.find()) s3 += m.group(1)+m.group(2);
//s3 now contains c13iij
works here
Use regex as follows:
Matcher m = Pattern.compile("(.|)" + Pattern.quote(b) + "(?=(.?))").matcher(a);
for (int i = 1; m.find(); c += m.group(1) + m.group(2), i++);
Check this demo.

Categories

Resources