I have string p[name]=[1111];[2222] and i need to take from it 3 parts p[name]=, [1111] and [2222]. String can be different like p[name]=[1111] or p[name]=[1111];[2222,[1,2,3],1];[3333]
I'm trying to use regex for it, but can't find working solution.
My regex is
(p\\[[a-zA-Z0-9]+\\]=)(?:(\\[.[^;]+\\]);?)+
When i run this code i have only two groups
Pattern p = Pattern.compile("(p\\[[a-zA-Z0-9^=]+\\]=)(?:;*(\\[.[^;]+\\]))+");
Matcher m = p.matcher("p[name]=[1111];[2222]");
if (m.find()) {
for(int i = 1, l = m.groupCount(); i <= l; ++i) {
System.out.println(m.group(i));
}
}
Result is
p[name]=
[2222]
Why not simply do this?
Pattern p = Pattern.compile("p\\[[a-z0-9]+]=|\\[[0-9]+]", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("p[name]=[1111];[2222]");
while(m.find()) {
System.out.println(m.group());
}
However, if you want to check the string structure at the same time, you can use this kind of pattern:
Pattern p = Pattern.compile("(p\\[[a-z0-9]+]=)|\\G(?<!^)(\\[[0-9]+])(?:;|$)", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("p[name]=[1111];[2222]");
while(m.find()) {
System.out.println((m.group(1))? m.group(1) : m.group(2));
}
I can give you this regex. That should also work with this: p[name]=[1111];[2222,[1,2,3],1]
Pattern p = Pattern.compile("([p]+)|\[[a-z0-9\,\[\]]+\]");
Matcher m = p.matcher("p[name]=[1111];[2222]");
if (m.find()) {
for(int i = 1, l = m.groupCount(); i <= l; ++i) {
System.out.println(m.group(i));
}
}
Related
m.find() returns false when it should return true.
solrQueries[i] contains the string:
"fl=trending:0,id,business_attr,newarrivals:0,bestprice:0,score,mostviewed:0,primarySortOrder,fastselling:0,modelNumber&defType=pedismax&pf=&mm=2<70%&bgids=1524&bgboost=0.1&shards.tolerant=true&stats=true"
The code is:
Pattern p = Pattern.compile("&mm=(\\d+)&");
for(int i=0; i<solrQueries.length; i++) {
Matcher m = p.matcher(solrQueries[i].toLowerCase());
System.out.println(p.matcher(solrQueries[i].toLowerCase()));
if (m.find()) {
System.out.println(m.group(1));
mmValues[i] = m.group(1);
}
Oh,
Pattern p = Pattern.compile("(?i)&mm=(\d+)");
works fine now.
Thank you, #Wiktor Stribiżew
You executed m.find() twice (first, in System.out.println(m.find()); and then in if (m.find())). And since there is only 1 match - even if the regex matches - you would get nothing after the second run.
Use
public String[] fetchMmValue(String[] solrQueries) {
String[] mmValues = new String[solrQueries.length];
Pattern p = Pattern.compile("(?i)&mm=(\\d+)");
for(int i=0; i<solrQueries.length; i++) {
Matcher m = p.matcher(solrQueries[i]);
if (m.find()) {
// System.out.println(m.group(1)); // this is just for debugging
mmValues[i] = m.group(1);
}
return mmValues;
}
If you want to get all chars other than & after &mm=, use another regex:
"&mm=([^&]+)"
where [^&]+ matches 1 or more chars other than &.
I am using StringUtils (import org.apache.commons.lang3.StringUtils;) library to split string like:
String str = "ZXCVFMS2ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEFZXCVFMS3ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEFZXCVFMERRORDEF";
I need to take out string start with zxcv* and end with *def as
String tmp1 = "ZXCVFMS2ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEF";
String tmp2 = "ZXCVFMS3ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEF";
any help?
Solution thanks to #assylias :
Pattern p = Pattern.compile("ZXCV.*?DEF");
Matcher m = p.matcher(str);
List<String> result = new ArrayList<> ();
while (m.find()) {
result.add(m.group());
}
How about using replaceAll?
String tmp = str.replaceAll(".*(zxcv.*def).*", "$1"); //zxcvVariableCanChancedef
UPDATE following your edit
if you have a repeating pattern, you could use a Matcher - to avoid matching the whole string use the ? quantifier to make the match lazy.
Pattern p = Pattern.compile("zxcv.*?def");
String input = "15684zxcvVariableCanChancedefABCDEND15684zxcvVariableCanChancedefABCDEND";
Matcher m = p.matcher(input);
List<String> result = new ArrayList<> ();
while (m.find()) {
result.add(m.group());
}
This can be done without any additional libraries using core java.util.regex functionality. For example:
String str = "15684zxcvVariableCanChancedefABCDEND";
Pattern pattern = Pattern.compile(".*(zxcv.*def).*");
Matcher matcher = pattern.matcher(str);
if (matcher.matches()) {
System.out.println(matcher.group(1)); // ==> zxcvVariableCanChancedef
}
String line = "15684zxcvAAAAAAAncedefABCDEND15684zxcvBBBBBBBBBBdefABCDEND";
Last occurrence :
Matcher matcher = Pattern.compile(".*(zxcv.*def).*").matcher(line);
String tmp = matcher.find() ? matcher.group(1) : null;
System.out.println(tmp);
First occurence :
Matcher matcher = Pattern.compile(".*?(zxcv.*?def).*").matcher(line);
Biggest occurence (from first zxcv to last def) :
Matcher matcher = Pattern.compile(".*?(zxcv.*def).*").matcher(line);
All occurrences
Matcher matcher = Pattern.compile(".*?(zxcv.*?def)").matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
I am not sure about it because I wrote it using a text document, I don't have any java IDE in this computer. I hope it helps
public String XXX()
{
int firstStorage = 0;
int secondStorage = 0;
for (int i = 0 ; i < tmp.lenght() < i++)
{
if( tmp.substring(i,i+4).equals("zxcv"))
{
firstStorage = i;
break;
}
}
for (int i = firstStorage ; i < tmp.lenght() < i++)
{
if( tmp.substring(i,i+3).equals("def"))
{
secondStorage = i + 2;
break;
}
}
return tmp.substring(firstStorage, secondStorage + 1);
}
Let me know if it is working or not. Have a nice day !!
String str = "15684zxcvVariableCanChancedefABCDEND15684zxcvVariableCanChancedefABCDEND";
List<string> strList = new List<string>();
while (str.IndexOf("zxc") >= 0 && str.IndexOf("def") >= 0)
{
var startIndex = str.IndexOf("zxc");
var stopIndex = str.IndexOf("def");
var item = str.Substring(startIndex, stopIndex - startIndex + 3);
strList.Add(item);
str = str.Substring(0, startIndex) + str.Substring(stopIndex+3);
}
I have a string that comes out like this: 1.[Aagaard,Lindsay][SeniorPolicyAdvisor][TREASURYBOARDSECRETARIAT][DEPUTYPREMIERANDPRESIDENTOFTHETREASURYBOARD,Toronto][416-327-0948][lindsay.aagaard#ontario.ca]2.[Aalto,Margaret][ProbationOfficer][CHILDRENANDYOUTHSERVICES][THUNDERBAY,ThunderBay][807-475-1310][margaret.aalto#ontario.ca]
I want to split it into an arraylist like this:
1.
Aagaard,Lindsay
SeniorPolicyAdvisor
etc.
Any suggestions?
I read the JavaDoc and used Pattern and Matcher like so:
Pattern p = Pattern.compile("\\[(.*?)\\]");
Matcher m = p.matcher(tableContent);
while(m.find()) {
System.out.println(m.group(1));
}
First delete the first and the last brackets and then split by '][':
String arr = "[Aalto,Margaret][ProbationOfficer][CHILDRENANDYOUTHSERVICES]";
String[] items = arr.substring(1, arr.length() - 1).split("][");
Simply this:
String[] list = str.split("\\[");
for(int i = 0 ; i < list.length ; i++) {
list[i] = list[i].replace("\\]", "");
}
I would like to match the smallest sub string that starts with d and ends with a and contains o.
Example : "djswxaeqobdnoa" => "dnoa"
With this code :
Pattern pattern = Pattern.compile("d.*?o.*?a");
Matcher matcher = pattern.matcher("fondjswxaeqobdnoajezbpfrehanxi");
while (matcher.find()) {
System.out.println(matcher.group());
}
The entire input string "djswxaeqobdnoa" printed instead of just "dnoa". Why ? How can I match the smallest ?
Here a solution :
String shortest = null;
Pattern pattern = Pattern.compile("(?=(d.*?o.*?a))");
Matcher matcher = pattern.matcher("ondjswxaeqobdnoajezbpfrehanxi");
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
if (shortest == null || matcher.group(i).length() < shortest.length()) {
shortest = matcher.group(i);
}
}
}
djswxaeqobdnoa
d....*..o..*.a
That's one match of your regular expression consuming the full String.
You are matching the whole String, hence the whole String is returned by your group invocation.
If you want specific matches of each segment of your Pattern, you only need to group those segments.
For instance:
Pattern pattern = Pattern.compile("(d.*?)(o.*?)a");
Matcher matcher = pattern.matcher("djswxaeqobdnoa");
while (matcher.find()) {
System.out.println(matcher.group());
// specific groups are 1-indexed
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Output
djswxaeqobdnoa
djswxaeq
obdno
Your regex is d.*?o.*?a and the string you want to compare is djswxaeqobdnoa.
starts with letter d and match the shortest possiblity in which the next character would be o. So it matches from d to first o.Because of nongreedyness .*? again it matches the shortest possiblity from o to the next shortest a. Thus it matches the whole string.
Thanks with this code it works :
String shortest = null;
Pattern pattern = Pattern.compile("(?=(d.*?o.*?a))");
Matcher matcher = pattern.matcher("ondjswxaeqobdnoajezbpfrehanxi");
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
if (shortest == null || matcher.group(i).length() < shortest.length()) {
shortest = matcher.group(i);
}
}
Which is better ?
Let's say I have a string which contains this:
HelloxxxHelloxxxHello
I compile a pattern to look for 'Hello'
Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher("HelloxxxHelloxxxHello");
It should find three matches. How can I get a count of how many matches there were?
I've tried various loops and using the matcher.groupCount() but it didn't work.
matcher.find() does not find all matches, only the next match.
Solution for Java 9+
long matches = matcher.results().count();
Solution for Java 8 and older
You'll have to do the following. (Starting from Java 9, there is a nicer solution)
int count = 0;
while (matcher.find())
count++;
Btw, matcher.groupCount() is something completely different.
Complete example:
import java.util.regex.*;
class Test {
public static void main(String[] args) {
String hello = "HelloxxxHelloxxxHello";
Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher(hello);
int count = 0;
while (matcher.find())
count++;
System.out.println(count); // prints 3
}
}
Handling overlapping matches
When counting matches of aa in aaaa the above snippet will give you 2.
aaaa
aa
aa
To get 3 matches, i.e. this behavior:
aaaa
aa
aa
aa
You have to search for a match at index <start of last match> + 1 as follows:
String hello = "aaaa";
Pattern pattern = Pattern.compile("aa");
Matcher matcher = pattern.matcher(hello);
int count = 0;
int i = 0;
while (matcher.find(i)) {
count++;
i = matcher.start() + 1;
}
System.out.println(count); // prints 3
This should work for matches that might overlap:
public static void main(String[] args) {
String input = "aaaaaaaa";
String regex = "aa";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
int from = 0;
int count = 0;
while(matcher.find(from)) {
count++;
from = matcher.start() + 1;
}
System.out.println(count);
}
From Java 9, you can use the stream provided by Matcher.results()
long matches = matcher.results().count();
If you want to use Java 8 streams and are allergic to while loops, you could try this:
public static int countPattern(String references, Pattern referencePattern) {
Matcher matcher = referencePattern.matcher(references);
return Stream.iterate(0, i -> i + 1)
.filter(i -> !matcher.find())
.findFirst()
.get();
}
Disclaimer: this only works for disjoint matches.
Example:
public static void main(String[] args) throws ParseException {
Pattern referencePattern = Pattern.compile("PASSENGER:\\d+");
System.out.println(countPattern("[ \"PASSENGER:1\", \"PASSENGER:2\", \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\", \"PASSENGER:1\" ]", referencePattern));
System.out.println(countPattern("[ ]", referencePattern));
}
This prints out:
2
0
1
0
This is a solution for disjoint matches with streams:
public static int countPattern(String references, Pattern referencePattern) {
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
new Iterator<Integer>() {
Matcher matcher = referencePattern.matcher(references);
int from = 0;
#Override
public boolean hasNext() {
return matcher.find(from);
}
#Override
public Integer next() {
from = matcher.start() + 1;
return 1;
}
},
Spliterator.IMMUTABLE), false).reduce(0, (a, c) -> a + c);
}
Use the below code to find the count of number of matches that the regex finds in your input
Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);// "regex" here indicates your predefined regex.
Matcher m = p.matcher(pattern); // "pattern" indicates your string to match the pattern against with
boolean b = m.matches();
if(b)
count++;
while (m.find())
count++;
This is a generalized code not specific one though, tailor it to suit your need
Please feel free to correct me if there is any mistake.