i need to get a regex that can match something like this :
1234 <CIRCLE> 12 12 12 </CIRCLE>
1234 <RECTANGLE> 12 12 12 12 </RECTANGLE>
i've come around to write this regex :
(\\d+?) <([A-Z]+?)> (\\d+?) (\\d+?) (\\d+?) (\\d*)? (</[A-Z]+?>)
It works fine for when i'm trying to match the rectangle, but it doesn't work for the circle
the problem is my fifth group is not capturing though it should be ??
Try
(\\d+?) <([A-Z]+?)> (\\d+?) (\\d+?) (\\d+?) (\\d+ )?(</[A-Z]+?>)
(I changed the last "\d" group to make the space optional too.)
That is because only (\\d*)? part is optional, but spaces before and after it are mandatory, so you end up requiring two spaces at end, if last (\\d*) would not be found. Try maybe with something like
(\\d+?) <([A-Z]+?)> (:?(\\d+?) ){3,4}(</[A-Z]+?>)
Oh, and if you want to make sure that closing tag is same as opening one you can use group references like \\1 will represent match from first group. So maybe update your regex to something like
(\\d+?) <([A-Z]+?)> (:?(\\d+?) ){3,4}(</\\2>)
// ^^^^^^^-----------------------^^^
// group 2 here value need to match one from group 2
Solution for just the numbers:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.annotation.Nonnull;
public class Q26005150
{
private static final Pattern P = Pattern.compile("(\\d+)");
public static void main(String[] args)
{
final String s1 = "1234 <CIRCLE> 12 12 12 </CIRCLE>";
final String s2 = "1234 <RECTANGLE> 12 12 12 12 </RECTANGLE>";
final List<Integer> l1 = getAllMatches(s1);
final List<Integer> l2 = getAllMatches(s2);
System.out.println("l1 = " + l1);
System.out.println("l2 = " + l2);
}
private static List<Integer> getAllMatches(#Nonnull final String s)
{
final Matcher m = P.matcher(s);
final List<Integer> matches = new ArrayList<Integer>();
while(m.find())
{
matches.add(Integer.valueOf(m.group(1)));
}
return matches;
}
}
Outputs:
l1 = [1234, 12, 12, 12]
l2 = [1234, 12, 12, 12, 12]
Answer on GitHub
Stackoverflow GitHub repository
Solution for the Numbers and the Tags
private static final Pattern P = Pattern.compile("(<\/?(\w+)>|(\d+))");
public static void main(String[] args)
{
final String s1 = "1234 <CIRCLE> 12 12 12 </CIRCLE>";
final String s2 = "1234 <RECTANGLE> 12 12 12 12 </RECTANGLE>";
final List<String> l1 = getAllMatches(s1);
final List<String> l2 = getAllMatches(s2);
System.out.println("l1 = " + l1);
System.out.println("l2 = " + l2);
}
private static List<String> getAllMatches(#Nonnull final String s)
{
final Matcher m = P.matcher(s);
final List<String> matches = new ArrayList<String>();
while(m.find())
{
final String match = m.group(1);
matches.add(match);
}
return matches;
}
Outputs:
l1 = [1234, <CIRCLE>, 12, 12, 12, </CIRCLE>]
l2 = [1234, <RECTANGLE>, 12, 12, 12, 12, </RECTANGLE>]
Answer on GitHub
Stackoverflow GitHub repository
assuming the labels between "<" & ">" has to match and the numbers in between are identical
use this pattern
^\d+\s<([A-Z]+)>\s(\d+\s)(\2)+<\/(\1)>$
Demo
or if numbers in the middle do not have to be identical and or optional:
^\d+\s<([A-Z]+)>\s(\d+\s)*<\/(\1)>$
Related
I need your help for one tricky problem.
I have one map like below
Map<Integer, String> ruleMap = new HashMap<>();
ruleMap.put(1, "A");
ruleMap.put(12, "Test - 12");
ruleMap.put(1012, "Test Metadata 12 A");
and I have one rule in which I am separating ids and collecting them into a list. By iterating over the list, I am replacing its ids by its respective value from map.
code :
String rule = "1 AND 12 OR 1012 AND 12 or 1012";
List < String > idList = new ArrayList < > ();
Pattern p = Pattern.compile("[0-9]+");
Matcher m = p.matcher(rule);
while (m.find()) {
idList.add(m.group());
}
for (String id: idList) {
if (ObjectUtils.isNonEmpty(ruleMap.get(Integer.parseInt(id)))) {
rule = rule.replaceFirst(id, ruleMap.get(Integer.parseInt(id)));
}
}
System.out.println(rule);
I am getting output like this -
A AND Test - Test - 12 OR Test Metadata 12 A AND 12 or Test Metadata 12 A
as you can see for the first iteration of id 12 replaced its respected value but on the second occurrence of the id 12 replaced the value Test - 12 to Test - Test - 12.
So can anyone help me with this?
You need to be doing regex replacement with proper word boundaries. I suggest using a regex iterator (as you were already doing), but use the pattern \b\d+\b to match only full number strings. Then, at each match, do a lookup in the rule map, if the number be present, to find a replacement value. Use Matcher#appendReplacement to build out the replacement string as you go along.
Map<String, String> ruleMap = new HashMap<>();
ruleMap.put("1", "A");
ruleMap.put("12", "Test - 12");
ruleMap.put("1012", "Test Metadata 12 A");
String rule = "1 AND 12 OR 1012 AND 12 or 1012";
Pattern pattern = Pattern.compile("\\b\\d+\\b");
Matcher m = pattern.matcher(rule);
StringBuffer buffer = new StringBuffer();
while(m.find()) {
if (ruleMap.containsKey(m.group(0))) {
m.appendReplacement(buffer, ruleMap.get(m.group(0)));
}
}
m.appendTail(buffer);
System.out.println(rule + "\n" + buffer.toString());
This prints:
1 AND 12 OR 1012 AND 12 or 1012
A AND Test - 12 OR Test Metadata 12 A AND Test - 12 or Test Metadata 12 A
If you happen to use Java 9 or higher you could also use Matcher#replaceAll:
Map<Integer, String> ruleMap = new HashMap<>();
ruleMap.put(1, "A");
ruleMap.put(12, "Test - 12");
ruleMap.put(1012, "Test Metadata 12 A");
String rule = "1 AND 12 OR 1012 AND 12 or 1012";
rule = Pattern.compile("\\b\\d+\\b")
.matcher(rule)
.replaceAll(m -> ruleMap.getOrDefault(
Integer.parseInt(m.group()), m.group()));
System.out.println(rule);
I want to split this string (1,0) and get result 1 and 0 i have tried this code:
String str ="(1,0)";
String parts[]= str.split("(,)");
System.out.println(parts[0]);
System.out.println(parts[1]);
But i got this :
(1
0)
Here's an efficient way you can isolate all your digits using the Regex Tools and put them into an ArrayList for easy usage. It doesn't use the .split() method, but it is efficient.
import java.util.ArrayList;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public static void main(String[] args) {
String str = "(1,0)";
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(str);
ArrayList<Integer> vals = new ArrayList<>();
while(m.find())
vals.add(Integer.parseInt(m.group()));
System.out.println(vals);
}
A simple solution would be as follows:
public class Main {
public static void main(String[] args) {
String str = "(1,0)";
String parts[] = str.replace("(", "").replace(")", "").split(",");
System.out.println(parts[0]);
System.out.println(parts[1]);
}
}
Output:
1
0
If you split on (, the first value in the returned array will be "".
I'd recommend using regex directly, e.g.
String input = "(1,0)";
Matcher m = Pattern.compile("\\(([^,\\)]+),([^,\\)]+)\\)").matcher(input);
if (! m.matches())
throw new IllegalArgumentException("Invalid input: " + input);
System.out.println(m.group(1));
System.out.println(m.group(2));
Of course, if you insist on using split(), it can be done like this:
String input = "(1,0)";
String[] parts = input.split("[(,)]");
if (parts.length != 3 || ! parts[0].isEmpty())
throw new IllegalArgumentException("Invalid input: " + input);
System.out.println(parts[1]);
System.out.println(parts[2]);
If you know how to use regex, go for that. (personally I prefer to use string manipulation here because it's really easier) If not, learn how to use it or do something like this:
String input = "(64,128)";
String[] numbers = input.substring(1, input.length() - 1).split(",");
Try this, assuming your format is consistent.
String str = "(1,0)";
String[] tokens = str.substring(1,str.length()-1).split(",");
System.out.println(Arrays.toString(tokens));
Prints
[1, 0]
or if printed separately
1
0
I have this string:
values="[72, 216, 930],[250],[72],[228, 1539],[12]";
am trying to combine two patterns in order to get the last number in first [] type and the number in the second [] type.
pattern="\\, ([0-9]+)\\]|\\[([0-9]+)\\]"
But it outputs null:
930, null, null, 1539, null
How do I solve this problem?
Here, we might not want to bound it from the left, and simply use the ] from right, then we swipe to left and collect our digits, maybe similar to this expression:
([0-9]+)\]
Graph
This graph shows how it would work:
If you like, we can also bound it from the left, similar to this expression:
([\[\s,])([0-9]+)(\])
Graph
This graph shows how the second one would work:
Try this.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = ", ([0-9]+)]";
final String string = "[72, 216, 930],[250],[72],[228, 1539],[12]";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
Output:
Full match: , 930]
Group 1: 930
Full match: , 1539]
Group 1: 1539
package Sample;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class StackOverFlow{
final static String regex = "\\d*]";
final static String string = "[72, 216, 930],[250],[72],[228, 1539],[12]";
final static Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final static Matcher matcher = pattern.matcher(string);
public static void main(String[] args) {
while (matcher.find()) {
String val = matcher.group(0).replace("]", "");
System.out.println(val);
}
}
}
output
930
250
72
1539
12
To make sure that the data is actually in between square brackets, you could use a capturing group, start the match with [ and end the match with ]
\[(?:\d+,\h+)*(\d+)]
In Java
\\[(?:\\d+,\\h+)*(\\d+)]
\[ Match [
(?:\d+,\h+)* Repeat 0+ times matching 1+ digit, comma and 1+ horizontal whitespace chars
(\d+) Capture in group 1 matching 1+ digit
] Match closing square bracket
Regex demo | Java demo
For example:
String regex = "\\[(?:\\d+,\\h+)*(\\d+)]";
String string = "[72, 216, 930],[250],[72],[228, 1539],[12]";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Result:
930
250
72
1539
12
It seems like for a structure such as this, it's likely beneficial to parse the whole thing into memory, then index into the elements you're particularly interested in to your heart's content. Should the structure change unexpectedly/dynamically, you won't need to rewrite your regex, just index as needed as many times as you wish:
import java.util.*;
class Main {
public static void main(String[] args) {
String values = "[72, 216, 930],[250],[72],[228, 1539],[12]";
String[] data = values.substring(1, values.length() - 1).split("\\]\\s*,\\s*\\[");
ArrayList<String[]> result = new ArrayList<>();
for (String d : data) {
result.add(d.split("\\s*,\\s*"));
}
System.out.println(result.get(0)[result.get(0).length-1]); // => 930
System.out.println(result.get(1)[0]); // => 250
}
}
First time I use Regex statement.
I have java regex statement, which split String by pattern with list of some characters.
String line = "F01T8B02S00003H04Z05C0.12500";
Pattern pattern = Pattern.compile("([BCFHSTZ])");
String[] commands = pattern.split(line);
for (String command : commands) {
System.out.print(command);
}
output of above code is like (018020000304050.12500)
Actually I want output like this, ("F", "01", "T", "8", "B", "02", "S", "00003", "H", "04", "Z", "05", "C", "0.12500").
Means desired output is contains pattern character and split value both.
Can you please suggest me?
I've created a sample, try it and let me know if it's what you want.
public class Main {
public static void main(String[] args) {
String line = "F01T8B02S00003H04Z05C0.12500";
String pattern = "([A-Z][a-z]*)(((?=[A-Z][a-z]*|$))|\\d+(\\.\\d+)?)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
HashMap<String, String> mHash = new LinkedHashMap<>();
while (m.find()) {
mHash.put(m.group(1), m.group(2));
}
System.out.println(mHash.toString());
}
}
Output is:
F 01
T 8
B 02
S 00003
H 04
Z 05
C 0.12500
Edit with LinkedHashMap
public class Main {
public static void main(String[] args) {
String line = "F01T8B02S00003H04Z05C0.12500";
String pattern = "([A-Z][a-z]*)(((?=[A-Z][a-z]*|$))|\\d+(\\.\\d+)?)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
HashMap<String, String> mHash = new LinkedHashMap<>();
while (m.find()) {
mHash.put(m.group(1), m.group(2));
}
System.out.println(mHash.toString());
}
}
Output is :
{F=01, T=8, B=02, S=00003, H=04, Z=05, C=0.12500}
You can use a String#split on [A-Z] which keeps the delimiter as separated item:
String line = "F01T8B02S00003H04Z05C0.12500";
String[] result = line.split("((?<=[A-Z])|(?=[A-Z]))");
System.out.println(java.util.Arrays.toString(result));
Which will result in the String-array:
[F, 01, T, 8, B, 02, S, 00003, H, 04, Z, 05, C, 0.12500]
Try it online.
Above all answer were correct.
I got solution by below code also.
String line = "F01T8B02S00003H04Z05C0.12500A03";
String[] commands = line.split("(?<=[BCFHSTZ])|(?=[BCFHSTZ])");
for (String str: commands) {
System.out.print(str);
}
Thanks all for help.
I have a file which contains the following string data
...
v -0.570000 -0.950000 -0.100000
v 0.570000 -0.950000 -0.100000
v -0.570000 -0.760000 -0.100000
v 0.570000 -0.760000 -0.100000
...
f 34 22
f 3 35 3
f 345 22
f 55 632 76
f 55 632
....
From this file I want to extract all the numbers from the lines starting with 'v' and 'f'. I have written the following regex for it.
v(?:\s([0-9\-\.]+))+
Output:
group 1: -0.100000
f(?:\s([0-9]+))+
Output:
group 1: 22
But as you can see the output is only extracting the last numbers from each line, I want the output as follows:
Output:
group 1: -0.570000
group 2: -0.950000
group 3: -0.100000
Output:
group 1: 34
group 2: 22
Can someone please help me here?
Here is the solution which I used finally, in case someone needs it.
I used two Regex instead of using string splitting as suggested in the above comments.
public class ObjParser {
private static final Pattern vertexLinePattern = Pattern.compile("^v\\s(.+)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
private static final Pattern faceLinePattern = Pattern.compile("^f\\s(.+)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
private static final Pattern vertexValuePattern = Pattern.compile("([0-9\\-\\.]+)");
private static final Pattern faceValuePattern = Pattern.compile("([0-9]+)");
private List<Float> vertices = new LinkedList<Float>();
private List<Short> faces = new LinkedList<Short>();
public void parseVertices(String data) {
Matcher matcher1 = vertexLinePattern.matcher(data);
while(matcher1.find()) {
String line = matcher1.group(1);
Matcher matcher2 = vertexValuePattern.matcher(line);
while(matcher2.find()) {
vertices.add(Float.parseFloat(matcher2.group(1)));
}
}
}
public void parseFaces(String data) {
Matcher matcher1 = faceLinePattern.matcher(data);
while(matcher1.find()) {
String line = matcher1.group(1);
Matcher matcher2 = faceValuePattern.matcher(line);
while(matcher2.find()) {
short no = (short)(Integer.parseInt(matcher2.group(1)) - 1); // -1 due to 1 based index in OBJ files
faces.add(no);
}
}
}
public List<Float> getVertices() {
return vertices;
}
public List<Short> getFaces() {
return faces;
}
}