I want to parse a range of data (e.g. 100-2000) in Java. Is this code correct:
String patternStr = "^(\\\\d+)-(\\\\d+)$";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
if(matcher.find()){
// Doing some parser
}
Too many backslashes, and you can use matches() without anchors (^$).
String inputStr = "100-2000";
String patternStr = "(\\d+)-(\\d+)";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
if (matcher.matches()) {
System.out.println(matcher.group(1) + " - " + matcher.group(2));
}
As for your question "Is this code correct", all you had to do was wrap the code in a class with a main method and run it, and you'd get the answer: No.
No, you're double (well, quadruple)-escaping the digits.
It should be: "^(\\d+)-(\\d+)$".
Meaning:
Start of input: ^
Group 1: 1+ digit(s): (\\d+)
Hyphen literal: -
Group 2: 1+ digit(s): (\\d+)
End of input: $
Notes
The groups are useful for back-references. Here you're using none, so you can ditch the parenthesis around the \\d+ expressions.
You are parsing the representation of a range in this example.
If you want an actual range class, you can use the [min-max] idiom, where "min" and "max" are numbers, for instance [0-9].
As mentioned by Andreas, you can use String.matches without the Pattern-Matcher idiom and the ^ and $, if you want to match the whole input.
Related
I need write a pattern to remove currency symbol and comma. eg Fr.-145,000.01
After the pattern matcher should return -145000.01.
The pattern i am using:
^[^0-9\\-]*([0-9\\-\\.\\,]*?)[^0-9\\-]*$
This will return -145,000.01
Then I remove the comma to get -145000.01, I want to ask if that's possible that I change the pattern and directly get -145000.01
String pattern = "^[^0-9\\-]*([0-9\\-\\.\\,]*?)[^0-9\\-]*$";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);
if(m.matches()) {
System.out.println(m.group(1));
}
I expect the output could resolve the comma
You can simply it with String.replaceAll() and simpler regex (providing you are expecting the input to be reasonably sane, i.e. without multiple decimal points embedded in the numbers or multiple negative signs)
String str = "Fr.-145,000.01";
str.replaceAll("[^\\d-.]\\.?", "")
If you are going down this route, I would sanity check it by parsing the output with BigDecimal or Double.
One approach would be to just collect our desired digits, ., + and - in a capturing group followed by an optional comma, and then join them:
([+-]?[0-9][0-9.]+),?
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "([+-]?[0-9][0-9.]+),?";
final String string = "Fr.-145,000.01\n"
+ "Fr.-145,000\n"
+ "Fr.-145,000,000\n"
+ "Fr.-145\n"
+ "Fr.+145,000.01\n"
+ "Fr.+145,000\n"
+ "Fr.145,000,000\n"
+ "Fr.145\n"
+ "Fr.145,000,000,000.01";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
Demo
String str = "Fr.-145,000.01";
Pattern regex = Pattern.compile("^[^0-9-]*(-?[0-9]+)(?:,([0-9]{3}))?(?:,([0-9]{3}))?(?:,([0-9]{3}))?(\\.[0-9]+)?[^0-9-]*$");
Matcher matcher = regex.matcher(str);
System.out.println(matcher.replaceAll("$1$2$3$4$5"));
Output:
-145000.01
It looks for number with up to 3 commas (Up to 999,999,999,999.99), and replaces it with the digits.
My approach would be to remove all the unnecessary parts using replaceAll.
The unnecessary parts are, apparently:
Any sequence which is not digits or minus at the beginning of the string.
Commas
The first pattern is represented by ^[^\\d-]+. The second is merely ,.
Put them together with an |:
Pattern p = Pattern.compile("(^[^\\d-]+)|,");
Matcher m = p.matcher(str);
String result = m.replaceAll("");
You could 2 capturing groups and make use of repeating matching using the \G anchor to assert the position at the end of the previous match.
(?:^[^0-9+-]+(?=[.+,\d-]*\.\d+$)([+-]?\d{1,3})|\G(?!^)),(\d{3})
In Java
String regex = "(?:^[^0-9+-]+(?=[.+,\\d-]*\\.\\d+$)([+-]?\\d{1,3})|\\G(?!^)),(\\d{3})";
Explanation
(?: Non capturing group
^[^0-9+-]+ Match 1+ times not a digit, + or -
(?= Positive lookahead, assert that what follows is:
[.+,\d-]*\.\d+$ Match 0+ times what is allowed and assert ending on . and 1+ digits
) Close positive lookahead
( Capturing group 1
[+-]?\d{1,3}) Match optional + or - followed by 1-3 digits
| Or
\G(?!^) Assert position at the end of prevous match, not at the start
), Close capturing group 1 and match ,
(\d{3}) Capture in group 2 matching 3 digits
In the replacement use the 2 capturing groups $1$2
See the Regex demo | Java demo
I have this input text:
142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48
I want to use regular expression to extract 000781fe0000326f and -51.984, so the output looks like this
000781fe0000326f-51.984
I can use [0-9]{5,7}(?:[a-z][a-z0-9_]*) and ([-]?\\d*\\.\\d+)(?![-+0-9\\.]) to extract 000781fe0000326f and -51.984, respectively.
Is there a way to ignore or exclude everything between 000781fe0000326f and -51.984? To ignore everythin that will be captured by the non greedy filler (.*?) ?
String ref="[0-9]{5,7}(?:[a-z][a-z0-9_]*)_____([-]?\\d*\\.\\d+)(?![-+0-9\\.])";
Pattern p = Pattern.compile(ref,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(input);
while (m.find())
{
String all = m.group();
//list3.add(all);
}
For you example data you might use an alternation | to match either one of the regexes in you question and then concatenate them.
Note that in your regex you could write (?:[a-z][a-z0-9_]*) as [a-z][a-z0-9_] and you don't have to escape the dot in a character class.
For example:
[0-9]{5,7}[a-z][a-z0-9_]*|-?\d*\.\d+(?![-+0-9.])
Regex demo
String regex = "[0-9]{5,7}[a-z][a-z0-9_]*|-?\\d*\\.\\d+(?![-+0-9.])";
String string = "142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
String result = "";
while (matcher.find()) {
result += matcher.group(0);
}
System.out.println(result); // 000781fe0000326f-51.984
Demo Java
There's no way to combine strings together like that in pure regex, but it's easy to create a group for the first match, a group for the second match, and then use m.group(1) + m.group(2) to concatenate the two groups together and create your desired combined string.
Also note that [0-9] simplifies to \d, a character set with only one token in it simplifies to just that token, [a-z0-9_] with the i flag simplifies to \w, and there's no need to escape a . inside a character set:
String input = "142d 000781fe0000326f BPD false 65535 FSK_75 FSK_75 -51.984 -48";
String ref="(\\d{5,7}(?:[a-z]\\w*)).*?((?:-?\\d*\\.\\d+)(?![-+\\d.]))";
Pattern p = Pattern.compile(ref,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(input);
while (m.find())
{
String all = m.group(1) + m.group(2);
System.out.println(all);
}
you cannot really ignore the words in between. You can include them all.
something like this will include all of them.
[0-9]{5,7}(?:[a-z][a-z0-9_])[a-zA-Z0-9_ ]([-]?\d*.\d+)(?![-+0-9.])
But that is not what you want.
I think the best bet is either having 2 regular expressions and then combining the result, or splitting the string on spaces/tab characters and checking the 'n'th elements as required
I have the below java string in the below format.
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:"
Using the java.util.regex package matter and pattern classes I have to get the output string int the following format:
Output: [NYK:1100][CLT:2300][KTY:3540]
Can you suggest a RegEx pattern which can help me get the above output format?
You can use this regex \[name:([A-Z]+)\]\[distance:(\d+)\] with Pattern like this :
String regex = "\\[name:([A-Z]+)\\]\\[distance:(\\d+)\\]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
StringBuilder result = new StringBuilder();
while (matcher.find()) {
result.append("[");
result.append(matcher.group(1));
result.append(":");
result.append(matcher.group(2));
result.append("]");
}
System.out.println(result.toString());
Output
[NYK:1100][CLT:2300][KTY:3540]
regex demo
\[name:([A-Z]+)\]\[distance:(\d+)\] mean get two groups one the upper letters after the \[name:([A-Z]+)\] the second get the number after \[distance:(\d+)\]
Another solution from #tradeJmark you can use this regex :
String regex = "\\[name:(?<name>[A-Z]+)\\]\\[distance:(?<distance>\\d+)\\]";
So you can easily get the results of each group by the name of group instead of the index like this :
while (matcher.find()) {
result.append("[");
result.append(matcher.group("name"));
//----------------------------^^
result.append(":");
result.append(matcher.group("distance"));
//------------------------------^^
result.append("]");
}
If the format of the string is fixed, and you always have just 3 [...] groups inside to deal with, you may define a block that matches [name:...] and captures the 2 parts into separate groups and use a quite simple code with .replaceAll:
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:";
String matchingBlock = "\\s*\\[name:([A-Z]+)]\\[distance:(\\d+)]";
String res = s.replaceAll(String.format(".*%1$s%1$s%1$s.*", matchingBlock),
"[$1:$2][$3:$4][$5:$6]");
System.out.println(res); // [NYK:1100][CLT:2300][KTY:3540]
See the Java demo and a regex demo.
The block pattern matches:
\\s* - 0+ whitespaces
\\[name: - a literal [name: substring
([A-Z]+) - Group n capturing 1 or more uppercase ASCII chars (\\w+ can also be used)
]\\[distance: - a literal ][distance: substring
(\\d+) - Group m capturing 1 or more digits
] - a ] symbol.
In the .*%1$s%1$s%1$s.* pattern, the groups will have 1 to 6 IDs (referred to with $1 - $6 backreferences from the replacement pattern) and the leading and final .* will remove start and end of the string (add (?s) at the start of the pattern if the string can contain line breaks).
I have the following code to extract the string within double quotes using Regex.
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile("\"([^\"]*)\"");
final Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}
The output I get now is java programming.But from the String str I want the content in the second double quotes which is programming. Can any one tell me how to do that using Regex.
If you take your example, and change it slightly to:
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile("\"([^\"]*)\"");
final Matcher matcher = pattern.matcher(str);
int i = 0
while(matcher.find()){
System.out.println("match " + ++i + ": " + matcher.group(1) + "\n");
}
You should find that it prints:
match 1: Java
match 2: programming
This shows that you are able to loop over all of the matches. If you only want the last match, then you have a number of options:
Store the match in the loop, and when the loop is finished, you have the last match.
Change the regex to ignore everything until your pattern, with something like: Pattern.compile(".*\"([^\"]*)\"")
If you really want explicitly the second match, then the simplest solution is something like Pattern.compile("\"([^\"]*)\"[^\"]*\"([^\"]*)\""). This gives two matching groups.
If you want the last token inside double quotes, add an end-of-line archor ($):
final Pattern pattern = Pattern.compile("\"([^\"]*)\"$");
In this case, you can replace while with if if your input is a single line.
Great answer from Paul. Well,You can also try this pattern
final Pattern pattern = Pattern.compile(",\"(\\w+)\"");
Java program
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile(",\"(\\w+)\"");
final Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Explanation
,\": matches a comma, followed by a quotation mark "
(\\w+): matches one or more words
\": matches the last quotation mark "
Then the group(\\w+) is captured (group 1 precisely)
Output
programming
I ping a host. In result a standard output. Below a REGEXP but it do not work correct. Where I did a mistake?
String REGEXP ="time=(\\\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}
You only need \\d+ in your regex because
Matcher looks for the pattern (using which it is created) and then tries to find every occurance of the pattern in the string being matched.
Use while(matcher.group(1) in case of multiple occurances.
each () represents a captured group.
You have too many backslashes. Assuming you want to get the number from a string like "time=32ms", then you need:
String REGEXP ="time=(\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}
Explanation: The search pattern you are looking for is "\d", meaning a decimal number, the "+" means 1 or more occurrences.
To get the "\" to the matcher, it needs to be escaped, and the escape character is also "\".
The brackets define the matching group that you want to pick out.
With "\\\\d+", the matcher sees this as "\\d+", which would match a backslash followed by one or more "d"s. The first backslash protects the second backslash, and the third protects the fourth.