Java Regex: Grouping consecutive 1 or 0 in a binary string - java

I want to capture all the consecutive groups in a binary string
1000011100001100111100001
should give me
1
0000
111
0000
11
00
1111
0000
1
I have made ([1?|0?]+) regex in my java application to group the consequential 1 or 0 in the string like 10000111000011.
But when I run it in my code, there is nothing in the console printed:
String name ="10000111000011";
regex("(\\[1?|0?]+)" ,name);
public static void regex(String regex, String searchedString) {
Pattern pattern = Pattern.compile(regex);
Matcher regexMatcher = pattern.matcher(searchedString);
while (regexMatcher.find())
if (regexMatcher.group().length() > 0)
System.out.println(regexMatcher.group());
}
To avoid syntax error in the runtime of regex, I have changed the ([1?|0?]+) to the (\\[1?|0?]+)
Why there is no group based on regex?

First - just as an explanation - your regex defines a character class ([ ... ]) that matches any of the characters 1, ?, | or 0 one or more times (+). I think you mean to have ( ... ) in it, among other things, which would make the | an alternation lazy matching a 0 or a 1. But that's not either what you want (I think ;).
Now, the solution might be this:
([01])\1*
which matches a 0 or a 1, and captures it. Then it matches any number of the same digit (\1 is a back reference to what ever is captured in the first capture group - in this case the 0 or the 1) any number of times.
Check it out at ideone.

You can try this:
(1+|0+)
Explanation
Sample Code:
final String regex = "(1+|0+)";
final String string = "10000111000011\n"
+ "11001111110011";
final Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Group " + 1 + ": " + matcher.group(1));
}

Related

Java Regex get all numbers

I need to retrieve all numbers from a String, example :
"a: 1 | b=2 ; c=3.2 / d=4,2"
I want get this result :
1
2
3.2
4,2
So, i don't know how to say that in Regex on Java.
Actually, i have this :
(?<=\D)(?=\d)|(?<=\d)(?=\D)
He split letter and number (but the double value is not respected), and the result is :
1
2
3
2 (problem)
4
2 (problem)
Can you help me ?
Thanks :D
You might use a capturing group with a character class:
[a-z][:=]\h*(\d+(?:[.,]\d+)?)
Explanation
[a-z] Word boundary, match a char a-z
[:=] Match either : or =
\h* Match 0+ horizontal whitespace chars
( Capture group 1
\d+(?:[.,]\d+)? Match 1+ digits with an optional decimal part with either . or ,
) Close group
Regex demo | Java demo
For example
String regex = "[a-z][:=]\\h*(\\d+(?:[.,]\\d+)?)";
String string = "a: 1 | b=2 ; c=3.2 / d=4,2";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
1
2
3.2
4,2
You can do it as follows:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) throws InterruptedException {
// Test
String s = "a: 1 | b=2 ; c=3.2 / d=4,2";
showNumbers(s);
}
static void showNumbers(String s) {
Pattern regex = Pattern.compile("\\d[\\d,.]*");
Matcher matcher = regex.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
Output:
1
2
3.2
4,2
You can use
/\d+(?:[.,]\d+)?/g
demo
You may jsut need a regex like (\d(?:[.,]\d)?) that
find a digit
evently dot/comma + other digit after
The multi-digit version is (\d+(?:[.,]\d+)?)
String value = "a: 1 | b=2 ; c=3.2 / d=4,2";
Pattern p = Pattern.compile("(\\d(?:[.,]\\d)?)");
Matcher m = p.matcher(value);
while (m.find()) {
System.out.println(m.group());
}
// 1
// 2
// 3.2
// 4,2

Merge two pattern into one

I need write a pattern to remove currency symbol and comma. eg Fr.-145,000.01
After the pattern matcher should return -145000.01.
The pattern i am using:
^[^0-9\\-]*([0-9\\-\\.\\,]*?)[^0-9\\-]*$
This will return -145,000.01
Then I remove the comma to get -145000.01, I want to ask if that's possible that I change the pattern and directly get -145000.01
String pattern = "^[^0-9\\-]*([0-9\\-\\.\\,]*?)[^0-9\\-]*$";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);
if(m.matches()) {
System.out.println(m.group(1));
}
I expect the output could resolve the comma
You can simply it with String.replaceAll() and simpler regex (providing you are expecting the input to be reasonably sane, i.e. without multiple decimal points embedded in the numbers or multiple negative signs)
String str = "Fr.-145,000.01";
str.replaceAll("[^\\d-.]\\.?", "")
If you are going down this route, I would sanity check it by parsing the output with BigDecimal or Double.
One approach would be to just collect our desired digits, ., + and - in a capturing group followed by an optional comma, and then join them:
([+-]?[0-9][0-9.]+),?
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "([+-]?[0-9][0-9.]+),?";
final String string = "Fr.-145,000.01\n"
+ "Fr.-145,000\n"
+ "Fr.-145,000,000\n"
+ "Fr.-145\n"
+ "Fr.+145,000.01\n"
+ "Fr.+145,000\n"
+ "Fr.145,000,000\n"
+ "Fr.145\n"
+ "Fr.145,000,000,000.01";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
Demo
String str = "Fr.-145,000.01";
Pattern regex = Pattern.compile("^[^0-9-]*(-?[0-9]+)(?:,([0-9]{3}))?(?:,([0-9]{3}))?(?:,([0-9]{3}))?(\\.[0-9]+)?[^0-9-]*$");
Matcher matcher = regex.matcher(str);
System.out.println(matcher.replaceAll("$1$2$3$4$5"));
Output:
-145000.01
It looks for number with up to 3 commas (Up to 999,999,999,999.99), and replaces it with the digits.
My approach would be to remove all the unnecessary parts using replaceAll.
The unnecessary parts are, apparently:
Any sequence which is not digits or minus at the beginning of the string.
Commas
The first pattern is represented by ^[^\\d-]+. The second is merely ,.
Put them together with an |:
Pattern p = Pattern.compile("(^[^\\d-]+)|,");
Matcher m = p.matcher(str);
String result = m.replaceAll("");
You could 2 capturing groups and make use of repeating matching using the \G anchor to assert the position at the end of the previous match.
(?:^[^0-9+-]+(?=[.+,\d-]*\.\d+$)([+-]?\d{1,3})|\G(?!^)),(\d{3})
In Java
String regex = "(?:^[^0-9+-]+(?=[.+,\\d-]*\\.\\d+$)([+-]?\\d{1,3})|\\G(?!^)),(\\d{3})";
Explanation
(?: Non capturing group
^[^0-9+-]+ Match 1+ times not a digit, + or -
(?= Positive lookahead, assert that what follows is:
[.+,\d-]*\.\d+$ Match 0+ times what is allowed and assert ending on . and 1+ digits
) Close positive lookahead
( Capturing group 1
[+-]?\d{1,3}) Match optional + or - followed by 1-3 digits
| Or
\G(?!^) Assert position at the end of prevous match, not at the start
), Close capturing group 1 and match ,
(\d{3}) Capture in group 2 matching 3 digits
In the replacement use the 2 capturing groups $1$2
See the Regex demo | Java demo

How to optimize a regex pattern?

I'm trying to fetch substring(s) from a text and using regex for it.
Sample text:
bla bla 1:30-2pm bla bla 5-6:30am some text 1-2:15am
I'm looking for the time frame entries(1-30-2pm...). Made them bold for readability only
Here's my regex:
\d{1,2}(:\d{1,2})? – \d{1,2}(:\d{1,2})?(am|pm)
java snippet:
public static List<String> foo(String text, String regex) {
List<String> entries = new ArrayList<>();
Matcher matcher = Pattern.compile(regex).matcher(text);
while (matcher.find()) {
entries.add(matcher.group());
}
return entries;
}
Can you help me optimize the regex pattern? There might be some use cases that i missed.
If we like to optimize our expression, we might want to add optional spaces, just in case our inputs might have any additional spaces, other than that, your expression looks great:
(\d{1,2})(:\d{1,2})?(\s+)?-(\s+)?(\d{1,2})(:\d{1,2})?(am|pm)
We have also added capturing groups, if we wish to get the data.
Demo 1
Or:
(\d{1,2})(:\d{1,2})?(\s+)?(am|pm)?(\s+)?-(\s+)?(\d{1,2})(:\d{1,2})?(\s+)?(am|pm)
Demo 2
whichever would be desired.
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(\\d{1,2})(:\\d{1,2})?(\\s+)?-(\\s+)?(\\d{1,2})(:\\d{1,2})?(am|pm)";
final String string = "bla bla 1:30-2pm bla bla 5-6:30am some text 1-2:15am\n"
+ "bla bla 1:30 - 2pm bla bla 5 - 6:30am some text 1 - 2:15am";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
RegEx
If this expression wasn't desired and you wish to modify it, please visit this link at regex101.com.
RegEx Circuit
jex.im visualizes regular expressions:
I suggest using a regex like
String regex = "(?i)(?<!\\d)(?:0?[1-9]|1[0-2])(?::[0-5]\\d)?\\p{Pd}(?:0?[1-9]|1[0-2])(?::[0-5]\\d)?[ap]m\\b";
See the regex demo
Details
(?i) - case insensitive flag (for AM, PM, am, pm values etc.)
(?<!\d) - no digit immediately to the left is allowed
(?:0?[1-9]|1[0-2]) - an optional 0 and then a digit from 1 to 9, or 1 and then 0, 1 or 2
(?::[0-5]\d)? - an optional group: a digit from 0 to 5 and then any one digit
\p{Pd} - any hyphen
(?:0?[1-9]|1[0-2])(?::[0-5]\d)? - see above
[ap]m\b - a or p and then m and a word boundary.

Regex matcher not giving expected result. Not matching number properly

I cannot understand why 2nd group is giving me only 0. I expect 3000. And do point me to a resource where I can understand better.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );//?
System.out.println("Found value: " + m.group(3) );
}else {
System.out.println("NO MATCH");
}
}
}
Precise the pattern, add QT before the \d pattern, or use .*? instead of the first .* to get as few chars as possible.
String pattern = "(.*QT)(\\d+)(.*)";
or
String pattern = "(.*?)(\\d+)(.*)";
will do. See a Java demo.
The (.*QT)(\\d+)(.*) will match and capture into Group 1 any 0+ chars other than line break chars, as many as possible, up to the last occurrence of QT (followed with the subsequent subpatterns), then will match and capture 1+ digits into Group 2, and then will match and capture into Group 3 the rest of the line.
The .*? in the alternative pattern will matchand capture into Group 1 any 0+ chars other than line break chars, as few as possible, up to the first chunk of 1 or more digits.
You may also use a simpler pattern like String pattern = "QT(\\d+)"; to get all digits after QT, and the result will be in Group 1 then (you won't have the text before and after the number).
The * quantifier will try to match as many as possible, because it is a greedy quantifier.
You can make it non-greedy (lazy) by changing it to *?
Then, your regex will become :
(.*?)(\d+)(.*)
And you will match 3000 in the 2nd capturing group.
Here is a regex101 demo

Regex to find Integers in particular string lines

I have this regex to find integers in a string (newlines). However, I want to filtrate this. I want the regex to find the number in certain lines, and not others.
String:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
pattern = "(?<=,)\\d+";
pr = Pattern.compile(pattern);
match = pr.matcher(test);
System.out.println();
if (match.find()) {
System.out.println("Found: " + match.group());
}
This regex find the integers after the comma, for all the lines. If I want a particular regex to find the integers in the line containing "test1", "test2", and "test3". How should I do this? I want to create three different regex, but my regex skills are weak.
First regex should print out 2. The second 8 and the third 3.
You can expand your pattern to include test[123] in the lookbehind, which would match test1, test2, or test3:
String pattern = "(?<=test[123][^,]{0,100},[^,]{1,100},)\\d+";
Pattern pr = Pattern.compile(pattern);
Matcher match = pr.matcher(test);
System.out.println();
while (match.find()) {
System.out.println("Found: " + match.group());
}
The ,[^,] portion skis everything between two commas that follow testN.
I use {0,100} in place of * and {1,100} in place of + inside lookbehind expressions, because Java regex engine requires that lookbehinds had a pre-defined limit on their length. If you need to allow skipping more than 100 characters, adjust the maximum length accordingly.
Demo.
You can use the following Pattern and loop for this:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
// | "test" literal
// | | any number of digits
// | | | comma
// | | | any number of digits
// | | | | comma
// | | | | | group1, your digits
Pattern p = Pattern.compile("test\\d+,\\d+,(\\d+)");
Matcher m = p.matcher(test);
while (m.find()) {
// prints back-reference to group 1
System.out.printf("Found: %s%n", m.group(1));
}
Output
Found: 2
Found: 8
Found: 3
You could also use capturing groups to extract the test number and the other number from the string:
String pattern = "test([123]),\\d+,(\\d+),";
...
while (match.find()) {
// get and parse the number after "test" (first capturing group)
int testNo = Integer.parseInt(match.group(1));
// get and parse the number you wanted to extract (second capturing group)
int num = Integer.parseInt(match.group(2));
System.out.println("test"+testNo+": " + num);
}
Which prints
test1: 2
test2: 8
test3: 3
Note: In this example parsing the strings is only done for demonstration purposes, but it could be useful, if you want to do something with the numbers, like storing them in a array.
Update: If you also want to match strings like "ytrt.ytrwyt.test1.ytrwyt,0,2,0" you could change pattern to "test([123])\\D*,\\d+,(\\d+)," to allow any number of non-digits to follow test1, test2 or test3 (preceding the comma seperated ints).

Categories

Resources