Regex positive lookbehind woes - java

My goal is to match the first 0 and everything after that zero in a decimal value. If the first decimal place is a zero then I want to match the decimal too. If there is no decimal then capture nothing.
Here are some examples of what I want:
180.570123 // should capture the "0123" on the end
180.570 // should capture the "0" on the end
180.0123 // should capture the ".0123" on the end
180.0 // should capture the ".0" on the end
180123 // should capture nothing
180 // should capture nothing
If the first decimal place is a 0 then making the match is easy:
(\.0.*)
My problem is matching when the first decimal place is not a 0. I believe positive lookbehind will fix this issue, but I am not able to get it to work correctly. Here is one regex I have tried:
(?<=^.*\..*)0.*
This regex will eventually be used in Java.
UPDATE:
I am going to use this regex to get rid of numbers and possibly a decimal point on the end of a string using Java's replaceAll method. I will do this by replacing the capture group with an empty string. Here is a better example of what I want.
String case1 = "180.570123";
String case2 = "180.570";
String case3 = "180.0123";
String case4 = "180.0";
String case5 = "180123";
String case6 = "180";
String result = null;
result = case1.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180.57
result = case2.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180.57
result = case3.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180
result = case4.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180
result = case5.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180123
result = case6.replaceAll( "THE REGEX I NEED", "" );
System.out.println( result ); // should print 180
Also, I am testing these regexs at http://gskinner.com/RegExr/

You can use this expression:
\.[1-9]*(0\d*)
And what you want will be in the first capturing group. (Except the decimal point.)
If you want to capture the decimal point too, you can use:
(?:\.[1-9]+|(?=\.))(\.?0\d*)
Example (online):
Pattern p = Pattern.compile("(?:\\.[1-9]+|(?=\\.))(\\.?0\\d*)");
String[] strs = {"180.570123", "180.570", "180.0123", "180.0", "180123", "180", "180.2030405"};
for (String s : strs) {
Matcher m = p.matcher(s);
System.out.printf("%-12s: Match: %s%n", s,
m.find() ? m.group(1) : "n/a");
}
Output:
180.570123 : Match: 0123
180.570 : Match: 0
180.0123 : Match: .0123
180.0 : Match: .0
180123 : Match: n/a
180 : Match: n/a
180.2030405 : Match: 030405

I would write a small function to do the extracting instead of regex.
private String getZeroPart(final String s) {
final String[] strs = s.split("\\.");
if (strs.length != 2 || strs[1].indexOf("0") < 0) {
return null;
} else {
return strs[1].startsWith("0") ? "." + strs[1] : strs[1].substring(strs[1].indexOf("0"));
}
}
to test it:
final String[] ss = { "180.570123", "180.570", "180.0123",
"180.0", "180123", "180", "180.2030405","180.5555" };
for (final String s : ss) {
System.out.println(getZeroPart(s));
}
output:
0123
0
.0123
.0
null
null
030405
null
update
based on the EDIT of the question. do some changes on the method to get the right number:
private String cutZeroPart(final String s) {
final String[] strs = s.split("\\.");
if (strs.length != 2 || strs[1].indexOf("0") < 0) {
return s;
} else {
return strs[1].startsWith("0") ? strs[0] : s.substring(0, strs[0].length() + strs[1].indexOf("0") + 1);
}
}
output:
180.570123 -> 180.57
180.570 -> 180.57
180.0123 -> 180
180.0 -> 180
180123 -> 180123
180 -> 180
180.2030405 -> 180.2
180.5555 -> 180.5555

Lookbehinds might be overkill. This one worked well-->
/\.\d*(0\d*)/

You can try this :
Matcher m = Pattern.compile("(?<=(\\.))(\\d*?)(0.*)").matcher("180.2030405");
String res="";
if(m.find()){
if(m.group(2).equals("")){
res = "."+m.group(3);
}
else{
res = m.group(3);
}
}

Related

Regex to capture groups and ignore last two characters where one is optional

I need to capture two groups from an input string. The values differ in structure as they come in.
The following are examples of the incoming strings:
Comment = "This is a comment";
NumericValue = 123456;
What I am trying to accomplish is to capture the string value from the left of the equals sign as one group and the value after the equals sign as a second group. The semicolon should never be included.
The caveat is that if the second group is a string, the quotes from each end must not be included in that capture group.
The expected results would be:
Comment = "This is a comment";
key group => Comment
value group => This is a comment
NumericValue = 123456;
key group => NumericValue
value group => 123456
The following is what I have so far. This works fine for capturing the numeric value, but leaves the end double quote when capturing the string value.
(?<key>\w+)\s*=\s*(?:[\"]?)(?<group>.+(?:(?=[\"]?;)))
EDIT
When applying the regex against a string value, it must allow capture of semicolons and double quotes within the string and ignore only the closing ones.
So, if we have an input of:
Comment = "This is a "comment"; This is still a comment";
The second capture group should be:
This is a "comment"; This is still a comment
An option is to use an alternation where you would have to check for group 2 or group 3:
(?<key>\w+)\h*=\h*(?:"(.*?)"|([^"\r\n]+));$
(?<key>\w+) Group key match 1+ word chars
\h*=\h* Match an = between optional horizontal whitespace chars
(?: Non capturing group
"(.+?)" Capture in group 2 1+ times any char between "
| Or
([^"\r\n]+) Capture group 3, match 1+ times any char except " or a newline
); Close non capturing group and match ;
$ End of string
Regex demo
In Java
String regex = "(?<key>\\w+)\\h*=\\h*(?:\"(.*?)\"|([^\"\\r\\n]+));$";
Edited based on comment to include ; and " in the comments as per the examples given:
(?<key>\w+)\s*=\s*(?:[\"]?)(?<value>((")(?!;?$)|;(?!$)|[^;"])+)"?;?$
The following one additionally doesn't allow ; or " to appear in the numeric text. However, to include this, I had to rename the capturing groups because the name cannot be used for more than one group.
(?<key>\w+)\s*=\s*((?:")(?<valueT>((")(?!;?$)|;(?!$)|[^;"])+)";?$|(?<valueN>[^;"]+);?$)
Here is a class that tests it.
For readability, I have separated the key and value regexes in the class. I have added the test cases in a method within the class. However, this still doesn't handle the case of a numeric text containing ; or ". Also, the line needs to be trimmed before being subjected to the pattern test (which I think is feasible).
public class NameValuePairRegex{
public static void main( String[] args ){
String SPACE = "\\s*";
String EQ = "=";
String OR = "|";
/* The original regex tried by you (for comparison). */
String orig = "(?<key>\\w+)\\s*=\\s*(?:[\\\"]?)(?<value>.+(?:(?=;)))";
String key = "(?<key>\\w+)";
String valuePatternForText = "(?:\")(?<valueT>((\")(?!;?$)|;(?!$)|[^;\"])+)\";?$";
String valuePatternForNumbers = "(?<valueN>[^;\"]+);?$";
String p = key + SPACE + EQ + SPACE + "(" + valuePatternForText + OR + valuePatternForNumbers + ")";
Pattern nvp = Pattern.compile( p );
System.out.println( nvp.pattern() );
print( input(), nvp );
}
private static void print( List<String> input, Pattern ep ) {
for( String e : input ) {
System.out.println( e );
Matcher m = ep.matcher( e );
boolean found = m.find();
if( !found ) {
System.out.println( "\t\tNo match" );
continue;
}
String valueT = m.group( "valueT" );
String valueN = m.group( "valueN" );
System.out.print( "\t\t" + m.group( "key" ) + " -> " + ( valueT == null ? "" : valueT ) + " " + ( valueN == null ? "" : valueN ) );
System.out.println( );
}
}
private static List<String> input(){
List<String> neg = new ArrayList<>();
Collections.addAll( neg,
"Comment = \"This is a comment\";",
"Comment = \"This is a comment with semicolon ;\";",
"Comment = \"This is a comment with semicolon ; and quote\"\";",
"Comment = \"This is a comment\"",
"Comment = \"This is a \"comment\"; This is still a comment\";",
"NumericValue = 123456;",
"NumericValue = 123;456;",
"NumericValue = 123\"456;",
"NumericValue = 123456" );
return neg;
}
}
Original answer:
The following changed regex is fulfilling the requirements you mentioned. I added the exclusion of ; and " from the value part.
Original that you tried:
(?<key>\w+)\s*=\s*(?:[\"]?)(?<group>.+(?:(?=[\"]?;)))
The changed one:
(?<key>\w+)\s*=\s*(?:[\"]?)(?<value>[^;"]+)
Regular expressions are fun, but look how clean and easy to read this would be without using a regular expression:
int equals = s.indexOf('=');
String key = s.substring(0, equals).trim();
String value = s.substring(equals + 1).trim();
if (value.endsWith(";")) {
value = value.substring(0, value.length() - 1).trim();
}
if (value.startsWith("\"") && value.endsWith("\"")) {
value = value.substring(1, value.length() - 1);
}
Don’t assume that because this uses more lines of code than a regular expression that it’s slower. The lines of code executed internally by a regex engine will far exceed the above code.

Java : Pattern matcher returns new lines unexpectedly

I have an use case that I have to handle any escaped/unescaped characters as delimiter to split a sentence. So far the unescaped/escaped character we have are :
" " (space),"\\t","|", "\\|",";","\\;","," etc
Which is working so far with a regex, defined as :
String delimiter = " ";
String regex = "(?:\\\\.|[^"+ delimiter +"\\\\]++)*";
The input string is :
String input = "234|Tamarind|something interesting ";
Now, below is the code that splits and prints:
List<String> matchList = new ArrayList<>( );
Matcher regexMatcher = pattern.matcher( input );
while ( regexMatcher.find() )
{
matchList.add( regexMatcher.group() );
}
System.out.println( "Unescaped/escaped test result with size: " + matchList.size() );
matchList.stream().forEach( System.out::println );
However, there are extra strings(new lines) that are being stored unexpectedly. So the output looks like :
Unescaped/escaped test result with size: 5
234|Tamarind|something
interesting
.
Is there a better way to do this so that there won't be any extra strings?
It is easy: make sure you match at least one character. That means you may remove the ++ quantifier and replace * with +. See the regex demo.
Full Java demo:
String delimiter = " ";
String regex = "(?:\\\\.|[^"+ delimiter +"\\\\])+";
// System.out.println(regex); // => (?:\\.|[^ \\])+
Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
String input = "234|Tamarind|something interesting ";
List<String> matchList = new ArrayList<>( );
Matcher regexMatcher = pattern.matcher( input );
while ( regexMatcher.find() )
{
// System.out.println("'"+regexMatcher.group()+"'");
matchList.add( regexMatcher.group() );
}
System.out.println( "Unescaped/escaped test result with size: " + matchList.size() );
matchList.stream().forEach( System.out::println );
Ouput:
Unescaped/escaped test result with size: 2
234|Tamarind|something
interesting

how do I extract data between two characters in java

String text = "/'Team1 = 6', while /'Team2 = 4', and /'Team3 = 2'";
String[] body = text.split("/|,");
String b1 = body[1];
String b2 = body[2];
String b3 = body[3];
Desired results:
b1 = 'Team1 = 6'
b2 = 'Team2 = 4'
b3 = 'Team3 = 2'
Use regex. Something like this:
String text = "/'Team1 = 6', while /'Team2 = 4', and /'Team3 = 2'";
Matcher m = Pattern.compile("(\\w+\\s=\\s\\d+)").matcher(text);
// \w+ matches the team name (eg: Team1). \s=\s matches " = " and \d+ matches the score.
while (m.find()){
System.out.print(m.group(1)+"\n");
}
This prints:
Team1 = 6
Team2 = 4
Team3 = 2
There's a few ways you can do this, but in your case I'd use regex.
I don't know Java but think something like this regex pattern should work:
Pattern compile("\/'(.*?)'")
A random regex tester site with this pattern is here: https://regex101.com/r/MCRfMm/1
I'm going to say "friends don't let friends use regex" and recommend parsing this out. The built-in class StreamTokenizer will handle the job.
private static void testTok( String in ) throws Exception {
System.out.println( "Input: " + in );
StreamTokenizer tok = new StreamTokenizer( new StringReader( in ) );
tok.resetSyntax();
tok.wordChars( 'a', 'z' );
tok.wordChars( 'A', 'Z' );
tok.wordChars( '0', '9' );
tok.whitespaceChars( 0, ' ' );
String prevToken = null;
for( int type; (type = tok.nextToken()) != StreamTokenizer.TT_EOF; ) {
// System.out.println( tokString( type ) + ": nval=" + tok.nval + ", sval=" + tok.sval );
if( type == '=' ) {
tok.nextToken();
System.out.println( prevToken + "=" + tok.sval );
}
prevToken = tok.sval;
}
}
Output:
Input: /'Team1 = 6', while /'Team2 = 4', and /'Team3 = 2'
Team1=6
Team2=4
Team3=2
BUILD SUCCESSFUL (total time: 0 seconds)
One advantage of this technique is that the individual tokens like "Team1", "=" and "6" are all parsed separately, whereas the regex presented so far is already complex to read and would have to be made even more complex to isolate each of those tokens separately.
You can split on "a slash, optionally preceded by a comma followed by zero or more non-slash characters":
String[] body = text.split("(?:,[^/]*)?/");
public class MyClass {
public static void main(String args[]) {
String text = "/'Team1 = 6', while /'Team2 = 4', and /'Team3 = 2'";
char []textArr = text.toCharArray();
char st = '/';
char ed = ',';
boolean lookForEnd = false;
int st_idx =0;
for(int i =0; i < textArr.length; i++){
if(textArr[i] == st){
st_idx = i+1;
lookForEnd = true;
}
else if(lookForEnd && textArr[i] == ed){
System.out.println(text.substring(st_idx,i));
lookForEnd = false;
}
}
// we still didn't find ',' therefore print everything from lastFoundIdx of '/'
if(lookForEnd){
System.out.println(text.substring(st_idx));
}
}
}
/*
'Team1 = 6'
'Team2 = 4'
'Team3 = 2'
*/
You could use split and a regex using an alternation matching either the start of the string followed by a forward slash or matching a comma, match not a comma one or more times and then a forward slash followed by a positive lookahead to assert that what follows the alternation is a '
(?:^/|,[^,]+/)(?=')
Explanation
(?: Start non capturing group
^/ Assert the start of the string followed by forward slash
| Or
,[^,]+/ Match a comma followed by match not a comma one or more times using a negated character class and then match a forward slash
(?=') Positive lookahead to assert what follows is '
) Close non capturing group
Regex demo - Java demo
Getting a match instead of split
If you want to to match a pattern like 'Team1 = 6', you could use:
'[^=]+=[^']+'
Regex demo - Java demo

REGEX: Get double (positive or negative) from string [duplicate]

let's say i have string like that:
eXamPLestring>1.67>>ReSTOfString
my task is to extract only 1.67 from string above.
I assume regex will be usefull, but i can't figure out how to write propper expression.
If you want to extract all Int's and Float's from a String, you can follow my solution:
private ArrayList<String> parseIntsAndFloats(String raw) {
ArrayList<String> listBuffer = new ArrayList<String>();
Pattern p = Pattern.compile("[0-9]*\\.?[0-9]+");
Matcher m = p.matcher(raw);
while (m.find()) {
listBuffer.add(m.group());
}
return listBuffer;
}
If you want to parse also negative values you can add [-]? to the pattern like this:
Pattern p = Pattern.compile("[-]?[0-9]*\\.?[0-9]+");
And if you also want to set , as a separator you can add ,? to the pattern like this:
Pattern p = Pattern.compile("[-]?[0-9]*\\.?,?[0-9]+");
.
To test the patterns you can use this online tool: http://gskinner.com/RegExr/
Note: For this tool remember to unescape if you are trying my examples (you just need to take off one of the \)
You could try matching the digits using a regular expression
\\d+\\.\\d+
This could look something like
Pattern p = Pattern.compile("\\d+\\.\\d+");
Matcher m = p.matcher("eXamPLestring>1.67>>ReSTOfString");
while (m.find()) {
Float.parseFloat(m.group());
}
Here's how to do it in one line,
String f = input.replaceAll(".*?(-?[\\d.]+)?.*", "$1");
Which returns a blank String if there is no float found.
If you actually want a float, you can do it in one line:
float f = Float.parseFloat(input.replaceAll(".*?(-?[\\d.]+).*", "$1"));
but since a blank cannot be parsed as a float, you would have to do it in two steps - testing if the string is blank before parsing - if it's possible for there to be no float.
String s = "eXamPLestring>1.67>>ReSTOfString>>0.99>>ahgf>>.9>>>123>>>2323.12";
Pattern p = Pattern.compile("\\d*\\.\\d+");
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(">> "+ m.group());
}
Gives only floats
>> 1.67
>> 0.99
>> .9
>> 2323.12
You can use the regex \d*\.?,?\d* This will work for floats like 1.0 and 1,0
Have a look at this link, they also explain a few things that you need to keep in mind when building such a regex.
[-+]?[0-9]*\.?[0-9]+
example code:
String[] strings = new String[3];
strings[0] = "eXamPLestring>1.67>>ReSTOfString";
strings[1] = "eXamPLestring>0.57>>ReSTOfString";
strings[2] = "eXamPLestring>2547.758>>ReSTOfString";
Pattern pattern = Pattern.compile("[-+]?[0-9]*\\.?[0-9]+");
for (String string : strings)
{
Matcher matcher = pattern.matcher(string);
while(matcher.find()){
System.out.println("# float value: " + matcher.group());
}
}
output:
# float value: 1.67
# float value: 0.57
# float value: 2547.758
/**
* Extracts the first number out of a text.
* Works for 1.000,1 and also for 1,000.1 returning 1000.1 (1000 plus 1 decimal).
* When only a , or a . is used it is assumed as the float separator.
*
* #param sample The sample text.
*
* #return A float representation of the number.
*/
static public Float extractFloat(String sample) {
Pattern pattern = Pattern.compile("[\\d.,]+");
Matcher matcher = pattern.matcher(sample);
if (!matcher.find()) {
return null;
}
String floatStr = matcher.group();
if (floatStr.matches("\\d+,+\\d+")) {
floatStr = floatStr.replaceAll(",+", ".");
} else if (floatStr.matches("\\d+\\.+\\d+")) {
floatStr = floatStr.replaceAll("\\.\\.+", ".");
} else if (floatStr.matches("(\\d+\\.+)+\\d+(,+\\d+)?")) {
floatStr = floatStr.replaceAll("\\.+", "").replaceAll(",+", ".");
} else if (floatStr.matches("(\\d+,+)+\\d+(.+\\d+)?")) {
floatStr = floatStr.replaceAll(",", "").replaceAll("\\.\\.+", ".");
}
try {
return new Float(floatStr);
} catch (NumberFormatException ex) {
throw new AssertionError("Unexpected non float text: " + floatStr);
}
}

REGEX : How to escape []?

I'm working on strings like "[ro.multiboot]: [1]". How do I just select 1(it can also be 0) out of this string?
I am looking for a regex in Java.
Usually, you would do something like (assuming 0 and 1 were the only options):
^.*\[([01])\].*$
If you only wanted the value for ro.multiboot, you could change it to something like:
^.*\[ro.multiboot\].*\[([01])\].*$
(depending on how complex any of the non-bracketed stuff is allowed to be).
These would both basically only extract the value between square brackets if it were zero or one, and capture it into a capture variable so you could use it.
Of course, regex is not a world-wide standard, nor are the environments in which you use it. That means it depends a lot on your actual environment how you will actually code this up.
For Java, the following sample program may help:
import java.util.regex.*;
class Test {
public static void main(String args[]) {
Pattern p = Pattern.compile("^.*\\[ro.multiboot\\].*\\[([01])\\].*$");
String str;
Matcher m;
str = "[ro.multiboot]: [0]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str0 has " + m.group(1));
}
str = "[ro.multiboot]: [1]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str1 has " + m.group(1));
}
str = "[ro.multiboot]: [2]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str2 has " + m.group(1));
}
}
}
This results in (as expected):
str0 has 0
str1 has 1
#paxdiablo's regexps are correct, but complete answer for "How do I just select 1(it can also be 0) out of this string?" is:
1. very simple solution
String input = "[ro.multiboot]: [1]";
String matched = input.replaceFirst( "^.*\\[ro.multiboot\\].*\\[([01])\\].*$", "$1" );
2. same functionality, more complicated but with better performance
String input = "[ro.multiboot]: [1]";
Pattern p = Pattern.compile( "^.*\\[ro.multiboot\\].*\\[([01])\\].*$" );
Matcher m = p.matcher( input );
String matched = null;
if ( m.matches() ) matched = m.group( 1 );
Performance is better because the pattern is compiled just once (for example when you are matching array os such Strings);
Notes:
in both examples the group is part of regexps between ( and ) (if not escaped)
in Java you have to use \\[, because \[ returns error - it is not correct escape sequence for String

Categories

Resources