I work with strings in my programs for many times.
Is there a way to do this line of Java code more efficient:
String str2 = str.replaceAll("\\s+", " ").trim();
You could try using a pre compiled pattern:
private Pattern p = Pattern.compile( "\\s+" );
And then use it like so:
str2 = p.matcher( str.trim() ).replaceAll( " " );
A more complex version that doesn't require trimming:
private Pattern p = Pattern.compile( "^\\s+|\\s+(?= )|\\s+$" );
str2 = p.matcher( str ).replaceAll( "" ); // no space
Related
I have an use case that I have to handle any escaped/unescaped characters as delimiter to split a sentence. So far the unescaped/escaped character we have are :
" " (space),"\\t","|", "\\|",";","\\;","," etc
Which is working so far with a regex, defined as :
String delimiter = " ";
String regex = "(?:\\\\.|[^"+ delimiter +"\\\\]++)*";
The input string is :
String input = "234|Tamarind|something interesting ";
Now, below is the code that splits and prints:
List<String> matchList = new ArrayList<>( );
Matcher regexMatcher = pattern.matcher( input );
while ( regexMatcher.find() )
{
matchList.add( regexMatcher.group() );
}
System.out.println( "Unescaped/escaped test result with size: " + matchList.size() );
matchList.stream().forEach( System.out::println );
However, there are extra strings(new lines) that are being stored unexpectedly. So the output looks like :
Unescaped/escaped test result with size: 5
234|Tamarind|something
interesting
.
Is there a better way to do this so that there won't be any extra strings?
It is easy: make sure you match at least one character. That means you may remove the ++ quantifier and replace * with +. See the regex demo.
Full Java demo:
String delimiter = " ";
String regex = "(?:\\\\.|[^"+ delimiter +"\\\\])+";
// System.out.println(regex); // => (?:\\.|[^ \\])+
Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
String input = "234|Tamarind|something interesting ";
List<String> matchList = new ArrayList<>( );
Matcher regexMatcher = pattern.matcher( input );
while ( regexMatcher.find() )
{
// System.out.println("'"+regexMatcher.group()+"'");
matchList.add( regexMatcher.group() );
}
System.out.println( "Unescaped/escaped test result with size: " + matchList.size() );
matchList.stream().forEach( System.out::println );
Ouput:
Unescaped/escaped test result with size: 2
234|Tamarind|something
interesting
I have the following data of which i would like to convert it to regular expression.
PALMKERNEL OIL Mal/Indo dlrs tonne cif Rotterdam
Dec15/Jan16 890.00
Jan16/Feb16 900.00 +10.00
My code below doesn't seem to work. Firstly how do I determine that after the 890, there could be either nothing or there could be a +10.00 or any number in that format? I tried to use ?: but sometimes it will totally ignore the month information which i am trying to capture..In this case i do not want to capture the +10.00 or any characters after the price of 890 or 900.
(PALMKERNEL OIL Mal\b)/(Indo dlrs tonne cif Rotterdam\b)\s*([^\s]+)\s*(\d*.?\d*)\s*([^\s]+|[+\d*.?\d*])\s*(\d*.?\d*)\s*([^\s]+|(?:[+\d*.?\d*]))
For the part with the dates and prices, this regular expression handles the two variants in your sample string.
Pattern pat = Pattern.compile(
"\\w{3}\\d{1,2}/\\w{3}\\d{1,2}" +
"\\s*(\\d+\\.\\d\\d)(\\s+\\+\\d+\\.\\d\\d)?" );
String s1 = "Dec15/Jan16 890.00";
String s2 = "Jan16/Feb16 900.00 +10.00";
Matcher m1 = pat.matcher( s1 );
if( m1.matches() )
System.out.println("m1 " + m1.group(1) + ":" + m1.group(2) );
Matcher m2 = pat.matcher( s2 );
if( m2.matches() )
System.out.println("m2 " + m2.group(1) + ":" + m2.group(2) );
Output:
m1 890.00:null
m2 900.00: +10.00
There's not enough information - so I don't know about a third alternative in /+10.00/?.
I am trying to do a replacement using regex. The relevant piece of code is as follows:
String msg =" <ClientVerificationResult>\n " +
" <VerificationIDCheck>Y</VerificationIDCheck>\n" +
" </ClientVerificationResult>\n";
String regex = "(<VerificationIDCheck>)([Y|N])(</VerificationIDCheck>)";
String replacedMsg= msg.replaceAll(regex, "$2".matches("Y") ? "$1YES$3" : "$1NO$3") ;
System.out.println(replacedMsg);
The output of this is
<ClientVerificationResult>
<VerificationIDCheck>NO</VerificationIDCheck>
</ClientVerificationResult>
When it should be
<ClientVerificationResult>
<VerificationIDCheck>YES</VerificationIDCheck>
</ClientVerificationResult>
I guess the problem is that "$2".matches("Y") is returning false. I have tried doing "$2".equals("Y"); and weird combinations inside matches() like "[Y]" or "([Y])", but still nothing.
If I print "$2" the output is Y. Any hints on what am I doing wrong?
You cannot use Java code as the replacement argument for replaceAll which is supposed to be a string only. Better use Pattern and Matcher APIs and evaluate matcher.group(2) for your replacement logic.
Suggested Code:
String msg =" <ClientVerificationResult>\n " +
" <VerificationIDCheck>Y</VerificationIDCheck>\n" +
" </ClientVerificationResult>\n";
String regex = "(<VerificationIDCheck>)([YN])(</VerificationIDCheck>)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher( msg );
StringBuffer sb = new StringBuffer();
while (m.find()) {
String repl = m.group(2).matches("Y") ? "YES" : "NO";
m.appendReplacement(sb, m.group(1) + repl + m.group(3));
}
m.appendTail(sb);
System.out.println(sb); // replaced string
You are checking the literal string "$2" to see if it matches "Y". This will never happen.
I have a string which is in following format:
I am extracting this Hello:A;B;C, also Hello:D;E;F
How do I extract the strings A;B;C and D;E;F?
I have written below code snippet to extract but not able to extract the last matching character D;E;F
Pattern pattern = Pattern.compile("(?<=Hello:).*?(?=,)");
The $ means end-of-line.
Thus this should work:
Pattern pattern = Pattern.compile("(?<=Hello:).*?(?=,|$)");
So you look-ahead for a comma or the end-of-line.
Test.
Try this:
String test = "I am extracting this Hello:Word;AnotherWord;YetAnotherWord, also Hello:D;E;F";
// any word optionally followed by ";" three times, the whole thing followed by either two non-word characters or EOL
Pattern pattern = Pattern.compile("(\\w+;?){3}(?=\\W{2,}|$)");
Matcher matcher = pattern.matcher(test);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output:
Word;AnotherWord;YetAnotherWord
D;E;F
Assuming you mean omitting certain patterns in a string:
String s = "I am extracting this Hello:A;B;C, also Hello:D;E;F" ;
ArrayList<String> tokens = new ArrayList<String>();
tokens.add( "A;B;C" );
tokens.add( "D;E;F" );
for( String tok : tokens )
{
if( s.contains( tok ) )
{
s = s.replace( tok, "");
}
}
System.out.println( s );
I'm working on strings like "[ro.multiboot]: [1]". How do I just select 1(it can also be 0) out of this string?
I am looking for a regex in Java.
Usually, you would do something like (assuming 0 and 1 were the only options):
^.*\[([01])\].*$
If you only wanted the value for ro.multiboot, you could change it to something like:
^.*\[ro.multiboot\].*\[([01])\].*$
(depending on how complex any of the non-bracketed stuff is allowed to be).
These would both basically only extract the value between square brackets if it were zero or one, and capture it into a capture variable so you could use it.
Of course, regex is not a world-wide standard, nor are the environments in which you use it. That means it depends a lot on your actual environment how you will actually code this up.
For Java, the following sample program may help:
import java.util.regex.*;
class Test {
public static void main(String args[]) {
Pattern p = Pattern.compile("^.*\\[ro.multiboot\\].*\\[([01])\\].*$");
String str;
Matcher m;
str = "[ro.multiboot]: [0]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str0 has " + m.group(1));
}
str = "[ro.multiboot]: [1]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str1 has " + m.group(1));
}
str = "[ro.multiboot]: [2]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str2 has " + m.group(1));
}
}
}
This results in (as expected):
str0 has 0
str1 has 1
#paxdiablo's regexps are correct, but complete answer for "How do I just select 1(it can also be 0) out of this string?" is:
1. very simple solution
String input = "[ro.multiboot]: [1]";
String matched = input.replaceFirst( "^.*\\[ro.multiboot\\].*\\[([01])\\].*$", "$1" );
2. same functionality, more complicated but with better performance
String input = "[ro.multiboot]: [1]";
Pattern p = Pattern.compile( "^.*\\[ro.multiboot\\].*\\[([01])\\].*$" );
Matcher m = p.matcher( input );
String matched = null;
if ( m.matches() ) matched = m.group( 1 );
Performance is better because the pattern is compiled just once (for example when you are matching array os such Strings);
Notes:
in both examples the group is part of regexps between ( and ) (if not escaped)
in Java you have to use \\[, because \[ returns error - it is not correct escape sequence for String