How to put an argument in a RegEx? - java

So, I am trying to use an argument in a RegEx pattern and I can't find a pattern because the argument is a simple String which is contained in the bigger string. Here is the the task itself, which I took from this codingbat.com, so everything to be clear:
THE Precondition and explanation of the task.
Given a string and a non-empty word string, return a version of the
original String where all chars have been replaced by pluses ("+"),
except for appearances of the word string which are preserved
unchanged.
My code:
public String plusOut(String str, String word) {
if(str.matches(".*(<word>.*<word>){1,}.*") || str.matches(".*(<word>.*<word>.*<word>){1,}.*")) {
return str.replaceAll(".", "+"); //after finding the argument I can easily exclude it but for now I have a bigger problem in the if-condition
} else {
return str;
}
}
Is there a way in Java to match an argument? The above code doesn't work for obvious reasons (<word>). How to use the argument word in the string RegEx?
UPDATE
This is the closest I got but it works only for the last char of the word String.
public String plusOut(String str, String word)
{
if(str.matches(".*("+ word + ".*" + word + "){1,}.*") || str.matches(".*(" + word + ".*" + word + ".*" + word + "){1,}.*") || str.matches(".*("+ word + "){1,}.*"))
{
return str.replaceAll(".(?<!" + word + ")", "+");
} else {
return str;
}
}
Input/Output
plusOut("12xy34", "xy") → "+++y++" (Expected "++xy++")
plusOut("12xy34", "1") → "1+++++" (Expected "1+++++")
plusOut("12xy34xyabcxy", "xy") → "+++y+++y++++y" (Expected "++xy++xy+++xy")
It`s because of the ? in the RegEx.

You can't do it with only patterns, you'll have to write some code apart from the pattern. Try this:
public static String plusOut(String input, String word) {
StringBuilder builder = new StringBuilder();
Pattern pattern = Pattern.compile(Pattern.quote(word));
Matcher matcher = pattern.matcher(input);
int start = 0;
while(matcher.find()) {
char[] replacement = new char[matcher.start() - start];
Arrays.fill(replacement, '+');
builder.append(new String(replacement)).append(word);
start = matcher.end();
}
if(start < input.length()) {
char[] replacement = new char[input.length() - start];
Arrays.fill(replacement, '+');
builder.append(new String(replacement));
}
return builder.toString();
}

You need to concatenate it using + operator of Java
if(str.matches("<"+word+">")){ // Now word will be replaced by the value
//do Anything
}

You cannot place arguments inside the regex pattern. You can create a regex object by concatenating variables with the regex pattern parts like this:
public String plusOut(String str, String word)
{
if(str.matches(".*("+ word + ".*" + word + "){1,}.*") || str.matches(".*(" + word + ".*" + word + ".*" + word + "){1,}.*"))
{
return str.replaceAll(".", "+");
}
else
{
return str;
}
}

Related

Java - How to add "\\" infront of "(" or ")" in a String

I am reading a document and removing some words to it.
I have the following function:
//Takes a string and removes the word
private static String removeWord(String string, String word) {
if (string.contains(word)) {
String tempWord = word.trim();
string = string.replaceAll(tempWord, "");
}
return string;
}
I have the following issue when I try to replace for example:
Hello world (
Gives me the following Error:
Caused by: java.util.regex.PatternSyntaxException: Unclosed group near index 14
Doing some research I find out that is because of split() expects a regular expression, and brackets are used to mark capturing groups in a regex.
So I did this:
private static String removeWord(String string, String word) {
if (string.contains(word)) {
String [] temp = word.split(" ");
word = "";
for (int i = 0; i < temp.length ; i++) {
if (temp[i].equals("(")){
word += " "+ "\\(";
}else if (temp[i].equals(")")){
word += " "+ "\\)";
} else {
word += temp[i] + " ";
}
}
String tempWord = word.trim();
string = string.replaceAll(tempWord, "");
}
return string;
}
This code isn't the best solution. Because sometimes the string is like (Hello world.
How can I improve this part of the code?
You seem to be trying to escape a regex manually. My advice is: Don't.
Even if you have successfully handled (), you still have a ton of other characters that have special meaning in regex to escape, such as *+[]\? just to name a few.
Luckily, there is a very convenient method called Pattern.quote that does this for you automatically:
private static String removeWord(String string, String word) {
if (string.contains(word)) {
String tempWord = word.trim();
string = string.replaceAll(Pattern.quote(tempWord), "");
}
return string;
}
private static String removeWord(String string, String word) {
return string.replaceFirst("\\W+" + word + "\\W+","");
}
\W matches a non word character enter link description here
. You can also use replaceAll if you want to replace all occurrences, and if you want to replace a specific number of occurrences then you can use the replaceFirst in a loop.

Recursion for printing out words in sentence backwards

I'm working on getting a method that prints the words of a sentence out backwards. I'm very close to a solution but am hitting a minor snag.
Here is my code:
public static String reverseString(String str) {
if (str.equals(""))
return "";
else {
int i = str.length() - 1;
while (!Character.isWhitespace(str.charAt(i))) {
if (i - 1 < 0)
break;
i--;
}
return str.substring(i,str.length()) + reverseString(str.substring(0,i));
}
}
Now the problem is the that the output from my test:
String test = "This is a test.";
System.out.println(reverseString(test));
Is giving me this back:
test. a isThis
Now, when I try to bump up the index of the substring being returned and add in the spaces manually, it cuts off the "T" in "This". That is, if I decide to instead return as follows:
return str.substring(i+1,str.length()) + " " + reverseString(str.substring(0,i));
then I get back
test. a is his
Does anyone have any advice or pointers on my implementation in general?
You can change the return statement to this:
return str.substring(i, str.length()).trim() + " " + reverseString(str.substring(0, i));
Split the sentence using String.split and then iterate over the resulting array backwards. To split at whitespace do
test.split(" +");
The split method takes a Regular Expression and the above means: split at one or more consecutive whitespaces.
Recursive approach:
public String reverse(final String s) {
final int pos = s.indexOf(' ');
if (pos > -1) {
return reverse(s.substring(pos + 1).trim()) + " " + s.substring(0, pos).trim();
}
return s;
}
In this approach you can selectively create substring based on whitespace. For input This is a test. below method will give return test. a is This. Idea here is if you have a leading space, you will actually convert to trailing space.
public static String reverseString(String str) {
if (str.equals(""))
return "";
else {
int i = str.length() - 1;
while (!Character.isWhitespace(str.charAt(i))) {
if (i - 1 < 0)
break;
i--;
}
String substring;
if(Character.isWhitespace(str.charAt(i)))
{
substring= str.substring(i+1,str.length())+" ";
}
else
{
substring= str.substring(i,str.length());
}
return substring + reverseString(str.substring(0,i));
}
}
Working with your code you would just need to add an additional space in front of whatever string you want to reverse such as with this code
reverseString(" " + str)
when you first execute the method.

How to remove only trailing spaces of a string in Java and keep leading spaces?

The trim() function removes both the trailing and leading space, however, if I only want to remove the trailing space of a string, how can I do it?
Since JDK 11
If you are on JDK 11 or higher you should probably be using stripTrailing().
Earlier JDK versions
Using the regular expression \s++$, you can replace all trailing space characters (includes space and tab characters) with the empty string ("").
final String text = " foo ";
System.out.println(text.replaceFirst("\\s++$", ""));
Output
foo
Online demo.
Here's a breakdown of the regex:
\s – any whitespace character,
++ – match one or more of the previous token (possessively); i.e., match one or more whitespace character. The + pattern is used in its possessive form ++, which takes less time to detect the case when the pattern does not match.
$ – the end of the string.
Thus, the regular expression will match as much whitespace as it can that is followed directly by the end of the string: in other words, the trailing whitespace.
The investment into learning regular expressions will become more valuable, if you need to extend your requirements later on.
References
Java regular expression syntax
Another option is to use Apache Commons StringUtils, specifically StringUtils.stripEnd
String stripped = StringUtils.stripEnd(" my lousy string "," ");
I modified the original java.lang.String.trim() method a bit and it should work:
public String trim(String str) {
int len = str.length();
int st = 0;
char[] val = str.toCharArray();
while ((st < len) && (val[len - 1] <= ' ')) {
len--;
}
return str.substring(st, len);
}
Test:
Test test = new Test();
String sample = " Hello World "; // A String with trailing and leading spaces
System.out.println(test.trim(sample) + " // No trailing spaces left");
Output:
Hello World // No trailing spaces left
As of JDK11 you can use stripTrailing:
String result = str.stripTrailing();
The most practical answer is #Micha's, Ahmad's is reverse of what you wanted so but here's what I came up with in case you'd prefer not to use unfamiliar tools or to see a concrete approach.
public String trimEnd( String myString ) {
for ( int i = myString.length() - 1; i >= 0; --i ) {
if ( myString.charAt(i) == ' ' ) {
continue;
} else {
myString = myString.substring( 0, ( i + 1 ) );
break;
}
}
return myString;
}
Used like:
public static void main( String[] args ) {
String s = " Some text here ";
System.out.println( s + "|" );
s = trimEnd( s );
System.out.println( s + "|" );
}
Output:
Some text here |
Some text here|
The best way in my opinion:
public static String trimEnd(String source) {
int pos = source.length() - 1;
while ((pos >= 0) && Character.isWhitespace(source.charAt(pos))) {
pos--;
}
pos++;
return (pos < source.length()) ? source.substring(0, pos) : source;
}
This does not allocate any temporary object to do the job and is faster than using a regular expression. Also it removes all whitespaces, not just ' '.
Here's a very short, efficient and easy-to-read version:
public static String trimTrailing(String str) {
if (str != null) {
for (int i = str.length() - 1; i >= 0; --i) {
if (str.charAt(i) != ' ') {
return str.substring(0, i + 1);
}
}
}
return str;
}
As an alternative to str.charAt(i) != ' ' you can also use !Character.isWhitespace(str.charAt(i) if you want to use a broader definition of whitespace.
Spring framework gives a useful org.springframework.util.StringUtils.
trimTrailingWhitespace
This code is intended to be read a easily as possible by using descriptive names (and avoiding regular expressions).
It does use Java 8's Optional so is not appropriate for everyone.
public static String removeTrailingWhitspace(String string) {
while (hasWhitespaceLastCharacter(string)) {
string = removeLastCharacter(string);
}
return string;
}
private static boolean hasWhitespaceLastCharacter(String string) {
return getLastCharacter(string)
.map(Character::isWhitespace)
.orElse(false);
}
private static Optional<Character> getLastCharacter(String string) {
if (string.isEmpty()) {
return Optional.empty();
}
return Optional.of(string.charAt(string.length() - 1));
}
private static String removeLastCharacter(String string) {
if (string.isEmpty()) {
throw new IllegalArgumentException("String must not be empty");
}
return string.substring(0, string.length() - 1);
}
String value= "Welcome to java ";
So we can use
value = value.trim();

How to get a string between two characters?

I have a string,
String s = "test string (67)";
I want to get the no 67 which is the string between ( and ).
Can anyone please tell me how to do this?
There's probably a really neat RegExp, but I'm noob in that area, so instead...
String s = "test string (67)";
s = s.substring(s.indexOf("(") + 1);
s = s.substring(0, s.indexOf(")"));
System.out.println(s);
A very useful solution to this issue which doesn't require from you to do the indexOf is using Apache Commons libraries.
StringUtils.substringBetween(s, "(", ")");
This method will allow you even handle even if there multiple occurrences of the closing string which wont be easy by looking for indexOf closing string.
You can download this library from here:
https://mvnrepository.com/artifact/org.apache.commons/commons-lang3/3.4
Try it like this
String s="test string(67)";
String requiredString = s.substring(s.indexOf("(") + 1, s.indexOf(")"));
The method's signature for substring is:
s.substring(int start, int end);
By using regular expression :
String s = "test string (67)";
Pattern p = Pattern.compile("\\(.*?\\)");
Matcher m = p.matcher(s);
if(m.find())
System.out.println(m.group().subSequence(1, m.group().length()-1));
Java supports Regular Expressions, but they're kind of cumbersome if you actually want to use them to extract matches. I think the easiest way to get at the string you want in your example is to just use the Regular Expression support in the String class's replaceAll method:
String x = "test string (67)".replaceAll(".*\\(|\\).*", "");
// x is now the String "67"
This simply deletes everything up-to-and-including the first (, and the same for the ) and everything thereafter. This just leaves the stuff between the parenthesis.
However, the result of this is still a String. If you want an integer result instead then you need to do another conversion:
int n = Integer.parseInt(x);
// n is now the integer 67
In a single line, I suggest:
String input = "test string (67)";
input = input.subString(input.indexOf("(")+1, input.lastIndexOf(")"));
System.out.println(input);`
You could use apache common library's StringUtils to do this.
import org.apache.commons.lang3.StringUtils;
...
String s = "test string (67)";
s = StringUtils.substringBetween(s, "(", ")");
....
Test String test string (67) from which you need to get the String which is nested in-between two Strings.
String str = "test string (67) and (77)", open = "(", close = ")";
Listed some possible ways: Simple Generic Solution:
String subStr = str.substring(str.indexOf( open ) + 1, str.indexOf( close ));
System.out.format("String[%s] Parsed IntValue[%d]\n", subStr, Integer.parseInt( subStr ));
Apache Software Foundation commons.lang3.
StringUtils class substringBetween() function gets the String that is nested in between two Strings. Only the first match is returned.
String substringBetween = StringUtils.substringBetween(subStr, open, close);
System.out.println("Commons Lang3 : "+ substringBetween);
Replaces the given String, with the String which is nested in between two Strings. #395
Pattern with Regular-Expressions: (\()(.*?)(\)).*
The Dot Matches (Almost) Any Character
.? = .{0,1}, .* = .{0,}, .+ = .{1,}
String patternMatch = patternMatch(generateRegex(open, close), str);
System.out.println("Regular expression Value : "+ patternMatch);
Regular-Expression with the utility class RegexUtils and some functions.
Pattern.DOTALL: Matches any character, including a line terminator.
Pattern.MULTILINE: Matches entire String from the start^ till end$ of the input sequence.
public static String generateRegex(String open, String close) {
return "(" + RegexUtils.escapeQuotes(open) + ")(.*?)(" + RegexUtils.escapeQuotes(close) + ").*";
}
public static String patternMatch(String regex, CharSequence string) {
final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
final Matcher matcher = pattern .matcher(string);
String returnGroupValue = null;
if (matcher.find()) { // while() { Pattern.MULTILINE }
System.out.println("Full match: " + matcher.group(0));
System.out.format("Character Index [Start:End]«[%d:%d]\n",matcher.start(),matcher.end());
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
if( i == 2 ) returnGroupValue = matcher.group( 2 );
}
}
return returnGroupValue;
}
String s = "test string (67)";
int start = 0; // '(' position in string
int end = 0; // ')' position in string
for(int i = 0; i < s.length(); i++) {
if(s.charAt(i) == '(') // Looking for '(' position in string
start = i;
else if(s.charAt(i) == ')') // Looking for ')' position in string
end = i;
}
String number = s.substring(start+1, end); // you take value between start and end
String result = s.substring(s.indexOf("(") + 1, s.indexOf(")"));
public String getStringBetweenTwoChars(String input, String startChar, String endChar) {
try {
int start = input.indexOf(startChar);
if (start != -1) {
int end = input.indexOf(endChar, start + startChar.length());
if (end != -1) {
return input.substring(start + startChar.length(), end);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return input; // return null; || return "" ;
}
Usage :
String input = "test string (67)";
String startChar = "(";
String endChar = ")";
String output = getStringBetweenTwoChars(input, startChar, endChar);
System.out.println(output);
// Output: "67"
Another way of doing using split method
public static void main(String[] args) {
String s = "test string (67)";
String[] ss;
ss= s.split("\\(");
ss = ss[1].split("\\)");
System.out.println(ss[0]);
}
Use Pattern and Matcher
public class Chk {
public static void main(String[] args) {
String s = "test string (67)";
ArrayList<String> arL = new ArrayList<String>();
ArrayList<String> inL = new ArrayList<String>();
Pattern pat = Pattern.compile("\\(\\w+\\)");
Matcher mat = pat.matcher(s);
while (mat.find()) {
arL.add(mat.group());
System.out.println(mat.group());
}
for (String sx : arL) {
Pattern p = Pattern.compile("(\\w+)");
Matcher m = p.matcher(sx);
while (m.find()) {
inL.add(m.group());
System.out.println(m.group());
}
}
System.out.println(inL);
}
}
The "generic" way of doing this is to parse the string from the start, throwing away all the characters before the first bracket, recording the characters after the first bracket, and throwing away the characters after the second bracket.
I'm sure there's a regex library or something to do it though.
The least generic way I found to do this with Regex and Pattern / Matcher classes:
String text = "test string (67)";
String START = "\\("; // A literal "(" character in regex
String END = "\\)"; // A literal ")" character in regex
// Captures the word(s) between the above two character(s)
String pattern = START + "(\w+)" + END;
Pattern pattern = Pattern.compile(pattern);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println(matcher.group()
.replace(START, "").replace(END, ""));
}
This may help for more complex regex problems where you want to get the text between two set of characters.
The other possible solution is to use lastIndexOf where it will look for character or String from backward.
In my scenario, I had following String and I had to extract <<UserName>>
1QAJK-WKJSH_MyApplication_Extract_<<UserName>>.arc
So, indexOf and StringUtils.substringBetween was not helpful as they start looking for character from beginning.
So, I used lastIndexOf
String str = "1QAJK-WKJSH_MyApplication_Extract_<<UserName>>.arc";
String userName = str.substring(str.lastIndexOf("_") + 1, str.lastIndexOf("."));
And, it gives me
<<UserName>>
String s = "test string (67)";
System.out.println(s.substring(s.indexOf("(")+1,s.indexOf(")")));
Something like this:
public static String innerSubString(String txt, char prefix, char suffix) {
if(txt != null && txt.length() > 1) {
int start = 0, end = 0;
char token;
for(int i = 0; i < txt.length(); i++) {
token = txt.charAt(i);
if(token == prefix)
start = i;
else if(token == suffix)
end = i;
}
if(start + 1 < end)
return txt.substring(start+1, end);
}
return null;
}
This is a simple use \D+ regex and job done.
This select all chars except digits, no need to complicate
/\D+/
it will return original string if no match regex
var iAm67 = "test string (67)".replaceFirst("test string \\((.*)\\)", "$1");
add matches to the code
String str = "test string (67)";
String regx = "test string \\((.*)\\)";
if (str.matches(regx)) {
var iAm67 = str.replaceFirst(regx, "$1");
}
---EDIT---
i use https://www.freeformatter.com/java-regex-tester.html#ad-output to test regex.
turn out it's better to add ? after * for less match. something like this:
String str = "test string (67)(69)";
String regx1 = "test string \\((.*)\\).*";
String regx2 = "test string \\((.*?)\\).*";
String ans1 = str.replaceFirst(regx1, "$1");
String ans2 = str.replaceFirst(regx2, "$1");
System.out.println("ans1:"+ans1+"\nans2:"+ans2);
// ans1:67)(69
// ans2:67
String s = "(69)";
System.out.println(s.substring(s.lastIndexOf('(')+1,s.lastIndexOf(')')));
Little extension to top (MadProgrammer) answer
public static String getTextBetween(final String wholeString, final String str1, String str2){
String s = wholeString.substring(wholeString.indexOf(str1) + str1.length());
s = s.substring(0, s.indexOf(str2));
return s;
}

Search and replace words in Java

I have a string, with characters a-z, A-Z, 0-9, (, ), +, -, etc.
I want to find every word within that string and replace it with the same word with 'word' (single quotes added). Words in that string can be preceded/followed by "(", ")", and spaces.
How do I go about doing that?
Input:
(Movie + 2000)
Output:
('Movie' + '2000')
Keep it simple! This does what you need:
String input = "(Movie + 2000)";
input.replaceAll("\\b", "'");
// Outputs "('Movie' + '2000')"
This uses the regex \b, which is a "word boundary". What could be simpler?
As stated in the comments, regex is a good way to go:
String input = "(Movie + 2000)";
input = input.replaceAll("[A-Za-z0-9]+", "'$0'");
You don't give a precise defition of 'word', so I assume it is any combination of letters and numbers.
EDIT OK, thanks to #Buhb for explaining why this solution is not the best one. Better solution was given by #Bohemian.
public class Main {
/**
* #param args
*/
public static void main(String[] args) {
String str1 = "Hello string";
String str2 = "str";
System.out.println(replace(str1, str2, "'" + str2 + "'"));
}
static String replace(String str, String pattern, String replace) {
int s = 0;
int e = 0;
StringBuffer result = new StringBuffer();
while ((e = str.indexOf(pattern, s)) >= 0) {
result.append(str.substring(s, e));
result.append(replace);
s = e + pattern.length();
}
result.append(str.substring(s));
return result.toString();
}
}
Output: Hello 'str'ing
WBR
It makes sense to return string only if a replacement took place, see below:
if(s>0)
return result.toString();
else
return null;

Categories

Resources