Search and replace words in Java - java

I have a string, with characters a-z, A-Z, 0-9, (, ), +, -, etc.
I want to find every word within that string and replace it with the same word with 'word' (single quotes added). Words in that string can be preceded/followed by "(", ")", and spaces.
How do I go about doing that?
Input:
(Movie + 2000)
Output:
('Movie' + '2000')

Keep it simple! This does what you need:
String input = "(Movie + 2000)";
input.replaceAll("\\b", "'");
// Outputs "('Movie' + '2000')"
This uses the regex \b, which is a "word boundary". What could be simpler?

As stated in the comments, regex is a good way to go:
String input = "(Movie + 2000)";
input = input.replaceAll("[A-Za-z0-9]+", "'$0'");
You don't give a precise defition of 'word', so I assume it is any combination of letters and numbers.
EDIT OK, thanks to #Buhb for explaining why this solution is not the best one. Better solution was given by #Bohemian.

public class Main {
/**
* #param args
*/
public static void main(String[] args) {
String str1 = "Hello string";
String str2 = "str";
System.out.println(replace(str1, str2, "'" + str2 + "'"));
}
static String replace(String str, String pattern, String replace) {
int s = 0;
int e = 0;
StringBuffer result = new StringBuffer();
while ((e = str.indexOf(pattern, s)) >= 0) {
result.append(str.substring(s, e));
result.append(replace);
s = e + pattern.length();
}
result.append(str.substring(s));
return result.toString();
}
}
Output: Hello 'str'ing
WBR

It makes sense to return string only if a replacement took place, see below:
if(s>0)
return result.toString();
else
return null;

Related

How to put an argument in a RegEx?

So, I am trying to use an argument in a RegEx pattern and I can't find a pattern because the argument is a simple String which is contained in the bigger string. Here is the the task itself, which I took from this codingbat.com, so everything to be clear:
THE Precondition and explanation of the task.
Given a string and a non-empty word string, return a version of the
original String where all chars have been replaced by pluses ("+"),
except for appearances of the word string which are preserved
unchanged.
My code:
public String plusOut(String str, String word) {
if(str.matches(".*(<word>.*<word>){1,}.*") || str.matches(".*(<word>.*<word>.*<word>){1,}.*")) {
return str.replaceAll(".", "+"); //after finding the argument I can easily exclude it but for now I have a bigger problem in the if-condition
} else {
return str;
}
}
Is there a way in Java to match an argument? The above code doesn't work for obvious reasons (<word>). How to use the argument word in the string RegEx?
UPDATE
This is the closest I got but it works only for the last char of the word String.
public String plusOut(String str, String word)
{
if(str.matches(".*("+ word + ".*" + word + "){1,}.*") || str.matches(".*(" + word + ".*" + word + ".*" + word + "){1,}.*") || str.matches(".*("+ word + "){1,}.*"))
{
return str.replaceAll(".(?<!" + word + ")", "+");
} else {
return str;
}
}
Input/Output
plusOut("12xy34", "xy") → "+++y++" (Expected "++xy++")
plusOut("12xy34", "1") → "1+++++" (Expected "1+++++")
plusOut("12xy34xyabcxy", "xy") → "+++y+++y++++y" (Expected "++xy++xy+++xy")
It`s because of the ? in the RegEx.
You can't do it with only patterns, you'll have to write some code apart from the pattern. Try this:
public static String plusOut(String input, String word) {
StringBuilder builder = new StringBuilder();
Pattern pattern = Pattern.compile(Pattern.quote(word));
Matcher matcher = pattern.matcher(input);
int start = 0;
while(matcher.find()) {
char[] replacement = new char[matcher.start() - start];
Arrays.fill(replacement, '+');
builder.append(new String(replacement)).append(word);
start = matcher.end();
}
if(start < input.length()) {
char[] replacement = new char[input.length() - start];
Arrays.fill(replacement, '+');
builder.append(new String(replacement));
}
return builder.toString();
}
You need to concatenate it using + operator of Java
if(str.matches("<"+word+">")){ // Now word will be replaced by the value
//do Anything
}
You cannot place arguments inside the regex pattern. You can create a regex object by concatenating variables with the regex pattern parts like this:
public String plusOut(String str, String word)
{
if(str.matches(".*("+ word + ".*" + word + "){1,}.*") || str.matches(".*(" + word + ".*" + word + ".*" + word + "){1,}.*"))
{
return str.replaceAll(".", "+");
}
else
{
return str;
}
}

How to get a string between two characters?

I have a string,
String s = "test string (67)";
I want to get the no 67 which is the string between ( and ).
Can anyone please tell me how to do this?
There's probably a really neat RegExp, but I'm noob in that area, so instead...
String s = "test string (67)";
s = s.substring(s.indexOf("(") + 1);
s = s.substring(0, s.indexOf(")"));
System.out.println(s);
A very useful solution to this issue which doesn't require from you to do the indexOf is using Apache Commons libraries.
StringUtils.substringBetween(s, "(", ")");
This method will allow you even handle even if there multiple occurrences of the closing string which wont be easy by looking for indexOf closing string.
You can download this library from here:
https://mvnrepository.com/artifact/org.apache.commons/commons-lang3/3.4
Try it like this
String s="test string(67)";
String requiredString = s.substring(s.indexOf("(") + 1, s.indexOf(")"));
The method's signature for substring is:
s.substring(int start, int end);
By using regular expression :
String s = "test string (67)";
Pattern p = Pattern.compile("\\(.*?\\)");
Matcher m = p.matcher(s);
if(m.find())
System.out.println(m.group().subSequence(1, m.group().length()-1));
Java supports Regular Expressions, but they're kind of cumbersome if you actually want to use them to extract matches. I think the easiest way to get at the string you want in your example is to just use the Regular Expression support in the String class's replaceAll method:
String x = "test string (67)".replaceAll(".*\\(|\\).*", "");
// x is now the String "67"
This simply deletes everything up-to-and-including the first (, and the same for the ) and everything thereafter. This just leaves the stuff between the parenthesis.
However, the result of this is still a String. If you want an integer result instead then you need to do another conversion:
int n = Integer.parseInt(x);
// n is now the integer 67
In a single line, I suggest:
String input = "test string (67)";
input = input.subString(input.indexOf("(")+1, input.lastIndexOf(")"));
System.out.println(input);`
You could use apache common library's StringUtils to do this.
import org.apache.commons.lang3.StringUtils;
...
String s = "test string (67)";
s = StringUtils.substringBetween(s, "(", ")");
....
Test String test string (67) from which you need to get the String which is nested in-between two Strings.
String str = "test string (67) and (77)", open = "(", close = ")";
Listed some possible ways: Simple Generic Solution:
String subStr = str.substring(str.indexOf( open ) + 1, str.indexOf( close ));
System.out.format("String[%s] Parsed IntValue[%d]\n", subStr, Integer.parseInt( subStr ));
Apache Software Foundation commons.lang3.
StringUtils class substringBetween() function gets the String that is nested in between two Strings. Only the first match is returned.
String substringBetween = StringUtils.substringBetween(subStr, open, close);
System.out.println("Commons Lang3 : "+ substringBetween);
Replaces the given String, with the String which is nested in between two Strings. #395
Pattern with Regular-Expressions: (\()(.*?)(\)).*
The Dot Matches (Almost) Any Character
.? = .{0,1}, .* = .{0,}, .+ = .{1,}
String patternMatch = patternMatch(generateRegex(open, close), str);
System.out.println("Regular expression Value : "+ patternMatch);
Regular-Expression with the utility class RegexUtils and some functions.
Pattern.DOTALL: Matches any character, including a line terminator.
Pattern.MULTILINE: Matches entire String from the start^ till end$ of the input sequence.
public static String generateRegex(String open, String close) {
return "(" + RegexUtils.escapeQuotes(open) + ")(.*?)(" + RegexUtils.escapeQuotes(close) + ").*";
}
public static String patternMatch(String regex, CharSequence string) {
final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
final Matcher matcher = pattern .matcher(string);
String returnGroupValue = null;
if (matcher.find()) { // while() { Pattern.MULTILINE }
System.out.println("Full match: " + matcher.group(0));
System.out.format("Character Index [Start:End]«[%d:%d]\n",matcher.start(),matcher.end());
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
if( i == 2 ) returnGroupValue = matcher.group( 2 );
}
}
return returnGroupValue;
}
String s = "test string (67)";
int start = 0; // '(' position in string
int end = 0; // ')' position in string
for(int i = 0; i < s.length(); i++) {
if(s.charAt(i) == '(') // Looking for '(' position in string
start = i;
else if(s.charAt(i) == ')') // Looking for ')' position in string
end = i;
}
String number = s.substring(start+1, end); // you take value between start and end
String result = s.substring(s.indexOf("(") + 1, s.indexOf(")"));
public String getStringBetweenTwoChars(String input, String startChar, String endChar) {
try {
int start = input.indexOf(startChar);
if (start != -1) {
int end = input.indexOf(endChar, start + startChar.length());
if (end != -1) {
return input.substring(start + startChar.length(), end);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return input; // return null; || return "" ;
}
Usage :
String input = "test string (67)";
String startChar = "(";
String endChar = ")";
String output = getStringBetweenTwoChars(input, startChar, endChar);
System.out.println(output);
// Output: "67"
Another way of doing using split method
public static void main(String[] args) {
String s = "test string (67)";
String[] ss;
ss= s.split("\\(");
ss = ss[1].split("\\)");
System.out.println(ss[0]);
}
Use Pattern and Matcher
public class Chk {
public static void main(String[] args) {
String s = "test string (67)";
ArrayList<String> arL = new ArrayList<String>();
ArrayList<String> inL = new ArrayList<String>();
Pattern pat = Pattern.compile("\\(\\w+\\)");
Matcher mat = pat.matcher(s);
while (mat.find()) {
arL.add(mat.group());
System.out.println(mat.group());
}
for (String sx : arL) {
Pattern p = Pattern.compile("(\\w+)");
Matcher m = p.matcher(sx);
while (m.find()) {
inL.add(m.group());
System.out.println(m.group());
}
}
System.out.println(inL);
}
}
The "generic" way of doing this is to parse the string from the start, throwing away all the characters before the first bracket, recording the characters after the first bracket, and throwing away the characters after the second bracket.
I'm sure there's a regex library or something to do it though.
The least generic way I found to do this with Regex and Pattern / Matcher classes:
String text = "test string (67)";
String START = "\\("; // A literal "(" character in regex
String END = "\\)"; // A literal ")" character in regex
// Captures the word(s) between the above two character(s)
String pattern = START + "(\w+)" + END;
Pattern pattern = Pattern.compile(pattern);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println(matcher.group()
.replace(START, "").replace(END, ""));
}
This may help for more complex regex problems where you want to get the text between two set of characters.
The other possible solution is to use lastIndexOf where it will look for character or String from backward.
In my scenario, I had following String and I had to extract <<UserName>>
1QAJK-WKJSH_MyApplication_Extract_<<UserName>>.arc
So, indexOf and StringUtils.substringBetween was not helpful as they start looking for character from beginning.
So, I used lastIndexOf
String str = "1QAJK-WKJSH_MyApplication_Extract_<<UserName>>.arc";
String userName = str.substring(str.lastIndexOf("_") + 1, str.lastIndexOf("."));
And, it gives me
<<UserName>>
String s = "test string (67)";
System.out.println(s.substring(s.indexOf("(")+1,s.indexOf(")")));
Something like this:
public static String innerSubString(String txt, char prefix, char suffix) {
if(txt != null && txt.length() > 1) {
int start = 0, end = 0;
char token;
for(int i = 0; i < txt.length(); i++) {
token = txt.charAt(i);
if(token == prefix)
start = i;
else if(token == suffix)
end = i;
}
if(start + 1 < end)
return txt.substring(start+1, end);
}
return null;
}
This is a simple use \D+ regex and job done.
This select all chars except digits, no need to complicate
/\D+/
it will return original string if no match regex
var iAm67 = "test string (67)".replaceFirst("test string \\((.*)\\)", "$1");
add matches to the code
String str = "test string (67)";
String regx = "test string \\((.*)\\)";
if (str.matches(regx)) {
var iAm67 = str.replaceFirst(regx, "$1");
}
---EDIT---
i use https://www.freeformatter.com/java-regex-tester.html#ad-output to test regex.
turn out it's better to add ? after * for less match. something like this:
String str = "test string (67)(69)";
String regx1 = "test string \\((.*)\\).*";
String regx2 = "test string \\((.*?)\\).*";
String ans1 = str.replaceFirst(regx1, "$1");
String ans2 = str.replaceFirst(regx2, "$1");
System.out.println("ans1:"+ans1+"\nans2:"+ans2);
// ans1:67)(69
// ans2:67
String s = "(69)";
System.out.println(s.substring(s.lastIndexOf('(')+1,s.lastIndexOf(')')));
Little extension to top (MadProgrammer) answer
public static String getTextBetween(final String wholeString, final String str1, String str2){
String s = wholeString.substring(wholeString.indexOf(str1) + str1.length());
s = s.substring(0, s.indexOf(str2));
return s;
}

How to Split a string in java based on limit

I have following String and i want to split this string into number of sub strings(by taking ',' as a delimeter) when its length reaches 36. Its not exactly splitting on 36'th position
String message = "This is some(sampletext), and has to be splited properly";
I want to get the output as two substrings follows:
1. 'This is some (sampletext)'
2. 'and has to be splited properly'
Thanks in advance.
A solution based on regex:
String s = "This is some sample text and has to be splited properly";
Pattern splitPattern = Pattern.compile(".{1,15}\\b");
Matcher m = splitPattern.matcher(s);
List<String> stringList = new ArrayList<String>();
while (m.find()) {
stringList.add(m.group(0).trim());
}
Update:
trim() can be droped by changing the pattern to end in space or end of string:
String s = "This is some sample text and has to be splited properly";
Pattern splitPattern = Pattern.compile("(.{1,15})\\b( |$)");
Matcher m = splitPattern.matcher(s);
List<String> stringList = new ArrayList<String>();
while (m.find()) {
stringList.add(m.group(1));
}
group(1) means that I only need the first part of the pattern (.{1,15}) as output.
.{1,15} - a sequence of any characters (".") with any length between 1 and 15 ({1,15})
\b - a word break (a non-character before of after any word)
( |$) - space or end of string
In addition I've added () surrounding .{1,15} so I can use it as a whole group (m.group(1)).
Depending on the desired result, this expression can be tweaked.
Update:
If you want to split message by comma only if it's length would be over 36, try the following expression:
Pattern splitPattern = Pattern.compile("(.{1,36})\\b(,|$)");
The best solution I can think of is to make a function that iterates through the string. In the function you could keep track of whitespace characters, and for each 16th position you could add a substring to a list based on the position of the last encountered whitespace. After it has found a substring, you start anew from the last encountered whitespace. Then you simply return the list of substrings.
Here's a tidy answer:
String message = "This is some sample text and has to be splited properly";
String[] temp = message.split("(?<=^.{1,16}) ");
String part1 = message.substring(0, message.length() - temp[temp.length - 1].length() - 1);
String part2 = message.substring(message.length() - temp[temp.length - 1].length());
This should work on all inputs, except when there are sequences of chars without whitespace longer than 16. It also creates the minimum amount of extra Strings by indexing into the original one.
public static void main(String[] args) throws IOException
{
String message = "This is some sample text and has to be splited properly";
List<String> result = new ArrayList<String>();
int start = 0;
while (start + 16 < message.length())
{
int end = start + 16;
while (!Character.isWhitespace(message.charAt(end--)));
result.add(message.substring(start, end + 1));
start = end + 2;
}
result.add(message.substring(start));
System.out.println(result);
}
If you have a simple text as the one you showed above (words separated by blank spaces) you can always think of StringTokenizer. Here's some simple code working for your case:
public static void main(String[] args) {
String message = "This is some sample text and has to be splited properly";
while (message.length() > 0) {
String token = "";
StringTokenizer st = new StringTokenizer(message);
while (st.hasMoreTokens()) {
String nt = st.nextToken();
String foo = "";
if (token.length()==0) {
foo = nt;
}
else {
foo = token + " " + nt;
}
if (foo.length() < 16)
token = foo;
else {
System.out.print("'" + token + "' ");
message = message.substring(token.length() + 1, message.length());
break;
}
if (!st.hasMoreTokens()) {
System.out.print("'" + token + "' ");
message = message.substring(token.length(), message.length());
}
}
}
}

How do i remove the whitespace?

I'm trying to create a palindrome tester program for my AP Java class and I need to remove the white spaces in my code completely but it's not letting me do so.
import java.util.Scanner;
public class Palin{
public static boolean isPalindrome(String stringToTest) {
String workingCopy = removeJunk(stringToTest);
String reversedCopy = reverse(workingCopy);
return reversedCopy.equalsIgnoreCase(workingCopy);
}
public static String removeJunk(String string) {
int i, len = string.length();
StringBuffer dest = new StringBuffer(len);
char c;
for (i = (len - 1); i >= 0; i-=1) {
c = string.charAt(i);
if (Character.isLetterOrDigit(c))
{
dest.append(c);
}
}
return dest.toString();
}
public static String reverse(String string) {
StringBuffer sb = new StringBuffer(string);
return sb.reverse().toString();
}
public static void main(String[] args) {
System.out.print("Enter Palindrome: ");
Scanner sc = new Scanner(System.in);
String string = sc.next();
String str = string;
String space = "";
String result = str.replaceAll("\\W", space);
System.out.println(result);
System.out.println();
System.out.println("Testing palindrome:");
System.out.println(" " + string);
System.out.println();
if (isPalindrome(result)) {
System.out.println("It's a palindrome!");
} else {
System.out.println("Not a palindrome!");
}
System.out.println();
}
}
Any help would be greatly appreciated.
Seems like your code is fine except for the following. You are using
String string = sc.next();
which will not read the whole line of input, hence you will lose part of the text. I think you should use the following instead of that line.
String string = sc.nextLine();
If you just want to remove the beginning and ending whitespace, you can use the built in function trim(), e.g. " abcd ".trim() is "abcd"
If you want to remove it everywhere, you can use the replaceAll() method with the whitespace class as the parameter, e.g. " abcd ".replaceAll("\W","").
Use a StringTokenizer to remove " "
StringTokenizer st = new StringTokenizer(string," ",false);
String t="";
while (st.hasMoreElements()) t += st.nextElement();
String result = t;
System.out.println(result);
I haven't actually tesed this, but have you considered the String.replaceAll(String regex, String replacement) method?
public static String removeJunk (String string) {
return string.replaceAll (" ", "");
}
Another thing to look out for is that while removing all non-digit/alpha characters removeJunk also reverses the string (it starts from the end and then appends one character at a time).
So after reversing it again (in reverse) you are left with the original and it will always claim that the given string is a palindrome.
You should use the String replace(char oldChar, char newChar) method.
Although the name suggests that only the first occurrence will be replaced, fact is that all occurrences will be replaced. The advantage of this method is that it won't use regular expressions, thus is more efficient.
So give a try to string.replace(' ', '');

Breaking Strings into chars that are in upper case

I'm making a method to read a whole class code and do some stuff with it.
What I want to do is get the name of the method, and make a String with it.
Something like removeProduct
I'll make a String "Remove Product"
How can I split the name method in capital cases?
How can I build this new string with the first letter of each word as capital case?
I'm doing it with substring, is there a easier and better way to do it?
ps: I'm sure my brazilian English didn't help on title. If anyone can make it looks better, I'd appreciate.
Don't bother reinvent the wheel, use the method in commons-lang
String input = "methodName";
String[] words = StringUtils.splitByCharacterTypeCamelCase(methodName);
String humanised = StringUtils.join(words, ' ');
You can use a regular expression to split the name into the various words, and then capitalize the first one:
public static void main(String[] args) {
String input = "removeProduct";
//split into words
String[] words = input.split("(?=[A-Z])");
words[0] = capitalizeFirstLetter(words[0]);
//join
StringBuilder builder = new StringBuilder();
for ( String s : words ) {
builder.append(s).append(" ");
}
System.out.println(builder.toString());
}
private static String capitalizeFirstLetter(String in) {
return in.substring(0, 1).toUpperCase() + in.substring(1);
}
Note that this needs better corner case handling, such as not appending a space at the end and handling 1-char words.
Edit: I meant to explain the regex. The regular expression (?=[A-Z]) is a zero-width assertion (positive lookahead) matching a position where the next character is between 'A' and 'Z'.
You can do this in 2 steps:
1 - Make the first letter of the string uppercase.
2 - Insert an space before an uppercase letter which is preceded by a lowercase letter.
For step 1 you can use a function and for step 2 you can use String.replaceAll method:
String str = "removeProduct";
str = capitalizeFirst(str);
str = str.replaceAll("(?<=[^A-Z])([A-Z])"," $1");
static String capitalizeFirst(String input) {
return input.substring(0, 1).toUpperCase() + input.substring(1);
}
Code In Action
#MrWiggles is right.
Just one more way to do this without being fancy :)
import java.util.StringTokenizer;
public class StringUtil {
public static String captilizeFirstLetter(String token) {
return Character.toUpperCase(token.charAt(0)) + token.substring(1);
}
public static String convert(String str) {
final StringTokenizer st = new StringTokenizer(str,
"A B C D E F G H I J K L M N O P Q R S T U V W X Y Z", true);
final StringBuilder sb = new StringBuilder();
String token;
if (st.hasMoreTokens()) {
token = st.nextToken();
sb.append(StringUtil.captilizeFirstLetter(token) + " ");
}
while (st.hasMoreTokens()) {
token = st.nextToken();
if (st.hasMoreTokens()) {
token = token + st.nextToken();
}
sb.append(StringUtil.captilizeFirstLetter(token) + " ");
}
return sb.toString().trim();
}
public static void main(String[] args) throws Exception {
String words = StringUtil.convert("helloWorldHowAreYou");
System.out.println(words);
}
}
public String convertMethodName(String methodName) {
StringBuilder sb = new StringBuilder().append(Character.toUpperCase(methodName.charAt(0)));
for (int i = 1; i < methodName.length(); i++) {
char c = methodName.charAt(i);
if (Character.isUpperCase(c)) {
sb.append(' ');
}
sb.append(c);
}
return sb.toString();
}
Handling it this way may give you some finer control in case you want to add in functionality later for other situations (multiple caps in a row, etc.). Basically, for each character, it just checks to see if it's within the bounds of capital letters (character codes 65-90, inclusive), and if so, adds a space to the buffer before the word begins.
EDIT: Using Character.isUpperCase()

Categories

Resources