I am reading a document and removing some words to it.
I have the following function:
//Takes a string and removes the word
private static String removeWord(String string, String word) {
if (string.contains(word)) {
String tempWord = word.trim();
string = string.replaceAll(tempWord, "");
}
return string;
}
I have the following issue when I try to replace for example:
Hello world (
Gives me the following Error:
Caused by: java.util.regex.PatternSyntaxException: Unclosed group near index 14
Doing some research I find out that is because of split() expects a regular expression, and brackets are used to mark capturing groups in a regex.
So I did this:
private static String removeWord(String string, String word) {
if (string.contains(word)) {
String [] temp = word.split(" ");
word = "";
for (int i = 0; i < temp.length ; i++) {
if (temp[i].equals("(")){
word += " "+ "\\(";
}else if (temp[i].equals(")")){
word += " "+ "\\)";
} else {
word += temp[i] + " ";
}
}
String tempWord = word.trim();
string = string.replaceAll(tempWord, "");
}
return string;
}
This code isn't the best solution. Because sometimes the string is like (Hello world.
How can I improve this part of the code?
You seem to be trying to escape a regex manually. My advice is: Don't.
Even if you have successfully handled (), you still have a ton of other characters that have special meaning in regex to escape, such as *+[]\? just to name a few.
Luckily, there is a very convenient method called Pattern.quote that does this for you automatically:
private static String removeWord(String string, String word) {
if (string.contains(word)) {
String tempWord = word.trim();
string = string.replaceAll(Pattern.quote(tempWord), "");
}
return string;
}
private static String removeWord(String string, String word) {
return string.replaceFirst("\\W+" + word + "\\W+","");
}
\W matches a non word character enter link description here
. You can also use replaceAll if you want to replace all occurrences, and if you want to replace a specific number of occurrences then you can use the replaceFirst in a loop.
Related
So, I am trying to use an argument in a RegEx pattern and I can't find a pattern because the argument is a simple String which is contained in the bigger string. Here is the the task itself, which I took from this codingbat.com, so everything to be clear:
THE Precondition and explanation of the task.
Given a string and a non-empty word string, return a version of the
original String where all chars have been replaced by pluses ("+"),
except for appearances of the word string which are preserved
unchanged.
My code:
public String plusOut(String str, String word) {
if(str.matches(".*(<word>.*<word>){1,}.*") || str.matches(".*(<word>.*<word>.*<word>){1,}.*")) {
return str.replaceAll(".", "+"); //after finding the argument I can easily exclude it but for now I have a bigger problem in the if-condition
} else {
return str;
}
}
Is there a way in Java to match an argument? The above code doesn't work for obvious reasons (<word>). How to use the argument word in the string RegEx?
UPDATE
This is the closest I got but it works only for the last char of the word String.
public String plusOut(String str, String word)
{
if(str.matches(".*("+ word + ".*" + word + "){1,}.*") || str.matches(".*(" + word + ".*" + word + ".*" + word + "){1,}.*") || str.matches(".*("+ word + "){1,}.*"))
{
return str.replaceAll(".(?<!" + word + ")", "+");
} else {
return str;
}
}
Input/Output
plusOut("12xy34", "xy") → "+++y++" (Expected "++xy++")
plusOut("12xy34", "1") → "1+++++" (Expected "1+++++")
plusOut("12xy34xyabcxy", "xy") → "+++y+++y++++y" (Expected "++xy++xy+++xy")
It`s because of the ? in the RegEx.
You can't do it with only patterns, you'll have to write some code apart from the pattern. Try this:
public static String plusOut(String input, String word) {
StringBuilder builder = new StringBuilder();
Pattern pattern = Pattern.compile(Pattern.quote(word));
Matcher matcher = pattern.matcher(input);
int start = 0;
while(matcher.find()) {
char[] replacement = new char[matcher.start() - start];
Arrays.fill(replacement, '+');
builder.append(new String(replacement)).append(word);
start = matcher.end();
}
if(start < input.length()) {
char[] replacement = new char[input.length() - start];
Arrays.fill(replacement, '+');
builder.append(new String(replacement));
}
return builder.toString();
}
You need to concatenate it using + operator of Java
if(str.matches("<"+word+">")){ // Now word will be replaced by the value
//do Anything
}
You cannot place arguments inside the regex pattern. You can create a regex object by concatenating variables with the regex pattern parts like this:
public String plusOut(String str, String word)
{
if(str.matches(".*("+ word + ".*" + word + "){1,}.*") || str.matches(".*(" + word + ".*" + word + ".*" + word + "){1,}.*"))
{
return str.replaceAll(".", "+");
}
else
{
return str;
}
}
I have a string,
String s = "test string (67)";
I want to get the no 67 which is the string between ( and ).
Can anyone please tell me how to do this?
There's probably a really neat RegExp, but I'm noob in that area, so instead...
String s = "test string (67)";
s = s.substring(s.indexOf("(") + 1);
s = s.substring(0, s.indexOf(")"));
System.out.println(s);
A very useful solution to this issue which doesn't require from you to do the indexOf is using Apache Commons libraries.
StringUtils.substringBetween(s, "(", ")");
This method will allow you even handle even if there multiple occurrences of the closing string which wont be easy by looking for indexOf closing string.
You can download this library from here:
https://mvnrepository.com/artifact/org.apache.commons/commons-lang3/3.4
Try it like this
String s="test string(67)";
String requiredString = s.substring(s.indexOf("(") + 1, s.indexOf(")"));
The method's signature for substring is:
s.substring(int start, int end);
By using regular expression :
String s = "test string (67)";
Pattern p = Pattern.compile("\\(.*?\\)");
Matcher m = p.matcher(s);
if(m.find())
System.out.println(m.group().subSequence(1, m.group().length()-1));
Java supports Regular Expressions, but they're kind of cumbersome if you actually want to use them to extract matches. I think the easiest way to get at the string you want in your example is to just use the Regular Expression support in the String class's replaceAll method:
String x = "test string (67)".replaceAll(".*\\(|\\).*", "");
// x is now the String "67"
This simply deletes everything up-to-and-including the first (, and the same for the ) and everything thereafter. This just leaves the stuff between the parenthesis.
However, the result of this is still a String. If you want an integer result instead then you need to do another conversion:
int n = Integer.parseInt(x);
// n is now the integer 67
In a single line, I suggest:
String input = "test string (67)";
input = input.subString(input.indexOf("(")+1, input.lastIndexOf(")"));
System.out.println(input);`
You could use apache common library's StringUtils to do this.
import org.apache.commons.lang3.StringUtils;
...
String s = "test string (67)";
s = StringUtils.substringBetween(s, "(", ")");
....
Test String test string (67) from which you need to get the String which is nested in-between two Strings.
String str = "test string (67) and (77)", open = "(", close = ")";
Listed some possible ways: Simple Generic Solution:
String subStr = str.substring(str.indexOf( open ) + 1, str.indexOf( close ));
System.out.format("String[%s] Parsed IntValue[%d]\n", subStr, Integer.parseInt( subStr ));
Apache Software Foundation commons.lang3.
StringUtils class substringBetween() function gets the String that is nested in between two Strings. Only the first match is returned.
String substringBetween = StringUtils.substringBetween(subStr, open, close);
System.out.println("Commons Lang3 : "+ substringBetween);
Replaces the given String, with the String which is nested in between two Strings. #395
Pattern with Regular-Expressions: (\()(.*?)(\)).*
The Dot Matches (Almost) Any Character
.? = .{0,1}, .* = .{0,}, .+ = .{1,}
String patternMatch = patternMatch(generateRegex(open, close), str);
System.out.println("Regular expression Value : "+ patternMatch);
Regular-Expression with the utility class RegexUtils and some functions.
Pattern.DOTALL: Matches any character, including a line terminator.
Pattern.MULTILINE: Matches entire String from the start^ till end$ of the input sequence.
public static String generateRegex(String open, String close) {
return "(" + RegexUtils.escapeQuotes(open) + ")(.*?)(" + RegexUtils.escapeQuotes(close) + ").*";
}
public static String patternMatch(String regex, CharSequence string) {
final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
final Matcher matcher = pattern .matcher(string);
String returnGroupValue = null;
if (matcher.find()) { // while() { Pattern.MULTILINE }
System.out.println("Full match: " + matcher.group(0));
System.out.format("Character Index [Start:End]«[%d:%d]\n",matcher.start(),matcher.end());
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
if( i == 2 ) returnGroupValue = matcher.group( 2 );
}
}
return returnGroupValue;
}
String s = "test string (67)";
int start = 0; // '(' position in string
int end = 0; // ')' position in string
for(int i = 0; i < s.length(); i++) {
if(s.charAt(i) == '(') // Looking for '(' position in string
start = i;
else if(s.charAt(i) == ')') // Looking for ')' position in string
end = i;
}
String number = s.substring(start+1, end); // you take value between start and end
String result = s.substring(s.indexOf("(") + 1, s.indexOf(")"));
public String getStringBetweenTwoChars(String input, String startChar, String endChar) {
try {
int start = input.indexOf(startChar);
if (start != -1) {
int end = input.indexOf(endChar, start + startChar.length());
if (end != -1) {
return input.substring(start + startChar.length(), end);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return input; // return null; || return "" ;
}
Usage :
String input = "test string (67)";
String startChar = "(";
String endChar = ")";
String output = getStringBetweenTwoChars(input, startChar, endChar);
System.out.println(output);
// Output: "67"
Another way of doing using split method
public static void main(String[] args) {
String s = "test string (67)";
String[] ss;
ss= s.split("\\(");
ss = ss[1].split("\\)");
System.out.println(ss[0]);
}
Use Pattern and Matcher
public class Chk {
public static void main(String[] args) {
String s = "test string (67)";
ArrayList<String> arL = new ArrayList<String>();
ArrayList<String> inL = new ArrayList<String>();
Pattern pat = Pattern.compile("\\(\\w+\\)");
Matcher mat = pat.matcher(s);
while (mat.find()) {
arL.add(mat.group());
System.out.println(mat.group());
}
for (String sx : arL) {
Pattern p = Pattern.compile("(\\w+)");
Matcher m = p.matcher(sx);
while (m.find()) {
inL.add(m.group());
System.out.println(m.group());
}
}
System.out.println(inL);
}
}
The "generic" way of doing this is to parse the string from the start, throwing away all the characters before the first bracket, recording the characters after the first bracket, and throwing away the characters after the second bracket.
I'm sure there's a regex library or something to do it though.
The least generic way I found to do this with Regex and Pattern / Matcher classes:
String text = "test string (67)";
String START = "\\("; // A literal "(" character in regex
String END = "\\)"; // A literal ")" character in regex
// Captures the word(s) between the above two character(s)
String pattern = START + "(\w+)" + END;
Pattern pattern = Pattern.compile(pattern);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println(matcher.group()
.replace(START, "").replace(END, ""));
}
This may help for more complex regex problems where you want to get the text between two set of characters.
The other possible solution is to use lastIndexOf where it will look for character or String from backward.
In my scenario, I had following String and I had to extract <<UserName>>
1QAJK-WKJSH_MyApplication_Extract_<<UserName>>.arc
So, indexOf and StringUtils.substringBetween was not helpful as they start looking for character from beginning.
So, I used lastIndexOf
String str = "1QAJK-WKJSH_MyApplication_Extract_<<UserName>>.arc";
String userName = str.substring(str.lastIndexOf("_") + 1, str.lastIndexOf("."));
And, it gives me
<<UserName>>
String s = "test string (67)";
System.out.println(s.substring(s.indexOf("(")+1,s.indexOf(")")));
Something like this:
public static String innerSubString(String txt, char prefix, char suffix) {
if(txt != null && txt.length() > 1) {
int start = 0, end = 0;
char token;
for(int i = 0; i < txt.length(); i++) {
token = txt.charAt(i);
if(token == prefix)
start = i;
else if(token == suffix)
end = i;
}
if(start + 1 < end)
return txt.substring(start+1, end);
}
return null;
}
This is a simple use \D+ regex and job done.
This select all chars except digits, no need to complicate
/\D+/
it will return original string if no match regex
var iAm67 = "test string (67)".replaceFirst("test string \\((.*)\\)", "$1");
add matches to the code
String str = "test string (67)";
String regx = "test string \\((.*)\\)";
if (str.matches(regx)) {
var iAm67 = str.replaceFirst(regx, "$1");
}
---EDIT---
i use https://www.freeformatter.com/java-regex-tester.html#ad-output to test regex.
turn out it's better to add ? after * for less match. something like this:
String str = "test string (67)(69)";
String regx1 = "test string \\((.*)\\).*";
String regx2 = "test string \\((.*?)\\).*";
String ans1 = str.replaceFirst(regx1, "$1");
String ans2 = str.replaceFirst(regx2, "$1");
System.out.println("ans1:"+ans1+"\nans2:"+ans2);
// ans1:67)(69
// ans2:67
String s = "(69)";
System.out.println(s.substring(s.lastIndexOf('(')+1,s.lastIndexOf(')')));
Little extension to top (MadProgrammer) answer
public static String getTextBetween(final String wholeString, final String str1, String str2){
String s = wholeString.substring(wholeString.indexOf(str1) + str1.length());
s = s.substring(0, s.indexOf(str2));
return s;
}
I'm trying to create a palindrome tester program for my AP Java class and I need to remove the white spaces in my code completely but it's not letting me do so.
import java.util.Scanner;
public class Palin{
public static boolean isPalindrome(String stringToTest) {
String workingCopy = removeJunk(stringToTest);
String reversedCopy = reverse(workingCopy);
return reversedCopy.equalsIgnoreCase(workingCopy);
}
public static String removeJunk(String string) {
int i, len = string.length();
StringBuffer dest = new StringBuffer(len);
char c;
for (i = (len - 1); i >= 0; i-=1) {
c = string.charAt(i);
if (Character.isLetterOrDigit(c))
{
dest.append(c);
}
}
return dest.toString();
}
public static String reverse(String string) {
StringBuffer sb = new StringBuffer(string);
return sb.reverse().toString();
}
public static void main(String[] args) {
System.out.print("Enter Palindrome: ");
Scanner sc = new Scanner(System.in);
String string = sc.next();
String str = string;
String space = "";
String result = str.replaceAll("\\W", space);
System.out.println(result);
System.out.println();
System.out.println("Testing palindrome:");
System.out.println(" " + string);
System.out.println();
if (isPalindrome(result)) {
System.out.println("It's a palindrome!");
} else {
System.out.println("Not a palindrome!");
}
System.out.println();
}
}
Any help would be greatly appreciated.
Seems like your code is fine except for the following. You are using
String string = sc.next();
which will not read the whole line of input, hence you will lose part of the text. I think you should use the following instead of that line.
String string = sc.nextLine();
If you just want to remove the beginning and ending whitespace, you can use the built in function trim(), e.g. " abcd ".trim() is "abcd"
If you want to remove it everywhere, you can use the replaceAll() method with the whitespace class as the parameter, e.g. " abcd ".replaceAll("\W","").
Use a StringTokenizer to remove " "
StringTokenizer st = new StringTokenizer(string," ",false);
String t="";
while (st.hasMoreElements()) t += st.nextElement();
String result = t;
System.out.println(result);
I haven't actually tesed this, but have you considered the String.replaceAll(String regex, String replacement) method?
public static String removeJunk (String string) {
return string.replaceAll (" ", "");
}
Another thing to look out for is that while removing all non-digit/alpha characters removeJunk also reverses the string (it starts from the end and then appends one character at a time).
So after reversing it again (in reverse) you are left with the original and it will always claim that the given string is a palindrome.
You should use the String replace(char oldChar, char newChar) method.
Although the name suggests that only the first occurrence will be replaced, fact is that all occurrences will be replaced. The advantage of this method is that it won't use regular expressions, thus is more efficient.
So give a try to string.replace(' ', '');
I have a string, with characters a-z, A-Z, 0-9, (, ), +, -, etc.
I want to find every word within that string and replace it with the same word with 'word' (single quotes added). Words in that string can be preceded/followed by "(", ")", and spaces.
How do I go about doing that?
Input:
(Movie + 2000)
Output:
('Movie' + '2000')
Keep it simple! This does what you need:
String input = "(Movie + 2000)";
input.replaceAll("\\b", "'");
// Outputs "('Movie' + '2000')"
This uses the regex \b, which is a "word boundary". What could be simpler?
As stated in the comments, regex is a good way to go:
String input = "(Movie + 2000)";
input = input.replaceAll("[A-Za-z0-9]+", "'$0'");
You don't give a precise defition of 'word', so I assume it is any combination of letters and numbers.
EDIT OK, thanks to #Buhb for explaining why this solution is not the best one. Better solution was given by #Bohemian.
public class Main {
/**
* #param args
*/
public static void main(String[] args) {
String str1 = "Hello string";
String str2 = "str";
System.out.println(replace(str1, str2, "'" + str2 + "'"));
}
static String replace(String str, String pattern, String replace) {
int s = 0;
int e = 0;
StringBuffer result = new StringBuffer();
while ((e = str.indexOf(pattern, s)) >= 0) {
result.append(str.substring(s, e));
result.append(replace);
s = e + pattern.length();
}
result.append(str.substring(s));
return result.toString();
}
}
Output: Hello 'str'ing
WBR
It makes sense to return string only if a replacement took place, see below:
if(s>0)
return result.toString();
else
return null;
I'm making a method to read a whole class code and do some stuff with it.
What I want to do is get the name of the method, and make a String with it.
Something like removeProduct
I'll make a String "Remove Product"
How can I split the name method in capital cases?
How can I build this new string with the first letter of each word as capital case?
I'm doing it with substring, is there a easier and better way to do it?
ps: I'm sure my brazilian English didn't help on title. If anyone can make it looks better, I'd appreciate.
Don't bother reinvent the wheel, use the method in commons-lang
String input = "methodName";
String[] words = StringUtils.splitByCharacterTypeCamelCase(methodName);
String humanised = StringUtils.join(words, ' ');
You can use a regular expression to split the name into the various words, and then capitalize the first one:
public static void main(String[] args) {
String input = "removeProduct";
//split into words
String[] words = input.split("(?=[A-Z])");
words[0] = capitalizeFirstLetter(words[0]);
//join
StringBuilder builder = new StringBuilder();
for ( String s : words ) {
builder.append(s).append(" ");
}
System.out.println(builder.toString());
}
private static String capitalizeFirstLetter(String in) {
return in.substring(0, 1).toUpperCase() + in.substring(1);
}
Note that this needs better corner case handling, such as not appending a space at the end and handling 1-char words.
Edit: I meant to explain the regex. The regular expression (?=[A-Z]) is a zero-width assertion (positive lookahead) matching a position where the next character is between 'A' and 'Z'.
You can do this in 2 steps:
1 - Make the first letter of the string uppercase.
2 - Insert an space before an uppercase letter which is preceded by a lowercase letter.
For step 1 you can use a function and for step 2 you can use String.replaceAll method:
String str = "removeProduct";
str = capitalizeFirst(str);
str = str.replaceAll("(?<=[^A-Z])([A-Z])"," $1");
static String capitalizeFirst(String input) {
return input.substring(0, 1).toUpperCase() + input.substring(1);
}
Code In Action
#MrWiggles is right.
Just one more way to do this without being fancy :)
import java.util.StringTokenizer;
public class StringUtil {
public static String captilizeFirstLetter(String token) {
return Character.toUpperCase(token.charAt(0)) + token.substring(1);
}
public static String convert(String str) {
final StringTokenizer st = new StringTokenizer(str,
"A B C D E F G H I J K L M N O P Q R S T U V W X Y Z", true);
final StringBuilder sb = new StringBuilder();
String token;
if (st.hasMoreTokens()) {
token = st.nextToken();
sb.append(StringUtil.captilizeFirstLetter(token) + " ");
}
while (st.hasMoreTokens()) {
token = st.nextToken();
if (st.hasMoreTokens()) {
token = token + st.nextToken();
}
sb.append(StringUtil.captilizeFirstLetter(token) + " ");
}
return sb.toString().trim();
}
public static void main(String[] args) throws Exception {
String words = StringUtil.convert("helloWorldHowAreYou");
System.out.println(words);
}
}
public String convertMethodName(String methodName) {
StringBuilder sb = new StringBuilder().append(Character.toUpperCase(methodName.charAt(0)));
for (int i = 1; i < methodName.length(); i++) {
char c = methodName.charAt(i);
if (Character.isUpperCase(c)) {
sb.append(' ');
}
sb.append(c);
}
return sb.toString();
}
Handling it this way may give you some finer control in case you want to add in functionality later for other situations (multiple caps in a row, etc.). Basically, for each character, it just checks to see if it's within the bounds of capital letters (character codes 65-90, inclusive), and if so, adds a space to the buffer before the word begins.
EDIT: Using Character.isUpperCase()