Java replace string sequences ignoring case?

Java replace string sequences ignoring case? - java

For a project, I've given a sequence that the program searches for and tries to find a sequence of letters. If it finds any, it makes those sequences capitalized and prints out the line.
For instance in sequence bcdaaab when I run the program, and specify the string aa as what to find,
what should be printed is: bcdAAAb
what is being printied is: bcdAAab (notice the third a is not capped).
This is because I'm just using the replaceAll function for strings which does not ignore case. I want to know if there is any way to make it so it ignores case when I'm searching. If I can't use replaceAll at all, could someone suggest other algorithm?

The problem you're running into is that even with case-insensitive matching, it replaces the first 'aa' and then searches for the next match starting after the original replacement. Since the next position after the replacement is a single 'a' followed by a different letter, it doesn't consider it a match.
This will take care of your problem:
"bcdaaab".replaceAll("(?i)a((?=a)|(?<=a))","A");
Instead of replacing 'aa' with 'AA', use this to replace one 'a' at a time. It utilizes a lookahead and a lookbehind to basically say "is there another 'a' next to me?"
If you don't want to do that for whatever reason, you can always do a while(matcher.find(offset)) loop.

This will help you
String abc ="NewDc";
System.out.println(abc.replaceAll("(?i)dc"," Data Center."));

try this
String s1 = "bcdaaab";
String s2 = "aa";
StringBuilder sb = new StringBuilder(s1);
for (int i = 0;; i++) {
i = s1.indexOf(s2, i);
if (i == -1) {
break;
}
sb.replace(i, i + s2.length(), s2.toUpperCase());
}
System.out.println(sb);
output
bcdAAAb
In case for replacement aA result also should be bcdAAAb you can use
String s1 = "bcdaaab";
String s2 = "aA".toUpperCase();
StringBuilder sb = new StringBuilder(s1);
s1 = s1.toUpperCase();
for (int i = 0;; i++) {
i = s1.indexOf(s2, i);
if (i == -1) {
break;
}
sb.replace(i, i + s2.length(), s2.toUpperCase());
}
System.out.println(sb);

Related

Java regex: Replace all characters with `+` except instances of a given string

I have the following problem which states
Replace all characters in a string with + symbol except instances of the given string in the method
so for example if the string given was abc123efg and they want me to replace every character except every instance of 123 then it would become +++123+++.
I figured a regular expression is probably the best for this and I came up with this.
str.replaceAll("[^str]","+")
where str is a variable, but its not letting me use the method without putting it in quotations. If I just want to replace the variable string str how can I do that? I ran it with the string manually typed and it worked on the method, but can I just input a variable?
as of right now I believe its looking for the string "str" and not the variable string.
Here is the output its right for so many cases except for two :(
List of open test cases:
plusOut("12xy34", "xy") → "++xy++"
plusOut("12xy34", "1") → "1+++++"
plusOut("12xy34xyabcxy", "xy") → "++xy++xy+++xy"
plusOut("abXYabcXYZ", "ab") → "ab++ab++++"
plusOut("abXYabcXYZ", "abc") → "++++abc+++"
plusOut("abXYabcXYZ", "XY") → "++XY+++XY+"
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
plusOut("--++ab", "++") → "++++++"
plusOut("aaxxxxbb", "xx") → "++xxxx++"
plusOut("123123", "3") → "++3++3"

Looks like this is the plusOut problem on CodingBat.
I had 3 solutions to this problem, and wrote a new streaming solution just for fun.
Solution 1: Loop and check
Create a StringBuilder out of the input string, and check for the word at every position. Replace the character if doesn't match, and skip the length of the word if found.
public String plusOut(String str, String word) {
StringBuilder out = new StringBuilder(str);
for (int i = 0; i < out.length(); ) {
if (!str.startsWith(word, i))
out.setCharAt(i++, '+');
else
i += word.length();
}
return out.toString();
}
This is probably the expected answer for a beginner programmer, though there is an assumption that the string doesn't contain any astral plane character, which would be represented by 2 char instead of 1.
Solution 2: Replace the word with a marker, replace the rest, then restore the word
public String plusOut(String str, String word) {
return str.replaceAll(java.util.regex.Pattern.quote(word), "#").replaceAll("[^#]", "+").replaceAll("#", word);
}
Not a proper solution since it assumes that a certain character or sequence of character doesn't appear in the string.
Note the use of Pattern.quote to prevent the word being interpreted as regex syntax by replaceAll method.
Solution 3: Regex with \G
public String plusOut(String str, String word) {
word = java.util.regex.Pattern.quote(word);
return str.replaceAll("\\G((?:" + word + ")*+).", "$1+");
}
Construct regex \G((?:word)*+)., which does more or less what solution 1 is doing:
\G makes sure the match starts from where the previous match leaves off
((?:word)*+) picks out 0 or more instance of word - if any, so that we can keep them in the replacement with $1. The key here is the possessive quantifier *+, which forces the regex to keep any instance of the word it finds. Otherwise, the regex will not work correctly when the word appear at the end of the string, as the regex backtracks to match .
. will not be part of any word, since the previous part already picks out all consecutive appearances of word and disallow backtrack. We will replace this with +
Solution 4: Streaming
public String plusOut(String str, String word) {
return String.join(word,
Arrays.stream(str.split(java.util.regex.Pattern.quote(word), -1))
.map((String s) -> s.replaceAll("(?s:.)", "+"))
.collect(Collectors.toList()));
}
The idea is to split the string by word, do the replacement on the rest, and join them back with word using String.join method.
Same as above, we need Pattern.quote to avoid split interpreting the word as regex. Since split by default removes empty string at the end of the array, we need to use -1 in the second parameter to make split leave those empty strings alone.
Then we create a stream out of the array and replace the rest as strings of +. In Java 11, we can use s -> String.repeat(s.length()) instead.
The rest is just converting the Stream to an Iterable (List in this case) and joining them for the result

This is a bit trickier than you might initially think because you don't just need to match characters, but the absence of specific phrase - a negated character set is not enough. If the string is 123, you would need:
(?<=^|123)(?!123).*?(?=123|$)
https://regex101.com/r/EZWMqM/1/
That is - lookbehind for the start of the string or "123", make sure the current position is not followed by 123, then lazy-repeat any character until lookahead matches "123" or the end of the string. This will match all characters which are not in a "123" substring. Then, you need to replace each character with a +, after which you can use appendReplacement and a StringBuffer to create the result string:
String inputPhrase = "123";
String inputStr = "abc123efg123123hij";
StringBuffer resultString = new StringBuffer();
Pattern regex = Pattern.compile("(?<=^|" + inputPhrase + ")(?!" + inputPhrase + ").*?(?=" + inputPhrase + "|$)");
Matcher m = regex.matcher(inputStr);
while (m.find()) {
String replacement = m.group(0).replaceAll(".", "+");
m.appendReplacement(resultString, replacement);
}
m.appendTail(resultString);
System.out.println(resultString.toString());
Output:
+++123+++123123+++
Note that if the inputPhrase can contain character with a special meaning in a regular expression, you'll have to escape them first before concatenating into the pattern.

You can do it in one line:
input = input.replaceAll("((?:" + str + ")+)?(?!" + str + ").((?:" + str + ")+)?", "$1+$2");
This optionally captures "123" either side of each character and puts them back (a blank if there's no "123"):

So instead of coming up with a regular expression that matches the absence of a string. We might as well just match the selected phrase and append + the number of skipped characters.
StringBuilder sb = new StringBuilder();
Matcher m = Pattern.compile(Pattern.quote(str)).matcher(input);
while (m.find()) {
for (int i = 0; i < m.start(); i++) sb.append('+');
sb.append(str);
}
int remaining = input.length() - sb.length();
for (int i = 0; i < remaining; i++) {
sb.append('+');
}

Absolutely just for the fun of it, a solution using CharBuffer (unexpectedly it took a lot more that I initially hoped for):
private static String plusOutCharBuffer(String input, String match) {
int size = match.length();
CharBuffer cb = CharBuffer.wrap(input.toCharArray());
CharBuffer word = CharBuffer.wrap(match);
int x = 0;
for (; cb.remaining() > 0;) {
if (!cb.subSequence(0, size < cb.remaining() ? size : cb.remaining()).equals(word)) {
cb.put(x, '+');
cb.clear().position(++x);
} else {
cb.clear().position(x = x + size);
}
}
return cb.clear().toString();
}

To make this work you need a beast of a pattern. Let's say you you are operating on the following test case as an example:
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
What you need to do is build a series of clauses in your pattern to match a single character at a time:
Any character that is NOT "X", "Y" or "Z" -- [^XYZ]
Any "X" not followed by "YZ" -- X(?!YZ)
Any "Y" not preceded by "X" -- (?<!X)Y
Any "Y" not followed by "Z" -- Y(?!Z)
Any "Z" not preceded by "XY" -- (?<!XY)Z
An example of this replacement can be found here: https://regex101.com/r/jK5wU3/4
Here is an example of how this might work (most certainly not optimized, but it works):
import java.util.regex.Pattern;
public class Test {
public static void plusOut(String text, String exclude) {
StringBuilder pattern = new StringBuilder("");
for (int i=0; i<exclude.length(); i++) {
Character target = exclude.charAt(i);
String prefix = (i > 0) ? exclude.substring(0, i) : "";
String postfix = (i < exclude.length() - 1) ? exclude.substring(i+1) : "";
// add the look-behind (?<!X)Y
if (!prefix.isEmpty()) {
pattern.append("(?<!").append(Pattern.quote(prefix)).append(")")
.append(Pattern.quote(target.toString())).append("|");
}
// add the look-ahead X(?!YZ)
if (!postfix.isEmpty()) {
pattern.append(Pattern.quote(target.toString()))
.append("(?!").append(Pattern.quote(postfix)).append(")|");
}
}
// add in the other character exclusion
pattern.append("[^" + Pattern.quote(exclude) + "]");
System.out.println(text.replaceAll(pattern.toString(), "+"));
}
public static void main(String [] args) {
plusOut("12xy34", "xy");
plusOut("12xy34", "1");
plusOut("12xy34xyabcxy", "xy");
plusOut("abXYabcXYZ", "ab");
plusOut("abXYabcXYZ", "abc");
plusOut("abXYabcXYZ", "XY");
plusOut("abXYxyzXYZ", "XYZ");
plusOut("--++ab", "++");
plusOut("aaxxxxbb", "xx");
plusOut("123123", "3");
}
}
UPDATE: Even this doesn't quite work because it can't deal with exclusions that are just repeated characters, like "xx". Regular expressions are most definitely not the right tool for this, but I thought it might be possible. After poking around, I'm not so sure a pattern even exists that might make this work.

The problem in your solution that you put a set of instance string str.replaceAll("[^str]","+") which it will exclude any character from the variable str and that will not solve your problem
EX: when you try str.replaceAll("[^XYZ]","+") it will exclude any combination of character X , character Y and character Z from your replacing method so you will get "++XY+++XYZ".
Actually you should exclude a sequence of characters instead in str.replaceAll.
You can do it by using capture group of characters like (XYZ) then use a negative lookahead to match a string which does not contain characters sequence : ^((?!XYZ).)*$
Check this solution for more info about this problem but you should know that it may be complicated to find regular expression to do that directly.
I have found two simple solutions for this problem :
Solution 1:
You can implement a method to replace all characters with '+' except the instance of given string:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
for(int i = 0; i < str.length(); i++){
// exclude any instance string of exWord from replacing process in str
if(str.substring(i, str.length()).indexOf(exWord) + i == i){
i = i + exWord.length()-1;
}
else{
str = str.substring(0,i) + "+" + str.substring(i+1);//replace each character with '+' symbol
}
}
Note : str.substring(i, str.length()).indexOf(exWord) + i this if statement will exclude any instance string of exWord from replacing process in str.
Output:
+++++++XYZ
Solution 2:
You can try this Approach using ReplaceAll method and it doesn't need any complex regular expression:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
str = str.replaceAll(exWord,"*"); // replace instance string with * symbol
str = str.replaceAll("[^*]","+"); // replace all characters with + symbol except *
str = str.replaceAll("\\*",exWord); // replace * symbol with instance string
Note : This solution will work only if your input string str doesn't contain any * symbol.
Also you should escape any character with a special meaning in a regular expression in phrase instance string exWord like : exWord = "++".

Remove "yak" from a string

I am working on a codingbat problem:Given a string, return a version where all the "yak" are removed, but the "a" can be any char. The "yak" strings will not overlap. I am look at the solution but there is one part of that code I do not understand.....
How come the first part of the if statement "i = i+ 2" could return the string and you don't need anything else? I mean after all these three conditions are met and just write i = i + 2,and that's it. that is going to give you a String as a result. I don't get it, please help.
public String stringYak(String str) {
String result = "";
for (int i=0; i<str.length(); i++) {
// Look for i starting a "yak" -- advance i in that case
if (i+2<str.length() && str.charAt(i)=='y' && str.charAt(i+2)=='k') {
i = i + 2;
} else { // Otherwise do the normal append
result = result + str.charAt(i);
}
}
return result;
}

The code is selecting what characters to put in the new string.
We go through the characters one by one.
If we run into a "y.k", skip this whole section
Else add the character to the new string.
[a][b][y][c][k][d] => New String: [a] (a is okay)
.|..........................
[a][b][y][c][k][d] => New String: [a][b] (b is okay)
......|.....................
[a][b][y][c][k][d] => New String: [a][b] (Oops! We have run into the y.k
pattern, skip it) ..........|.................
[a][b][y][c][k][d] => New String: [a][b][d] (d is okay)
.....................|......
Final String: [a][b][d]

This may not be the style of answer you want, but a simple call to String#replaceAll() should work:
String str = "Some string yak containing yok.";
str = str.replaceAll("y.k", "");

Doing i = i + 2 is not going to give you any String it is simply incrementing the for loop so you do not need to re-evaluate the next two chars as they have already been evaluated.
The key point is in the else part, which will append the char if it is not y and the character after-next is also not k

String manipulation of function names

For this Kata, i am given random function names in the PEP8 format and i am to convert them to camelCase.
(input)get_speed == (output)getSpeed ....
(input)set_distance == (output)setDistance
I have a understanding on one way of doing this written in pseudo-code:
loop through the word,
if the letter is an underscore
then delete the underscore
then get the next letter and change to a uppercase
endIf
endLoop
return the resultant word
But im unsure the best way of doing this, would it be more efficient to create a char array and loop through the element and then when it comes to finding an underscore delete that element and get the next index and change to uppercase.
Or would it be better to use recursion:
function camelCase takes a string
if the length of the string is 0,
then return the string
endIf
if the character is a underscore
then change to nothing,
then find next character and change to uppercase
return the string taking away the character
endIf
finally return the function taking the first character away
Any thoughts please, looking for a good efficient way of handing this problem. Thanks :)

I would go with this:
divide given String by underscore to array
from second word until end take first letter and convert it to uppercase
join to one word
This will work in O(n) (go through all names 3 time). For first case, use this function:
str.split("_");
for uppercase use this:
String newName = substring(0, 1).toUpperCase() + stre.substring(1);
But make sure you check size of the string first...
Edited - added implementation
It would look like this:
public String camelCase(String str) {
if (str == null ||str.trim().length() == 0) return str;
String[] split = str.split("_");
String newStr = split[0];
for (int i = 1; i < split.length; i++) {
newStr += split[i].substring(0, 1).toUpperCase() + split[i].substring(1);
}
return newStr;
}
for inputs:
"test"
"test_me"
"test_me_twice"
it returns:
"test"
"testMe"
"testMeTwice"

It would be simpler to iterate over the string instead of recursing.
String pep8 = "do_it_again";
StringBuilder camelCase = new StringBuilder();
for(int i = 0, l = pep8.length(); i < l; ++i) {
if(pep8.charAt(i) == '_' && (i + 1) < l) {
camelCase.append(Character.toUpperCase(pep8.charAt(++i)));
} else {
camelCase.append(pep8.charAt(i));
}
}
System.out.println(camelCase.toString()); // prints doItAgain

The question you pose is whether to use an iterative or a recursive approach. For this case I'd go for the recursive approach because it's straightforward, easy to understand doesn't require much resources (only one array, no new stackframe etc), though that doesn't really matter for this example.
Recursion is good for divide-and-conquer problems, but I don't see that fitting the case well, although it's possible.
An iterative implementation of the algorithm you described could look like the following:
StringBuilder buf = new StringBuilder(input);
for(int i = 0; i < buf.length(); i++){
if(buf.charAt(i) == '_'){
buf.deleteCharAt(i);
if(i != buf.length()){ //check fo EOL
buf.setCharAt(i, Character.toUpperCase(buf.charAt(i)));
}
}
}
return buf.toString();
The check for the EOL is not part of the given algorithm and could be ommitted, if the input string never ends with '_'

Formatting String Array efficiently in Java

I was working on some string formatting, and I was curious if I was doing it the most efficient way.
Assume I have a String Array:
String ArrayOne[] = {"/test/" , "/this/is/test" , "/that/is/" "/random/words" }
I want the result Array to be
String resultArray[] = {"test", "this_is_test" , "that_is" , "random_words" }
It's quite messy and brute-force-like.
for(char c : ArrayOne[i].toCharArray()) {
if(c == '/'){
occurances[i]++;
}
}
First I count the number of "/" in each String like above and then using these counts, I find the indexOf("/") for each string and add "_" accordingly.
As you can see though, it gets very messy.
Is there a more efficient way to do this besides the brute-force way I'm doing?
Thanks!

You could use replaceAll and replace, as follows:
String resultArray[] = new String[ArrayOne.length];
for (int i = 0; i < ArrayOne.length; ++i) {
resultArray[i] = ArrayOne[i].replaceAll("^/|/$", "").replace('/', '_');
}
The replaceAll method searches the string for a match to the regex given in the first argument, and replaces each match with the text in the second argument.
Here, we use it first to remove leading and trailing slashes. We search for slashes at the start of the string (^/) or the end of the string (/$), and replace them with nothing.
Then, we replace all remaining slashes with underscores using replace.

Simplify & condense multiple editorial operations on an array. Java

I have some raw output that I want to clean up and make presentable but right now I go about it in a very ugly and cumbersome way, I wonder if anyone might know a clean and elegant way in which to perform the same operation.
int size = charOutput.size();
for (int i = size - 1; i >= 1; i--)
{
if(charOutput.get(i).compareTo(charOutput.get(i - 1)) == 0)
{
charOutput.remove(i);
}
}
for(int x = 0; x < charOutput.size(); x++)
{
if(charOutput.get(x) == '?')
{
charOutput.remove(x);
}
}
String firstOne = Arrays.toString(charOutput.toArray());
String secondOne = firstOne.replaceAll(",","");
String thirdOne = secondOne.substring(1, secondOne.length() - 1);
String output = thirdOne.replaceAll(" ","");
return output;

ZouZou has the right code for fixing the final few calls in your code. I have some suggestions for the for loops. I hope I got them right...
These work after you get the String represented by charOutput, using a method such as the one suggested by ZouZou.
Your first block appears to remove all repeated letters. You can use a regular expression for that:
Pattern removeRepeats = Pattern.compile("(.)\\1{1,}");
// "(.)" creates a group that matches any character and puts it into a group
// "\\1" gets converted to "\1" which is a reference to the first group, i.e. the character that "(.)" matched
// "{1,}" means "one or more"
// So the overall effect is "one or more of a single character"
To use:
removeRepeats.matcher(s).replaceAll("$1");
// This creates a Matcher that matches the regex represented by removeRepeats to the contents of s, and replaces the parts of s that match the regex represented by removeRepeats with "$1", which is a reference to the first group captured (i.e. "(.)", which is the first character matched"
To remove the question mark, just do
Pattern removeQuestionMarks = Pattern.compile("\\?");
// Because "?" is a special symbol in regex, you have to escape it with a backslash
// But since backslashes are also a special symbol, you have to escape the backslash too.
And then to use, do the same thing as was done above except with replaceAll("");
And you're done!
If you really wanted to, you can combine a lot of regex into two super-regex expressions (and one normal regex expression):
Pattern p0 = Pattern.compile("(\\[|\\]|\\,| )"); // removes brackets, commas, and spaces
Pattern p1 = Pattern.compile("(.)\\1{1,}"); // Removes duplicate characters
Pattern p2 = Pattern.compile("\\?");
String removeArrayCharacters = p0.matcher(charOutput.toString()).replaceAll("");
String removeDuplicates = p1.matcher(removeArrayCharacters).replaceAll("$1");
return p2.matcher(removeDuplicates).replaceAll("");

Use a StringBuilder and append each character you want, at the end just return myBuilder.toString();
Instead of this:
String firstOne = Arrays.toString(charOutput.toArray());
String secondOne = firstOne.replaceAll(",","");
String thirdOne = secondOne.substring(1, secondOne.length() - 1);
String output = thirdOne.replaceAll(" ","");
return output;
Simply do:
StringBuilder sb = new StringBuilder();
for(Character c : charOutput){
sb.append(c);
}
return sb.toString();
Note that you are doing a lot of unnecessary work (by iterating through the list and removing some elements). What you can actually do is just iterate one time and then if the condition fullfits your requirements (the two adjacent characters are not the same and no question mark) then append it to the StringBuilder directly.
This task could also be a job for a regular expression.

If you don't want to use Regex try this version to remove consecutive characters and '?':
int size = charOutput.size();
if (size == 1) return Character.toString((Character)charOutput.get(0));
else if (size == 0) return null;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < size - 1; i++) {
Character temp = (Character)charOutput.get(i);
if (!temp.equals(charOutput.get(i+1)) && !temp.equals('?'))
sb.append(temp);
}
//for the last element
if (!charOutput.get(size-1).equals(charOutput.get(size-2))
&& !charOutput.get(size-1).equals('?'))
sb.append(charOutput.get(size-1));
return sb.toString();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java replace string sequences ignoring case? - java

This will help you String abc ="NewDc"; System.out.println(abc.replaceAll("(?i)dc"," Data Center."));

Related

Java regex: Replace all characters with `+` except instances of a given string

Remove "yak" from a string

String manipulation of function names

Formatting String Array efficiently in Java

Simplify & condense multiple editorial operations on an array. Java

Categories

Resources