What is an efficient way to replace many characters in a string?

What is an efficient way to replace many characters in a string? - java

String handling in Java is something I'm trying to learn to do well. Currently I want to take in a string and replace any characters I find.
Here is my current inefficient (and kinda silly IMO) function. It was written to just work.
public String convertWord(String word)
{
return word.toLowerCase().replace('á', 'a')
.replace('é', 'e')
.replace('í', 'i')
.replace('ú', 'u')
.replace('ý', 'y')
.replace('ð', 'd')
.replace('ó', 'o')
.replace('ö', 'o')
.replaceAll("[-]", "")
.replaceAll("[.]", "")
.replaceAll("[/]", "")
.replaceAll("[æ]", "ae")
.replaceAll("[þ]", "th");
}
I ran 1.000.000 runs of it and it took 8182ms. So how should I proceed in changing this function to make it more efficient?
Solution found:
Converting the function to this
public String convertWord(String word)
{
StringBuilder sb = new StringBuilder();
char[] charArr = word.toLowerCase().toCharArray();
for(int i = 0; i < charArr.length; i++)
{
// Single character case
if(charArr[i] == 'á')
{
sb.append('a');
}
// Char to two characters
else if(charArr[i] == 'þ')
{
sb.append("th");
}
// Remove
else if(charArr[i] == '-')
{
}
// Base case
else
{
sb.append(word.charAt(i));
}
}
return sb.toString();
}
Running this function 1.000.000 times takes 518ms. So I think that is efficient enough. Thanks for the help guys :)

You could create a table of String[] which is Character.MAX_VALUE in length. (Including the mapping to lower case)
As the replacements got more complex, the time to perform them would remain the same.
private static final String[] REPLACEMENT = new String[Character.MAX_VALUE+1];
static {
for(int i=Character.MIN_VALUE;i<=Character.MAX_VALUE;i++)
REPLACEMENT[i] = Character.toString(Character.toLowerCase((char) i));
// substitute
REPLACEMENT['á'] = "a";
// remove
REPLACEMENT['-'] = "";
// expand
REPLACEMENT['æ'] = "ae";
}
public String convertWord(String word) {
StringBuilder sb = new StringBuilder(word.length());
for(int i=0;i<word.length();i++)
sb.append(REPLACEMENT[word.charAt(i)]);
return sb.toString();
}

My suggestion would be:
Convert the String to a char[] array
Run through the array, testing each character one by one (e.g. with a switch statement) and replacing it if needed
Convert the char[] array back to a String
I think this is probably the fastest performance you will get in pure Java.
EDIT: I notice you are doing some changes that change the length of the string. In this case, the same principle applies, however you need to keep two arrays and increment both a source index and a destination index separately. You might also need to resize the destination array if you run out of target space (i.e. reallocate a larger array and arraycopy the existing destination array into it)

My implementation is based on look up table.
public static String convertWord(String str) {
char[] words = str.toCharArray();
char[] find = {'á','é','ú','ý','ð','ó','ö','æ','þ','-','.',
'/'};
String[] replace = {"a","e","u","y","d","o","o","ae","th"};
StringBuilder out = new StringBuilder(str.length());
for (int i = 0; i < words.length; i++) {
boolean matchFailed = true;
for(int w = 0; w < find.length; w++) {
if(words[i] == find[w]) {
if(w < replace.length) {
out.append(replace[w]);
}
matchFailed = false;
break;
}
}
if(matchFailed) out.append(words[i]);
}
return out.toString();
}

My first choice would be to use a StringBuilder because you need to remove some chars from the string.
Second choice would be to iterate throw the array of chars and add the treated char to another array of the inicial size of the string. Then you would need to copy the array to trim the possible unused positions.
After that, I would make some performance tests to see witch one is better.

I doubt, that you can speed up the 'character replacement' at all really. As for the case of regular expression replacement, you may compile the regexs beforehand

Use the function String.replaceAll.
Nice article similar with what you want: link

Any time we have problems like this we use regular expressions are they are by far the fastest way to deal with what you are trying to do.
Have you already tried regular expressions?

What i see being inefficient is that you are gonna check again characters that have already been replaced, which is useless.
I would get the charArray of the String instance, iterate over it, and for each character spam a series of if-else like this:
char[] array = word.toCharArray();
for(int i=0; i<array.length; ++i){
char currentChar = array[i];
if(currentChar.equals('é'))
array[i] = 'e';
else if(currentChar.equals('ö'))
array[i] = 'o';
else if(//...
}

I just implemented this utility class that replaces a char or a group of chars of a String. It is equivalent to bash tr and perl tr///, aka, transliterate. I hope it helps someone!
package your.package.name;
/**
* Utility class that replaces chars of a String, aka, transliterate.
*
* It's equivalent to bash 'tr' and perl 'tr///'.
*
*/
public class ReplaceChars {
public static String replace(String string, String from, String to) {
return new String(replace(string.toCharArray(), from.toCharArray(), to.toCharArray()));
}
public static char[] replace(char[] chars, char[] from, char[] to) {
char[] output = chars.clone();
for (int i = 0; i < output.length; i++) {
for (int j = 0; j < from.length; j++) {
if (output[i] == from[j]) {
output[i] = to[j];
break;
}
}
}
return output;
}
/**
* For tests!
*/
public static void main(String[] args) {
// Example from: https://en.wikipedia.org/wiki/Caesar_cipher
String string = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG";
String from = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
String to = "XYZABCDEFGHIJKLMNOPQRSTUVW";
System.out.println();
System.out.println("Cesar cypher: " + string);
System.out.println("Result: " + ReplaceChars.replace(string, from, to));
}
}
This is the output:
Cesar cypher: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Result: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

Related

I'm stuck. I need to adjust my loops so they continue to compare my two arrays but not print out all the extra characters

I have to compare two string arrays. If the any of the characters in myArray match a character in argArray then I need to swap the case of the character in myArray. I'm almost there but am getting extra output.
This is what I have so far -
public class Main {
public static void main(String[] args) {
Main ob = new Main();
ob.reverse("bcdxyz#3210.");
}
public String reverse(String arg) {
String reverseCap = "";
String myStr = "abc, XYZ; 123.";
char[] argArray = arg.toCharArray();
char[] myArray = myStr.toCharArray();
for (int i =0; i < myArray.length; i++) {
for (int j =0; j < argArray.length; j++){
if (myArray[i] == argArray[j] && Character.isLowerCase(myArray[i])){
reverseCap += Character.toUpperCase(myArray[i]);
} else if (myArray[i] == argArray[j] && Character.isUpperCase(myArray[i])){
reverseCap += Character.toLowerCase(myArray[i]);
} else {
reverseCap += myArray[i];
}
}
}
System.out.println(reverseCap);
return null;
}
I want reverseCap to be "aBC, xyz, 123." but am getting the following -
"aaaaaaaaaaaaBbbbbbbbbbbbcCcccccccccc,,,,,,,,,,,, XXXXXXXXXXXXYYYYYYYYYYYYZZZZZZZZZZZZ;;;;;;;;;;;; 111111111111222222222222333333333333............
".
I've been staring at this for hours so I figured it was time to ask for help before I pluck my eyes out.

Marce noted the problem of adding characters to reverseCap on every iteration. Here is a solution that solves that problem and performs the case changes in place. Checking for a match first and then changing the case simplifies the logic a bit. Note myArray[i] needs to be lowercased before checking against arg[i] because the former may be an uppercase character; this is not needed for argArray[j] because those characters are assumed to be all lowercase. Finally, once the inner loop has matched, further iterations of it are no longer needed.
public class Main {
public static void main(String[] args) {
Main ob = new Main();
String testStr = "abc, XYZ; 123.";
String testArg = "bcdxyz#3210.";
System.out.println(testStr + " using " + testArg + " =>");
System.out.println(ob.reverse(testStr, testArg));
}
public String reverse(String myStr, String myArg) {
char[] myArray = myStr.toCharArray();
char[] argArray = myArg.toCharArray();
for (int i =0; i < myArray.length; i++) {
for (int j =0; j < argArray.length; j++) {
if (Character.toLowerCase(myArray[i]) == argArray[j]) {
if (Character.isLowerCase(myArray[i])) {
myArray[i] = Character.toUpperCase(myArray[i]);
} else if (Character.isUpperCase(myArray[i])) {
myArray[i] = Character.toLowerCase(myArray[i]);
}
break;
}
}
}
return String.valueOf(myArray);
}
}

With this part
} else {
reverseCap += myArray[i];
}
you're adding a character to reverseCap with every iteration, regardless if the characters match or not.
In your specific example, you could just leave that out, since every character in myStr also appears in arg, but if you want to add characters to reverseCap, even if they don't appear in arg, you'll need a way of checking if you already added a character to reverseCap.

Change
String reverseCap = "";
to
char[] reverseCap = new char[myStr.length()];
and then for each occurrence of
reverseCap +=
change that to read
reverseCap[i] =
Finally, convert reverseCap to a String:
String result = String.valueOf(reverseCap);
You are currently returning null. Consider returning result, and moving the System.out.println(...) into the main() method.
Update:
I think a better way to approach this is to use a lookup map containing upper/lower case pairs and their inverse to get the replacement character. The nested for loops are a bit gnarly.
/**
* Example: for the string "bcdxyz#3210."
* the lookup map is
* {B=b, b=B, C=c, c=C, D=d, d=D, X=x, x=X, Y=y, y=Y, Z=z, z=Z}
* <p>
* Using a map to get the inverse of a character is faster than repetitively
* looping through the string.
* </p>
* #param arg
* #return
*/
public String reverse2(String arg) {
Map<Character, Character> inverseLookup = createInverseLookupMap(arg);
String myStr = "abc, XYZ; 123.";
String result = myStr.chars()
.mapToObj(ch -> Character.toString(inverseLookup.getOrDefault(ch, (char) ch)))
.collect(Collectors.joining());
return result;
}
private Map<Character, Character> createInverseLookupMap(String arg) {
Map<Character, Character> lookupMap = arg.chars()
.filter(ch -> Character.isLetter(ch))
.mapToObj(this::getPairs)
.flatMap(List::stream)
.collect(Collectors.toMap(Pair::key, Pair::value));
System.out.println(lookupMap);
return lookupMap;
}
private List<Pair> getPairs(int ch) {
char upperVariant = (char) Character.toUpperCase(ch);
return List.of(
new Pair(upperVariant, Character.toLowerCase(upperVariant)),
new Pair(Character.toLowerCase(upperVariant), upperVariant));
}
static record Pair(Character key, Character value) {
}
But if one is not used to the Java streaming API, this might look a bit gnarly too.

What is the best way to replace a letter with the letter following it in the alphabet in Java?

I'm a programming newbie and I am doing a coderbyte exercise that says "
Replace every letter in the string with the letter following it in the alphabet (ie. c becomes d, z becomes a)"
i'm thinking of the following methods:
declare a string called "abcdefghijklmnopqrstuvxyz" and compare each string's char index position with the alphabet's index position, and then just bring the alphabet char that is located at the i+1 index location. But I don't know how it would work from z to a.
I've seen some techniques using ASCII values for every char but I've never done that before and not sure how it works
convert the given string into a char[] array, but then I'm not sure how I would tell the system to get me the next alphabet char
What would be the easiest way to do this?
EDIT
this is my code so far, but it doesn't work.
import java.util.*;
import java.io.*;
class Main {
public static String LetterChanges(String str) {
// code goes here
String alphabet = "abcdefghijklmnopqrstuvwxyz";
String newWord = "";
for (int i = 0; i < str.length(); i++){
for (int j = 0; j < alphabet.length(); i++){
if (str[i] == alphabet[i]){
if (alphabet[i+1].isVowel()){
newWord = newWord + toUpperCase(alphabet[i+1]);
}
else{
newWord = newWord + alphabet[i+1];
}
}
}
}
return str;
}
public static void main (String[] args) {
// keep this function call here
Scanner s = new Scanner(System.in);
System.out.print(LetterChanges(s.nextLine()));
}
}
Can't I ask for the index position of a Char that is a part of a String? in C I could do that.
Other than that not sure why it doesn't work.

I would definitely go with method 1.
I believe what you're looking for is the indexOf method on a String.
First of, I would create a method that given a character finds the next letter in the alphabet and return that. This could be done by finding the letter in your alphabet string and then fetch the letter at index+1. As you also pointed out you would need to take care of the edge case to turn 'z' into 'a', could by done with an if-statement or by having an extra letter 'a' at the end of your alphabet string.
Now all that remains to do is create a loop that runs over all characters in the message and calls the previously made method on that character and constuct a new string with the output.
Hope this helps you figure out a solution.

Assuming that there would be only lower case English letters in the given String the most performant way would be to add +1 to every character, and use either if-statement checking whethe the initial character was z or use the modulo operator % as #sp00m has pointed out in the comment.
Performing a search in the alphabetic string (option 1 in your list) is redundant, as well extracting array char[] from the given string (option 3).
Checking the edge case:
public static String shiftLetters(String str) {
StringBuilder result = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
char next = str.charAt(i);
if (next == 'z') result.append('a'); // checking the edge case
else result.append((char) (next + 1));
}
return result.toString();
}
Applying modulo operator:
public static String shiftLetters(String str) {
StringBuilder result = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
char next = (char) ((str.charAt(i) - 'a' + 1) % 26 + 'a');
result.append(next);
}
return result.toString();
}
main()
public static void main(String[] args) {
System.out.println(shiftLetters("abc"));
System.out.println(shiftLetters("wxyz"));
}
Output:
bcd // "abc"
xyza // "wxyz"

Efficient Algorithm for character replacement in a string

I have two strings
111TTT0000TT11T00
001101
Now I want to replace all appearances of T in string 1 with character from string 2. Like first T with 0, second T with 0, third T with 1 and so on.
One way of doing so is using while loop and compare every character but in programming sense that's not a good way of acheiving it. Can anybody solve it with better algorithm using JAVA?
public void DataParse(String point, String code)
{
//////////tln("Point:"+point);
//////////tln("code:"+code);
// //////////tln(baseString_temp);
int counter=0;
while(baseString_temp.contains(point))
{
if(code!=null)
{
String input=String.valueOf(code.charAt(counter));
//zzzzz(input);
baseString_temp=baseString_temp.replaceFirst(point,input);
counter=counter+1;
}
}
////////////System.out(baseString_temp);
}

Every time, when you use contains and replaceFirst, you force your program enumerate string's character from begining. I believe it will be better to do it in single pass:
public static String replaceToken(String primary, String secondary, char token) {
char [] charArray =primary.toCharArray();
int counter = 0;
for(int i=0; i<charArray.length; i++){
if(charArray[i]==token){
charArray[i] = secondary.charAt(counter);
counter++;
if(counter>=secondary.length()) break;
}
}
return new String(charArray);
}
public static void main(String[] args) {
String result = replaceToken("111TTT0000TT11T00", "001101", 'T');
}
If you realy would like to use RegExp so much, then here you are:
public static String replaceSequence(String primary, String secondary, String sequence){
Pattern pattern = Pattern.compile(sequence + "+");
Matcher matcher = pattern.matcher(primary);
int counter = 0;
char [] charArray = primary.toCharArray();
while(matcher.find() && counter<secondary.length()){
for(int i = matcher.start(); i<matcher.end(); i++){
charArray[i] = secondary.charAt(counter++);
if(counter>=secondary.length()) break;
}
}
return new String(charArray);
}
But, based on description of your task, I prefer first approach.

There's a couple of things. Because Strings are immutable,
baseString_temp=baseString_temp.replaceFirst(point,input);
will always create a new String object (Also, it goes through the string from the beginning, looking for point). If you use a StringBuilder, you only allocate memory once, and then you can mutate it. Actually, using an array like in Ken's answer would be even better, as it allocates less and has less overhead from method calls.
Also, I'd imagine contains() uses a loop of its own, and in the worst case goes to the end of the string. You only need to iterate over the string once, and replace as you go along.
Working example:
public class Test {
private static String replace(char what, String input, String repls) {
StringBuilder sb = new StringBuilder(input);
int replIdx = 0;
for (int i = 0; i < input.length(); i++) {
if (input.charAt(i) == what) {
sb.setCharAt(i, repls.charAt(replIdx++));
}
}
return sb.toString();
}
public static void main(String[] args) {
System.out.println(replace('T', "111TTT0000TT11T00", "001101"));
}
}

Substring alternative

So I'm creating a program that will output the first character of a string and then the first character of another string. Then the second character of the first string and the second character of the second string, and so on.
I created what is below, I was just wondering if there is an alternative to this using a loop or something rather than substring
public class Whatever
{
public static void main(String[] args)
{
System.out.println (interleave ("abcdefg", "1234"));
}
public static String interleave(String you, String me)
{
if (you.length() == 0) return me;
else if (me.length() == 0) return you;
return you.substring(0,1) + interleave(me, you.substring(1));
}
}
OUTPUT: a1b2c3d4efg

Well, if you really don't want to use substrings, you can use String's toCharArray() method, then you can use a StringBuilder to append the chars. With this you can loop through each of the array's indices.
Doing so, this would be the outcome:
public static String interleave(String you, String me) {
char[] a = you.toCharArray();
char[] b = me.toCharArray();
StringBuilder out = new StringBuilder();
int maxLength = Math.max(a.length, b.length);
for( int i = 0; i < maxLength; i++ ) {
if( i < a.length ) out.append(a[i]);
if( i < b.length ) out.append(b[i]);
}
return out.toString();
}
Your code is efficient enough as it is, though. This can be an alternative, if you really want to avoid substrings.

This is a loop implementation (not handling null value, just to show the logic):
public static String interleave(String you, String me) {
StringBuilder result = new StringBuilder();
for (int i = 0 ; i < Math.max(you.length(), me.length()) ; i++) {
if (i < you.length()) {
result.append(you.charAt(i)); }
if (i < me.length()) {
result.append(me.charAt(i));
}
}
return result.toString();
}

The solution I am proposing is based on the expected output - In your particular case consider using split method of String since you are interleaving by on character.
So do something like this,
String[] xs = "abcdefg".split("");
String[] ys = "1234".split("");
Now loop over the larger array and ensure interleave ensuring that you perform length checks on the smaller one before accessing.

To implement this as a loop you would have to maintain the position in and keep adding until one finishes then tack the rest on. Any larger sized strings should use a StringBuilder. Something like this (untested):
int i = 0;
String result = "";
while(i <= you.length() && i <= me.length())
{
result += you.charAt(i) + me.charAt(i);
i++;
}
if(i == you.length())
result += me.substring(i);
else
result += you.substring(i);

Improved (in some sense) #BenjaminBoutier answer.
StringBuilder is the most efficient way to concatenate Strings.
public static String interleave(String you, String me) {
StringBuilder result = new StringBuilder();
int min = Math.min(you.length(), me.length());
String longest = you.length() > me.length() ? you : me;
int i = 0;
while (i < min) { // mix characters
result.append(you.charAt(i));
result.append(me.charAt(i));
i++;
}
while (i < longest.length()) { // add the leading characters of longest
result.append(longest.charAt(i));
i++;
}
return result.toString();
}

How to remove surrogate characters in Java?

I am facing a situation where i get Surrogate characters in text that i am saving to MySql 5.1. As the UTF-16 is not supported in this, I want to remove these surrogate pairs manually by a java method before saving it to the database.
I have written the following method for now and I am curious to know if there is a direct and optimal way to handle this.
Thanks in advance for your help.
public static String removeSurrogates(String query) {
StringBuffer sb = new StringBuffer();
for (int i = 0; i < query.length() - 1; i++) {
char firstChar = query.charAt(i);
char nextChar = query.charAt(i+1);
if (Character.isSurrogatePair(firstChar, nextChar) == false) {
sb.append(firstChar);
} else {
i++;
}
}
if (Character.isHighSurrogate(query.charAt(query.length() - 1)) == false
&& Character.isLowSurrogate(query.charAt(query.length() - 1)) == false) {
sb.append(query.charAt(query.length() - 1));
}
return sb.toString();
}

Here's a couple things:
Character.isSurrogate(char c):
A char value is a surrogate code unit if and only if it is either a low-surrogate code unit or a high-surrogate code unit.
Checking for pairs seems pointless, why not just remove all surrogates?
x == false is equivalent to !x
StringBuilder is better in cases where you don't need synchronization (like a variable that never leaves local scope).
I suggest this:
public static String removeSurrogates(String query) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < query.length(); i++) {
char c = query.charAt(i);
// !isSurrogate(c) in Java 7
if (!(Character.isHighSurrogate(c) || Character.isLowSurrogate(c))) {
sb.append(firstChar);
}
}
return sb.toString();
}
Breaking down the if statement
You asked about this statement:
if (!(Character.isHighSurrogate(c) || Character.isLowSurrogate(c))) {
sb.append(firstChar);
}
One way to understand it is to break each operation into its own function, so you can see that the combination does what you'd expect:
static boolean isSurrogate(char c) {
return Character.isHighSurrogate(c) || Character.isLowSurrogate(c);
}
static boolean isNotSurrogate(char c) {
return !isSurrogate(c);
}
...
if (isNotSurrogate(c)) {
sb.append(firstChar);
}

Java strings are stored as sequences of 16-bit chars, but what they represent is sequences of unicode characters. In unicode terminology, they are stored as code units, but model code points. Thus, it's somewhat meaningless to talk about removing surrogates, which don't exist in the character / code point representation (unless you have rogue single surrogates, in which case you have other problems).
Rather, what you want to do is to remove any characters which will require surrogates when encoded. That means any character which lies beyond the basic multilingual plane. You can do that with a simple regular expression:
return query.replaceAll("[^\u0000-\uffff]", "");

why not simply
for (int i = 0; i < query.length(); i++)
char c = query.charAt(i);
if(!isHighSurrogate(c) && !isLowSurrogate(c))
sb.append(c);
you probably should replace them with "?", instead of out right erasing them.

Just curious. If char is high surrogate is there a need to check the next one? It is supposed to be low surrogate. The modified version would be:
public static String removeSurrogates(String query) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < query.length(); i++) {
char ch = query.charAt(i);
if (Character.isHighSurrogate(ch))
i++;//skip the next char is it's supposed to be low surrogate
else
sb.append(ch);
}
return sb.toString();
}

if remove, all these solutions are useful
but if repalce, below is better
StringBuffer sb = new StringBuffer();
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if(Character.isHighSurrogate(c)){
sb.append('*');
}else if(!Character.isLowSurrogate(c)){
sb.append(c);
}
}
return sb.toString();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What is an efficient way to replace many characters in a string? - java

I doubt, that you can speed up the 'character replacement' at all really. As for the case of regular expression replacement, you may compile the regexs beforehand

Use the function String.replaceAll. Nice article similar with what you want: link

Any time we have problems like this we use regular expressions are they are by far the fastest way to deal with what you are trying to do. Have you already tried regular expressions?

Related

I'm stuck. I need to adjust my loops so they continue to compare my two arrays but not print out all the extra characters

What is the best way to replace a letter with the letter following it in the alphabet in Java?

Efficient Algorithm for character replacement in a string

Substring alternative

How to remove surrogate characters in Java?

Categories

Resources