Efficient Algorithm for character replacement in a string

Efficient Algorithm for character replacement in a string - java

I have two strings
111TTT0000TT11T00
001101
Now I want to replace all appearances of T in string 1 with character from string 2. Like first T with 0, second T with 0, third T with 1 and so on.
One way of doing so is using while loop and compare every character but in programming sense that's not a good way of acheiving it. Can anybody solve it with better algorithm using JAVA?
public void DataParse(String point, String code)
{
//////////tln("Point:"+point);
//////////tln("code:"+code);
// //////////tln(baseString_temp);
int counter=0;
while(baseString_temp.contains(point))
{
if(code!=null)
{
String input=String.valueOf(code.charAt(counter));
//zzzzz(input);
baseString_temp=baseString_temp.replaceFirst(point,input);
counter=counter+1;
}
}
////////////System.out(baseString_temp);
}

Every time, when you use contains and replaceFirst, you force your program enumerate string's character from begining. I believe it will be better to do it in single pass:
public static String replaceToken(String primary, String secondary, char token) {
char [] charArray =primary.toCharArray();
int counter = 0;
for(int i=0; i<charArray.length; i++){
if(charArray[i]==token){
charArray[i] = secondary.charAt(counter);
counter++;
if(counter>=secondary.length()) break;
}
}
return new String(charArray);
}
public static void main(String[] args) {
String result = replaceToken("111TTT0000TT11T00", "001101", 'T');
}
If you realy would like to use RegExp so much, then here you are:
public static String replaceSequence(String primary, String secondary, String sequence){
Pattern pattern = Pattern.compile(sequence + "+");
Matcher matcher = pattern.matcher(primary);
int counter = 0;
char [] charArray = primary.toCharArray();
while(matcher.find() && counter<secondary.length()){
for(int i = matcher.start(); i<matcher.end(); i++){
charArray[i] = secondary.charAt(counter++);
if(counter>=secondary.length()) break;
}
}
return new String(charArray);
}
But, based on description of your task, I prefer first approach.

There's a couple of things. Because Strings are immutable,
baseString_temp=baseString_temp.replaceFirst(point,input);
will always create a new String object (Also, it goes through the string from the beginning, looking for point). If you use a StringBuilder, you only allocate memory once, and then you can mutate it. Actually, using an array like in Ken's answer would be even better, as it allocates less and has less overhead from method calls.
Also, I'd imagine contains() uses a loop of its own, and in the worst case goes to the end of the string. You only need to iterate over the string once, and replace as you go along.
Working example:
public class Test {
private static String replace(char what, String input, String repls) {
StringBuilder sb = new StringBuilder(input);
int replIdx = 0;
for (int i = 0; i < input.length(); i++) {
if (input.charAt(i) == what) {
sb.setCharAt(i, repls.charAt(replIdx++));
}
}
return sb.toString();
}
public static void main(String[] args) {
System.out.println(replace('T', "111TTT0000TT11T00", "001101"));
}
}

Related

a program which identifies the differences between pairs of strings

My problem is that I need to identify characters which differ between the two given strings in a visually striking way. Output the two input strings on two lines, and then identify the differences on the line below using periods (for identical characters) and asterisks (for differing characters). For example:
ATCCGCTTAGAGGGATT
GTCCGTTTAGAAGGTTT
*....*.....*..*..
I have tried to write two string with each other but I dont know how to make the program check for every character in the string and see if those match
This is what I have done so far :/
System.out.println("String 1: ");
String var1 = Scanner.nextLine();
System.out.println("String 2: ");
String var2 = Scanner.nextLine();
if (same (var1, var2))
System.out.println(".........");
else
System.out.println("********");
public static boolean same (String var1, String var2){
if (var1.equals(var2))
{
return true;
}
else
{
return false;
}
Can anyone help me with this?

You need to loop through your Strings and compare characters one by one. To run through your list you can make a for-loop. Use an int as counter and use the method length() to obtain your string size.
for(int i=0; i<string1.length(); i++ {
// do stuff
}
Then since you have a counter going through all position of your string, you can obtain the character at a specific position in this string using the method charAt()
char char1 = string1.charAt(i);
Then compare the character to check if they are the same. If they are print a dot . if they're not print an asterisk *
if(char1 == char2) {
System.out.print(".");
} else {
System.out.print("*");
}
In the above part I supposed your two string have the same size. If it's not the case, you can first determine which one is the smallest (and so which is the biggest) :
String smallestString;
String biggestString;
if(string1.size() > string2.sise()) {
smallestString = string2;
biggestString = string1;
else {
smallestString = string1;
biggestString = string2;
}
Then make your for loop go through the smallest String, otherwise you will face IndexOutOfBoundsException.
for(int i=0; i<smallestString.length(); i++ {
// do stuff
}
And the end of this for loop print asterisks for the characters that left in the biggest String
for(int j=smallestString.length(); j<biggestString.length(); j++) {
System.out.print("*");
}

This is what I've come up with.Mind you there are better ways to do this and I've just written it with as much effort as you put in your question.
public class AskBetterQuestion{
public static void main(String[] args) {
// TODO Auto-generated method stub
String w1="ATCCGCTTAGAGGGATT";
String w2="GTCCGTTTAGAAGGTTT";
char[] first = w1.toCharArray();
char[] second = w2.toCharArray();
int minLength = Math.min(first.length, second.length);
char[] out=new char[minLength];
for(int i = 0; i < minLength; i++)
{
if (first[i] != second[i])
{
out[i]='.';
}
else out[i]='*';
}
System.out.println(w1);
System.out.println(w2);
System.out.print(out);
}
}

Elegant solution to replace substrings

I have a challenging problem that I am having trouble with. I have an unmodified string, for instance abcdefg and an array of objects that contains a string and indices.
For instance, object 1 contains d and indices [1, 2];
Then I would replace whatever letter is at substring [1,2] with d, with the resulting string looking like adcdefg.
The problem comes when the replacing text is of different length then the replaced text. I need some way to keep track of the length changes or else the indices of further replacements would be inaccurate.
Here is what I have so far:
for (CandidateResult cResult : candidateResultList) {
int[] index = cResult.getIndex();
finalResult = finalResult.substring(0, index[0]) + cResult.getCandidate()
+ finalResult.substring(index[1], finalResult.length()); //should switch to stringbuilder
}
return finalResult;
This does not take care of the corner case mentioned above.
Additionally, this is not homework if anyone was wondering. This is actually for an ocr trainer program that I'm creating.

Here's an implementation, I haven't tested it yet but you can try to get an idea. I'll add comments to the code as needed.
/** This class represents a replacement of characters in the original String, s[i0:if],
* with a new string, str.
**/
class Replacement{
int s, e;
String str;
public Replacement(int s, int e, String str){
this.s = s;
this.e = e;
this.str = str;
}
}
String stringReplace(String str, List<Replacement> replacements){
// Sort Replacements by starting index
Collections.sort(replacements, new Comparator<Replacement>(){
#Override public int compare(Replacement r1, Replacement r2){
return Integer.compare(r1.s, r2.s);
}
};
StringBuilder sb = new StringBuilder();
int repPos = 0;
for(int i = 0; i < str.length; i++){
Replacement rep = replacements.get(repPos);
if(rep.s == i){ // Replacement starts here, at i == s
sb.append(rep.str); // Append the replacement
i = rep.e - 1; // Advance i -> e - 1
repPos++; // Advance repPos by 1
} else {
sb.append(str.charAt(i)); // No replacement, append char
}
}
return sb.toString();
}
[Edit:] After seeing ptyx's answer, I think that way is probably more elegant. If you sort in reverse order, you shouldn't have to worry about the different lengths:
String stringReplace(String str, List<Replacement> replacements){
// Sort Replacements in reverse order by index
Collections.sort(replacements, new Comparator<Replacement>(){
#Override public int compare(Replacement r1, Replacement r2){
return -Integer.compare(r1.s, r2.s); // Note reverse order
}
};
// By replacing in reverse order, shouldn't affect next replacement.
StringBuilder sb = new StringBuilder(str);
for(Replacement rep : replacements){
sb.replace(rep.s, rep.e, rep.str);
}
return sb.toString();
}

Assuming no overlapping ranges to replace, process your replacements in reverse position order - done.
It doesn't matter what you replace [5-6] with, it will never modify [0-4] therefore you don't need to bother about any index mapping for, for example: [1,2]

This seems to do as you ask, basically you just translate the replacement based on previous inserts
public static void main(String[] args) {
Replacer[] replacers = {
new Replacer(new int[]{ 1 , 2 }, "ddd") ,
new Replacer(new int[]{ 2 , 3 }, "a")
};
System.out.println(
m("abcdefg", replacers));
}
public static String m(String s1, Replacer[] replacers){
StringBuilder builder = new StringBuilder(s1);
int translate = 0;
for (int i = 0 ; i < replacers.length ; i++) {
translate += replacers[i].replace(builder, translate);
}
return builder.toString();
}
public static class Replacer{
int[] arr;
String toRep;
public Replacer(int[] arr, String toRep) {
this.arr = arr;
this.toRep = toRep;
}
public int replace(StringBuilder b, int translate){
b.replace(arr[0] + translate, arr[1] + translate, toRep);
return arr[1];
}
}

Substring alternative

So I'm creating a program that will output the first character of a string and then the first character of another string. Then the second character of the first string and the second character of the second string, and so on.
I created what is below, I was just wondering if there is an alternative to this using a loop or something rather than substring
public class Whatever
{
public static void main(String[] args)
{
System.out.println (interleave ("abcdefg", "1234"));
}
public static String interleave(String you, String me)
{
if (you.length() == 0) return me;
else if (me.length() == 0) return you;
return you.substring(0,1) + interleave(me, you.substring(1));
}
}
OUTPUT: a1b2c3d4efg

Well, if you really don't want to use substrings, you can use String's toCharArray() method, then you can use a StringBuilder to append the chars. With this you can loop through each of the array's indices.
Doing so, this would be the outcome:
public static String interleave(String you, String me) {
char[] a = you.toCharArray();
char[] b = me.toCharArray();
StringBuilder out = new StringBuilder();
int maxLength = Math.max(a.length, b.length);
for( int i = 0; i < maxLength; i++ ) {
if( i < a.length ) out.append(a[i]);
if( i < b.length ) out.append(b[i]);
}
return out.toString();
}
Your code is efficient enough as it is, though. This can be an alternative, if you really want to avoid substrings.

This is a loop implementation (not handling null value, just to show the logic):
public static String interleave(String you, String me) {
StringBuilder result = new StringBuilder();
for (int i = 0 ; i < Math.max(you.length(), me.length()) ; i++) {
if (i < you.length()) {
result.append(you.charAt(i)); }
if (i < me.length()) {
result.append(me.charAt(i));
}
}
return result.toString();
}

The solution I am proposing is based on the expected output - In your particular case consider using split method of String since you are interleaving by on character.
So do something like this,
String[] xs = "abcdefg".split("");
String[] ys = "1234".split("");
Now loop over the larger array and ensure interleave ensuring that you perform length checks on the smaller one before accessing.

To implement this as a loop you would have to maintain the position in and keep adding until one finishes then tack the rest on. Any larger sized strings should use a StringBuilder. Something like this (untested):
int i = 0;
String result = "";
while(i <= you.length() && i <= me.length())
{
result += you.charAt(i) + me.charAt(i);
i++;
}
if(i == you.length())
result += me.substring(i);
else
result += you.substring(i);

Improved (in some sense) #BenjaminBoutier answer.
StringBuilder is the most efficient way to concatenate Strings.
public static String interleave(String you, String me) {
StringBuilder result = new StringBuilder();
int min = Math.min(you.length(), me.length());
String longest = you.length() > me.length() ? you : me;
int i = 0;
while (i < min) { // mix characters
result.append(you.charAt(i));
result.append(me.charAt(i));
i++;
}
while (i < longest.length()) { // add the leading characters of longest
result.append(longest.charAt(i));
i++;
}
return result.toString();
}

replace string one with instance of second string remove and replaced with third string without using replaceAll

This is what I came up with, but the last test case doesn't work. Any suggestions?
public class tester
{
public static String replaceAll(String a, String b, String c){
for(;;){
int i = a.indexOf(b);
if(i==-1){
break;
}
a = a.substring(0,i)+ c + a.substring(i + b.length());
}
return a;
}
public static void main(String args[]){
System.out.println(replaceAll("hello my friend, how are you?", "h", "y"));
System.out.println(replaceAll("CS221 is great!!","great","awesome"));
System.out.println(replaceAll("aaaa","a","aaa"));
}
}

Assuming I'm reading your code right:
Seeks "a" in "aaaa". Finds it at index zero.
Adds "aaa" to the start of "aaaa" yielding "aaaaaaa".
Seeks "a" in "aaaaaaa". Finds it at index zero.
Adds "aaa" to "aaaaaaa" yielding "aaaaaaaaaa".
...
So the string will keep growing indefinitely. You could look into updating the search index by the length of the inserted text to make sure you don't explode your strings like this.
Also, if you're going to be doing a lot of string concatenation, I'd recommend using a StringBuilder to prevent loads of copying.
Which is to say that something like this should work:
public static String replaceAll(String a, String b, String c){
StringBuilder sb = new StringBuilder();
int i = 0;
while(i < a.length()){
int j = a.indexOf(b, i);
if(j == -1){
sb.append(a.substring(i, a.length()));
i = a.length();
} else {
sb.append(a.substring(i, j));
sb.append(c);
i = j + b.length();
}
}
return sb.toString();
}
EDIT - added solution.

What is an efficient way to replace many characters in a string?

String handling in Java is something I'm trying to learn to do well. Currently I want to take in a string and replace any characters I find.
Here is my current inefficient (and kinda silly IMO) function. It was written to just work.
public String convertWord(String word)
{
return word.toLowerCase().replace('á', 'a')
.replace('é', 'e')
.replace('í', 'i')
.replace('ú', 'u')
.replace('ý', 'y')
.replace('ð', 'd')
.replace('ó', 'o')
.replace('ö', 'o')
.replaceAll("[-]", "")
.replaceAll("[.]", "")
.replaceAll("[/]", "")
.replaceAll("[æ]", "ae")
.replaceAll("[þ]", "th");
}
I ran 1.000.000 runs of it and it took 8182ms. So how should I proceed in changing this function to make it more efficient?
Solution found:
Converting the function to this
public String convertWord(String word)
{
StringBuilder sb = new StringBuilder();
char[] charArr = word.toLowerCase().toCharArray();
for(int i = 0; i < charArr.length; i++)
{
// Single character case
if(charArr[i] == 'á')
{
sb.append('a');
}
// Char to two characters
else if(charArr[i] == 'þ')
{
sb.append("th");
}
// Remove
else if(charArr[i] == '-')
{
}
// Base case
else
{
sb.append(word.charAt(i));
}
}
return sb.toString();
}
Running this function 1.000.000 times takes 518ms. So I think that is efficient enough. Thanks for the help guys :)

You could create a table of String[] which is Character.MAX_VALUE in length. (Including the mapping to lower case)
As the replacements got more complex, the time to perform them would remain the same.
private static final String[] REPLACEMENT = new String[Character.MAX_VALUE+1];
static {
for(int i=Character.MIN_VALUE;i<=Character.MAX_VALUE;i++)
REPLACEMENT[i] = Character.toString(Character.toLowerCase((char) i));
// substitute
REPLACEMENT['á'] = "a";
// remove
REPLACEMENT['-'] = "";
// expand
REPLACEMENT['æ'] = "ae";
}
public String convertWord(String word) {
StringBuilder sb = new StringBuilder(word.length());
for(int i=0;i<word.length();i++)
sb.append(REPLACEMENT[word.charAt(i)]);
return sb.toString();
}

My suggestion would be:
Convert the String to a char[] array
Run through the array, testing each character one by one (e.g. with a switch statement) and replacing it if needed
Convert the char[] array back to a String
I think this is probably the fastest performance you will get in pure Java.
EDIT: I notice you are doing some changes that change the length of the string. In this case, the same principle applies, however you need to keep two arrays and increment both a source index and a destination index separately. You might also need to resize the destination array if you run out of target space (i.e. reallocate a larger array and arraycopy the existing destination array into it)

My implementation is based on look up table.
public static String convertWord(String str) {
char[] words = str.toCharArray();
char[] find = {'á','é','ú','ý','ð','ó','ö','æ','þ','-','.',
'/'};
String[] replace = {"a","e","u","y","d","o","o","ae","th"};
StringBuilder out = new StringBuilder(str.length());
for (int i = 0; i < words.length; i++) {
boolean matchFailed = true;
for(int w = 0; w < find.length; w++) {
if(words[i] == find[w]) {
if(w < replace.length) {
out.append(replace[w]);
}
matchFailed = false;
break;
}
}
if(matchFailed) out.append(words[i]);
}
return out.toString();
}

My first choice would be to use a StringBuilder because you need to remove some chars from the string.
Second choice would be to iterate throw the array of chars and add the treated char to another array of the inicial size of the string. Then you would need to copy the array to trim the possible unused positions.
After that, I would make some performance tests to see witch one is better.

I doubt, that you can speed up the 'character replacement' at all really. As for the case of regular expression replacement, you may compile the regexs beforehand

Use the function String.replaceAll.
Nice article similar with what you want: link

Any time we have problems like this we use regular expressions are they are by far the fastest way to deal with what you are trying to do.
Have you already tried regular expressions?

What i see being inefficient is that you are gonna check again characters that have already been replaced, which is useless.
I would get the charArray of the String instance, iterate over it, and for each character spam a series of if-else like this:
char[] array = word.toCharArray();
for(int i=0; i<array.length; ++i){
char currentChar = array[i];
if(currentChar.equals('é'))
array[i] = 'e';
else if(currentChar.equals('ö'))
array[i] = 'o';
else if(//...
}

I just implemented this utility class that replaces a char or a group of chars of a String. It is equivalent to bash tr and perl tr///, aka, transliterate. I hope it helps someone!
package your.package.name;
/**
* Utility class that replaces chars of a String, aka, transliterate.
*
* It's equivalent to bash 'tr' and perl 'tr///'.
*
*/
public class ReplaceChars {
public static String replace(String string, String from, String to) {
return new String(replace(string.toCharArray(), from.toCharArray(), to.toCharArray()));
}
public static char[] replace(char[] chars, char[] from, char[] to) {
char[] output = chars.clone();
for (int i = 0; i < output.length; i++) {
for (int j = 0; j < from.length; j++) {
if (output[i] == from[j]) {
output[i] = to[j];
break;
}
}
}
return output;
}
/**
* For tests!
*/
public static void main(String[] args) {
// Example from: https://en.wikipedia.org/wiki/Caesar_cipher
String string = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG";
String from = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
String to = "XYZABCDEFGHIJKLMNOPQRSTUVW";
System.out.println();
System.out.println("Cesar cypher: " + string);
System.out.println("Result: " + ReplaceChars.replace(string, from, to));
}
}
This is the output:
Cesar cypher: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Result: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Efficient Algorithm for character replacement in a string - java

Related

a program which identifies the differences between pairs of strings

Elegant solution to replace substrings

Substring alternative

replace string one with instance of second string remove and replaced with third string without using replaceAll

What is an efficient way to replace many characters in a string?

Categories

Resources