Transliteration in Java. Redefine each char in a string

Transliteration in Java. Redefine each char in a string - java

The aim of a method is a transliteration of strings, like: афиваў => afivaw.
The problem is: I cannot use charAt method to redefine because there are some letters that demand to be transliterated as two symbols 'ш' => "sh".
I try this:
public static String belrusToEngTranlit (String text){
char[] abcCyr = {'a','б','в','г','д','ё','ж','з','и','к','л','м','н','п','р','с','т','у','ў','ф','х','ц','ш','щ','ы','э','ю','я'};
String[] abcLat = {"a","b","v","g","d","jo","zh","z","i","k","l","m","n","p","r","s","t","u","w","f","h","ts","sh","sch","","e","ju","ja"};
for (int i = 0; i < text.length(); i++) {
for(int x = 0; x < abcCyr.length; x++ )
if (text.charAt(i) == abcCyr[x]) {
text.charAt(i) = abcLat[x];
}
}
return text;
}
May be you can recommend me something except charAt?

String is immutable, so you can't change any text in it. So you can use StringBuilder to store result. See below code.
public static String belrusToEngTranlit (String text){
char[] abcCyr = {'a','б','в','г','д','ё','ж','з','и','к','л','м','н','п','р','с','т','у','ў','ф','х','ц','ш','щ','ы','э','ю','я'};
String[] abcLat = {"a","b","v","g","d","jo","zh","z","i","k","l","m","n","p","r","s","t","u","w","f","h","ts","sh","sch","","e","ju","ja"};
StringBuilder builder = new StringBuilder();
for (int i = 0; i < text.length(); i++) {
for(int x = 0; x < abcCyr.length; x++ )
if (text.charAt(i) == abcCyr[x]) {
builder.append(abcLat[x]);
}
}
return builder.toString();
}

String is immutable, you cannot set chars like this:
text.charAt(i) = abcLat[x]
This line is also syntactically incorrect
(let alone the immutability).
Look at StringBuilder.
This is what I can recommend.
The Cyrillic to Latin is easier, the opposite
(if you need it), will be a bit harder. Why?
Because e.g. you cannot just check for 's', you
need to inspect the next char too to see
if it is 'h' or not.

Strings are immutable (you can't change their contents), but with a small change to use a StringBuilder, which is a kind of mutable String, your code will work:
public static String belrusToEngTranlit (String text){
char[] abcCyr = {'a','б','в','г','д','ё','ж','з','и','к','л','м','н','п','р','с','т','у','ў','ф','х','ц','ш','щ','ы','э','ю','я'};
String[] abcLat = {"a","b","v","g","d","jo","zh","z","i","k","l","m","n","p","r","s","t","u","w","f","h","ts","sh","sch","","e","ju","ja"};
StringBuilder english = new StringBuilder();
outer:
for (int i = 0; i < text.length(); i++) {
for(int x = 0; x < abcCyr.length; x++ )
if (text.charAt(i) == abcCyr[x]) {
english.append(abcLat[x]);
continue outer; // jump to next letter
}
// if no replacement made, use character as-is
english.append(text.charAt(i));
}
return english.toString();
}
Note that there's the replaceEach() utility method in Apache's commons-lang library that does exactly this. Rather than reinvent the wheel, you could simply do this:
public static String belrusToEngTranlit (String text){
String[] abcCyr = {"a","б","в","г","д","ё","ж","з","и","к","л","м","н","п","р","с","т","у","ў","ф","х","ц","ш","щ","ы","э","ю","я"};
String[] abcLat = {"a","b","v","g","d","jo","zh","z","i","k","l","m","n","p","r","s","t","u","w","f","h","ts","sh","sch","","e","ju","ja"};
return StringUtils.replaceEach(text, abcCyr, abcLat);
}

Related

How can I compare two strings and try to print out comman latters but i could not avoid to repeat a latter more than once

I am comparing two strings and try to print out comman latters but i could not avoid to repeat a latter more than once.
here is my code
public static String getCommonCharacters ( final String a, final String b){
String result="";
for(int i = 0; i < a.length(); i++){
for(int j = 0; j < b.length(); j++)
if(a.charAt(i)==b.charAt(j)){
result +=a.charAt(i);
}
} return result;
the problem is when a = "baac" and b =" fdeabac " then i get out = "aabaac" instead of "abc" or "bca" etc

change the if condition to:
if (a.charAt(i) == b.charAt(j) &&
!result.contains(String.valueOf(a.charAt(i)))) { ... }
Thus, you only perform the statement:
result +=a.charAt(i);
if the accumulating string doesn't already contain the character.

Working code with minor modification to yours:
public class StringCompare {
public static String getCommonCharacters(final String a, final String b) {
String result = "";
for (int i = 0; i < a.length(); i++) {
for (int j = 0; j < b.length(); j++)
if (a.charAt(i) == b.charAt(j)) {
result += a.charAt(i);
}
}
return result;
}
public static void main(String[] args) {
System.out.println(getCommonCharacters("baac", "fdeabac ").replaceAll(
"(.)\\1{1,}", "$1")); // You could use regular expressions for
// that. Removing repeated characters.
}
}
Output:
bac
Pattern explanation:
"(.)\1{1,}" means any character (added to group 1) followed by itself at least once
"$1" references contents of group 1
More about Regular Expressions Oracle Docs

Hier is another solution: create two new HashSet for each String which we change to charArray, then add them to hashSet with For loops,
retainAll() method provide used to remove it's elements from a list that are not contained in the specified collection.#Java Doc by Oracle
Last For-Loop used to concatenate char as a strings.
String str ="";
Set<Character> s1 = new HashSet<Character>();
Set<Character> s2 = new HashSet<Character>();
for(char c:a.toCharArray()) s1.add(c);
for(char c:b.toCharArray()) s2.add(c);
s1.retainAll(s2);
for(char s:s1) str +=s;
return str;

Is converting to String the most succinct way to remove the last comma in output in java?

So basically this is how my code looked like
public static void printPrime(int[] arr)
{
int len = arr.length;
for(int i = 0; i < len; i++)
{
int c = countFactor(arr[i]);
if(c == 2)
{
System.out.print(arr[i] + ",");
}
}
}
So the output did have the 'comma' in the end. I tried looking around to remove the last comma some answers say to print last element separately but that can only happen when output depends on the for loop and not the if condition.
But as you can see I don't know how many elements I am going to get from the if condition. Only two things I can think of, to add another loop or use String then substr to output.
So I converted it to String
public static void printPrime(int[] arr)
{
int len = arr.length;
String str = "";
for(int i = 0; i < len; i++)
{
int c = countFactor(arr[i]);
if(c == 2)
{
str = str + arr[i] + ",";
}
}
str = str.substring(0, str.length()-1);
System.out.println(str);
}
My question is about knowing the optimum way (converting to string then substringing it?) for similar questions or could there be a better way as well? That I seem to be missing.

You don't have to construct a string. Consider the following slight tweaks:
public static void printPrime(int[] arr)
{
int len = arr.length;
String sep = ""; // HERE
for(int i = 0; i < len; i++)
{
int c = countFactor(arr[i]);
if(c == 2)
{
System.out.print(sep); // HERE
sep = ",";
System.out.print(arr[i]);
}
}
}
Print the delimiter first, and store its value in a variable: the first time it's printed, it will print the empty string. Thereafter, it prints the comma.

Whatever means you use should operate correctly for an empty array (length 0), a singleton array (length 1) and a long array (a large length).
Adding the comma then removing it requires special case handling for the empty array case. So you must have conditional code (an if statement) whatever you do.

Substring alternative

So I'm creating a program that will output the first character of a string and then the first character of another string. Then the second character of the first string and the second character of the second string, and so on.
I created what is below, I was just wondering if there is an alternative to this using a loop or something rather than substring
public class Whatever
{
public static void main(String[] args)
{
System.out.println (interleave ("abcdefg", "1234"));
}
public static String interleave(String you, String me)
{
if (you.length() == 0) return me;
else if (me.length() == 0) return you;
return you.substring(0,1) + interleave(me, you.substring(1));
}
}
OUTPUT: a1b2c3d4efg

Well, if you really don't want to use substrings, you can use String's toCharArray() method, then you can use a StringBuilder to append the chars. With this you can loop through each of the array's indices.
Doing so, this would be the outcome:
public static String interleave(String you, String me) {
char[] a = you.toCharArray();
char[] b = me.toCharArray();
StringBuilder out = new StringBuilder();
int maxLength = Math.max(a.length, b.length);
for( int i = 0; i < maxLength; i++ ) {
if( i < a.length ) out.append(a[i]);
if( i < b.length ) out.append(b[i]);
}
return out.toString();
}
Your code is efficient enough as it is, though. This can be an alternative, if you really want to avoid substrings.

This is a loop implementation (not handling null value, just to show the logic):
public static String interleave(String you, String me) {
StringBuilder result = new StringBuilder();
for (int i = 0 ; i < Math.max(you.length(), me.length()) ; i++) {
if (i < you.length()) {
result.append(you.charAt(i)); }
if (i < me.length()) {
result.append(me.charAt(i));
}
}
return result.toString();
}

The solution I am proposing is based on the expected output - In your particular case consider using split method of String since you are interleaving by on character.
So do something like this,
String[] xs = "abcdefg".split("");
String[] ys = "1234".split("");
Now loop over the larger array and ensure interleave ensuring that you perform length checks on the smaller one before accessing.

To implement this as a loop you would have to maintain the position in and keep adding until one finishes then tack the rest on. Any larger sized strings should use a StringBuilder. Something like this (untested):
int i = 0;
String result = "";
while(i <= you.length() && i <= me.length())
{
result += you.charAt(i) + me.charAt(i);
i++;
}
if(i == you.length())
result += me.substring(i);
else
result += you.substring(i);

Improved (in some sense) #BenjaminBoutier answer.
StringBuilder is the most efficient way to concatenate Strings.
public static String interleave(String you, String me) {
StringBuilder result = new StringBuilder();
int min = Math.min(you.length(), me.length());
String longest = you.length() > me.length() ? you : me;
int i = 0;
while (i < min) { // mix characters
result.append(you.charAt(i));
result.append(me.charAt(i));
i++;
}
while (i < longest.length()) { // add the leading characters of longest
result.append(longest.charAt(i));
i++;
}
return result.toString();
}

Filter bad words | java 'replace'

In an attempt to filter the bad words, I found the 'replace' function in java is not as handy as intended.
Please find below the code :
Eg : consider the word 'abcde' and i want to filter it to 'a***e'.
String test = "abcde";
for (int i = 1; i < sdf.length() - 1; i++) {
test= test.replace(test.charAt(i), '*');
}
System.out.print(test);
Output : a***e
But if the String is String test = "bbcde";, the output is ****e. It seems, if the word has repetitive letters(as in here), the replace function replaces the repetitive letters
too.
Why is it so? I want to filter the words excluding the first and last letter.

That is because String.replace(char, char) replaces all occurrences of the first character (according to its Javadoc).
What you want is probably more like this:
char[] word = test.toCharArray();
for (int i = 1; i < word.lengh - 1; i++) { // make sure to start at second char, and end at one-but-last char
word[i] = '*';
}
System.out.println(String.copyValueOf(word));

since String.replace(char, char) replaces all occurrences of specified char, this would be a better approach for your requirement:
String test = "abcde";
String replacement = "";
for (int i = 0; i < sdf.length(); i++) {
replacement += "*";
}
test= test.replace(sdf, replacement );
System.out.print(test);

It seems, if the word has repetitive letters(as in here), the replace function replaces the repetitive letters too. Why is it so?
Why? Because that's just how it works, exactly as the API documentation of String.replace(char oldChar, char newChar) says:
Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar.
If you just want to replace the content of the string by the first letter, some asterisks and the last letter, then you don't need to use replace at all.
String test = "abcde";
if (test.length() >= 1) {
StringBuilder result = new StringBuilder();
result.append(test.charAt(0));
for (int i = 0; i < test.length() - 2; ++i) {
result.append('*');
}
result.append(test.charAt(test.length() - 1));
test = result.toString();
}
System.out.println(test);

public static void main(String[] args) {
String test = "bbcde";
String output = String.valueOf(test.charAt(0));
for (int i = 1; i < test.length() - 1; i++) {
output = output + "*";
}
output = output + String.valueOf(test.charAt(test.length() - 1));
System.out.print(output);
}

You should use the replaceAll-Function:
Link
With this you can replace all times you find a given substring in a string (f.e. "abcde") and replace all these with another string (f.e. "a***e").
String test = "abcde";
String replacement = "";
for (int i = 0; i < test.length(); i++) {
if (i==0 || i==test.length()-1){
replacement += test.charAt(i);
} else {
replacement += "*";
}
}
sdf = sdf.replaceAll(test, replacement);
System.out.print(test);

What is an efficient way to replace many characters in a string?

String handling in Java is something I'm trying to learn to do well. Currently I want to take in a string and replace any characters I find.
Here is my current inefficient (and kinda silly IMO) function. It was written to just work.
public String convertWord(String word)
{
return word.toLowerCase().replace('á', 'a')
.replace('é', 'e')
.replace('í', 'i')
.replace('ú', 'u')
.replace('ý', 'y')
.replace('ð', 'd')
.replace('ó', 'o')
.replace('ö', 'o')
.replaceAll("[-]", "")
.replaceAll("[.]", "")
.replaceAll("[/]", "")
.replaceAll("[æ]", "ae")
.replaceAll("[þ]", "th");
}
I ran 1.000.000 runs of it and it took 8182ms. So how should I proceed in changing this function to make it more efficient?
Solution found:
Converting the function to this
public String convertWord(String word)
{
StringBuilder sb = new StringBuilder();
char[] charArr = word.toLowerCase().toCharArray();
for(int i = 0; i < charArr.length; i++)
{
// Single character case
if(charArr[i] == 'á')
{
sb.append('a');
}
// Char to two characters
else if(charArr[i] == 'þ')
{
sb.append("th");
}
// Remove
else if(charArr[i] == '-')
{
}
// Base case
else
{
sb.append(word.charAt(i));
}
}
return sb.toString();
}
Running this function 1.000.000 times takes 518ms. So I think that is efficient enough. Thanks for the help guys :)

You could create a table of String[] which is Character.MAX_VALUE in length. (Including the mapping to lower case)
As the replacements got more complex, the time to perform them would remain the same.
private static final String[] REPLACEMENT = new String[Character.MAX_VALUE+1];
static {
for(int i=Character.MIN_VALUE;i<=Character.MAX_VALUE;i++)
REPLACEMENT[i] = Character.toString(Character.toLowerCase((char) i));
// substitute
REPLACEMENT['á'] = "a";
// remove
REPLACEMENT['-'] = "";
// expand
REPLACEMENT['æ'] = "ae";
}
public String convertWord(String word) {
StringBuilder sb = new StringBuilder(word.length());
for(int i=0;i<word.length();i++)
sb.append(REPLACEMENT[word.charAt(i)]);
return sb.toString();
}

My suggestion would be:
Convert the String to a char[] array
Run through the array, testing each character one by one (e.g. with a switch statement) and replacing it if needed
Convert the char[] array back to a String
I think this is probably the fastest performance you will get in pure Java.
EDIT: I notice you are doing some changes that change the length of the string. In this case, the same principle applies, however you need to keep two arrays and increment both a source index and a destination index separately. You might also need to resize the destination array if you run out of target space (i.e. reallocate a larger array and arraycopy the existing destination array into it)

My implementation is based on look up table.
public static String convertWord(String str) {
char[] words = str.toCharArray();
char[] find = {'á','é','ú','ý','ð','ó','ö','æ','þ','-','.',
'/'};
String[] replace = {"a","e","u","y","d","o","o","ae","th"};
StringBuilder out = new StringBuilder(str.length());
for (int i = 0; i < words.length; i++) {
boolean matchFailed = true;
for(int w = 0; w < find.length; w++) {
if(words[i] == find[w]) {
if(w < replace.length) {
out.append(replace[w]);
}
matchFailed = false;
break;
}
}
if(matchFailed) out.append(words[i]);
}
return out.toString();
}

My first choice would be to use a StringBuilder because you need to remove some chars from the string.
Second choice would be to iterate throw the array of chars and add the treated char to another array of the inicial size of the string. Then you would need to copy the array to trim the possible unused positions.
After that, I would make some performance tests to see witch one is better.

I doubt, that you can speed up the 'character replacement' at all really. As for the case of regular expression replacement, you may compile the regexs beforehand

Use the function String.replaceAll.
Nice article similar with what you want: link

Any time we have problems like this we use regular expressions are they are by far the fastest way to deal with what you are trying to do.
Have you already tried regular expressions?

What i see being inefficient is that you are gonna check again characters that have already been replaced, which is useless.
I would get the charArray of the String instance, iterate over it, and for each character spam a series of if-else like this:
char[] array = word.toCharArray();
for(int i=0; i<array.length; ++i){
char currentChar = array[i];
if(currentChar.equals('é'))
array[i] = 'e';
else if(currentChar.equals('ö'))
array[i] = 'o';
else if(//...
}

I just implemented this utility class that replaces a char or a group of chars of a String. It is equivalent to bash tr and perl tr///, aka, transliterate. I hope it helps someone!
package your.package.name;
/**
* Utility class that replaces chars of a String, aka, transliterate.
*
* It's equivalent to bash 'tr' and perl 'tr///'.
*
*/
public class ReplaceChars {
public static String replace(String string, String from, String to) {
return new String(replace(string.toCharArray(), from.toCharArray(), to.toCharArray()));
}
public static char[] replace(char[] chars, char[] from, char[] to) {
char[] output = chars.clone();
for (int i = 0; i < output.length; i++) {
for (int j = 0; j < from.length; j++) {
if (output[i] == from[j]) {
output[i] = to[j];
break;
}
}
}
return output;
}
/**
* For tests!
*/
public static void main(String[] args) {
// Example from: https://en.wikipedia.org/wiki/Caesar_cipher
String string = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG";
String from = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
String to = "XYZABCDEFGHIJKLMNOPQRSTUVW";
System.out.println();
System.out.println("Cesar cypher: " + string);
System.out.println("Result: " + ReplaceChars.replace(string, from, to));
}
}
This is the output:
Cesar cypher: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Result: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Transliteration in Java. Redefine each char in a string - java

Related

How can I compare two strings and try to print out comman latters but i could not avoid to repeat a latter more than once

Is converting to String the most succinct way to remove the last comma in output in java?

Substring alternative

Filter bad words | java 'replace'

What is an efficient way to replace many characters in a string?

Categories

Resources