Arabic to latin conversion failure?

Arabic to latin conversion failure? - java

package com.webom.crypt;
import org.apache.commons.lang3.StringEscapeUtils;
import com.ibm.icu.text.Transliterator;
public class Test {
public static String ARABIC_TO_LATIN = "Any-Arabic";
public static String ARABIC_TO_LATIN_NO_ACCENTS = "Arabic-Latin/BGN; nfd; [:nonspacing mark:] remove; nfc";
public static void main(String[] args) {
String ARABICString = "صدام حسين التكريتي";
String unicodeCodes = StringEscapeUtils.escapeJava(ARABICString);
System.out.println("Unicode codes:" + unicodeCodes);
// conversion
Transliterator ARABICToLatinTrans = Transliterator.getInstance(ARABIC_TO_LATIN);
String result1 = ARABICToLatinTrans.transliterate(ARABICString);
System.out.println("ARABIC to Latin:" + result1);
// conversion
Transliterator ARABICToLatinNoAccentsTrans = Transliterator.getInstance(ARABIC_TO_LATIN_NO_ACCENTS);
String result2 = ARABICToLatinNoAccentsTrans.transliterate(ARABICString);
System.out.println("ARABIC to Latin (no accents):" + result2);
}
}
As conversion of arabic to latin fails because there is issue regarding to the instances .Could you please find out the correct instance string? As when you use google translator it will show exact conversion.

Related

Convert String with special characters

String descriptionEscaped = "Domnul Florin Cîţu afirmă"
=>
String descriptionEscaped = "Domnul Florin Cîţu afirmă, sâmbătă"
Is there a way to do this ?
(Sorry for the confusing title of question)

following is helping you to convert special characters
public static void main(String[] args) {
String descriptionEscaped = "Domnul Florin Cîţu afirmă";
descriptionEscaped =
StringEscapeUtils.unescapeHtml4(descriptionEscaped);
System.out.println(descriptionEscaped);
}

Extract a specific word from a text in java

I want to extract a particular word from a text using Java. Is it possible
e.g. :
String str = "this is 009876 birthday of mine";
I want to get '009876' from above text in Java. Is this possible ?

You can do it easily by regex. Below is an example:
import java.util.regex.*;
class Test {
public static void main(String[] args) {
String hello = "this is 009876 birthday of mine";
Pattern pattern = Pattern.compile("009876");
Matcher matcher = pattern.matcher(hello);
int count = 0;
while (matcher.find())
count++;
System.out.println(count); // prints 1
}
}
If you want to check if the text contains the source string (e.g. "009876") you can do it simply by contains method of String as shown in below example:
public static String search() {
// TODO Auto-generated method stub
String text = "this is 009876 birthday of mine";
String source = "009876";
if(text.contains(source))
return text;
else
return text;
}
Let me know if any issue.

You can do it like this:
class ExtractDesiredString{
static String extractedString;
public static void main(String[] args) {
String hello = "this is 009876 birthday of mine";
Pattern pattern = Pattern.compile("009876");
if (hello.contains(pattern.toString())) {
extractedString = pattern.toString();
}else{
Assert.fail("Given string doesn't contain desired text")
}
}
}

Replacing german umlauts generated by latex or bibtex tool in java?

I want to replace the german umlauts generated by a Citavi-Bibtex-Export-Tool. For example one reference string input is J{\"o}rgand I want Jörg as a result. After inspecting my JUnit-Test the result of my method was J{"o}rg - what went wrong?
public String replaceBibtexMutatedVowels(String str){
CharSequence target = "{\\\"o}";
CharSequence replacement = "ö";
str.replace(target, replacement);
return str;
}
UPDATE: Thanks guys - I was able to master german umlauts - unfortunately Bibtex escapes quotation marks with {\dg} - I was not able to create the corresponding java code.
String afterDg = "";
CharSequence targetDg = "{\\dg}";
CharSequence replacementDg = "\"";
afterDg = afterAe.replace(targetDg, replacementDg);
newStringInstance = afterDg;
return newStringInstance;

Basically, you are doing all right, but:
str.replace(target, replacement);
must be replaced with
str = str.replace(target, replacement);
because replace doesn't change the string itself, but returns a "replaced string".
P.S.: German has more special characters than "ö"; you're missing "ä", "ü" (and their corresponding capital letters), "ß" etc.
And here's my test code:
package test;
public class Test {
public static void main(String[] args) throws Exception {
String latexText = "J{\\\"o}rg";
String normalText = replaceBibtexMutatedVowels(latexText);
System.out.println(latexText);
System.out.println(normalText);
}
public static String replaceBibtexMutatedVowels(String str) {
CharSequence target = "{\\\"o}";
CharSequence replacement = "ö";
str = str.replace(target, replacement);
return str;
}
}

How to remove everything from HTML except special tag in java?

I want to parse HTML String by extracting only <form> ... </form>. All other stuff don't needed and I can remove it.
Today I have some helpers to remove through replaceAll special tag content like:
/** remove form */
String newString = string.replaceAll("(?s)<form.*?</form>", "");
(?s)<form.*?</form>
removes form tags. But I need vice versa, remove everything except form.
How can I fix it?
See my Gskinner example

Try below code.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Client {
private static final String PATTERN = "<form>(.+?)</form>";
private static final Pattern REGEX = Pattern.compile(PATTERN);
private static final boolean ONLY_TAG = true;
public static void main(String[] args) {
String text = "Hello <form><span><table>Hello Rais</table></span></form> end";
System.out.println(getValues(text, ONLY_TAG));
System.out.println(getValues(text, !ONLY_TAG));
}
private static String getValues(final String text, boolean flag) {
final Matcher matcher = REGEX.matcher(text);
String tagValues = null;
if (flag) {
if (matcher.find()) {
tagValues = "<form>" + matcher.group(1) + "</form>";
}
} else {
tagValues = text.replaceAll(PATTERN, "");
}
return tagValues;
}
}
You will get below output
<form><span><table>Hello Rais</table></span></form>
Hello end

The below code will give you a direction in what you are looking for:
String str = "<html><form>test form</form></html>";
String newString = str.replaceAll("[^<form</form>]+|((?s)<form.*?</form>)", "$1");
System.out.println(newString);

Android equivalent of vb.net StringValue.ToString("0000")

Is there an equivilent of vb.nets StringValue.ToString("0000") so that it returns the string as four numbers. I'm trying to work it on :
public static String getNextID(int stationID, String tablename) {
String rtnID;
rtnID = Integer.toString(stationID) + "-" + getNextID(tablename);
return rtnID;
}
So that the value of rtnID is 4 characters long and it's added 0's in the right place if needed
Tom
Edit: heres what I now have that isn't working:
public static String getNextID(int stationID, String tablename) {
String rtnID;
NumberFormat formatter = new DecimalFormat("0000");
String s = formatter.format(String.valueOf(stationID));
rtnID = s + "-" + getNextID(tablename);
return rtnID;
}
With this error: http://pastebin.com/XpvzkC5D

You can use this:
NumberFormat formatter = new DecimalFormat("0000");
String s = formatter.format(stationID);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Arabic to latin conversion failure? - java

Related

Convert String with special characters

Extract a specific word from a text in java

Replacing german umlauts generated by latex or bibtex tool in java?

How to remove everything from HTML except special tag in java?

Android equivalent of vb.net StringValue.ToString("0000")

Categories

Resources