Display Hindi language in console using Java - java

StringBuffer contents=new StringBuffer();
BufferedReader input = new BufferedReader(new FileReader("/home/xyz/abc.txt"));
String line = null; //not declared within while loop
while (( line = input.readLine()) != null){
contents.append(line);
}
System.out.println(contents.toString());
File abc.txt contains
\u0905\u092d\u0940 \u0938\u092e\u092f \u0939\u0948 \u091c\u0928\u0924\u093e \u091c\u094b \u091a\u093e\u0939\u0924\u0940 \u0939\u0948 \u092
I want to dispaly in Hindi language in console using Java.
if i simply print like this
String str="\u0905\u092d\u0940 \u0938\u092e\u092f \u0939\u0948 \u091c\u0928\u0924\u093e \u091c\u094b \u091a\u093e\u0939\u0924\u0940 \u0939\u0948 \u092";
System.out.println(str);
then it works fine but when i try to read from a file it doesn't work.
help me out.

Use Apache Commons Lang.
import org.apache.commons.lang3.StringEscapeUtils;
// open the file as ASCII, read it into a string, then
String escapedStr; // = "\u0905\u092d\u0940 \u0938\u092e\u092f \u0939\u0948 ..."
// (to include such a string in a Java program you would have to double each \)
String hindiStr = StringEscapeUtils.unescapeJava( escapedStr );
System.out.println(hindiStr);
(Make sure your console is set up to display Hindi (correct fonts, etc) and the console's encoding matches your Java encoding. The Java code above is just the bare bones.)

You should store the contents in the file as UTF-8 encoded Hindi characters. For instance, in your case it would be अभी समय है जनता जो चाहती है. That is, instead of saving unicode escapes, directly save the raw Hindi characters. You can then simply read like normal.
You just have to make sure that the editor you use saves it using UTF-8 encoding. See Spanish language chars are not displayed properly?
Otherwise, you'll have to make the file a .properties file and read using java.util.Properties as it offers unicode unescaping support inherently.
Also read Reading unicode character in java

Related

Reading characters with UTF-8 standards from console in java

I want to read some Unicode characters from console (Farsi Characters).
I have used System.in but it didn't work. Looks like that Standard Input does not understand the characters I'm writing in the input so its just returns some mumbo jumbo to my String variable. I am absolutely sure that String variable's standard is set to "UTF-8". Believe me i doubled check.
Some pieces of code that I tried.
String t = new String (new Scanner(System.in).nextLine().getBytes() , "UTF-8");
didn't work.
byte b[] = new byte[4];
System.in.read(b);
String st = new String (b , "UTF-16");
System.out.println(st);
I wrote the above code for reading just one Farsi character. didn't work either.
First of all, the console must be in UTF-8 mode.
If using NetBeans, edit the file <NetBeansRoot>/etc/netbeans.conf.
Under netbeans_default_options, add -J-Dfile.encoding=UTF-8.
Once you're sure the console and your project encoding are set to UTF-8, try this:
Scanner console = new Scanner(new InputStreamReader(System.in, "UTF-8"));
while (console.hasNextLine())
System.out.println(console.nextLine());
Note: System.in is an InputStream, i.e. a stream of bytes, it produces the bytes from the console 1-to-1.
To read characters you need a Reader. A Reader takes an InputStream and an encoding, and produces characters.
If it doesn't help, try another console (e.g. Windows cmd, but first run chcp 65001).

Writing strings with chars like "ñ" to a txt file

Im having a strange issue trying to write in text files with strings which contain characters like "ñ", "á".. and so on. Let me first show you my little piece of code:
import java.io.*;
public class test {
public static void main(String[] args) throws Exception {
String content = "whatever";
int c;
c = System.in.read();
content = content + (char)c;
FileWriter fw = new FileWriter("filename.txt");
BufferedWriter bw = new BufferedWriter(fw);
bw.write(content);
bw.close();
}
}
In this example, im just reading a char from the keyboard input and appending it to a given string; then writting the final string into a txt. The problem is that if I type an "ñ" for example (i have a Spanish layout keyboard), when i check the txt, it shows a strange char "¤" where there should be a "ñ", that is, the content of the file is "whatever¤". The same happens with "ç", "ú"..etc. However it writes it fine ("whateverñ") if i just forget about the keyboard input and i write:
...
String content = "whateverñ";
...
or
...
content = content + "ñ";
...
It makes me think that there might be something wrong with the read() method? Or maybe im using it wrongly? or should i use a different method to get the keyboard input? or..? Im a bit lost here.
(Im using the jdk 7u45 # Windows 7 Pro x64)
So ...
It works (i.e. you can read the accented characters on the output file) if you write them as literal strings.
It doesn't work when you read them from System.in and then write them.
This suggests that the problem is on the input side. Specifically, I think your console / keyboard must be using a character encoding for the input stream that does not match the encoding that Java thinks should be used.
You should be able to confirm this tentative diagnosis by outputting the characters you are reading in hexadecimal, and then checking the codes against the unicode tables (which you can find at unicode.org for example).
It strikes me as "odd" that the "platform default encoding" appears to be working on the output side, but not the input side. Maybe someone else can explain ... and offer a concrete suggestion for fixing it. My gut feeling is that the problem is in the way your keyboard is configured, not in Java or your application.
files do not remember their encoding format, when you look at a .txt, the text editor makes a "best guess" to the encoding used.
if you try to read the file into your program again, the text should be back to normal.
also, try printing the "strange" character directly.

Error when reading non-English language character from file

I am building an app where users have to guess a secret word. I have *.txt files in assets folder. The problem is that words are in Albanian language. Our language uses letters like "ë" and "ç", so whenever I try to read from the file some word containing any of those characters I get some wicked symbol and I can not implement string.compare() for these characters. I have tried many options with UTF-8, changed Eclipse setting but still the same error.
I wold really appreciate if someone has got any advice.
The code I use to read the files is:
AssetManager am = getAssets();
strOpenFile = "fjalet.txt";
InputStream fins = am.open(strOpenFile);
reader = new BufferedReader(new InputStreamReader(fins));
ArrayList<String> stringList = new ArrayList<String>();
while ((aDataRow = reader.readLine()) != null) {
aBuffer += aDataRow + "\n";
stringList.add(aDataRow);
}
Otherwise the code works fine, except for mentioned characters
It seems pretty clear that the default encoding that is in force when you create the InputStreamReader does not match the file.
If the file you are trying to read is UTF-8, then this should work:
reader = new BufferedReader(new InputStreamReader(fins, "UTF-8"));
If the file is not UTF-8, then that won't work. Instead you should use the name of the file's true encoding. (My guess is that it is in ISO/IEC_8859-1 or ISO/IEC_8859-16.)
Once you have figured out what the file's encoding really is, you need to try to understand why it does not correspond to your Java platform's default encoding ... and then make a pragmatic decision on what to do about it. (Should you hard-wire the encoding into your application ... as above? Should you make it a configuration property or command parameter? Should you change the default encoding? Should you change the file?)
You need to determine the character encoding that was used when creating the file, and specify this encoding when reading it. If it's UTF-8, for example, use
reader = new BufferedReader(new InputStreamReader(fins, "UTF-8"));
or
reader = new BufferedReader(new InputStreamReader(fins, StandardCharsets.UTF_8));
if you're under Java 7.
Text editors like Notepad++ have good heuristics to guess what the encoding of a file is. Try opening it with such an editor and see which encoding it has guessed (if the characters appear correctly).
You should know encoding of the file.
InputStream class reads file binary. Although you can interpet input as character, it will be implicit guessing, which may be wrong.
InputStreamReader class converts binary to chars. But it should know character set.
You should use the following version to feed it by character set.
UPDATE
Don't suggest you have UTF-8 encoded file, which may be wrong. Here in Russia we have such encodings as CP866, WIN1251 and KOI8, which are all differ from UTF8. Probably you have some popular Albanian encoding of text files. Check your OS setting to guess.

Reading Arabic chars from text file

I had finished a project in which I read from a text file written with notepad.
The characters in my text file are in Arabic language,and the file encoding type is UTF-8.
When launching my project inside Netbeans(7.0.1) everything seemed to be ok,but when I built the project as a (.jar) file the characters where displayed in this way: ÇáãæÇÞÚááÊØæíÑ.
How could I solve This problem please?
Most likely you are using JVM default character encoding somewhere. If you are 100% sure your file is encoded using UTF-8, make sure you explicitly specify UTF-8 when reading as well. For example this piece of code is broken:
new FileReader("file.txt")
because it uses JVM default character encoding - which you might not have control over and apparently Netbeans uses UTF-8 while your operating system defines something different. Note that this makes FileReader class completely useless if you want your code to be portable.
Instead use the following code snippet:
new InputStreamReader(new FileInputStream("file.txt"), "UTF-8");
You are not providing your code, but this should give you a general impression how this should be implemented.
Maybe this example will help a little. I will try to print content of utf-8 file to IDE console and system console that is encoded in "Cp852".
My d:\data.txt contains ąźżćąś adsfasdf
Lets check this code
//I will read chars using utf-8 encoding
BufferedReader in = new BufferedReader(new InputStreamReader(
new FileInputStream("d:\\data.txt"), "utf-8"));
//and write to console using Cp852 encoding (works for my windows7 console)
PrintWriter out = new PrintWriter(new OutputStreamWriter(System.out,
"Cp852"),true); // "Cp852" is coding used in
// my console in Win7
// ok, lets read data from file
String line;
while ((line = in.readLine()) != null) {
// here I use IDE encoding
System.out.println(line);
// here I print data using Cp852 encoding
out.println(line);
}
When I run it in Eclipse output will be
ąźżćąś adsfasdf
Ą«ľ†Ą? adsfasdf
but output from system console will be

Java: how to write formatted output to plain text file

I am developing a small java application. At some point i am writing some data in a plain text file. Using the following code:
Writer Candidateoutput = null;
File Candidatefile = new File("Candidates.txt"),
Candidateoutput = new BufferedWriter(new FileWriter(Candidatefile));
Candidateoutput.write("\n Write this text on next line");
Candidateoutput.write("\t This is indented text");
Candidateoutput.close();
Now every thing goes fine, the file is created with the expected text. The only problem is that the text was not formatted all the text was on single line. But if I copy and paste the text in MS Word then the text is formatted automatically.
Is there any way to preserver text formatting in Plain text file as well?
Note: By text formatting I am referring to \n and \t only
Use System.getProperty("line.separator") for new lines - this is the platform-independent way of getting the new-line separator. (on windows it is \r\n, on linux it's \n)
Also, if this is going to be run on non-windows machines, avoid using \t - use X (four) spaces instead.
You can use line.separator system property to solve your issue.
E.g.
String separator = System.getProperty("line.separator");
Writer Candidateoutput = null;
File Candidatefile = new File("Candidates.txt"),
Candidateoutput = new BufferedWriter(new FileWriter(Candidatefile));
Candidateoutput.write(separator + " Write this text on next line");
Candidateoutput.write("\t This is indented text");
Candidateoutput.close();
line.separator system property is a platform independent way of getting a newline from your environment.
A PrintWriter does this platform independent - use the println() methods.
You would have to use the Java utility Formatter which can be found here: java.util.Formatter
Then all you would have to do is create an object of Formatter type such as this:
private Formatter output;
In this case, output will be the output file you are writing to.
Then you have to pass the file name to the output object like this:
output = new Formatter("name.of.your.file.txt")
Once that's done, you can either hard-code the file contents to your output file using the output.format command which is similar to the System.out.println or printf commands.
Or use the Scanner utility to input the data into memory then use output.format to output this data to the output object or file.
This is an example on how to write a record to output:
output.format( "%d %s %s %2f\n" , field1.decimal, field2.string, field3.string, field4.double)
There is a little bit more to it than this, but this sure beats parsing data, or using a bunch of complicated third party plugins.
To read this file you would redirect the Scanner utility to read a file instead of the console:
input = new Scanner(new File( "name.of.your.file.txt")
Window's Notepad needs \r\n to display a new-line correctly. Only \n is ignored by Notepad.
Well Windows expects a newline and a carriage return char to indicate a new line. So you'd want to do \r\n to make it work.

Categories

Resources