Japanese character not showing properly converting CSV file

Japanese character not showing properly converting CSV file - java

I am converting CSV file from Tatoeba project. It contains Japanese characters. I am inserting data into SQLite database. Insertion is going without a problem, but characters are showing not properly.
If I insert directly:
String str = content_parts[2];
sentence.setValue(str);
Getting values like this:
ãã¿ã«ã¡ãã£ã¨ãããã®ããã£ã¦ãããã
I have tried to decode to UTF8 from JIS:
String str = content_parts[2];
byte[] utf8EncodedBytes = str.getBytes("JIS");
String s = new String(utf8EncodedBytes, "UTF-8");
sentence.setValue(s);
JIS:
$B!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!r!)!)!/!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!r!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)(B
Shift-JIS:
????\??????�N?�}??????????????????��?????�N?�N???��??????
Shift_JIS:
????\????????????????????????��?�N??????????????????��??????
CSV file (when opened by Excel 2010)
n гЃЌгЃїгЃ«гЃЎг‚‡гЃЈгЃЁгЃ—гЃџг‚‚гЃ®г‚’г‚‚гЃЈгЃ¦гЃЌгЃџг‚€гЂ‚
What I am doing wrong? How to solve this problem?

If you are still searching for solution, refer below link
setting-a-utf-8-in-java-and-csv-file and handle Japanese characters
csv-reports-not-displaying-japanese-characters
In brief, add BOM(byte order mark) characters to your file outputstream before passing it to outputstream writer.
String content="some string to write in file(in any language)";
FileOutputStream fos = new FileOutputStream("D:\csvFile.csv");
fos.write(239);
fos.write(187);
fos.write(191);
Writer w = new BufferedWriter(new OutputStreamWriter(fos, StandardCharsets.UTF_8));
w.write(content);
w.close();
Hope this will help

Related

how to rewrite csv file in java without double quotes

original csv
reformatted csv
i have crawled data that i want from web and saved them as CSV as above.
However, when i tried to exclude the data that i do not want that to be included on my data set and tried to save it again, i get all those unwanted quotation mark(like """) through all columns on my data
How do i fix this problem??
(I want to make all the column names to have same format as above to have just "column name"(one quotation mark))
BufferedReader br = Files.newBufferedReader(Paths.get("FilePath"));
CSVWriter cw = new CSVWriter(new OutputStreamWriter(new FileOutputStream("FileName.csv", true), StandardCharsets.UTF_8));
String val="";
while((val=br.readLine())!=null){
if(!val.contains("some keywords")){
continue;
}
cw.writeNext(new String[]{val});
}
cw.flush();

Name containing character like "Â" is getting written with filewriter to csv. How to remove them with java?

When I create a csv file through java then for name "Men's Commemorative ® ELITE Bib Short", it is storing "Men's Commemorative Â® ELITE Bib Short" in csv. So I have to remove "Â" from csv file through java file.
public boolean writeProductsToFile()
{
final List<ProductModel> products = getListrakDao().getProducts();
final String filePath = getFilePath() + getProductFileName();
final File file = new File(filePath);
FileWriter writer = null;
writer = new FileWriter(file);
for (final ProductModel productModel : products)
{
productData.append(StringEscapeUtils.unescapeHtml("\"" + productModel.getName() + "\""));
productData.append(getFieldSeparator());
writer.write(productData.toString());
}
}
This is my code...where "baseProduct.getName()" is fetching name of product.
In database product name is "Men's Commemorative ® ELITE Bib Short". But in csv it is getting written as "Men's Commemorative Â® ELITE Bib Short". So how can I remove characters like "Â". So tha#t name in csv should be like exactly in database.

To a degree, this is a shot in the dark, but...
As a general practice, try to be explicit with the character sets you use.
Instead of
FileWriter writer = null;
writer = new FileWriter(file);
write
final Charset utf8 = java.nio.charset.StandardCharsets.UTF_8;
final Writer writer = new OutputStreamWriter(new FileOutputStream(file), utf8);
(imports left out for brevity, StandardCharsets requires Java 7 or later)
This allows actively controlling the used charset when writing. If not set, the system uses the default charset, which may not be appropriate. If UTF-8 is not what you desire, try something else, like ISO_8859_1.
When you read your CSV file, make sure the reader/editor you use supports the used charset and uses it. Otherwise, you'll see strange characters, much like you did.

Load Html file with UTF8 encoding from assets into a TextView

I have HTML file in assets folder which is encoded in UTF8(contain Persian characters), I want to read this file and load it into a TextView.I read lots of posts like load utf-8 text file , load HTML file into TextView , read UTF8 text file from res/raw and write this code:
try{
InputStream inputStream = getResources().getAssets().open("htmls/salamati.html");
// I also try "UTF-8" but none of them worked
BufferedReader r = new BufferedReader(new InputStreamReader(inputStream,"UTF8"));
StringBuilder total = new StringBuilder();
String html;
while ((html = r.readLine()) != null) {
total.append(html);
}
// total contains incorrect characters
textView.setText(Html.fromHtml(total.toString()));
}
catch (IOException exception)
{
textView.setText("Failed loading HTML.");
}
But It show incorrect characters!
I also try to convert total.toString() into a UTF8 String array and then add it to textView but it didn't work too
textView.setText(Html.fromHtml(new String(total.toString().getBytes("ISO-8859-1"), "UTF-8")));
There is no problem with textView or emulator because when I load HTML from Database, It shows utf8 characters correctly!
So what should I do?

After lots of searching and test some other codes,at the end I replace my HTML file with another one.Surprisingly my code works fine! I investigate former HTML file and notice that it has Unicode encoding!!!
So if you have a same problem, first of all check your file's encoding and make sure that it is correct.

Broken Text : reading larger size text in android

i have a question about Broken text when android app is reading large size text file.
I am trying to build the app to read large size text file(about 10mb)
when I am reading a file and using System.println to check the contents of text file
However, when I display message but print statement
it displays broken text such as..
��T��h��e�� ��P��r��o��j��e��c��t�� ��G��u
when I was reading small size of rtf was find, but i used text file then i made problems
I used code like ..
String UTF8 = "utf8";
int BUFFER_SIZE = 8192;
File gone = new File(path);
FileInputStream inputStream = new FileInputStream(gone);
// FileInputStream inputStream = openFileInput(gone);
if ( inputStream != null ) {
InputStreamReader inputStreamReader = new InputStreamReader(inputStream,UTF8);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader, BUFFER_SIZE);
String receiveString = "";
StringBuilder stringBuilder = new StringBuilder();
while ( (receiveString = bufferedReader.readLine()) != null ) {
stringBuilder.append(receiveString);
}
inputStream.close();
ret = stringBuilder.toString();
System.out.println(ret);
}
I was thinking about that it can be problem of encoding. there fore i added utf8 option.
However, it still doesn't work ..
Does anyone know solution of broken text ?
UPDATE:
I think, I solved problem.
I create new text file from window text editor and then i copy and paste content.
Now , it is reading file correctly

It may be wrong encoding for the given file, may be the file does not contain text, may be console does not support the characters.
Besides the code is too long, here's a one line solution
String s = new String(Files.readAllBytes(Paths.get(file)), "UTF-8");

The file may contain images or unsupported format, in that case it'll display like that.

Java csv file unable to write string like 012365479

Hi write a java code to write the output into a csv file. This is the sample code:
File downloadPlace = new File(realContextPathFile, "general");
File gtwayDestRateFile = new File(downloadPlace, (new StringBuilder("ConnectionReport")).append(System.currentTimeMillis()).append(".csv").toString());
PrintWriter pw = new PrintWriter(new FileWriter(gtwayDestRateFile));
pw.print("Operator name,");
pw.print("Telephone Number,");
pw.print("Op1");
pw.print("012365479");
pw.print("Op2");
pw.print("09746");
pw.close();
p_response.setContentType("application/octet-stream");
p_response.setHeader("Content-Disposition", (new StringBuilder("attachment; filename=\"")).append(gtwayDestRateFile.getName()).append("\"").toString());
FileInputStream fis = new FileInputStream(gtwayDestRateFile);
byte buf[] = new byte[4096];
ServletOutputStream out = p_response.getOutputStream();
do
{
int n = fis.read(buf);
if(n == -1)
break;
out.write(buf, 0, n);
} while(true);
fis.close();
out.flush();
In both case the output is like this: 12365479 instead of 012365479
And 9746 instead of 09746
Can anyone tell me how can i solve this problem?

Are you sure that the file is written wrongly, and you're not just opening it in Excel which is interpreting these as numbers and thus losing the leading zeroes? Try opening it in a text editor.

If you write to System.out instead you get
Operator name,Telephone Number,Op1012365479Op209746
As you can see the 0 is where you would expect. Perhaps the problem is you don't have , between fields.
If you open such a file using excel it will remove leading 0 as it assume its a number. To avoid this you need to use double quotes around the field so it is treated as text.

Read the file in a text editor, my guess is that it has the zero and what's reading it is thinking it's a number. Try putting quotes round it.
pw.print("\"012365479\"");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Japanese character not showing properly converting CSV file - java

Related

how to rewrite csv file in java without double quotes

Name containing character like "Â" is getting written with filewriter to csv. How to remove them with java?

Load Html file with UTF8 encoding from assets into a TextView

Broken Text : reading larger size text in android

Java csv file unable to write string like 012365479

Categories

Resources