Load Html file with UTF8 encoding from assets into a TextView - java

I have HTML file in assets folder which is encoded in UTF8(contain Persian characters), I want to read this file and load it into a TextView.I read lots of posts like load utf-8 text file , load HTML file into TextView , read UTF8 text file from res/raw and write this code:
try{
InputStream inputStream = getResources().getAssets().open("htmls/salamati.html");
// I also try "UTF-8" but none of them worked
BufferedReader r = new BufferedReader(new InputStreamReader(inputStream,"UTF8"));
StringBuilder total = new StringBuilder();
String html;
while ((html = r.readLine()) != null) {
total.append(html);
}
// total contains incorrect characters
textView.setText(Html.fromHtml(total.toString()));
}
catch (IOException exception)
{
textView.setText("Failed loading HTML.");
}
But It show incorrect characters!
I also try to convert total.toString() into a UTF8 String array and then add it to textView but it didn't work too
textView.setText(Html.fromHtml(new String(total.toString().getBytes("ISO-8859-1"), "UTF-8")));
There is no problem with textView or emulator because when I load HTML from Database, It shows utf8 characters correctly!
So what should I do?

After lots of searching and test some other codes,at the end I replace my HTML file with another one.Surprisingly my code works fine! I investigate former HTML file and notice that it has Unicode encoding!!!
So if you have a same problem, first of all check your file's encoding and make sure that it is correct.

Related

Created HTML file isn't recognized by my device

I create a test html file on an Android (7.0) device from string content. File is created fine, shows up with right extension and icon, but format isn't recognized when tapped, giving message "file format is not supported". Yet, if I save this same file to PC and transfer back to device, the issue disappears. It then shows app choices to open html, as it should be.
Tried several write methods. In all cases file was created and content looked right, but format wasn't recognized rightaway. Can't figure out why, is there an extra step or written data required for this? The latest one was with BufferedWriter (to ensure UTF-8) as below:
final File file = new File(path, name + file_extension);
StringBuilder strBuilder = new StringBuilder();
strBuilder.append("test");
strBuilder.insert(0, "<html>"+"\r\n"+"<body><p>"+"\r\n");
strBuilder.insert(strBuilder.length(), "\r\n"+"</p></body>"+"\r\n"+ "</html>");
String html_content=strBuilder.toString();
try
{
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8));
bw.write(html_content);
bw.close();
} catch (IOException e)
{
e.printStackTrace();
Log.e("Exception", "File write failed: " + e.toString());
}
Try to add this before your HTML Tags:
<!DOCTYPE html>
and delete the StringBuilder.append("test");

Convert XML-File to string without manipulation or optimization in Java

I have some trouble with the JDOM2 whitch i use to work with XML files.
I want to convert the XML file to a string without any manipulation or optimization.
Thats my Java code to do that:
SAXBuilder builder = new SAXBuilder();
File xmlFile = f;
try
{
Document document = (Document) builder.build(xmlFile);
xml = new XMLOutputter().outputString(document);
} catch (Exception e) {
System.out.println(e.getMessage());
}
return xml;
But when I compare my string with the original XML file I notice some changes.
The original:
<?xml version="1.0" encoding="windows-1252"?>
<xmi:XMI xmi:version="2.1" xmlns:uml="http://schema.omg.org/spec/UML/2.0" xmlns:xmi="http://schema.omg.org/spec/XMI/2.1" xmlns:thecustomprofile="http://www.sparxsystems.com/profiles/thecustomprofile/1.0" xmlns:SoaML="http://www.sparxsystems.com/profiles/SoaML/1.0">
And the string:
<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmlns:xmi="http://schema.omg.org/spec/XMI/2.1" xmlns:SoaML="http://www.sparxsystems.com/profiles/SoaML/1.0" xmlns:thecustomprofile="http://www.sparxsystems.com/profiles/thecustomprofile/1.0" xmlns:uml="http://schema.omg.org/spec/UML/2.0" xmi:version="2.1">
And all umlauts (ä, ö , ü) are changed too. I will get something like that: '�' instead of 'ä'.
Is there any way to stop that behaviore?
Firstly, as others have stated, you shouldn't use any XML processing. Just read the file as a text file.
Secondly, your umlaut characters showing up as '�' is due to an incorrect charset (encoding) being used. The charset error may be in your code, or it may be the XML file.
The original XML file contains encoding="windows-1252", but it's unusual for XML to be encoded in anything other than UTF-8, so I suspect the file is really a UTF-8 file and the encoding it claims to use is not correct.
Try forcing UTF-8 when reading the file. It's good practice, regardless, to specify the charset when converting bytes to text:
String xml = new String(
Files.readAllBytes(xmlFile.toPath(), StandardCharsets.UTF_8));
try this :
String xmlToString=FileUtils.readFileToString(new File("/file/path/file.xml"));
You need to have Commons-io jar for this.
See if this works for you.
//filename is filepath string
BufferedReader br = new BufferedReader(new FileReader(new File(filename)));
String line;
StringBuilder sb = new StringBuilder();
while((line=br.readLine())!= null){
sb.append(line.trim());
}

Japanese character not showing properly converting CSV file

I am converting CSV file from Tatoeba project. It contains Japanese characters. I am inserting data into SQLite database. Insertion is going without a problem, but characters are showing not properly.
If I insert directly:
String str = content_parts[2];
sentence.setValue(str);
Getting values like this:
ãã¿ã«ã¡ãã£ã¨ãããã®ããã£ã¦ãããã
I have tried to decode to UTF8 from JIS:
String str = content_parts[2];
byte[] utf8EncodedBytes = str.getBytes("JIS");
String s = new String(utf8EncodedBytes, "UTF-8");
sentence.setValue(s);
JIS:
$B!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!r!)!)!/!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!r!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)(B
Shift-JIS:
????\??????�N?�}??????????????????��?????�N?�N???��??????
Shift_JIS:
????\????????????????????????��?�N??????????????????��??????
CSV file (when opened by Excel 2010)
n きみにちょっとしたものをもってきたよ。
What I am doing wrong? How to solve this problem?
If you are still searching for solution, refer below link
setting-a-utf-8-in-java-and-csv-file and handle Japanese characters
csv-reports-not-displaying-japanese-characters
In brief, add BOM(byte order mark) characters to your file outputstream before passing it to outputstream writer.
String content="some string to write in file(in any language)";
FileOutputStream fos = new FileOutputStream("D:\csvFile.csv");
fos.write(239);
fos.write(187);
fos.write(191);
Writer w = new BufferedWriter(new OutputStreamWriter(fos, StandardCharsets.UTF_8));
w.write(content);
w.close();
Hope this will help

java file reader weird output

I set up a FileReader, and opened a file to read, but it gives me a weird output, that I can't seem to fix:
import java.io.BufferedReader;
import java.io.FileReader;
public class FileReading {
public static void main(String [] args) throws Exception {
FileReader file = new FileReader("/Users/danielpersonius/Desktop/test.rtf");
BufferedReader reader = new BufferedReader(file);
String text = "";
String line = reader.readLine();
while (line != null){
// So here, we want to print until it reaches 'null'
text += line;
line = reader.readLine();
}
System.out.println(text);
}
}
This is my output:
{\rtf1\ansi\ansicpg1252\cocoartf1265\cocoasubrtf200{\fonttbl\f0\fswiss\fcharset0
Helvetica;}{\colortbl;\red255\green255\blue255;}\margl1440\margr1440\vieww10800\viewh8400\viewkind0\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\f0\fs24
\cf0 TEST}
TEST is what the rtf file says, but how do I get rid of all the other stuff I obviously don't want?
I'm on an IMac with OS X Mavericks
The problem is that your are probably creating your file in TextEdit. TextEdit does not save files as a raw text file. Instead it saves it in a RTF (Rich Text File) format which embeds formating commands. You need to use a text editor that can create an ASCII text file.
For more information on RTF.
Just use the same editor you use to write your code to create your "test.*" file :)
On your test file go to format in the toolbar, then click convert to .txt file for conversion.
This change would get rid of the weird output

Broken Text : reading larger size text in android

i have a question about Broken text when android app is reading large size text file.
I am trying to build the app to read large size text file(about 10mb)
when I am reading a file and using System.println to check the contents of text file
However, when I display message but print statement
it displays broken text such as..
��T��h��e�� ��P��r��o��j��e��c��t�� ��G��u
when I was reading small size of rtf was find, but i used text file then i made problems
I used code like ..
String UTF8 = "utf8";
int BUFFER_SIZE = 8192;
File gone = new File(path);
FileInputStream inputStream = new FileInputStream(gone);
// FileInputStream inputStream = openFileInput(gone);
if ( inputStream != null ) {
InputStreamReader inputStreamReader = new InputStreamReader(inputStream,UTF8);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader, BUFFER_SIZE);
String receiveString = "";
StringBuilder stringBuilder = new StringBuilder();
while ( (receiveString = bufferedReader.readLine()) != null ) {
stringBuilder.append(receiveString);
}
inputStream.close();
ret = stringBuilder.toString();
System.out.println(ret);
}
I was thinking about that it can be problem of encoding. there fore i added utf8 option.
However, it still doesn't work ..
Does anyone know solution of broken text ?
UPDATE:
I think, I solved problem.
I create new text file from window text editor and then i copy and paste content.
Now , it is reading file correctly
It may be wrong encoding for the given file, may be the file does not contain text, may be console does not support the characters.
Besides the code is too long, here's a one line solution
String s = new String(Files.readAllBytes(Paths.get(file)), "UTF-8");
The file may contain images or unsupported format, in that case it'll display like that.

Categories

Resources