My desktop application (written in Java) encrypts a file for an Android application.
A string portion from the entire file:
..."A kerékpár (vagy bicikli) egy emberi erővel hajtott kétkerekű jármű. 19. századi kifejlesztése"...
read from a file:
FileInputStream is = new FileInputStream("cycles.txt");
StringBuilder sb = new StringBuilder();
Charset inputCharset = Charset.forName("ISO-8859-1");
BufferedReader br = new BufferedReader(new InputStreamReader(is, inputCharset));
String read;
while((read=br.readLine()) != null) {
sb.append(read);
}
After reading the entire file, I encrypt it:
String text = sb.toString();
Encrypter er = new Encrypter();
byte[] bEncrypt = er.encrypt(text);
After the encryption I encode it to base64:
bEncrypt = Base64.getEncoder().encode(bEncrypt);
text = new String(bEncrypt, "ISO-8859-1");
After this, the file is saved on my PC's disk:
File file = new File(System.getProperty("user.dir") + "/files/encr_cycles.txt");
try {
PrintWriter pr = new PrintWriter(new FileWriter(file));
pr.print(text);
pr.close();
} catch (IOException e) {
e.printStackTrace();
}
From the Android application I read the file:
BufferedReader reader = new BufferedReader(new InputStreamReader(getAssets().open("files/encr_cycles.txt"), "ISO-8859-1"));
// Read line by line
String mLine;
StringBuilder sb = new StringBuilder();
while ((mLine = reader.readLine()) != null) {
sb.append(mLine);
}
Decode and decrypt:
byte[] dec = Base64.decode(encryptedText, Base64.DEFAULT);
byte[] data= new Decipher().doFinal(dec);
String text= new String(data, "ISO-8859-1");
And the given text is:
"A kerékpár (vagy bicikli) egy emberi er?vel hajtott kétkerek? járm?. 19. századi kifejlesztése"
Note the "?" in the string? Some of the characters aren't decoded correctly.
Q1: What did I do wrong?
Q2: Am I using a wrong charset?
I changed the charset to "UTF-8" all over the applications (desktop & mobile). The problem was with the root file. The file wasn't saved in "UTF-8".
What did I done in eclipse:
Open the root file (.txt) in eclipse (drag & drop the file in the editor)
Insert your string or do some changes (blank chars) in the file (in my case the string which cannot be encoded)
Press save (CTLR + S), and a dialog will prompt to: SAVE AS UTF-8
Remove your line
Save again
The other solution is to save your file automatically while editing:
Window > Preferences > General > Content Types, set UTF-8 as the default
encoding for all content types.
Window > Preferences > General > Workspace, set "Text file encoding" to "Other : UTF-8".
Source: How to support UTF-8 encoding in Eclipse
Related
In my java web application when I upload a Zip file (thread dump), I get inputstream in servlet. I use the Zip4j library to unzip the file and then write it into a file. This zip file has multi encoded content (UTF-8, windows-1252, ISO-8859-1, ISO-8859-2, IBM424_rtl). When I open the output file, I see some characters like this Mac OS X 2 € ² ATTR ² ˜
Here is a sample code. Can you please let me know how can I fix this issue?
// Using Zip4j library to uncompress ZIP format
ZipInputStream zis = new ZipInputStream(iStream);
FileOutputStream zos = new FileOutputStream("output_file.txt");
ByteArrayOutputStream out = new ByteArrayOutputStream();
LocalFileHeader localFileHeader = zis.getNextEntry();
while (localFileHeader != null) {
if(localFileHeader.isDirectory()) {
localFileHeader = zis.getNextEntry();
continue;
}
IOUtils.copy(zis, out);
localFileHeader = zis.getNextEntry();
}
InputStreamReader isr = new InputStreamReader(new ByteArrayInputStream(out.toByteArray()));
BufferedReader reader = new BufferedReader(isr);
String str;
while ((str = reader.readLine()) != null) {
// This is a custom method that will return the charset of the input string using apache tikka library
String encoding = CharsetDetector.detectCharset(str);
zos.write(str.getBytes(encoding));
zos.write("\n".getBytes());
}
isr.close();
reader.close();
zos.close();
zis.close();
// Method is used to detect charset
public static String detectCharset(String text) throws IOException {
org.apache.tika.parser.txt.CharsetDetector detector = new org.apache.tika.parser.txt.CharsetDetector();
detector.setText(text.getBytes());
String charset = detector.detect().getName();
return charset;
}
Note: I am running application on windows machine.
Thanks in advance!
I have an Android Aplication that reads a file with SQL script to insert data into a SQLite DB.
However I need to know the exatly encoding of this file, I have an EditText that reads information from SQLite, and if the encoding is not right, it'll be shown as invalid characters like "?" instead of characters like "ç, í, ã".
I have the following code:
FileInputStream fIn = new FileInputStream(myFile);
BufferedReader myReader = new BufferedReader(new InputStreamReader(fIn, "ISO-8859-1"));
String aDataRow;
while ((aDataRow = myReader.readLine()) != null) {
if(!aDataRow.isEmpty()){
String[] querys = aDataRow.split(";");
Collections.addAll(querysParaExecutar, querys);
}
}
myReader.close();
this works for "ISO-8859-1" encoding, and works for UTF-8 if I set to "UTF-8" as a charset. I need to programatically detect the charset encoding (UTF-8 or ISO-8859-1) and apply the correct one to my code.
Is there a simple way to do that?
I resolved the problem with the lib universal chardet.
It's working fine as expected.
FileInputStream fIn = new FileInputStream(myFile);
byte[] buf = new byte[4096];
UniversalDetector detector = new UniversalDetector(null);
int nread;
while ((nread = fIn.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
detector.dataEnd();
String encoding = detector.getDetectedCharset();
String chartsetName = null;
if (encoding.equalsIgnoreCase("WINDOWS-1252")){
chartsetName = "ISO-8859-1";
}
if (encoding.equalsIgnoreCase("UTF-8")){
chartsetName = "UTF-8";
}
BufferedReader myReader = new BufferedReader(new InputStreamReader(fIn, chartsetName));
So, I am developing android application that read JSON text file containing some data. I have a 300 kb (307,312 bytes) JSON in a text file (here). I also develop desktop application (cpp) to generate and loading (and parsing) the JSON text file.
When I try to open and read it using ifstream in c++, I get the string length correctly (307,312). I even succesfully parse it.
Here is my code in C++:
std::string json = "";
std::string line;
std::ifstream myfile(textfile.txt);
if(myfile.is_open()){
while(std::getline(myfile, line)){
json += line;
json.push_back('\n');
}
json.pop_back(); // pop back the last '\n'
myfile.close();
}else{
std::cout << "Unable to open file";
}
In my android application, I put my JSON text file in res/raw folder. When I try to open and read using InputStream, the length of the string only 291,896. And I can't parse it (I parse it using jni with the same c++ code, maybe it is not important).
InputStream is = getResources().openRawResource(R.raw.textfile);
byte[] b = new byte[is.available()];
is.read(b);
in_str = new String(b);
UPDATE:
I also have try using this way.
InputStream is = getResources().openRawResource(R.raw.textfile);
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
String line = reader.readLine();
while(line != null){
in_str += line;
in_str += '\n';
line = reader.readLine();
}
if (in_str != null && in_str.length() > 0) {
in_str = in_str.substring(0, in_str.length()-1);
}
Even, I tried moving it from res/raw folder to assets folder in java android project. And of course I change the InputStream line to InputStream is = getAssets().open("textfile.txt"). Still not working.
Okay, I found the solution. It is the ASCII and UTF-8 problem.
From here:
UTF-8 Variable length encoding, 1-4 bytes per code point. ASCII values are encoded as ASCII using 1 byte.
ASCII Single byte encoding
My filesize is 307,312 bytes and basically I need to take the character each byte. So, I should need to encode the file as ASCII.
When I am using C++ ifstream, the string size is 307,312. (same as of the number character if it is using ASCII encoding)
Meanwhile, when I am using Java InputStream, the string size is 291,896. I assume that it happens because of the reader is using UTF-8 encoding instead.
So, how to use get ASCII encoding in Java?
Through this thread and this article, we can use InputStreamReader in Java and set it to ASCII. Here is my complete code:
String in_str = "";
try{
InputStream is = getResources().openRawResource(R.raw.textfile);
BufferedReader reader = new BufferedReader(new InputStreamReader(is, "ASCII"));
String line = reader.readLine();
while(line != null){
in_str += line;
in_str += '\n';
line = reader.readLine();
}
if (in_str != null && in_str.length() > 0) {
in_str = in_str.substring(0, in_str.length()-1);
}
}catch(Exception e){
e.printStackTrace();
}
If you have the same problem, hope this helps. Cheers.
I create a Programm which can load local or remote log files.
If i load a local file there is no error.
But if I copy first the file with SCP to my local (where i use this code: http://www.jcraft.com/jsch/examples/ScpFrom.java.html) and read it out I get an Error and the letters "ü/ä/ö" shown as �.
How can i fix this ?
Remote : Linux-Server
Local: Windows-PC
Code for SCP :
http://www.jcraft.com/jsch/examples/ScpFrom.java.html
Code for reading out :
protected void openTempRemoteFile() throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream( lfile )));
String strLine;
DefaultTableModel dtm = new DefaultTableModel(0, 0);
String header[] = new String[]{ "Timestamp", "Session-ID", "Log" };
dtm.setColumnIdentifiers(header);
table.setModel(dtm);
while ((strLine = reader.readLine()) != null) {
String[] sparts = strLine.split(" ");
String[] bparts = strLine.split(" : ");
String Timestamp = sparts[0] + " " + sparts[1];
String SessionID = sparts[4];
String Log = bparts[1];
dtm.addRow(new Object[] {Timestamp, SessionID, Log});
}
reader.close();
}
EDIT :
Encoding Format of the Local-Files: UTF-8
Encoding Format of the SCP-Remote-Files from Linux-Server: WINDOWS-1252
Supply appropriate Charset to InputStreamReader constructor, e.g.:
import java.nio.charset.StandardCharsets;
...
BufferedReader reader = new BufferedReader(
new InputStreamReader(
new FileInputStream( lfile ),
StandardCharsets.UTF_8)); // try also ISO_8859_1 if UTF_8 doesn't help.
To fix your problem you have at least two options:
You can specify the encoding for your files directly in your code, updating it as follow:
BufferedReader reader = new BufferedReader(
new InputStreamReader(
new FileInputStream( lfile ),
"UTF8"
)
);
or set the default file encoding when starting the JVM with:
java -Dfile.encoding=UTF-8 … com.example.Main
I definitely prefer the first way and you can parametrize the "UTF8" value too, if you need.
With the latter way you could still face the same issues if you forgot to specify that.
You can replace the encoding with whatever you prefer (Refer to https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html for Supported Encodings) and, on Windows, "Cp1252" is usually the default encoding.
Remember, you can always use query the file.encoding property or Charset.defaultCharset() to find the current default encoding for your application, eg:
byte [] byteArray = {'blablabla'};
InputStream inputStream = new ByteArrayInputStream(byteArray);
InputStreamReader reader = new InputStreamReader(inputStream);
String defaultEncoding = reader.getEncoding();
Working with encoding is very tricky thing. If your system always uses this kind of files (from different environment) than you should first detect the charset than read it with given charset. I had similar problem and i used
juniversalchardet
to detect charset and used InputStreamReader(stream, Charset).
In your case it would be like
protected void openTempRemoteFile() throws IOException {
String encoding = UniversalDetector.detectCharset(lfile);
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream( lfile ), Charset.forName(encoding)));
....
If it is only one time job than open it in text editor (notapad++ for example) than save it in your encoding. Than use it in program.
I am trying to save a .txt file since a JAVA code, in a Windows 7 machine, and it encodes the code in ANSI, but when I do the same in a Windows Server 2000 the code is saved in UTF.
I am doing different testings and I checked that the encoding is changing when I run the code each time in Windows Server 2000 without changes on the code.
I´m saving the file in a zip file and the code is the next (I have changed "Cp1252" by "ISO-8859-1" but the result is the same):
public byte[] getBytesZipFile(String nombreFichero, String input) throws IOException {
String tempdir = System.getProperty("java.io.tmpdir");
if (!(tempdir.endsWith("/") || tempdir.endsWith("\\"))) {
tempdir = tempdir + System.getProperty("file.separator");
}
File tempFile = new File(tempdir + nombreFichero + ".txt");
try {
BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(tempFile), "Cp1252"));
bufferedWriter.write(input);
bufferedWriter.flush();
bufferedWriter.close();
ByteArrayOutputStream byteArrayOutputStreambos = new ByteArrayOutputStream();
ZipOutputStream zipOutputStream = new ZipOutputStream(byteArrayOutputStreambos);
FileInputStream fileInputStream = new FileInputStream(tempFile);
zipOutputStream.putNextEntry(new ZipEntry(tempFile.getName()));
byte[] buf = new byte[1024];
int len;
while ((len = fileInputStream.read(buf)) > 0) {
zipOutputStream.write(buf, 0, len);
}
zipOutputStream.closeEntry();
fileInputStream.close();
zipOutputStream.flush();
zipOutputStream.close();
return byteArrayOutputStreambos.toByteArray();
} finally {
tempFile.delete();
}
}
Thanks by the help and answers and regards
It is because of the default encoding of the JVM.
Check this question for how to change the default encoding: Setting the default Java character encoding?
And check this external articel for setting the encoding of your specific file: http://www.mkyong.com/java/how-to-write-utf-8-encoded-data-into-a-file-java/