Android OutputStreamWriter writes non-UTF-8 characters

Android OutputStreamWriter writes non-UTF-8 characters - java

There's something weird happening to my code. I'm trying to open a file and append a generated string (which is based on time) to it, so that it can be used later on. In my app, this happens more then once, but somehow, this time it doesn't give me the result I'd like to see. To clarify:
I'm using the following code inside my class:
try {
OutputStreamWriter oos = new OutputStreamWriter(context.openFileOutput(ARRANGED_TXT, Context.MODE_PRIVATE));
oos.write(ArrangedTxtAsString);
oos.close();
}
catch(FileNotFoundException e){
Log.e("File Error", "File not found: " + e.toString());
}
and ARRANGED_TXT being defined as:
private static final String ARRANGED_TXT = "arranged_txt.txt";
ArrangedTxtAsString is a string, which looks like this:
Banana
Pineapple
Orange
Coconut
Strawberry
It was made using a StringBuilder.
Now, the problem is, is that the generated file looks like this:
It's not a problem with NotePad++, as other programs give the same result. One important thing to mention, is that although the text looks like it is made up out of non-UTF-8 characters, it is perfectly processed by the rest of the program. This may sound a bit weird, but what I'm trying to tell, is that the code is working fine, but the only problem there is, is that the text shown through a text-editor, doensn't correspond with what it should look like. You see, I'm quite a perfectionist and that's why I want it to get fixed. This might also be a problem in the future (debugging puposes etc.).
Edit:
It appears that it doesn't matter what the string contains: a single character, a number, or whatever; it gives the exact same result. Even writing "" results in those weird characters!
I really hope someone knows what's happening here!
Thank you in advance.

If you need to write UTF-8 MUST use the constructor with charset like:
OutputStreamWriter oos = new OutputStreamWriter(context.openFileOutput(ARRANGED_TXT, Context.MODE_PRIVATE),"UTF-8");
OutputStreamWriter(OutputStream out, String charsetName)
Otherwise the default encoding is used and not is utf-8 in all systems

Related

Creating a text file with java without using absolute path

following the question I asked before How to have my java project to use some files without using their absolute path? I found the solution but another problem popped up in creating text files that I want to write into.here's my code:
private String pathProvider() throws Exception {
//finding the location where the jar file has been located
String jarPath=URLDecoder.decode(getClass().getProtectionDomain().getCodeSource().getLocation().getPath(), "UTF-8");
//creating the full and final path
String completePath=jarPath.substring(0,jarPath.lastIndexOf("/"))+File.separator+"Records.txt";
return completePath;
}
public void writeRecord() {
try(Formatter writer=new Formatter(new FileWriter(new File(pathProvider()),true))) {
writer.format("%s %s %s %s %s %s %s %s %n", whichIsChecked(),nameInput.getText(),lastNameInput.getText()
,idInput.getText(),fieldOfStudyInput.getText(),date.getSelectedItem().toString()
,month.getSelectedItem().toString(),year.getSelectedItem().toString());
successful();
} catch (Exception e) {
failure();
}
}
this works and creates the text file wherever the jar file is running from but my problem is that when the information is been written to the file, the numbers,symbols, and English characters are remained but other characters which are in Persian are turned into question marks. like: ????? 111 ????? ????.although running the app in eclipse doesn't make this problem,running the jar does.
Note:I found the code ,inside pathProvider method, in some person's question.

Your pasted code and the linked question are complete red herrings - they have nothing whatsoever to do with the error you ran into. Also, that protection domain stuff is a hack and you've been told before not to write data files next to your jar files, it's not how OSes (are supposed to) work. Use user.home for this.
There is nothing in this method that explains the question marks - the string, as returned, has plenty of issues (see above), but NOT that it will result in question marks in the output.
Files are fundamentally bytes. Strings are fundamentally characters. Therefore, when you write code that writes a string to a file, some code somewhere is converting chars to bytes.
Make sure the place where that happens includes a charset encoding.
Use the new API (I think you've also been told to do this, by me, in an earlier question of yours) which defaults to UTF-8. Alternatively, specify UTF-8 when you write. Note that the usage of UTF-8 here is about the file name, not the contents of it (as in, if you put persian symbols in the file name, it's not about persian symbols in the contents of the file / in the contents you want to write).
Because you didn't paste the code, I can't give you specific details as there are hundreds of ways to do this, and I do not know which one you used.
To write to a file given a String representing its path:
Path p = Paths.get(completePath);
Files.write("Hello, World!", p);
is all you need. This will write as UTF_8, which can handle persian symbols (because the Files API defaults to UTF-8 if you specify no encoding, unlike e.g. new File, FileOutputStream, FileWriter, etc).
If you're using outdated APIs: new BufferedWriter(new OutputStreamWriter(new FileOutputStream(thePath), StandardCharsets.UTF-8) - but note that this is a resource leak bug unless you add the appropriate try-with-resources.
If you're using FileWriter: FileWriter is broken, never use this class. Use something else.
If you're converting the string on its own, it's str.getBytes(StandardCharsets.UTF_8), not str.getBytes().

Writing strings with chars like "ñ" to a txt file

Im having a strange issue trying to write in text files with strings which contain characters like "ñ", "á".. and so on. Let me first show you my little piece of code:
import java.io.*;
public class test {
public static void main(String[] args) throws Exception {
String content = "whatever";
int c;
c = System.in.read();
content = content + (char)c;
FileWriter fw = new FileWriter("filename.txt");
BufferedWriter bw = new BufferedWriter(fw);
bw.write(content);
bw.close();
}
}
In this example, im just reading a char from the keyboard input and appending it to a given string; then writting the final string into a txt. The problem is that if I type an "ñ" for example (i have a Spanish layout keyboard), when i check the txt, it shows a strange char "¤" where there should be a "ñ", that is, the content of the file is "whatever¤". The same happens with "ç", "ú"..etc. However it writes it fine ("whateverñ") if i just forget about the keyboard input and i write:
...
String content = "whateverñ";
...
or
...
content = content + "ñ";
...
It makes me think that there might be something wrong with the read() method? Or maybe im using it wrongly? or should i use a different method to get the keyboard input? or..? Im a bit lost here.
(Im using the jdk 7u45 # Windows 7 Pro x64)

So ...
It works (i.e. you can read the accented characters on the output file) if you write them as literal strings.
It doesn't work when you read them from System.in and then write them.
This suggests that the problem is on the input side. Specifically, I think your console / keyboard must be using a character encoding for the input stream that does not match the encoding that Java thinks should be used.
You should be able to confirm this tentative diagnosis by outputting the characters you are reading in hexadecimal, and then checking the codes against the unicode tables (which you can find at unicode.org for example).
It strikes me as "odd" that the "platform default encoding" appears to be working on the output side, but not the input side. Maybe someone else can explain ... and offer a concrete suggestion for fixing it. My gut feeling is that the problem is in the way your keyboard is configured, not in Java or your application.

files do not remember their encoding format, when you look at a .txt, the text editor makes a "best guess" to the encoding used.
if you try to read the file into your program again, the text should be back to normal.
also, try printing the "strange" character directly.

How to change the Properties.store() divider symbol from "=" to ":"?

I recently found out about java.util.Properties, which allows me to write and read from a config without writing my own function for it.
I was excited since it is so easy to use, but later noticed a flaw when I stored the modified config file.
Here is my code, quite simple for now:
FileWriter writer = null;
Properties configFile = new Properties();
configFile.load(ReadFileTest.class.getClassLoader().getResourceAsStream("config.txt"));
String screenwidth = configFile.getProperty("screenwidth");
String screenheight = configFile.getProperty("screenheight");
System.out.println(screenwidth);
System.out.println(screenheight);
configFile.setProperty("screenwidth", "1024");
configFile.setProperty("screenheight", "600");
try {
writer = new FileWriter("config.txt" );
configFile.store(writer, null);
} catch (IOException e) {
e.printStackTrace();
}
writer.flush();
writer.close();
The problem I noticed was that the config file I try to edit is stored like this:
foo: bar
bar: foo
foobar: barfoo
However, the output after properties.store(writer, null) is this:
foo=bar
bar=foo
foobar=barfoo
The config file I edit is not for my program, it is for an other application that needs the config file to be in the format shown above with : as divider or else it will reset the configuration to default.
Does anybody know how to easily change this?
I searched through the first 5 Google pages now but found noone with a similar problem.
I also checked the Javadoc and found no function that allows me to change it without writing a class for myself.
I would like to use Properties for now since it is there and quite easy to use.
I also got the idea of just replacing all = with : after I saved the file but maybe someone got a better suggestion?

Don't use a tool that isn't designed for the task - don't use Properties here. Instead, I'd just write your own - should be easy enough.
You can still use a Properties instance as your "store", but don't use it for serializing the properties to text. Instead, just use a FileWriter, iterate through the properties, and write the lines yourself - as key + ": " + value.

New idea here
Your comment about converting the = to : got me thinking: Properties.store() writes to a Stream. You could use an in-memory ByteArrayOutputStream, convert as appropriate in memory before you write to a file, then write the file. Likewise for Properties.load(). Or you could insert FilterXXXs instead. (I'd probably do it in memory).
I was looking into how hard it would be to subclass. It's nearly impossible. :-(
If you look at the source code for Properties, (I'm looking at Java 6) store() calls store0(). Now, unfortunately, store0 is private, not protected, and the "=" is given as a magic constant, not something read from a property. And it calls another private method called saveConvert() that also has a lot of magic constants.
Overall, I rate this code as D- quality. It breaks almost all the rules of good code and good style.
But, it's open source, so, theoretically, you could copy and paste (and improve!) a bunch of code into your own BetterProperties class.

StringBuilders ending with mass nul characters

I'm having a very difficult time debugging a problem with an application I've been building. The problem itself I cannot seem to reproduce with a representitive test program with the same issue which makes it difficult to demonstrate. Unfortunately I cannot share my actual source because of security, however, the following test represents fairly well what I am doing, the fact that the files and data are unix style EOL, writing to a zip file with a PrintWriter, and the use of StringBuilders:
public class Tester {
public static void main(String[] args) {
// variables
File target = new File("TESTSAVE.zip");
PrintWriter printout1;
ZipOutputStream zipStream;
ZipEntry ent1;
StringBuilder testtext1 = new StringBuilder();
StringBuilder replacetext = new StringBuilder();
// ensure file replace
if (target.exists()) {
target.delete();
}
try {
// open the streams
zipStream = new ZipOutputStream(new FileOutputStream(target, true));
printout1 = new PrintWriter(zipStream);
ent1 = new ZipEntry("testfile.txt");
zipStream.putNextEntry(ent1);
// construct the data
for (int i = 0; i < 30; i++) {
testtext1.append("Testing 1 2 3 Many! \n");
}
replacetext.append("Testing 4 5 6 LOTS! \n");
replacetext.append("Testing 4 5 6 LOTS! \n");
// the replace operation
testtext1.replace(21, 42, replacetext.toString());
// write it
printout1 = new PrintWriter(zipStream);
printout1.println(testtext1);
// save it
printout1.flush();
zipStream.closeEntry();
printout1.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
The heart of the problem is that the file I see at my side is producing a file of 16.3k characters. My friend, whether he uses the app on his pc or whether he looks at exactly the same file as me sees a file of 19.999k characters, the extra characters being a CRLF followed by a massive number of null characters. No matter what application, encoding or views I use, I cannot at all see these nul characters, I only see a single LF at the last line, but I do see a file of 20k. In all cases there is a difference between what is seen with the exact same files on the two machines even though both are windows machines and both are using the same editing softwares to view.
I've not yet been able to reproduce this behaviour with any amount of dummy programs. I have been able to trace the final line's stray CRLF to my use of println on the PrintWriter, however. When I replaced the println(s) with print(s + '\n') the problem appeared to go away (the file size was 16.3k). However, when I returned the program to println(s), the problem does not appear to return. I'm currently having the files verified by a friend in france to see if the problem really did go away (since I cannot see the nuls but he can), but this behaviour has be thoroughly confused.
I've also noticed that the StringBuilder's replace function states "This sequence will be lengthened to accommodate the specified String if necessary". Given that the stringbuilders setLength function pads with nul characters and that the ensureCapacity function sets capacity to the greater of the input or (currentCapacity*2)+2, I suspected a relation somewhere. However, I have only once when testing with this idea been able to get a result that represented what I've seen, and have not been able to reproduce it since.
Does anyone have any idea what could be causing this error or at least have a suggestion on what direction to take the testing?
Edit since the comments section is broken for me:
Just to clarify, the output is required to be in unix format regardless of the OS, hence the use of '\n' directly rather than through a formatter. The original StringBuilder that is inserted into is not in fact generated to me but is the contents of a file read in by the program. I'm happy the reading process works, as the information in it is used heavily throughout the application. I've done a little probing too and found that directly prior to saving, the buffer IS the correct capacity and that the output when toString() is invoked is the correct length (i.e. it contains no null characters and is 16,363 long, not 19,999). This would put the cause of the error somewhere between generating the string and saving the zip file.

Finally found the cause. Managed to reproduce the problem a few times and traced the cause down not to the output side of the code but the input side. My file reading function was essentially this:
char[] buf;
int charcount = 0;
StringBuilder line = new StringBuilder(2048);
InputStreamReader reader = new InputStreamReader(stream);// provides a line-wise read
BufferedReader file = new BufferedReader(reader);
do { // capture loop
try {
buf = new char[2048];
charcount = file.read(buf, 0, 2048);
} catch (IOException e) {
return null; // unknown IO error
}
line.append(buf);
} while (charcount != -1);
// close and output
problem was appending a buffer that wasnt full, so the later values were still at their initial values of null. Reason I couldnt reproduce it was because some data filled in the buffers nicely, some didn't.
Why I couldn't seem to view the problem on my text editors I still have no idea of, but I should be able to resolve this now. Any suggestions on the best way to do so are welcome, as this is part of one of my long term utility libraries I want to keep it as generic and optimised as possible.

struts2 data cut in string send to jsp

i've got this problem again...
So i've got String data in my Struts2 app. this data is quite big, 36KB data read from html with code:
BufferedReader reader = new BufferedReader(new FileReader("FILE.html"));
String readData;
while( (readData = reader.readLine()) != null) {
fileData.append(new String(readData.getBytes(),"UTF-8"));
}
reader.close();
fileData.trimToSize();
this.data2display = fileData.toString();
this.setData2display(this.data2display.replaceAll("\\s+", " "));
I display data2display in my jsp file, with just:
<s:property value="data2display" escape="false" escapeJavaScript="false" />
Aaaaaand... This data is entire while i'm debugging controller, but while i try to display this in jsp. I've got only part of data. I haven't got any error/debug logs.
Any idea how to check it/fix it ?
My app: (struts2, jsp) everything is from appfuse-basic-struts archetype.

My personal start point would be the source of PropertyTag, and from there on follow the code.
In this case, start with PropertyTag. You see that it extends ComponentTagSupport, which in turn extends StrutsBodyTagSupport.
This is where it gets interesting; the toString method uses a FastByteArrayOutputStream which uses a default block size (buffer) of 8192 bytes. Using the default constructor, as done by StrutsBodyTagSupport you can't output a String with more data than that.
Being not an expert on Struts I hesitate to say that's an implementation bug; it should IMHO compute the buffer size from the value to be printed. Unfortunately, it doesn't. So I don't think there's an easy way around it.
The non-easy way is obviously defining a List of String data parts smaller than 8k bytes, and iterate over that list in the JSP, or just use c:out or something like that.
This may not be the answer you're looking for, but I hope this will at least help you understand the trouble you're in.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.