Charset conversion Java - java

Hi
I have a scenario where I need to convert the default Charset should be overridden bu UTF-8. I am using below class. But I am not getting the expected output. Because I use a unix system that has default UTF-8 as charset and I compare the results there. Am I wrong somewhere in this program?
public class CharsetDisplay {
public static void main(String[] args) {
System.out.println(Charset.defaultCharset().name());
System.out.println(Charset.isSupported("UTF-8"));
final Charset UTF8_CHARSET = Charset.forName("UTF-8");
try {
byte[] byteArray = new byte[] {34,34,0};
String str = new String(byteArray,UTF8_CHARSET);
System.out.println("String*** "+str);
System.out.println("String to Hex *** "+stringToHex(str));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Prints output as
windows-1252
true
String*** ""
Note after "" in the string output I have a spl char, which I don't get in a unix env

What do you expect the zero byte to render as in this environment? Your output looks exactly correct to me.
Don't forget that any differences that you encounter between environments might not be down to Java. If you're invoking your Java program from a console (which I expect you are), it's up to the console to actually convert the program's output to what you see on the screen. So depending on the charset the console is using, it's entirely possible for Java to output the characters that you expect, but for the console to fail to render them properly.

If Java doesn't pick up your locale's encoding properly you may have to tell it explicitly, at the command-line:
java -Dfile.encoding=utf-8 CharsetDisplay

Related

Reading and writing file in ISO-8859-1 encoding?

I have file encoded in ISO-8859-1. I'm trying to read it in as a single String, do some regex substitutions on it, and write it back out in the same encoding.
However, the resulting file I get always seems to be UTF-8 (according to Notepad++ at least), mangling some characters.
Can anyone see what I'm doing wrong here?
private static void editFile(File source, File target) {
// Source and target encoding
Charset iso88591charset = Charset.forName("ISO-8859-1");
// Read the file as a single string
String fileContent = null;
try (Scanner scanner = new Scanner(source, iso88591charset)) {
fileContent = scanner.useDelimiter("\\Z").next();
} catch (IOException exception) {
LOGGER.error("Could not read input file as a single String.", exception);
return;
}
// Do some regex substitutions on the fileContent string
String newContent = regex(fileContent);
// Write the file back out in target encoding
try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), iso88591charset))) {
writer.write(newContent);
} catch (Exception exception) {
LOGGER.error("Could not write out edited file!", exception);
}
}
There is nothing actually wrong with your code. Notepad++ sees the file encoded in UTF-8 because on a basic level there is no difference between UTF-8 and the encoding you're trying to use. Only specific characters differ and some (a lot) are missing from ISO compared to UTF. You can read more here or by simply searching ISO-8859-1 vs UTF-8 in Google.
I've created a simple project with your code and tested it with characters that are different for the ISO encoding - the result is a file that IntelliJ (and probably Notepad++ as well - cannot easily check, I'm on Linux) recognizes as ISO-8859-1. Apart from that, I've added another class that makes use of new (JDK11) features from Files class. The new Scanner(source, charset) that you've used was added in JDK10, so I think that you may be using 11 already. Here's the simplified code:
private static void editFile(File source, File target) {
Charset charset = StandardCharsets.ISO_8859_1;
String fileContent;
try {
fileContent = Files.readString(source.toPath(), charset);
} catch (IOException exception) {
System.err.println("Could not read input file as a single String.");
exception.printStackTrace();
return;
}
String newContent = regex(fileContent);
try {
Files.writeString(target.toPath(), newContent, charset);
} catch (IOException exception) {
System.err.println("Could not write out edited file!");
exception.printStackTrace();
}
}
Feel free to clone the repository or check it on GitHub and use whichever code version you prefer.

How to use OutputStream to write to a String [duplicate]

What's the best way to pipe the output from an java.io.OutputStream to a String in Java?
Say I have the method:
writeToStream(Object o, OutputStream out)
Which writes certain data from the object to the given stream. However, I want to get this output into a String as easily as possible.
I'm considering writing a class like this (untested):
class StringOutputStream extends OutputStream {
StringBuilder mBuf;
public void write(int byte) throws IOException {
mBuf.append((char) byte);
}
public String getString() {
return mBuf.toString();
}
}
But is there a better way? I only want to run a test!
I would use a ByteArrayOutputStream. And on finish you can call:
new String( baos.toByteArray(), codepage );
or better:
baos.toString( codepage );
For the String constructor, the codepage can be a String or an instance of java.nio.charset.Charset. A possible value is java.nio.charset.StandardCharsets.UTF_8.
The method toString() accepts only a String as a codepage parameter (stand Java 8).
I like the Apache Commons IO library. Take a look at its version of ByteArrayOutputStream, which has a toString(String enc) method as well as toByteArray(). Using existing and trusted components like the Commons project lets your code be smaller and easier to extend and repurpose.
This worked nicely
OutputStream output = new OutputStream() {
private StringBuilder string = new StringBuilder();
#Override
public void write(int b) throws IOException {
this.string.append((char) b );
}
//Netbeans IDE automatically overrides this toString()
public String toString() {
return this.string.toString();
}
};
method call =>> marshaller.marshal( (Object) toWrite , (OutputStream) output);
then to print the string or get it just reference the "output" stream itself
As an example, to print the string out to console =>> System.out.println(output);
FYI: my method call marshaller.marshal(Object,Outputstream) is for working with XML. It is irrelevant to this topic.
This is highly wasteful for productional use, there is a way too many conversion and it is a bit loose. This was just coded to prove to you that it is totally possible to create a custom OuputStream and output a string. But just go Horcrux7 way and all is good with merely two method calls.
And the world lives on another day....
Here's what I ended up doing:
Obj.writeToStream(toWrite, os);
try {
String out = new String(os.toByteArray(), "UTF-8");
assertTrue(out.contains("testString"));
} catch (UnsupportedEncondingException e) {
fail("Caught exception: " + e.getMessage());
}
Where os is a ByteArrayOutputStream.
baos.toString(StandardCharsets.UTF_8);
Converts the buffer's contents into a string by decoding the bytes using the named charset.
Java 17 - https://docs.oracle.com/
Here's what I did (don't use this in production, this is not great! But it makes fixing multiple errors easier.)
Create a list that holds Exceptions.
Create a logger to log exceptions.
Use the code below:
private static void exceptionChecker() throws Exception {
if(exceptionList.isEmpty()) return; //nothing to do :) great news
//create lock for multithreading
synchronized (System.err){
//create new error stream
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PrintStream errorOut = new PrintStream(byteArrayOutputStream);
//save standard err out
PrintStream standardErrOut = System.err;
try{
//set new error stream
System.setErr(errorOut);
exceptionList.forEach(exception -> {
exception.printStackTrace();
System.err.println("<---------->");
});
} finally {
//reset everything back to normal
System.setErr(standardErrOut);
//Log all the exceptions
exceptionLogger.warning(byteArrayOutputStream.toString());
//throw final generic exception
throw new Exception();
}
}}
This isn't great as you are throwing an error in the finally block and it locks on the error stream, but it works for dev purposes.

Java Problems encoding UTF8

I think the easiest way to explain my problem is with a little example:
My string at the beginning is: Pâtes, and the result should be: Pâtes. What I get as result is still Pâtes How can I fix this?
Here the code:
private String encode(String string) {
try {
byte ptext[] =string.getBytes("UTF8");
string = new String(ptext, "UTF8");
}
catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return string;
}
There are two problems with your code. The first is that you're using UTF8, but the correct character set is UTF-8.
The second is that you're essentially performing a no op. By calling byte ptext[] =string.getBytes("UTF-8"); you are saying that this string is UTF-8. Then you convert it to UTF-8 which it already is.
What I think you mean is that the input is ISO-8859-1 and you want to convert it to UTF-8. (This fits with the example input and output you've given).
Try:
private String encode(String string) {
try {
byte ptext[] = string.getBytes("ISO-8859-1");
string = new String(ptext, "UTF-8");
}
catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return string;
}
This assumes that your initial string was originally read from somewhere and only contains ISO-8859-1 characters. As mentioned in a comment you should try to ensure the data is loaded in correctly from the source (i.e. when it is still just an array of bytes).

Android-Writing to a file results in random characters

Argargarg.
I am trying to get information from a user input, then to write it to a system file. I get the input, and I call getBytes on it. It logs to the file something along the lines of "null" and random numbers after that. I tried getting it to a string, no luck there, it was a random chain of symbols
Here is the specific code:
TextView note_input=(TextView) findViewById(R.id.note_input);
FileOutputStream fos=null;
String newNote=note_input.getText().toString();
Log.w("Debug",newNote);
try {
fos=openFileOutput("currentNote",Context.MODE_APPEND);
} catch (FileNotFoundException e) {
//IT_SHOUD_NOT_EXIST
}
try {
Log.w("Debug",newNote.getBytes().toString());
fos.write(newNote.getBytes());
fos.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I appreciate any help!
String.getBytes returns array of bytes, and when you try to do toString() you are actually writing it's pointer to string. You already have String change this line
Log.w("Debug",newNote.getBytes().toString());
into
Log.w("Debug",newNote);
and you will have proper Log output, and File should be written properly already.
Hope this helps and enjoy your work
Just a shot in the dark, but I notice you're calling getBytes() without specifying the character encoding. Unless your output file is the same character encoding as the system default encoding, you could easily get gibberish on the output.

java.util.properties getProperty method returns null for UTF-8 string

I have a problem getting localized value (a UTF-8 string) from the properties file.
Here is what i did ..
//properties file containing
//name=हिन्दी
public static void main(String[] args)
{
File f = new File("path\\to\\properties\\file");
Properties p = new Properties();
try
{
p.load(new InputStreamReader(new FileInputStream(f),Charset.forName("UTF-8")));
}
catch(IOException iox) {}
System.out.println(p.getProperty("name")); //outputs null
System.out.println(p.toString()); //outputs { name= हिन्दी }
}
Thanks
If the last line actually outputs name= हिन्दी then it looks like there's something wrong with the key String in the properties file; perhaps in includes some invisible whitespace character or one of the charaters of "name" is not the regular latin letter but something that looks like it. Take a look at the properties file in a hex editor.
Your problem is here:
catch(IOException iox) {}
this is called "mutual failure". Never do this. Your code that loads properties from file fails and throws exception and you even do not know about this. Add e.printStackTrace() and move code that prints properties values into the try/catch block. See the exception and fix the bug. I believe that your file path is wrong, so you will get FileNotFoundException or probably you do not have access rights to this file.

Categories

Resources