java encoding issue while reading stream - java

I am trying to download contents from ftp folder. There is one xml file which starts with standardazed xml codes.
< ?xml version="1.0" encoding="utf-8"?>
when i read these files (using java.net.Socket)and get input stream and then try to convert to String, somehow i get some new charecters. And the whole xml document starts with '?' eg. "?< ?xml version="1.0" encoding="utf-8"?>....."
BufferedInputStream reader = new BufferedInputStream(sock.getInputStream());
Then i am getting a string from this reader using following code.
StringBuilder sb = new StringBuilder();
String line;
BufferedReader br = new BufferedReader(new InputStreamReader(reader));
while ((line = br.readLine()) != null) {
sb.append(line);
}
System.out.println ("sb.toString()");
Not sure whats happening here. why am i getting some special charecters introduced ?Any suggestions would be appreciated
and then i just used following code to read the file and in console i see some special charecters
BufferedReader reader = new BufferedReader(new FileReader("c:/Users/appd922/DocumentMeta06122014.xml"));
StringBuffer sb = new StringBuffer();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line);
}
String output = sb.toString();
System.out.println("reading from file"+output);
I got output starting
"reading from file< ?xml version.....
where am i getting these special charecters ?
Note- ignore the space in the xml file line given above. i could not write here with proper xmlwithout that space.

Specify the encoding when creating InputStreamReader to read the file from the ftp, for example:
BufferedReader br = new BufferedReader(new InputStreamReader(reader, "utf-8"));
Otherwise, InputStreamReader uses default encoding. Also, specify the encoding when reading the downloaded file. FileReader uses default platform encoding. Use InputStreamReader and specify encoding, for example:
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), "utf-8"));

Those characters are called BOM, Byte Order Mark. If you set the encoding of the InputStreamReader to 'UTF-8', you could see that they are interpreted as a single character, that is the BOM character.
Unfortunately, you have to handle this character yourself, because Java won't do it for you: java utf-8 and bom. Usually you just strip your stream of it. Good luck.

Related

How to correct change encoding of post query?

When I send post to my page without setCharacterEncoding on server-side, I get фыв. With setCharacterEncoding(UTF-8), I get ыва. How to correct change character encoding of POST query?
P.S.: I read data from ServletInputStream.
Code below.
doPost
req.setCharacterEncoding("UTF-8");
BufferedReader r = new BufferedReader(new InputStreamReader(req.getInputStream()));
String line;
while ((line = r.readLine()) != null) {
System.out.println(line);
}
BufferedReader r = new BufferedReader(
new InputStreamReader(req.getInputStream(), StandardCharsets.UTF_8));
With getInputStream you have binary data without an encoding. Hence the binary-to-text bridging class InputStreamReader needs the correct encoding. Otherwise it uses the system default System.getProperty("file.encoding").

Can someone explain me how this code works

I got it from a page
Android AsyncTask method that I dont know how to solve
but i am not sure how it work completly, if someone can explain me what is the while for and This part "iso-8859-1"
i understood that the 8 is for the number of characters but i could be wrong
static InputStream is = null;
static String json = "";
is = httpEntity.getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(
is, "iso-8859-1"), 8);
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
is.close();
json = sb.toString();
Your code basically reads from an inputstream obtained from the httpentity, puts that into a StringBuilder and converts that into a json finally.
For understanding the api codes, javadoc is your friend.
Here is what I found in BufferredReader javadoc
public BufferedReader(Reader in,
int sz)
Creates a buffering character-input stream that uses an input buffer of the specified size.
Parameters:** in - A Reader sz - Input-buffer size
Throws: IllegalArgumentException - If sz is <=0
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
As a reader, InputStreamReader is used in your code. Here is the relevant javadoc for the InputStreamReader
public InputStreamReader(InputStream in,Charset cs) Creates an
InputStreamReader that uses the given charset.
Parameters:
in - An
InputStream cs - A charset
http://docs.oracle.com/javase/7/docs/api/java/io/InputStreamReader.html#InputStreamReader(java.io.InputStream, java.nio.charset.Charset)
So "iso-8859-1" is the charset specified.

inputStreamReader won't recognise type JFileChooser

I have a variable, inFileName of type JFileChooser.
I've called this variable to the method HexFinder in class checksumFinder. It is to be used in the inputStreamReader inside a BufferedReader. (I'm using this line to call it)
cf.HexFinder(inFileName,null,null,null);
This is causing an error as the inputStreamReader will only accept variables of type String. (Here's my code for the BufferedReader)
BufferedReader reader = new BufferedReader(new InputStreamReader(this.getClass().getResourceAsStream(inFileName)));
Is there a way that I can get the inputStreamReader to read in inFileName? If not then how can I resolve this?
Any help is much appreciated.
If you are trying to read a file chosen by a JFileChooser then you can do the following;
File file = inFileName.getSelectedFile();
BufferedReader reader = new BufferedReader(new FileReader(file));
Note that FileReader uses the default character encoding. You can manually specify the encoding like this;
String charset = "UTF-8";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), charset));

Getting only a part of text with InputStreamReader

I am reading from InputStreamReader but I only get the first 10,000 characters of the text that is supposed to come. Any idea what the problem may be? If there is no solution with this class, what are my alternatives?
I found this about InputStreamReader: "The buffer size is 8K." (http://developer.android.com/reference/java/io/InputStreamReader.html). Could this be the answer?
Any pointers really appreciated
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(new InputStreamReader(
httpcon.getInputStream(),"utf-8"));
String line = null;
while ((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
result = sb.toString();
8K buffer would mean 8000 bytes and since one character is one byte that would seem to make some sense as to your problem. But what is confusing is that you get 10,000 characters.

How to read and write UTF-8 to disk on the Android?

I cannot read and write extended characters (French accented characters, for example) to a text file using the standard InputStreamReader methods shown in the Android API examples. When I read back the file using:
InputStreamReader tmp = new InputStreamReader(in);
BufferedReader reader = new BufferedReader(tmp);
String str;
while ((str = reader.readLine()) != null) {
...
the string read is truncated at the extended characters instead of at the end-of-line. The second half of the string then comes on the next line. I'm assuming that I need to persist my data as UTF-8 but I cannot find any examples of that, and I'm new to Java.
Can anyone provide me with an example or a link to relevant documentation?
Very simple and straightforward. :)
String filePath = "/sdcard/utf8_file.txt";
String UTF8 = "utf8";
int BUFFER_SIZE = 8192;
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), UTF8),BUFFER_SIZE);
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(filePath), UTF8),BUFFER_SIZE);
When you instantiate the InputStreamReader, use the constructor that takes a character set.
InputStreamReader tmp = new InputStreamReader(in, "UTF-8");
And do a similar thing with OutputStreamWriter
I like to have a
public static final Charset UTF8 = Charset.forName("UTF-8");
in some utility class in my code, so that I can call (see more in the Doc)
InputStreamReader tmp = new InputStreamReader(in, MyUtils.UTF8);
and not have to handle UnsupportedEncodingException every single time.
this should just work on Android, even without explicitly specifying UTF-8, because the default charset is UTF-8. if you can reproduce this problem, please raise a bug with a reproduceable test case here:
http://code.google.com/p/android/issues/entry
if you face any such kind of problem try doing this. You have to Encode and Decode your data into Base64. This worked for me. I can share the code if you need it.
Check the encoding of your file by right clicking it in the Project Explorer and selecting properties. If it's not the right encoding you'll need to re-enter your special characters after you change it, or at least that was my experience.

Categories

Resources