Is there a way to send a textfile from client to server using XXXwriter and XXXreader instead of sending bytes?
Any suggestions?
You can wrap the InputStream in an InputStreamReader, and the OutputStream in an OutputStreamWriter. These classes bridge binary (byte[], *Stream) from/to java's Unicode text (String, char, *Reader, *Writer). Use the constructor with the correct encoding.
Charset encoding = StandardCharsets.UTF_8;
String encoding = "Windows-1252";
... new InputStreamReader(inputStream, encoding);
This however assumes that the Stream transfer is done fine. Possible errors are:
forgetting to close, not all data transfered;
use of available() which is not needed;
using a buffer to read, and not writing the actual number of bytes read, old data at the end.
Related
I use forbiddenapis to check my code. It gives an error:
[forbiddenapis] Forbidden class/interface use: java.lang.String#<init>(byte[])
[forbiddenapis] in org.a.b.MyObject (MyObject.java:14)
Which points to:
String finalString = new String(((ByteArrayOutputStream) out).toByteArray());
How can I resolve it? I know that I can set a Charset, i.e.:
Charset.forName("UTF-8").encode(myString);
However since there is used byte, which charset should I use to avoid a problem with different characters?
You'll need insight into the charset with which the bytes were encoded in the first place. If you're confident it'd always be UTF8, you could just use the String constructor:
new String(bytes, StandardCharsets.UTF_8)
Do not use FileReader. This is an old utility class to read files in the default platform encoding. That is not suited for portable files. The code is unportable.
String / Reader / Writer holds Unicode. When converting from byte[] / InputStream / OutputStream one needs to indicate the encoding of those bytes, binary data.
String s = new String(bytes, charset);
byte[] bytes = s.getBytes(charset);
It seems that the message mentions FileReader and complains about its
new String(bytes);
which uses the default encoding, as would:
string.getBytes();
I have a sample method which copies one file to another using InputStream and OutputStream. In this case, the source file is encoded in 'UTF-8'. Even if I don't specify the encoding while writing to the disk, the destination file has the correct encoding. But, if I have to write a java.lang.String to a file, I need to specify the encoding. Why is that ?
public static void copyFile() {
String sourceFilePath = "C://my_encoded.txt";
InputStream inStream = null;
OutputStream outStream = null;
try{
String targetFilePath = "C://my_target.txt";
File sourcefile =new File(sourceFilePath);
outStream = new FileOutputStream(targetFilePath);
inStream = new FileInputStream(sourcefile);
byte[] buffer = new byte[1024];
int length;
//copy the file content in bytes
while ((length = inStream.read(buffer)) > 0){
outStream.write(buffer, 0, length);
}
inStream.close();
outStream.close();
System.out.println("File "+targetFilePath+" is copied successful!");
}catch(IOException e){
e.printStackTrace();
}
}
My guess is that since the source file has thee correct encoding and since we read and write one byte at a time, it works fine. And java.lang.String is 'UTF-16' by default and if we write it to the file, it reads one byte at a time instead of 2 bytes and hence garbage values. Is that correct or am I completely wrong in my understanding ?
You are copying the file byte per byte, so you don't need to care about character encoding.
As a rule of thumb:
Use the various InputStream and OutputStream implementations for byte-wise processing (like file copy).
There are some convenience methods to handle text directly like PrintStream.println(). Be careful because most of them use the default platform specific encoding.
Use the various Reader and Writer implemenations for reading and writing text.
If you need to convert between byte-wise and text processing use InputStreamReader and OutputStreamWriter with explicit file encoding.
Do not rely on the default encoding. The default character encoding is platform specific (e.g. Windows-ANSI aka Cp1252 for Windows, usually UTF-8 on Linux).
Example: If you need to read a UTF-8 text file:
BufferedReader reader =
new BufferedReader(new InputStreamReader(new FileInputStream(inFile), "UTF-8"));
Avoid using a FileReader because a FileReader uses always the default encoding.
A special case: If you need random access to a file you should use RandomAccessFile. With it you can read and write data blocks at arbitrary positions. You can read and write raw byte blocks or you can use convenience methods to read and write text. But you should read the documentation carefully. E.g. the methods readUTF() and writeUTF() use a modified UTF-8 encoding.
InputStream, OutputStream, Reader, Writer and RandomAccessFile form the basic IO functionality, enough for most use cases. For advanced IO (e.g. memory mapped files, ...) have a look at package java.nio.
Just read your code! (For the copy part at least ;-) )
When you copy the two files, you copy it byte by byte. There is no conversion to String, thus.
When you write a String into a file, you need to convert it (indirectly sometimes) in an array of byte (byte[]). There you need to specify your encoding.
When you read a file to get a String, you need to know its encoding in order to do it properly. Java doesn't 'skip' any byte but you need to make a conversion once again : from a byte[] to a String.
Is a printstream appropriate for sending image files through a socket? I'm currently doing a homework assignment where I have to write a web proxy from scratch using basic sockets.
When I configure firefox to use my proxy everything works fine except images don't download. If I go to an image file directly firefox comes back with the error: The image cannot be displayed because it contains errors
Here is my code for sending the response from the server back to the client (firefox):
BufferedReader serverResponse = new BufferedReader(new InputStreamReader(webServer.getInputStream()));
String responseLine;
while((responseLine = serverResponse.readLine()) != null)
{
serverOutput.println(responseLine);
}
In the code above serverOutput is a PrintStream object. I am wondering if somehow the PrintStream is corrupting the data?
No, it is never appropriate to treat bytes as text unless you know they are text.
Specifically, the InputStreamReader will try to decode your image (which can be treated as a byte array) to a String. Then your PrintStream will try to encode the String back to a byte array.
There is no guarantee that this will produce the original byte array. You might even get an exception, depending on what encoding Java decides to use, if some of the image bytes aren't valid encoded characters.
As title say ...
I read content from htto response
InputStream is = response.getEntity().getContent();
String cw = IOUtils.toString(is);
byte[] b = cw.getBytes("Cp1250");
String x = StringUtils.newStringUtf8(b);
String content = new String(b, "UTF-8");
System.out.println(content);
I have tried plenty of variations. I am little confused about what are correct encoding constants used as strings. windows-1250 or Cp1250. UTF-8 or utf-8 or utf8?
You seem to think that a String object has an encoding. That's not correct. An encoding is used as part of the translation from binary data (a byte[] or InputStream) to text data (a String or char[] etc).
It's not clear what IOUtils.toString is doing, but it's almost certainly losing data or at least handling it inappropriately. If your data is originally in Windows-1250, then you should use an InputStreamReader wrapping the InputStream, specifying the charset in the InputStreamReader constructor call.
It's not clear where UTF-8 comes in - you might want to write out the data in UTF-8 afterwards, but the result of that would be byte[], not a string.
You're converting backwards. You need to get the input data as a byte array and then use String(byteArray, "Cp1250") to create the String object. Then if you want UTF-8, use String.getBytes("UTF-8").
Encoding have a canonical (unique) name and other varying names, and that case-insensitive. For instance "UTF-8" is the canonical name, but some java versions back it was "UTF8"; it got written more to the common usage. The same for "Windows-1250," which you might see also in HTML pages. "Cp1250" (Code-Page) is a java internal name.
In java byte[] is binary data, String (internally Unicode) is text.
Conversion between both needs an encoding, often optional though, taking the operating system default.
byte, InputStream, OutputStream <-> String, char, Reader, Writer
String cw = IOUtils.toString(is, "UTF-8"); // InputStream is binary gives byte[], hence give encoding
byte[] b = cw.getBytes("Cp1250");
String x = new String(b, "Cp1250");
String content = s;
System.out.println(content);
To allow this universal (qua encoding) String, String internally uses char, UTF-16.
String constants are stored in the .class file as UTF-8 (more compact).
Assuming Apache Commons IO, use one of the methods that specifies an encoding:
String cw = IOUtils.toString(is, "windows-1250");
All strings are implicitly UTF-16 in Java. Other encodings are generally represented using byte arrays.
I see better to use Scanner for reading in different charsets.
FileInputStream is = new FileInputStream(fileOrPath);
Scanner scanner = new Scanner(is, "cp1250");
String out = scanner.next();
And method next() returns String value in charset of application.
Tested on "czech language" from "cp1250" to "UTF-8".
How do I take a string and use something like GZIPOutputStream to gzip the string and then output the zipped content as a string.
My intention is to transfer the zipped content as a post variable through HTTP.
The steps are actually pretty simple:
Use the GZIPOutputStream to write it to a ByteArrayOutputStream... close the GZIPOutputStream
Call ByteArrayOutputStream.toBytes() to get the byte array
Use a Base64 encoder on the result
The server will perform essentially the reverse of these operations.