Reading a UTF-8 string from ZipFileInputStream - java

I am trying to read a UTF-8 file from a zipFile and its turning out to be a major challenge.
Here I zip the String to a bytes array to persist to my db.
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ZipOutputStream zo = new ZipOutputStream( bos );
zo.setLevel(9);
BufferedWriter writer = new BufferedWriter(
new OutputStreamWriter(bos, Charset.forName("utf-8"))
);
ZipEntry ze = new ZipEntry("data");
zo.putNextEntry(ze);
zo.write( s.getBytes() );
zo.close();
writer.close();
return bos.toByteArray();
And this is how I read the String back:
ZipInputStream zis = new ZipInputStream( new ByteArrayInputStream(bytes) );
ZipEntry entry = zis.getNextEntry();
byte[] buffer = new byte[2048];
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int size;
while ((size = zis.read(buffer, 0, buffer.length)) != -1) {
bos.write(buffer, 0, size);
}
BufferedReader r = new BufferedReader( new InputStreamReader( new ByteArrayInputStream( bos.toByteArray() ), Charset.forName("utf-8") ) );
StringBuilder b = new StringBuilder();
while (r.ready()) {
b.append( r.readLine() ).append(" ");
}
The String that I get back here has lost the UTF8 charecters!
UPDATE 1:
I changed the code around so that I compared the byte array of the original String with the byte array I read back from the zipfile and they freaking match! So its probably how I'm building the string after i have the bytes.
Arrays.equals(converted, orgi)

Your problem is in the writing, presuming s is a String, you have:
zo.write( s.getBytes() );
But that will convert s to bytes using whatever the default encoding is. You'll want to use UTF-8 for that conversion:
zo.write( s.getBytes("utf-8") );
Your observation that the original bytes are the same as the uncompressed bytes make sense because the original written data is the source of the problem.
Note that you have the writer stream declared but you never actually use it for anything (nor should you, in this context, since writing to it will just write uncompressed string data to the same stream bos that your ZipOutputStream writes to). It looks like you may have confused yourself trying a few different things at once here, you should just get rid of writer.

For one, BufferedReader#ready() is not a good indicator for reading input. Here's a number of reasons why
Does BufferedReader.ready() method ensure that readLine() method does not return NULL?
BufferedReader not stating 'ready' when it should
Second, you are using
b.append( r.readLine() ).append(" ");
which is always adding a " " on every iteration. The resulting String value is bound to be different than the original just because of this.
Third, shout out to Jason C about your BufferedWriter not doing anything.

Related

Deserialize ByteBuffer to InputStream java

I am trying to deserialize ByteBuffer to an InputStream and then to an Object in Java. This is my usecase:
# Initially I have a list of objects List<objA> objs, I converted them to a File and then to ByteBuffer like this.
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new GZIPOutputStream(new FileOutputStream(filePath)), "UTF-8"));
for (objA obj: objs) {
StringBuilder row = new StringBuilder();
row.append(obj.getProp1()).append(SEPARATOR);
row.append(obj.getProp2()).append(SEPARATOR);
writer.write(row.toString());
}
File file = new File(filepath);
InputStream inputStream = new BufferedInputStream(new FileInputStream(file));
ByteBuffer buff = ByteBuffer.wrap(IOUtils.toByteArray(inputStream));
Now, I want to exactly reverse this operation and get the List back, how can I do that? I couldn't find many references on this topic. Can someone help here.
Please note that I cannot change the above code snippet, I want to know how can I get the original list back from the ByteBuffer created from the above code snippet.

How can I optimize streaming a .osgb file into a ByteString efficiently?

I am sending a .osgb file in a Google Protobuf message which requires a Byte String. It is encoded "ISO-5589-1". In python, I can simply open(file_name, "r").read(). In Java, I have very noob-ishly created:
String model;
ByteString modelBytes = null;
try {
FileInputStream fis = new FileInputStream( filename );
DataInputStream dis = new DataInputStream(fis);
byte[] bytes = new byte[dis.available()];
if ( dis.available() != 0 ) {
dis.readFully(bytes);
}
model = new String(bytes, "ISO-8859-1");
modelBytes = ByteString.copyFrom(model, "ISO-8859-1");
}
I'm not looking to code-golf this excerpt, but I feel as though I have possibly redundant or extra code that is genuinely not needed. It feels as though I should be able to just convert the data stream immediately into a ByteString and not worry about the encoding, but I'm not familiar enough with it.
I am very inexperienced with Java so I appreciate any assistance. Thanks.

Java FileInputStream

I am trying to use a FileInputStream to essentially read in a text file, and then output it in a different text file. However, I always get very strange characters when I do this. I'm sure it's some simple mistake I'm making, thanks for any help or pointing me in the right direction. Here's what I've got so far.
File sendFile = new File(fileName);
FileInputStream fileIn = new FileInputStream(sendFile);
byte buf[] = new byte[1024];
while(fileIn.read(buf) > 0) {
System.out.println(buf);
}
The file it is reading from is just a big text file of regular ASCII characters. Whenever I do the system.out.println, however, I get the output [B#a422ede. Any ideas on how to make this work? Thanks
This happens because you are printing a byte array object itself, rather than printing its content. You should construct a String from the buffer and a length, and print that String instead. The constructor to use for this is
String s = new String(buf, 0, len, charsetName);
Above, len should be the value returned by the call of the read() method. The charsetName should represent the encoding used by the underlying file.
If you're reading from a file to another file, you shouldn't convert the bytes to a string at all, just write the bytes read into the other file.
If your intention is to convert a text file from an encoding to another, read from a new InputStreamReader(in, sourceEncoding), and write to a new OutputStreamWriter(out, targetEncoding).
That's because printing buf will print the reference to the byte array, not the bytes themselves as String as you would expect. You need to do new String(buf) to construct the byte array into string
Also consider using BufferedReader rather than creating your own buffer. With it you can just do
String line = new BufferedReader(new FileReader("filename.txt")).readLine();
Your loop should look like this:
int len;
while((len = fileIn.read(buf)) > 0) {
System.out.write(buf, 0, len);
}
You are (a) using the wrong method and (b) ignoring the length returned by read(), other than checking it for < 0. So you are printing junk at the end of each buffer.
the object 's defualt toString method is return object's id in the memory.
byte buf[] is an object.
you can print using this.
File sendFile = new File(fileName);
FileInputStream fileIn = new FileInputStream(sendFile);
byte buf[] = new byte[1024];
while(fileIn.read(buf) > 0) {
System.out.println(Arrays.toString(buf));
}
or
File sendFile = new File(fileName);
FileInputStream fileIn = new FileInputStream(sendFile);
byte buf[] = new byte[1024];
int len=0;
while((len=fileIn.read(buf)) > 0) {
for(int i=0;i<len;i++){
System.out.print(buf[i]);
}
System.out.println();
}

Unable to zip multiple files using Java

Getting data onto inputStream object from web url
inputStream = AWSFileUtil.getInputStream(
AWSConnectionUtil.getS3Object(null),
"cdn.generalsentiment.com", filePath);
If they are mutliple files then i want to zip them and sent the filetype as "zip" to struts.xml which does the download.
actually am converting the inputstream into byteArrayInputStream
ByteArrayInputStream byteArrayInputStream = new
ByteArrayInputStream(inputStream.toString().getBytes());
while (byteArrayInputStream.read(inputStream.toString().getBytes()) > 0) {
zipOutputStream.write(inputStream.toString().getBytes());
}
and then
zipOutputStream.close();
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
fileInputStream = new FileInputStream(file);
while (fileInputStream.read(buffer) > 0) {
byteArrayOutputStream.write(buffer);
}
byteArrayOutputStream.close();
inputStream = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
reportName = "GS_MediaValue_Reports.zip";
fileType = "zip";
}
return fileType;
But the downloaded zip when extracted gives corrupt files.
Please suggest me a solution for this issue.
The short answer is that it's not how ZipOutputStream works. Since it was designed to store multiple files, along with their file names, directory structures and so on, you need to tell the stream about that explicitly.
Furthermore, converting a stream to a string is a bad idea in general, plus it's slow, especially when you're doing it in a loop.
So your solution will be something like:
ZipEntry entry = new ZipEntry( fileName ); // You have to give each entry a different filename
zipOutputStream.putNextEntry( entry );
byte buffer[] = new byte[ 1024 ]; // 1024 is the buffer size here, but it could be anything really
int count;
while( (count = inputStream.read( buffer, 0, 1024 ) ) != -1 ) {
zipOutputStream.write( buffer, 0, count );
}

Reading binary file from URLConnection

I'm trying to read a binary file from a URLConnection. When I test it with a text file it seems to work fine but for binary files it doesn't. I'm using the following mime-type on the server when the file is send out:
application/octet-stream
But so far nothing seems to work. This is the code that I use to receive the file:
file = File.createTempFile( "tempfile", ".bin");
file.deleteOnExit();
URL url = new URL( "http://somedomain.com/image.gif" );
URLConnection connection = url.openConnection();
BufferedReader input = new BufferedReader( new InputStreamReader( connection.getInputStream() ) );
Writer writer = new OutputStreamWriter( new FileOutputStream( file ) );
int c;
while( ( c = input.read() ) != -1 ) {
writer.write( (char)c );
}
writer.close();
input.close();
This is how I do it,
input = connection.getInputStream();
byte[] buffer = new byte[4096];
int n;
OutputStream output = new FileOutputStream( file );
while ((n = input.read(buffer)) != -1)
{
output.write(buffer, 0, n);
}
output.close();
If you are trying to read a binary stream, you should NOT wrap the InputStream in a Reader of any kind. Read the data into a byte array buffer using the InputStream.read(byte[], int, int) method. Then write from the buffer to a FileOutputStream.
The way you are currently reading/writing the file will convert it into "characters" and back to bytes using your platform's default character encoding. This is liable to mangle binary data.
(There is a charset (LATIN-1) that provides a 1-to-1 lossless mapping between bytes and a subset of the char value-space. However this is a bad idea even when the mapping works. You will be translating / copying the binary data from byte[] to char[] and back again ... which achieves nothing in this context.)

Categories

Resources