How can I optimize streaming a .osgb file into a ByteString efficiently? - java

I am sending a .osgb file in a Google Protobuf message which requires a Byte String. It is encoded "ISO-5589-1". In python, I can simply open(file_name, "r").read(). In Java, I have very noob-ishly created:
String model;
ByteString modelBytes = null;
try {
FileInputStream fis = new FileInputStream( filename );
DataInputStream dis = new DataInputStream(fis);
byte[] bytes = new byte[dis.available()];
if ( dis.available() != 0 ) {
dis.readFully(bytes);
}
model = new String(bytes, "ISO-8859-1");
modelBytes = ByteString.copyFrom(model, "ISO-8859-1");
}
I'm not looking to code-golf this excerpt, but I feel as though I have possibly redundant or extra code that is genuinely not needed. It feels as though I should be able to just convert the data stream immediately into a ByteString and not worry about the encoding, but I'm not familiar enough with it.
I am very inexperienced with Java so I appreciate any assistance. Thanks.

Related

GZIPInputStream unable to decode at receiver side (invalid code lengths set)

I'm attempting to encode a String in a client using GZIPOutputStream then decoding the String in a server using GZIPOutputStream.
The client's side code (after the initial socket connection establishment) is:
// ... Establishing connection, getting a socket object.
// ... Now proceeding to send data using that socket:
DataOutputStream out = new DataOutputStream(socket.getOutputStream());
String message = "Hello World!";
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(message);
gzip.close();
String encMessage = out.toString();
out.writeInt(encMessage.getBytes().length);
out.write(encMessage.getBytes());
out.flush();
And the server's side code (again, after establishing a connection):
DataInputStream input = new DataInputStream(socket.getInputStream());
int length = input.readInt();
byte[] buffer = new byte[length];
input.readFully(buffer);
GZIPInputStream gz = new GZIPInputStream(new ByteArrayInputStream(buffer));
BufferedReader r = new BufferedReader(new InputStreamReader(gz));
String s = "";
String line;
while ((line = r.readLine()) != null)
{
s += line;
}
I checked and the buffer length (i.e., the coded message's size) is passed correctly, so the right number of bytes is transferred.
However, I'm getting this:
java.util.zip.ZipException: invalid code lengths set
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122)
at parsing.ReceiveResponsesTest$TestReceiver.run(ReceiveResponsesTest.java:147)
at java.lang.Thread.run(Thread.java:745)
Any ideas?
Thanks in advance for any assistance!
You're calling toString() on the ByteArrayOutputStream - that is incorrect, and it opens up all kinds of character encoding problems that are probably biting you here. You need to call toByteArray instead:
byte[] encMessage = out.toByteArray();
out.writeInt(encMessage.length);
out.write(encMessage);
Detail:
if you use toString(), Java will encode your bytes in your platform default character encoding. That could be some Windows codepage, UTF-8, or whatnot.
However not all characters can be encoded properly, and some will be replaced by an alternative character - a question mark perhaps. Without knowing the details, it's hard to tell.
But in any case, encoding the byte array to a String, and then decoding it to a byte array again when you write it out, is very likely to change the data in the byte array. And there is not need to do it, you can just get the byte array straight away as shown in the code above.
Why on earth are you indulging in all this complication? You can reduce it all to this:
GZIPOutputStream gzip = new GZIPOutputStream(socket.getOutputStream());
DataOutputStream out = new DataOutputStream(gzip);
String message = "Hello World!";
out.writeUTF(message);
out.close();
// ...
GZIPInputStream gz = new GZIPInputStream(new ByteArrayInputStream(socket.getInputStream()));
DataInputStream input = new DataInputStream(gz);
String line = input.readUTF();
I further note that your code doesn't actually compile. I would further note that unless the messages are several orders of magnitude larger, there is no benefit to the GZipping.

Reading a UTF-8 string from ZipFileInputStream

I am trying to read a UTF-8 file from a zipFile and its turning out to be a major challenge.
Here I zip the String to a bytes array to persist to my db.
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ZipOutputStream zo = new ZipOutputStream( bos );
zo.setLevel(9);
BufferedWriter writer = new BufferedWriter(
new OutputStreamWriter(bos, Charset.forName("utf-8"))
);
ZipEntry ze = new ZipEntry("data");
zo.putNextEntry(ze);
zo.write( s.getBytes() );
zo.close();
writer.close();
return bos.toByteArray();
And this is how I read the String back:
ZipInputStream zis = new ZipInputStream( new ByteArrayInputStream(bytes) );
ZipEntry entry = zis.getNextEntry();
byte[] buffer = new byte[2048];
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int size;
while ((size = zis.read(buffer, 0, buffer.length)) != -1) {
bos.write(buffer, 0, size);
}
BufferedReader r = new BufferedReader( new InputStreamReader( new ByteArrayInputStream( bos.toByteArray() ), Charset.forName("utf-8") ) );
StringBuilder b = new StringBuilder();
while (r.ready()) {
b.append( r.readLine() ).append(" ");
}
The String that I get back here has lost the UTF8 charecters!
UPDATE 1:
I changed the code around so that I compared the byte array of the original String with the byte array I read back from the zipfile and they freaking match! So its probably how I'm building the string after i have the bytes.
Arrays.equals(converted, orgi)
Your problem is in the writing, presuming s is a String, you have:
zo.write( s.getBytes() );
But that will convert s to bytes using whatever the default encoding is. You'll want to use UTF-8 for that conversion:
zo.write( s.getBytes("utf-8") );
Your observation that the original bytes are the same as the uncompressed bytes make sense because the original written data is the source of the problem.
Note that you have the writer stream declared but you never actually use it for anything (nor should you, in this context, since writing to it will just write uncompressed string data to the same stream bos that your ZipOutputStream writes to). It looks like you may have confused yourself trying a few different things at once here, you should just get rid of writer.
For one, BufferedReader#ready() is not a good indicator for reading input. Here's a number of reasons why
Does BufferedReader.ready() method ensure that readLine() method does not return NULL?
BufferedReader not stating 'ready' when it should
Second, you are using
b.append( r.readLine() ).append(" ");
which is always adding a " " on every iteration. The resulting String value is bound to be different than the original just because of this.
Third, shout out to Jason C about your BufferedWriter not doing anything.

When and why decorate OutputStream with ArmoredOutputStream when using BouncyCastle

I'm pretty new to BouncyCastle and pgp. I've seen many articles and samples on the internet. Almost every encryption sample contains the code snipped below
if (armor)
out = new ArmoredOutputStream(out);
It seems that my local test passed with both armor and none-armor. I googled around but found few useful and the javadoc of ArmoredOutputStream only shows This is basic output stream.
So what's the difference and when to use it?
Complete code sample:
public static void encryptFile(String decryptedFilePath,
String encryptedFilePath,
String encKeyPath,
boolean armor,
boolean withIntegrityCheck)
throws Exception{
OutputStream out = new FileOutputStream(encryptedFilePath);
FileInputStream pubKey = new FileInputStream(encKeyPath);
PGPPublicKey encKey = readPublicKeyFromCollection2(pubKey);
Security.addProvider(new BouncyCastleProvider());
if (armor)
out = new ArmoredOutputStream(out);
// Init encrypted data generator
PGPEncryptedDataGenerator encryptedDataGenerator =
new PGPEncryptedDataGenerator(PGPEncryptedData.CAST5, withIntegrityCheck, new SecureRandom(),"BC");
encryptedDataGenerator.addMethod(encKey);
OutputStream encryptedOut = encryptedDataGenerator.open(out, new byte[BUFFER_SIZE]);
// Init compression
PGPCompressedDataGenerator compressedDataGenerator = new PGPCompressedDataGenerator(PGPCompressedData.ZIP);
OutputStream compressedOut = compressedDataGenerator.open(encryptedOut);
PGPLiteralDataGenerator literalDataGenerator = new PGPLiteralDataGenerator();
OutputStream literalOut = literalDataGenerator.open(compressedOut, PGPLiteralData.BINARY, decryptedFilePath, new Date(), new byte[BUFFER_SIZE]);
FileInputStream inputFileStream = new FileInputStream(decryptedFilePath);
byte[] buf = new byte[BUFFER_SIZE];
int len;
while((len = inputFileStream.read(buf))>0){
literalOut.write(buf,0,len);
}
literalOut.close();
literalDataGenerator.close();
compressedOut.close();
compressedDataGenerator.close();
encryptedOut.close();
encryptedDataGenerator.close();
inputFileStream.close();
out.close();
}
}
ArmoredOutputStream uses an encoding similar to Base64, so that binary non-printable bytes are converted to something text friendly. You'd do this if you wanted to send the data over email, or post on a site, or some other text medium.
It doesn't make a difference in terms of security. There is a slight expansion of the message size though. The choice really just depends on what you want to do with the output.
ASCII armor is a generic term that means a binary data representation as an ASCII-only text. Technically, there is a lot of ways to ascii-armor binary data, but in the cryptography-related field the PEM format is prevalent (also check this and related questions at serverfault).
The PEM is basically a Base64-encoded binary data wrapped in -----BEGIN SOMETHING----- and -----END SOMETHING----- delimiters and a set of additional headers that can contain some meta information about the binary content.

Java mutable byte array data structure

I'm trying to find an easy way to create a mutable byte array that can automatically append any primitive Java data type. I've been searching but could not find anything useful.
I'm looking for something like this
ByteAppender byteStructure = new ByteAppender();
byteStructure.appendInt(5);
byteStructure.appendDouble(10.0);
byte[] bytes = byteStructure.toByteArray();
There is ByteByffer which is great, but you have to know the size of the buffer before you start, which won't work in my case. There is a similar thing (StringBuilder) for creating Strings, but I cannot find one for Bytes.
I thought this would be obvious in Java.
I guess you are looking for java.io.DataOutputStream
ByteArrayOutputStream out = new ByteArrayOutputStream();
DataOutputStream dout = new DataOutputStream(out);
dout.writeInt(1234);
dout.writeLong(123L);
dout.writeFloat(1.2f);
byte[] storingData = out.toByteArray();
How to use storingData?
//how to use storingData?
ByteArrayInputStream in = new ByteArrayInputStream(storingData);
DataInputStream din = new DataInputStream(in);
int v1 = din.readInt();//1234
long v2 = din.readLong();//123L
float v3 = din.readFloat();//1.2f

Sending big file using FileInputStream/ObjectOutputStream

I need help on my homework, any help will be much appreciated. I can send small files without a problem. But when i try to send let’s say a 1GB file byte array sends OutOfMemoryError so i need a better solution to send file from server to client. How can i improve this code and send big files, please help me.
Server Code:
FileInputStream fis = new FileInputStream(file);
byte[] fileByte = new byte[fis.available()]; //This causes the problem.
bytesRead = fis.read(fileByte);
oos = new ObjectOutputStream(sock.getOutputStream());
oos.writeObject(fileByte);
Client Code:
ois = new ObjectInputStream(sock.getInputStream());
byte[] file = (byte[]) ois.readObject();
fos = new FileOutputStream(file);
fos.write(file);
Don't read the whole file into memory, use a small buffer and write while you are reading the file:
BufferedOutputStream bos = new BufferedOutputStream(sock.getOutputStream())
File file = new File("asd");
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
byte[] buffer = new byte[1024*1024*10];
int n = -1;
while((n = bis.read(buffer))!=-1) {
bos.write(buffer,0,n):
}
Use Buffered* to optimize the writing and reading from Streams
Just split the array to smaller chunks so that you don't need to allocate any big array.
For example you could split the array into 16Kb chunks, eg new byte[16384] and send them one by one. On the receiving side you would have to wait until a chunk can be fully read and then store them somewhere and start with next chunk.
But if you are not able to allocate a whole array of the size you need on server side you won't be able to store all the data that you are going to receive anyway.
You could also compress the data before sending it to save bandwidth (and time), take a look at ZipOutputStream and ZipInputStream.
Here's how I solved it:
Client Code:
bis=new BufferedInputStream(sock.getInputStream());
fos = new FileOutputStream(file);
int n;
byte[] buffer = new byte[8192];
while ((n = bis.read(buffer)) > 0){
fos.write(buffer, 0, n);}
Server Code:
bos= new BufferedOutputStream(sock.getOutputStream());
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
int n=-1;
byte[] buffer = new byte[8192];
while((n = bis.read(buffer))>-1)
bos.write(buffer,0,n);
Depending on whether or not you have to write the code yourself, there are existing libraries which solve this problem, e.g. rmiio. If you are not using RMI, just plain java serialization, you can use the DirectRemoteInputStream, which is kind of like a Serializable InputStream. (this library also has support for things like auto-magically compressing the data).
Actually, if you are only sending file data, you would be better off ditching the Object streams and use DataInput/DataOutput streams. first write an integer indicating the file length, then copy the bytes directly to the stream. on the receiving side, read the integer file length, then read exactly that many bytes.
when you copy the data between streams, use a small, fixed size byte[] to move chunks of data between the input and output streams in a loop. there are numerous examples of how to do this correctly available online (e.g. #ErikFWinter's answer).

Categories

Resources