I am using netty, and have to parse binary data in a ChannelBufferInputStream. Here's the code I am using:
ins.skipBytes(14); // skip 14 bytes header
byte[] b = new byte[195]; // note that 195 is the length of data after inflation
(new InflaterInputStream(ins)).read(b, 0, 195);
This works as expected, but it sets the mark on the ChannelBufferInputStream after 195 bytes.
Needless to say, the mark should have been set after less that 195 bytes.
Is it possible to get the no. of 'actual' bytes read from the inputstream so that I can set the mark myself? Or is there some other way to inflate a ChannelBuffer's data in netty?
Without knowing what the larger code flow looks like, it's hard to recommend a best practice, but assuming you're reading an incoming network stream, a better pattern might be to use a sequence of pipeline handlers like:
HeaderHandler --> decoder, returns null until 14 bytes are read
InflaterDecoder --> Inflates the remainder (will ZLibDecoder work ?)
AppHandler --> Receives the inflated buffer
But to answer you first question directly, ChannelBufferInputStream.readBytes() will, to quote the javadoc:
Returns the number of read bytes by this stream so far.
Related
Update 2 (newest)
Here's the situation:
A foreign application is storing zlib deflated (compressed) data in this format:
78 9C BC (...data...) 00 00 FF FF - let's call it DATA1
If I take original XML file and deflate it in Java or Tcl, I get:
78 9C BD (...data...) D8 9F 29 BB - let's call it DATA2
Definitely the last 4 bytes in DATA2 is the Adler-32 checksum, which in DATA1 is replaced with the zlib FULL-SYNC marker (why? I have no idea).
3rd byte is different by value of 1.
The (...data...) is equal between DATA1 and DATA2.
Now the most interesting part: if I update the DATA1 changing the 3rd byte from BC to BD, leave last 8 bytes untouched (so 0000FFFF) and inflating this data with new Inflater(true) (https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/zip/Inflater.html#%3Cinit%3E(boolean)), I am able to decode it correctly! (because the Inflater in this mode does not require zlib Adler-32 checksum and zlib header)
Questions:
Why does changing BC to BD work? Is it safe to do in all cases? I checked with few cases and worked each time.
Why would any application output an incorrect (?) deflate value of BC at all?
Why would the application start with a zlib header (78 9C), but not produce compliant zlib structure (FLUSH-SYNC instead of Adler-32)? It's not a small hobby application, but a widely used business app (I would say dozens of thousands of business users).
### Update 1 (old)
After further analysis it seems that I have a zlib-compressed byte array that misses the final checksum (adler32).
According to RFC 1950, the correct zlib format must end with the adler32 checksum, but for some reason a dataset that I work with has zlib bytes, that miss that checksum. It always ends with 00 00 FF FF, which in zlib format is a marker of SYNC FLUSH. For a complete zlib object, there should be adler32 afterwards, but there is none.
Still it should be possible to inflate such data, right?
As mentioned earlier (in original question below), I've tried to pass this byte array to Java inflater (I've tried with one from Tcl too), with no luck. Somehow the application that produces these bytes is able to read it correctly (as also mentioned below).
How can I decompress it?
Original question, before update:
Context
There is an application (closed source code), that connects to MS SQL Server and stores compressed XML document there in a column of image type. This application - when requested - can export the document into a regular XML file on the local disk, so I have access to both plain text XML data, as well as the compressed one, directly in the database.
The problem
I'd like to be able to decompress any value from this column using my own code connecting to the SQL Server.
The problem is that it is some kind of weird zlib format. It does start with typical for zlib header bytes (78 9C), but I'm unable to decompress it (I used method described at Java Decompress a string compressed with zlib deflate).
The whole data looks like 789CBC58DB72E238...7E526E7EFEA5E3D5FF0CFE030000FFFF (of course dots mean more bytes inside - total of 1195).
What I've tried already
What caught my attention was the ending 0000FFFF, but even if I truncate it, the decompression still fails. I actually tried to decompress it truncating all amounts of bytes from the end (in the loop, chopping last byte per iteration) - none of iterations worked either.
I also compressed the original XML file into zlib bytes to see how it looks like then and apart from the 2 zlib header bytes and then maybe 5-6 more bytes afterwards, the rest of data was different. Number of output bytes was also different (smaller), but not much (it was like ~1180 vs 1195 bytes).
The difference on the deflate side is that the foreign application is using Z_SYNC_FLUSH or Z_FULL_FLUSH to flush the provided data so far to the compressed stream. You are (correctly) using Z_FINISH to end the stream. In the first case you end up with a partial deflate stream that is not terminated and has no check value. Instead it just ends with an empty stored block, which results in the 00 00 ff ff bytes at the end. In the second case you end up with a complete deflate stream and a zlib trailer with the check value. In that case, there happens to be a single deflate block (the data must be relatively small), so the first block is the last block, and is marked as such with a 1 as the low bit of the first byte.
What you are doing is setting the last block bit on the first block. This will in general not always work, since the stream may have more than one block. In that case, some other bit in the middle of the stream would need to be set.
I'm guessing that what you are getting is part, but not all of the compressed data. There is a flush to permit transmission of the data so far, but that would normally be followed by continued compression and more such flushed packets.
(Same question as #2, with the same answer.)
I am trying to encrypt a file(txt, pdf, doc) using Google Tink - streaming AEAD encryption, below is the Java code which I am trying to execute. But all I get is 1 KB output encrypted file and no errors. All Input files whether 2 MB or more than 10 MB, output file will be always of 1 KB. I am unable to figure out what could be going wrong, can someone please help.
TinkConfig.register();
final int chunkSize = 256;
KeysetHandle keysetHandle = KeysetHandle.generateNew(
StreamingAeadKeyTemplates.AES128_CTR_HMAC_SHA256_4KB);
// 2. Get the primitive.
StreamingAead streamingAead = keysetHandle.getPrimitive(StreamingAead.class);
// 3. Use the primitive to encrypt some data and write the ciphertext to a file,
FileChannel ciphertextDestination =
new FileOutputStream("encyptedOutput.txt").getChannel();
String associatedData = "Tinks34";
WritableByteChannel encryptingChannel =
streamingAead.newEncryptingChannel(ciphertextDestination, associatedData.getBytes());
ByteBuffer buffer = ByteBuffer.allocate(chunkSize);
InputStream in = new FileInputStream("FileToEncrypt.txt");
while (in.available() > 0) {
in.read(buffer.array());
System.out.println(in);
encryptingChannel.write(buffer);
}
encryptingChannel.close();
in.close();
System.out.println("completed");
This is all about understanding ByteBuffer and how it operates. Let me explain.
in.read(buffer.array());
This writes data to the underlying array, but since array is decoupled from the state of the original buffer, the position of the buffer is not advanced. This is not good, as the next call:
encryptingChannel.write(buffer);
will now think that the position is 0. The limit hasn't changed either and is therefore still set to the capacity: 256. That means the result of the write operation is to write 256 bytes and set the position to the limit (the position).
Now the read operation still operates on the underlying byte array, and that's still 256 bytes in size. So all next read operations take place perfectly. However, all the write operations will assume that there are no bytes to be written, as the position remains at 256.
To use ByteBuffer you can use FileBuffer.read. Then you need to flip the buffer before writing the read data. Finally, after writing you need to clear the buffer's position (and limit, but that only changes on the last read) to prepare the buffer for the next read operation. So the order is commonly read, flip, write, clear for instances of Buffer.
Don't mix Channels and I/O streams, it will makes your life unnecessarily complicated, and learning how to use ByteBuffer is hard enough all by itself.
I have a ByteBuffer object called msg with the intended message length in the first four bytes, which I read as follows:
int msgLen = msg.getInt();
LOG.debug("Message size: " + msgLen);
If the msgLen is less than some threshold value, I have a partial message and need to cache. In this case, I'd like to put those first four bytes back into the beginning of the message; that is, put the message back together to be identical to pre-reading. For example:
if (msgLen < threshold) {
msg.rewind();
msg.put(msgLen);
Unfortunately, this does not seem to be the correct way to do this. I've tried many combinations of flip, put, and rewind, but must be misunderstanding.
How would I put the bytes back into the write buffer in their original order?
Answer was posted by Andremoniy in comments section. Read operations do not consume bytes in the buffer, so msg.rewind() was adequate. This didn't work in my case because of some other logic in the program, and I incorrectly associated that with a problem at the buffer level.
I have a multi-threaded client-server application that uses Vector<String> as a queue of messages to send.
I need, however, to send a file using this application. In C++ I would not really worry, but in Java I'm a little confused when converting anything to string.
Java has 2 byte characters. When you see Java string in HEX, it's usually like:
00XX 00XX 00XX 00XX
Unless some Unicode characters are present.
Java also uses Big endian.
These facts make me unsure, whether - and eventually how - to add the file into the queue. Preferred format of the file would be:
-- Headers --
2 bytes Size of the block (excluding header, which means first four bytes)
2 bytes Data type (text message/file)
-- End of headers --
2 bytes Internal file ID (to avoid referring by filenames)
2 bytes Length of filename
X bytes Filename
X bytes Data
You can see I'm already using 2 bytes for all numbers to avoid some horrible operations required when getting 2 numbers out of one char.
But I have really no idea how to add the file data correctly. For numbers, I assume this would do:
StringBuilder packetData = new StringBuilder();
packetData.append((char) packetSize);
packetData.append((char) PacketType.BINARY.ordinal()); //Just convert enum constant to number
But file is really a problem. If I have also described anything wrongly regarding the Java data types please correct me - I'm a beginner.
Does it have to send only Strings? I think if it does then you really need to encode it using base64 or similar. The best approach overall would probably be to send it as raw bytes. Depending on how difficult it would be to refactor your code to support byte arrays instead of just Strings, that may be worth doing.
To answer your String question I just saw pop up in the comments, there's a getBytes method on a String.
For the socket question, see:
Java sending and receiving file (byte[]) over sockets
We have a Java code talking to external system over TCP connections with xml messages encoded in UTF-8.
The message received begin with '?'. SO the XML received is
?<begin>message</begin>
There is a real doubt if the first character is indeed '?'. At the moment, we cannot ask the external system if/what.
The code snippet for reading the stream is as below.
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, Charset.forName("UTF-8")));
int readByte = reader.read();
if (readByte <= 0) {
inputStream.close();
}
builder.append((char) readByte);
We are currently trying to log the raw bytes int readByte = inputStream.read(). The logs will take few days to be received.
In the mean time, I was wondering how we could ascertain at our end if it was truly a '?' and not a decoding issue?
I suspect strongly you have a byte-order-mark at the beginning of your doc. That won't render as a valid character, and consequently could appear as a question mark. Can you dump the raw bytes out and check for that sequence ?
Your question seems to boil down to this:
Can we ascertain the real value of the first byte of the message without actually looking at it.
The answer is "No, you can't". (Obviously!)
...
However, if you could intercept the TCP/IP traffic from the external system with a packet sniffer (aka traffic monitoring tool), then dumping the first byte or bytes of the message would be simple ... requiring no code changes.
Is logging the int returned by inputStream.read() the correct way to to analyse the bytes received. Or does the word length of the OS or other environment variables come into picture.
The InputStream.read() method returns either a single (unsigned) byte of data (in the range 0 to 255 inclusive) or -1 to indicate "end of stream". It is not sensitive to the "word length" or anything else.
In short, provided you treat the results appropriately, calling read() should give you the data you need to see what the bytes in the stream really are.