Up to now, I've been using Protocol Buffers to serialize and deserialize objects using the code-generated classes.
Now I am attempting to serialize and deserialize a single 64-bit integer. The problem is, I'm getting different results in Java and C#.
Here's how I'm doing it in Java....
private static byte[] convertLongToByteArray(long value) throws IOException {
int size = CodedOutputStream.computeInt64SizeNoTag(value);
byte[] buffer = new byte[size];
CodedOutputStream codedOutputStream = CodedOutputStream.newInstance(buffer);
codedOutputStream.writeInt64NoTag(value);
codedOutputStream.flush();
codedOutputStream.checkNoSpaceLeft();
return buffer;
}
And here's how I'm doing this in C#, using protobuf.net:
public void SerializeLongValue()
{
long n = 9876;
byte[] memoryBuffer = null;
using (MemoryStream destination = new MemoryStream())
{
ProtoBuf.Serializer.Serialize(destination, n);
destination.Flush();
memoryBuffer = destination.ToArray();
}
using (MemoryStream source = new MemoryStream(memoryBuffer))
{
long result = ProtoBuf.Serializer.Deserialize<long>(source);
Assert.AreEqual(n, result);
}
}
The java code converted the number 9876 to [0x94, 0x4D]
The C# code converted the number 9876 to [0x08, 0x94, 0x4D]
How do I do this so that both com.google.protobuf and protobuf.net produce identical outputs?
The protobuf.net method ProtoBuf.Serializer.Serialize forces a field header (field number=1) into the stream. That's the only way you can perform a serialization; this method invokes a number of internal methods which are not publicly available.
The solution I'm using is to change the Java code to also include a field header.
Here is my new Java code.
private static byte[] convertLongToByteArray(long value) throws IOException {
int size = CodedOutputStream.computeTagSize(1) + CodedOutputStream.computeInt64SizeNoTag(value);
byte[] buffer = new byte[size];
CodedOutputStream codedOutputStream = CodedOutputStream.newInstance(buffer);
codedOutputStream.writeInt64(1, value);
codedOutputStream.flush();
codedOutputStream.checkNoSpaceLeft();
return buffer;
}
public static long convertByteArrayToLong(byte[] byteArray) throws IOException {
CodedInputStream codedInputStream = CodedInputStream.newInstance(byteArray);
codedInputStream.readTag();
return codedInputStream.readInt64();
}
The changes I've made are:
When computing the required buffer size, include a tag size
Instead of CodedOutputStream.WriteInt64NoTag, call CodedOutputStream.WriteInt64
When reading back, call CodedOutputStream.ReadTag before calling CodedOutputStream.ReadInt64
Related
I am using Zstd compression in Java for compressing a large JSON payload. I am using methods from the zstd-jni library for Java. I create a byte array out of the JSON string and use this method.
public static byte[] compress(byte[] var0, int var1)
I read that ZSTD will give more optimal results when a dictionary is passed during compression and decompression. How do I create a ZstdDictCompress object? What byte array and integer should I pass to the constructor?
public static long compress(byte[] var0, byte[] var1, ZstdDictCompress var2)
This example is for https://github.com/luben/zstd-jni.
First of all you need to get many samples of your jsons. You shouldn't use just one or couple samples. After that you can train your dictionary:
List<String> jsons = ...; // List of your jsons samples
ZstdDictTrainer trainer = new ZstdDictTrainer(1024 * 1024, 16 * 1024); // 16 KB dictionary
for(String json : jsons) {
trainer.addSample(json.getBytes(StandardCharsets.UTF_8));
}
byte[] dictionary = trainer.trainSamples();
Now you have you dictionary in byte array.
Next step is using SAME dictionary to compress and decompress.
// Compress
byte[] json = jsonString.getBytes(StandardCharsets.UTF_8);
ZstdDictCompress zstdDictCompress = new ZstdDictCompress(dictionary, Zstd.defaultCompressionLevel());
byte[] compressed = Zstd.compress(json, zstdDictCompress);
// Tricky moment, you have to pass json full length to decompress method
int jsonFullLength = json.length;
// Decompress
ZstdDictDecompress zstdDictDecompress = new ZstdDictDecompress(dictionary);
byte[] decompressed = Zstd.decompress(compressed, zstdDictDecompress, jsonFullLength);
String jsonString2 = new String(decompressed, StandardCharsets.UTF_8);
That's all!
I get the data in the form of byte buffer of 32KB, and want to calculate the checksum of the whole data. So using the MessageDigest I keep updating the bytes into it and at the end I use the digest method to calculate the bytes read and calculating the checksum out of it. Checksum calculated is wrong by the above method. Below is the code. Any idea how to get it right?
private MessageDigest messageDigest;
//Keep getting bytebuffer of 32kb till eof is read
public int write(ByteBuffer src) throws IOException {
try {
ByteBuffer copiedByteBUffer = src.duplicate();
try{
messageDigest = MessageDigest.getInstance(MD5_CHECKSUM);
while(copiedByteBUffer.hasRemaining()){
messageDigest.update(copiedByteBUffer.get());
}
}catch(Exception e){
throw new IOException(e);
}
copiedByteBUffer = null;
}catch(Exception e){
}
}
//called after whole file is read in write function
public void calculateDigest(){
if(messageDigest != null){
byte[] digest = messageDigest.digest();
checkSumMultiPartFile = toHex(digest); // converting bytes into hexadecimal
}
}
Updated try #2
//Will Keep getting bytebuffer of 32kb till eof is read
public int write(ByteBuffer original) throws IOException {
try {
ByteBuffer copiedByteBuffer = cloneByteBuffer(original);
messageDigest = MessageDigest.getInstance(MD5_CHECKSUM);
messageDigest.update(copiedByteBuffer);
copiedByteBUffer = null;
}catch(Exception e){
}
}
public static ByteBuffer cloneByteBuffer(ByteBuffer original) {
final ByteBuffer clone = (original.isDirect()) ? ByteBuffer.allocateDirect(original.capacity()):ByteBuffer.allocate(original.capacity());
final ByteBuffer readOnlyCopy = original.asReadOnlyBuffer();
readOnlyCopy.flip();
clone.put(readOnlyCopy);
clone.position(original.position());
clone.limit(original.limit());
clone.order(original.order());
return clone;
}
After trying the above code i was able to see that the message digest was getting updated with all the bytes read for example: if the file size is 52,42,892 bytes then it was updated with 52,42,892 bytes. But when the checksum of file calculated using certutil -hashfile MD5 using CMD and the one calculated using the above method does not match.
I've been googling for almost two days and i still can't figure it out. I have this exercise when it passes InpuStream is as an argument and expects me to store whatever is passed and return the count, but I don't know and can't seem to figure how to handle InputStream properly. I always get argument error.
Code:
class Subtitles {
int redenBroj;
int vrPocetok;
int vrKraj;
String text;
public Subtitles() {
redenBroj = 0;
vrPocetok = 0;
vrKraj = 0;
text = null;
}
int loadSubtitles(InputStream is) {
}
}
InputStream is an abstract class. Therefore, the implementation of the method int loadSubtitles should not care on how the given InputStream is implemented - it can be anything, as long as it is a type of InputStream.
You can choose from different subclasses of InputStream so that you can test your method with your own data format:
FileInputStream -- You can use this type of input stream if you want to stream a file:
File sourceFile = new File("source.txt");
InputStream inputStream = new FileInputStream(sourceFile)
ByteArrayInputStream -- This is used to stream an array of bytes.
byte[] input = "this is an example array".getBytes();
InputStream inputStream = new ByteArrayInputStream(input);
Now that you have built an input stream, you can now use them regardless on how it is built:
// Java 9+
byte[] content = inputStream.readAllBytes();
// do something with `content`
-
// before Java 9
int data = inputStream.read();
while (data != -1) {
// doSomething with `data`
data = inputStream.read(); // read next data
}
inputStream.close(); // or use the try-with-resources syntax
I'm using the DataOutputStream#WriteLong method in the java programming language to write a long to a stream, and I need to be able to read it from C# using the BinaryReader class from C# to try to read the data, the BinaryReader is connected to a NetworkStreamthat uses the TcpClient socket.
The java DataInputStream#ReadLong method is used to read the long value sent from the DataOutputStream in Java, however I'm trying to use the BinaryReader class to read this value.
Here's the method I have to read a long variable in C#
public static long ReadLong()
{
return binaryReader.ReadInt64();
}
However this is causing inconsistency, For example, I sent two longs through Java:
-8328681194717166436 || -5321661121193135183
and when I read them on C# I received the following results:
-7186504045004821876||-5642088012899080778
I can reproduce this as many times as I fun the application.
As you can read in the java documentation, WriteLong writes output "high bytes first", this is also known as Big Endian. Meanwhile, .NET BinaryReader reads data as Little Endian. We need something that reverses the bytes:
public class BigEndianBinaryReader : BinaryReader
{
private byte[] a16 = new byte[2];
private byte[] a32 = new byte[4];
private byte[] a64 = new byte[8];
public BigEndianBinaryReader(Stream stream) : base(stream) { }
public override int ReadInt32()
{
a32 = base.ReadBytes(4);
Array.Reverse(a32);
return BitConverter.ToInt32(a32, 0);
}
public Int16 ReadInt16BigEndian()
{
a16 = base.ReadBytes(2);
Array.Reverse(a16);
return BitConverter.ToInt16(a16, 0);
}
public Int64 ReadInt64BigEndian()
{
a64 = base.ReadBytes(8);
Array.Reverse(a64);
return BitConverter.ToInt64(a64, 0);
}
public UInt32 ReadUInt32BigEndian()
{
a32 = base.ReadBytes(4);
Array.Reverse(a32);
return BitConverter.ToUInt32(a32, 0);
}
}
I need some consulting here.
I'm writing a client/server software.
One requirement is to use an XML file to transfer information between the client and the server. This decision was made to allow clients to be created on any language. I know I could use json for simpler communication, but I needed a better Object to Text Mapping, so I ended up with XML. So, to diminish the XMl size problem, I decided to compress it before sending over socket. On top of it, I want to encrypt the data so it will be harder to be broken by some interceptor. I have read a lot about encrypting and encryption types, but for testing purposes I'll just use Base64 and generic compression just to join up all the technologies, than I'll specialize the specifics. I found a lot of code and good advice here on stack exchange, and now I'm having some trouble.
What I have:
In memory Objects;
Static methods to convert Objects to XML in memory;
No use for Serialization because the clients can be written on C, .net, perl, etc;
Avoid disk write to reduce disk I/O, keeping everything in-memory;
The need to work with punctuation chars (UTF-8);
The need to work with stronger encrypt on the future;
The process I idealized is:
Convert object to XML - OK
Establish Client/server communication - OK
Encode XML String to Base64 - OK (no compression)
3.1. Compress XML String and than encode to Base64 - OK
Transfer between client/server - OK
Decode Base64 String to XML String - OK
5.1. Uncompress transferred byte array and decode Base64 - NOT OK
So, I can't find where I'm getting wrong on the "Compress String -> Encode it to Base64 -> transmit -> decode Base64 to String -> uncompress it" process... Here is the snippet from my code "inspired" on stack overflow:
I'm using this answer (with .net compatibility) as a reference to compress/uncompress:
How can I Zip and Unzip a string using GZIPOutputStream that is compatible with .Net?
public class Compressor2 {
public static byte[] compress(String string) throws IOException {
byte[] blockcopy = ByteBuffer
.allocate(4)
.order(java.nio.ByteOrder.LITTLE_ENDIAN)
.putInt(string.length())
.array();
ByteArrayOutputStream os = new ByteArrayOutputStream(string.length());
GZIPOutputStream gos = new GZIPOutputStream(os);
gos.write(string.getBytes());
gos.close();
os.close();
byte[] compressed = new byte[4 + os.toByteArray().length];
System.arraycopy(blockcopy, 0, compressed, 0, 4);
System.arraycopy(os.toByteArray(), 0, compressed, 4, os.toByteArray().length);
compressed = MyBase64.encode(new String(compressed)).getBytes();
return compressed;
}
public static String decompress(byte[] compressed) throws IOException {
compressed = MyBase64.decode(new String(compressed)).getBytes();
System.out.println(compressed);
System.out.println(new String(compressed));
final int BUFFER_SIZE = 32;
//ByteArrayInputStream is = new ByteArrayInputStream(compressed, 4, compressed.length - 4);
ByteArrayInputStream is = new ByteArrayInputStream(compressed);
GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
StringBuilder string = new StringBuilder();
byte[] data = new byte[BUFFER_SIZE];
int bytesRead;
while ((bytesRead = gis.read(data)) != -1) {
string.append(new String(data, 0, bytesRead));
}
gis.close();
is.close();
return string.toString();
}
from MyBase 64:
public static String encode(String text) throws UnsupportedEncodingException {
byte[] encodedBytes = Base64.encodeBase64(text.getBytes());
return new String(encodedBytes, "UTF-8");
}
public static String decode(String text) throws UnsupportedEncodingException {
byte[] decodedBytes = Base64.decodeBase64(text);
return new String(decodedBytes, "UTF-8");
}
and the test case:
/**
* Test of compress method, of class Compressor2.
*/
#Test
public void testCompress() throws Exception {
System.out.println("compress");
String string = "Hello all, how can I transfer this with punctuation? like á é í ó ú";
byte[] expResult = Compressor2.compress(string);
byte[] result = Compressor2.compress(string);
System.out.write(result);
System.out.print("\n");
System.out.println(new String(result));
assertArrayEquals(expResult, result);
for (byte character : result) {
System.out.print(String.valueOf(character));
}
System.out.print("\n");
for (byte character : result) {
System.out.print((character));
}
System.out.print("\n");
}
/**
* Test of decompress method, of class Compressor2.
*/
#Test
public void testDecompress() throws Exception {
System.out.println("decompress");
String expResult = "Hello all, how can I transfer this with punctuation? like á é í ó ú";
//String expResult = "The lazy dog ate an apple";
byte[] compressed = Compressor2.compress(expResult);
System.out.println(expResult);
System.out.write(compressed);
System.out.print("\n");
System.out.println("going to decompress");
String result = Compressor2.decompress(compressed);
System.out.println(result);
assertEquals(expResult, result);
// TODO punctuation errors?
}
System Out for checking the values:
compress
QwAAAB/vv70IAAAAAAAAAAXvv73vv70N77+9MAwE77+9VX4Adu+/ve+/vTHvv73vv70oFu+/vRMl77+9Mg8lBRUd77+9F++/vW5j77+9AlJd77+977+9RCLvv70Ob2Tvv73vv70G77+977+9Me+/vTPvv73vv71c77+977+9UDkZcSFuxIN4Ee+/vQ/nn5ENSAAAAA==
QwAAAB/vv70IAAAAAAAAAAXvv73vv70N77+9MAwE77+9VX4Adu+/ve+/vTHvv73vv70oFu+/vRMl77+9Mg8lBRUd77+9F++/vW5j77+9AlJd77+977+9RCLvv70Ob2Tvv73vv70G77+977+9Me+/vTPvv73vv71c77+977+9UDkZcSFuxIN4Ee+/vQ/nn5ENSAAAAA==
811196565656647118118554873656565656565656565658811811855511181185548785555435777651196955554357868852651001174347118101434711884721181185551118118554811170117434711882771085555435777103561086682851005555435770434347118875310655554357651087410055554357555543578267761181185548799850841181185551118118554871555543575555435777101434711884801181185551118118554999555543575555435785681079099837011712073785269101434711881471101105369788365656565656161
811196565656647118118554873656565656565656565658811811855511181185548785555435777651196955554357868852651001174347118101434711884721181185551118118554811170117434711882771085555435777103561086682851005555435770434347118875310655554357651087410055554357555543578267761181185548799850841181185551118118554871555543575555435777101434711884801181185551118118554999555543575555435785681079099837011712073785269101434711881471101105369788365656565656161
decompress
Hello all, how can I transfer this with punctuation? like á é í ó ú
QwAAAB/vv70IAAAAAAAAAAXvv73vv70N77+9MAwE77+9VX4Adu+/ve+/vTHvv73vv70oFu+/vRMl77+9Mg8lBRUd77+9F++/vW5j77+9AlJd77+977+9RCLvv70Ob2Tvv73vv70G77+977+9Me+/vTPvv73vv71c77+977+9UDkZcSFuxIN4Ee+/vQ/nn5ENSAAAAA==
going to decompress
[B#19f9bdc4
C
Sorry for the long post, I'm trying to give you something to work on to check where I'm getting this wrong. I Appreciate any help...
Edit.: Forgot to say, if you take the "MyBase64.decode or encode" line from the Compressor2 Classe, it works like a charm...