When Jackson maps a JSON input to a DTO it automatically decodes base64. I want to disable this decoding for one particular field (which is a byte array), because I transfer it through as-is to an other service through a REST API and the decoding causes an increase in memory usage due to intermediate structures (encoded byte array, decoded byte array, another encoded byte array to send to the other service).
Is there any elegant way to achieve this? I debugged a bit to see what the internal code is like and found the class Base64Variant which unfortunately is final so I cannot override its behaviour. I suppose I can go and copy-paste parts of the internal logic of JsonParser to read the inputstream (minus the base64 decoding) from within a custom deserializer but I first wanted to ask here if anyone has a better solution.
This is the same question as this one but for deserialization instead.
Related
I need to parse untrusted Java serialized objects. The data is given to me as a byte array (written at some point by ObjectOutputStream).
I do not want to simply call ObjectInputStream.readObject() and/or load the actual object. I am looking for a way to safely parse the bytes and grab field names & values.
--
Here's a little summary of my attempt so far, after taking a look at the ObjectInputStream procedure for deserializing objects.
I have tried to extract field types/names (as unicode strings) recursively based on expected stream constants. I end up with a list of field names whose values should appear in the byte array in order. I am uneasy about this approach because it is probably buggy. Especially accommodating for what seems to be individual serialization protocols followed by HashMap, ArrayList, etc. But it might work, if I can figure out a way to read the bytes that represent field values:
I can try to read and store primitives based on size/offset, but when I encounter my first object, it gets a bit more complicated -- there is no clear way to distinguish between which bytes are associated with which values anymore (without actually loading the object in the way that ObjectInputStream probably does?).
--
Can anyone suggest either a potential solution that I'm obviously looking past, or a trusted library that can help parse the serialized data without loading objects?
Thank you for reading, and for all comments/suggestions!!! I apologize if something is unclear and I would be happy to clarify if you bear with me.
You can't do this in principle. Any Java class can take over its own Serialization and write arbitrary data to the stream that only it knows how to parse and reconstruct, via code that is only invoked during deserialization.
How would one go about making a byte[] that is a packet that needs to be run through an RC4 encryption class and then sent to a server?
So let's say I need the packet to stsrt with a string, followed with an int, byte, int, string. How would I create that as a byte array? (Byte[])
Thanks!
OK, so you have a data structure containing strings, integers and bytes, that you want to serialize to a byte array. There are several options:
create a Serializable class containing all this information as fields, and use an ObjectOutputStream to write it. Beware: the result will only be easily readable by a Java program using the exact same class.
create a class containing all this information as fields, and use a JSON object mapper (like Jackson) to write it.
create a class containing all this information as fields, and use an XML object mapper (like JAXB) to write it.
design a binary representation of this structure, that can be transformed back into the individual parts, and use a DataOutputStream to write it.
Use protocol buffers
...
This is my first question to StackOverflow. Please let me know if the question is not clear and need any more details.
I have a class which has three attributes like this:
class SampleClass {
long [] field1;
float[] field2;
float[] field3;
}
A huge SampleClass object is built(with about a billion entries for each array). This object is serialized in one host and the serialized file is uploaded to another machine. Now I want to deserialize only a portion of the file so that I can get a smaller SampleClass object with about 10 indices filled for each field and not the complete object. Because this machine does not have enough capacity to load such a huge object in memory. Is this possible?
The object is serialized using JAVA's writeObject method and it is done by a different utility and so I have no control over it. Thanks in advance.
Forget using the Java serialization API - it's only designed to deserialize everything. If you have no control over how the serialized file is generated, then you should consider parsing the serialized file yourself and extracting the necessary parts - it's not really that hard.
The Java serialization format is well-documented (see e.g. official docs, informative article), and tools exist to parse the format (e.g. Serialysis, jdeserialize) though it isn't particularly hard to write your own tool based on the format spec.
Once you can parse the serialized data, you can simply extract what you need and skip over what you don't need.
Your best bet is to actually serialize only the portion you need, given that you cannot control/override serialization itself. On the machine which serialized entire file and is able to deserialize it:
1) load entire file into object
2) create new object of SampleClass
3) copy elements from required region in each array to blank SampleClass object
4) serialize this smaller version
If it helps any, fields can be made transient so they will not be serialized.
Still, it looks to me that this object should be in database:
It does not fit virtual memory
only portion of it is required at given time.
So you could use hard disk to store it and queries to get required portions.
What are methods to convert data (ints, strings) to bytes in Java? I am looking for methods other than using the Serializable class. I researched and found things like ByteOutputStream.
Can I just parse strings and ints to a byte data type?
Any suggestions?
Have a look at DataInputStream and DataOutputStream, they convert all Java data types to bytes and read/write to an underlying Input/OutputStream.
If you need to read or write ints, longs etc.. to a file, then these are the classes for you.
If instead you are just interested in how to convert then to bytes for other purposes, have a look at the source code of those classes, they convert to big-endian.
Classes supporting the DataOutput interface will do what you want. Use DataInput to read the stream back to data.
The standard encoding method used by Java when serializing is, just as your own thoughts, a simple translation of the fields into a byte stream.
Primitives as well as non-transient, non-static referenced objects are encoded into the stream. Each object that is referenced by the serialized object and must also be serialized.
Other languages, such as PHP for example, serializes to a pretty much human readable format and some implementations serialize to JSON or XML.
In my own mind though, true serialization should be binary byte-per-byte representation of the data. That way it's possible to quickly read all the data up into memory again and it can be executed as is.
In different languages I need to provide users with a stream of JSON objects with an interface similar to the following:
JSONObject json = stream.nextJSON();
Since it is a stream, each call will block until a full object has been retrieved. This means it makes no sense to try and encapsulate each JSON object inside a big array. An extra layer of structure and processing has to be added to the stream.
I have thought of two options:
Segmenting the stream with the null-termination character.
Writing a primitive parser that understand JSON scope so can detect the end of an object.
Each of the above have a number of potential issues to discuss: How will null-termination interact with the file system, socket or underlying streams in C++, Java and other languages? What edge cases would we need to take in to account when parsing? (different types of quote symbol might confuse a parser, for example). Furthermore, there might be alternatives to the two above.
So the question is: What is the best way to provide a JSON InputStream?
Well Google already thought about it apparently:
http://sites.google.com/site/gson/streaming