I am currently in need to serialize arbitrary Java objects since I would like to use the Hash as a key for a hash table. After I read various warnings that the default hashCode creates collisions way to often, I wanted to switch to hashing via MessageDigest to use alternative algorithms (e.g. SHA1, ...) that are said to allow more entries without collisions. [As a sidenote: I am aware that even here collisions can occur early on, yet I want to increase the likelihood to remain collision free.]
To achieve this I tried a method proposed in this StackOverflow post. It uses the following code to obtain a byte[] necessary for MessageDigest:
public static byte[] convertToHashableByteArray(Object obj) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutput out = null;
byte[] byteOutput = null;
try {
out = new ObjectOutputStream(bos);
out.writeObject(obj);
byteOutput = bos.toByteArray();
} catch (IOException io) {
io.printStackTrace();
} finally {
try {
if(out != null) { out.close(); }
} catch(IOException io) {
io.printStackTrace();
}
try {
bos.close();
} catch(IOException io) {
io.printStackTrace();
}
}
return byteOutput;
}
This, however, causes the problem that only objects implementing the serializable interface will be serialized/converted into a byte[]. To circumvent this issue I applied toString() to the given obj in the catch clause to enforce getting a byte[] in all cases:
public static byte[] convertToHashableByteArray(Object obj) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutput out = null;
byte[] byteOutput = null;
try {
out = new ObjectOutputStream(bos);
out.writeObject(obj);
byteOutput = bos.toByteArray();
} catch (IOException io) {
String stringed = obj.toString();
byteOutput = stringed.getBytes();
} finally {
try {
if(out != null) { out.close(); }
} catch(IOException io) {
io.printStackTrace();
}
try {
bos.close();
} catch(IOException io) {
io.printStackTrace();
}
}
return byteOutput;
}
However, this still feels utterly wrong for me. So my question is, whether there is a better alternative to convert arbitrary objects to byte[] to be able to compute hashes. Preferably a solution that works without using additional libraries or one using well established ones like Apache Commons.
(Beside that I am also open for other approaches to obtain SHA1/SHA512 hashes of arbitrary Java objects.)
Perhaps you can use UUIDs for your objects as immutable unique identifiers?
There are so many things wrong here...
You should have proper key classes with equals and hashCode implemented, instead of using random objects.
Serialization performance overhead can easily mean that such map will be slower than even trivial iteration search.
Default hashcode should not be used in most cases, as it might be different for objects which are 'equal' from business point of view. You should reimplement hashcode together with equals (which comes back to point 1). Whenever it has collisions due to pointer aliasing is irrelevant if it won't work poperly
Way overcomplicated method of closing in-memory streams. Just close them one after another, it is not external resource - if it fails, just let it fail, you don't need to close everything 100% in case of failures. You can also use one of closeable utilities (or try/catch with resources) to avoid some overhead
You don't need complicated digest of that byte array - use Arrays.hashCode, it WILL be good enough for your use cases (remember - don't do it anyway, point 1)
If you are still reading and still not willing to implement point 1, go back to point 1. And again. And again.
And to finally answer you question, use hessian serialization.
http://hessian.caucho.com/doc/hessian-overview.xtp
It is very similar to java one, just faster, shorter output and allows serializing objects which do not implement Serializable interface (at the risk of messing things up, you need to set special flag to allow that).
If you want to serialize a given object, i suggest you change your méthod like this :
public static byte[] convertToHashableByteArray(Serializable obj){
..........
..........
}
Related
Suppose we have some binary data byte[] data that only contains Integers. If I wanted to read this data utilizing a DataInputStream, the only approach I can come up with is the following:
DataInputStream in = new DataInputStream(new ByteArrayInputStream(data));
try {
while (true){
int i = in.readInt();
}
} catch (EOFException e) {
// we're done!
} catch (IOException e){
throw new RuntimeException(e);
}
What bugs me about this is that reaching the end of the stream is expected and it would exceptional only if no exception was thrown, what IMO defeats the purpose of exceptions in the first place.
When using Java NIO's IntBuffer, there's no such problem.
IntBuffer in = ByteBuffer.wrap(data).asIntBuffer();
while (in.hasRemaining()){
int i = in.get();
}
Coming from C# and being in the process of learning Java I refuse to believe that this is the intended way of doing this.
Moreover, I just came across Java NIO which seems to be "quite new". Using IntBuffer here instead would be my way of procrastinating the matter. Regardless, I wanna know how this is properly done in Java.
You can't. readInt() can return any integer value, so an out-of-band mechanism is required to signal end of stream, so an exception is thrown. That's how the API was designed. Nothing you can do about it.
Since you are coming from .NET, Java's DataInputStream is roughly equivalent to BinaryReader of .NET.
Just like its .NET equivalent, DataInputStream class and its main interface, DataInput, have no provision for determining if a primitive of any given type is available for retrieval at the current position of the stream.
You can gain valuable insight of how the designers of the API expect you to use it by looking at designer's own usage of the API.
For example, look at ObjectInputStream.java source, which is used for object deserialization. The code that reads arrays of various types calls type-specific readXYZ methods of DataInput in a loop. In order to figure out where the primitives end, the code retrieves the number of items (line 1642):
private Object readArray(boolean unshared) throws IOException {
if (bin.readByte() != TC_ARRAY) {
throw new InternalError();
}
ObjectStreamClass desc = readClassDesc(false);
int len = bin.readInt();
...
if (ccl == Integer.TYPE) {
bin.readInts((int[]) array, 0, len);
...
}
...
}
Above, bin is a BlockDataInputStream, which is another implementation of DataInput interface. Note how len, the number of items in the array stored by array serialization counterpart, is passed to readInts, which calls readInt in a loop len times (line 2918).
Context
I am making a simulation of behavioral evolution. For this I am using neural networks to simulate behavior. There are literally thousands of these neural networks interacting with each other. At the end of any given generation the strongest behavior is copied onto its neighbors. When this happens I need to create an identical yet independent version of the stronger neural network onto the new one; thus replacing it.
The Problem
I have looked into deep cloning, it works but it is copying node references, not creating a new identical instance. The difficulty comes from the system structure. I can't see how to ameliorate it though.
Speed is a factor here as well. I need this to run millions of iterations a week.
Any help would be greatly appreciated.
System structure
The cell
public class Cell_NN Extends Cell
{
private Network network;
//Methods
}
The network
public class Network implements Cloneable, Serializable
{
private ArrayList<ArrayList<Node>> net;
private ArrayList<Node> layer;
//Methods
}
The nodes
public class Node implements Cloneable, Serializable
{
private ArrayList<Node> nextNodes;
private ArrayList<Float> weights;
//Methods
}
The deep clone (which I ripped off someone from stackOverflow)
public Network deepClone()
{
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(this);
ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
ObjectInputStream ois = new ObjectInputStream(bais);
Network network = (Network) ois.readObject();
return (Network) ois.readObject();
} catch (IOException e) {
return null;
} catch (ClassNotFoundException e) {
return null;
}
}
I am unsure whether this is allowed but here is my github in case you would like more information: Napier40124399. Project is called HonorsMain_v2 and is public.
I think your code contains just a small bug:
Network network = (Network) ois.readObject();
return (Network) ois.readObject();
Here you read the network twice from the ObjectInputStream, and return the second instance (which should be null). If you remove the first line, it should work.
The object is deep-cloned, and all references in the new Network instance are correctly wired to each other, not the original objects.
However, performance-wise this serialization / deserialization is pretty expensive. I recommend finding a structure which is faster to process if performance is of the essence.
For instance, two flat arrays for each network, one with the weights (as primitive floats or even ints) and one with int indices to the next nodes would be at least an order of magnitude faster to copy, using System.arraycopy().
This part of the deep clone code looks odd to me:
Network network = (Network) ois.readObject();
return (Network) ois.readObject();
Reading the object twice, and only returning the 2nd one... I would expect an exception here.
Motivation:
To aid in remote debugging (Java), it's useful to be able to request remote servers to send over arbitrary objects to my local machine for inspection. However, this means that the remote server must be able to serialize an arbitrary java object that is not known in advance at runtime.
In particular, I would like to be able to serialize even those objects which don't implement Serializable. I stumbled upon JBossSerialization which claimed with that with JBossSerialization...
...You can serialize classes that are not implementing Serializable
Great! And even better, I managed to find the code that supposedly demonstrates how to do this.
Problem
So pinching the code from schabell.org, I wrote a quick test to check that I could serialize and deserialize without problems:
import org.jboss.serial.io.JBossObjectInputStream;
import org.jboss.serial.io.JBossObjectOutputStream;
import java.io.*;
class MyObj { // Test class which doesn't implement Serializable
public int x;
MyObj(int x) {this.x = x;}
}
public class SerializationTest {
public static void main(String[] args) {
MyObj obj = new MyObj(1);
byte[] byteArray = getByteArrayFromObject(obj); // Try to serialize
MyObj result = (MyObj) getObjectFromByteArray(byteArray); // Try to deserialize
System.out.println(result.x);
}
// Code that I pinched from website below (http://www.schabell.org/2009/03/jboss-serialization-simple-example.html):
public static Object getObjectFromByteArray(byte[] bytes) {
Object result = null;
try {
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
ObjectInputStream ois = new JBossObjectInputStream(bais);
result = ois.readObject(); // ERROR HERE!!!
ois.close();
} catch (IOException ioEx) {
ioEx.printStackTrace();
} catch (ClassNotFoundException cnfEx) {
cnfEx.printStackTrace();
}
return result;
}
public static byte[] getByteArrayFromObject(Object obj) {
byte[] result = null;
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new JBossObjectOutputStream(baos);
oos.writeObject(obj);
oos.flush();
oos.close();
baos.close();
result = baos.toByteArray();
} catch (IOException ioEx) {
ioEx.printStackTrace();
}
return result;
}
}
Problem is that the test failed. Debugging indicated that I could only serialize, but not deserialize. The call to ois.readObject() on line 26 is the culprit and gives as SerializationException:
org.jboss.serial.exception.SerializationException: Could not create instance of MyObj - MyObj
at org.jboss.serial.classmetamodel.ClassMetaData.newInstance(ClassMetaData.java:342)
at org.jboss.serial.persister.RegularObjectPersister.readData(RegularObjectPersister.java:239)
at org.jboss.serial.objectmetamodel.ObjectDescriptorFactory.readObjectDescriptionFromStreaming(ObjectDescriptorFactory.java:412)
at org.jboss.serial.objectmetamodel.ObjectDescriptorFactory.objectFromDescription(ObjectDescriptorFactory.java:82)
at org.jboss.serial.objectmetamodel.DataContainer$DataContainerDirectInput.readObject(DataContainer.java:643)
at org.jboss.serial.io.JBossObjectInputStream.readObjectOverride(JBossObjectInputStream.java:163)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:364)
at SerializationTest.getObjectFromByteArray(SerializationTest.java:44)
at SerializationTest.main(SerializationTest.java:15)
Caused by: java.lang.InstantiationException: MyObj
at java.lang.Class.newInstance(Class.java:359)
at org.jboss.serial.classmetamodel.ClassMetaData.newInstance(ClassMetaData.java:334)
... 8 more
Does anyone know what's going wrong here and how I can get round this?
Or indeed if JBossSerialization isn't the right tool for this, what is?
Edit:
As #Dima points out, the SerializationException is caused by a lack of a public default constructor of the MyObj class. However, adding a default constructor to MyObj isn't an option as I'd like to be able to serialize arbitrary objects, including those without a default constructor.
Well, it is actually impossible to do what you want in a way, that would be both safe and universal.
You can take a look at Kryo, as someone suggested in comments as well. It does have a way to instantiate objects without invoking a constructor, but it is off by default and there is a good reason for it.
Consider this for example:
public class CanonicalObject {
public static HashMap<String,CannicalObject> canons = new HahMap<~>();
public String name;
private CanonicalObject(String name) {
this.name = name;
canons.put(name, this);
}
public static synchronized CanonicalObject getCanonicalInstance(String name) {
CanonicalObject co = canon.get(name);
return co == null ? new CanonicalObject(name) : co;
}
}
(This is a "semi-real-life" example, in that there are real uses for this pattern. I am aware of the "memory leak", there are ways to avoid it in real applications, but they are irrelevant to this example, so I am just ignoring that issue for the sake of simplicity).
If you serialize an instance of this object, when you deserialize it on the other end, the whole "canonicalization" part will be skipped, which can cause subtle problems in application, that are really hard to diagnose, such as comparisons like if(canon1 != canon2) fireMissile() resulting in "friendly fire" and, possibly, a WorldWar III.
Note, that the problem here is broader than just a constructor not invoked by deserialization: the canon.put call could very well be put into getCanonicalInstance() instead of the constructor, and that would present the problem even if the constructor was invoked.
This is an illustration of why, as a matter of policy, you should not be serializing objects that are not designed to be serialized. IT can sometimes work, but, when it does not, it results in situations that are really hard to detect, and usually even harder to fix.
Consider a generic byte reader implementing the following simple API to read an unspecified number of bytes from a data structure that is otherwise inaccessible:
public interface ByteReader
{
public byte[] read() throws IOException; // Returns null only at EOF
}
How could the above be efficiently converted to a standard Java InputStream, so that an application using all methods defined by the InputStream class, works as expected?
A simple solution would be subclassing InputStream to
Call the read() method of the ByteReader as much as needed by the read(...) methods of the InputStream
Buffer the bytes retrieved in a byte[] array
Return part of the byte array as expected, e.g., 1 byte at a time whenever the InputStream read() method is called.
However, this requires more work to be efficient (e.g., for avoiding multiple byte array allocations). Also, for the application to scale to large input sizes, reading everything into memory and then processing is not an option.
Any ideas or open source implementations that could be used?
Create multiple ByteArrayInputStream instances around the returned arrays and use them in a stream that provides for concatenation. You could for instance use SequenceInputStream for this.
Trick is to implement a Enumeration<ByteArrayInputStream> that is can use the ByteReader class.
EDIT: I've implemented this answer, but it is probably better to create your own InputStream instance instead. Unfortunately, this solution does not let you handle IOException gracefully.
final Enumeration<ByteArrayInputStream> basEnum = new Enumeration<ByteArrayInputStream>() {
ByteArrayInputStream baos;
boolean ended;
#Override
public boolean hasMoreElements() {
if (ended) {
return false;
}
if (baos == null) {
getNextBA();
if (ended) {
return false;
}
}
return true;
}
#Override
public ByteArrayInputStream nextElement() {
if (ended) {
throw new NoSuchElementException();
}
if (baos.available() != 0) {
return baos;
}
getNextBA();
return baos;
}
private void getNextBA() {
byte[] next;
try {
next = byteReader.read();
} catch (IOException e) {
throw new IllegalStateException("Issues reading byte arrays");
}
if (next == null) {
ended = true;
return;
}
this.baos = new ByteArrayInputStream(next);
}
};
SequenceInputStream sis = new SequenceInputStream(basEnum);
I assume, by your use of "convert", that a replacement is acceptable.
The easiest way to do this is to just use a ByteArrayInputStream, which already provides all the features you are looking for (but must wrap an existing array), or to use any of the other already provided InputStream for reading data from various sources.
It seems like you may be running the risk of reinventing wheels here. If possible, I would consider scrapping your ByteReader interface entirely, and instead going with one of these options:
Replace with ByteInputStream.
Use the various other InputStream classes (depending on the source of the data).
Extend InputStream with your custom implementation.
I'd stick to the existing InputStream class everywhere. I have no idea how your code is structured but you could, for example, add a getInputStream() method to your current data sources, and have them return an appropriate already-existing InputStream (or a custom subclass if necessary).
By the way, I recommend avoiding the term Reader in your own IO classes, as Reader is already heavily used in the Java SDK to indicate stream readers that operate on encoded character data (as opposed to InputStream which generally operates on raw byte data).
I am currently working on a program that takes user data (name, address, phone). I want to make the data unreadable when I save it. Since it won't be a released program, I don't want to use a cipher that is too complicated, just a simple and quick scramble/unscramble algorithm.
How about defining a class (let's say Foo) and create instances of that class those which can hold the data, then put the objects into a list. After that, save the list object as a binary file. Something like this:
public static void saveObject(List<Foo> obj, String filePath)
{
OutputStream os = null;
try
{
os = new ObjectOutputStream(new FileOutputStream(filePath));
os.writeObject(obj);
}
catch(Exception ex){}
finally
{
os.close();
}
}
Then you can load it like:
public static List<Foo> loadObject(String filePath)
{
List<Foo> obj = null;
InputStream is = null;
try
{
is = new ObjectInputStream(new FileInputStream(filePath));
obj = (List<Foo>) is.readObject();
}
catch(Exception ex){}
finally
{
is.close();
}
return obj;
}
Note that you need to implement the Serializable interface to use this object serialization.
you just need to understand that its a matter of strength of your algorithm, because according to your post, you do need some way of encryption/decryption.
You can think about a lot of ways to "hide" your data but this "defense" can be broken in a relatively easy way... So its up to you to decide.
The easiest implementations are (naive):
- substitute each letter in your string with some number.
- play with ascii values of your letters in string.
Again the real question here is whether it is good enough???
From the point of view of "security" - definitely not.