Hazelcast Java Serialization/Deserialization ArrayList Pitfall - java

I've switched from Memcached to Hazelcast. After a while i've noticed that the size of the Cache was bigger than usual. With man center.
So i did like this:
1. Before to call the IMap.set(key,value(ArrayList) i deserialize the value to a file which has 128K as size.
2. After the IMap.set() is called, i IMap.get() the same map, which suddently this has now 6 MB size.
The object in question has many objects which are referenced multiple times in the same structure.
i've opened the 2 binary files and i've seen that the 6MB file has a lot of duplicated data. The serialization used by hazelcast somehow make copies of the references
All the Classes instantiated for the Cache are Serializable except Enums.
using Memcached the value size is 128K in both cases.
i've tryied Kryo with hazelcast and there was not really a difference, still over 6MB
Have someone a similar problem with hazelcast ? If yes how did you solved this without changing the cache provider.
I could provide the Object Structure and Try to reproduce it with non sensitive data, if someone need it.

I am not pretending, but after a lost day, i finally came up with a solution which workaround this. I cannot say if it is a feature or just a problem to report.
Anyway in Hazelcast if you put in an IMap a value as ArrayList thus will be Serialized Entry By Entry. Which means if we have 100 entries of the same instance A which is 6K we will have 600K with Hazelcast. Here a short RAW code which prove my answer.
To Workaround or avoid this with Java Serialization you should wrap the ArrayList into an object , this will do the trick.
(only with Serializable, no other Implementations)
#Test
public void start() throws Exception {
HazelcastInstance client = produceHazelcastClient();
Data data = new Data();
ArrayList<Data> datas = new ArrayList<>();
IntStream.range(0, 1000).forEach(i -> {
datas.add(data);
});
wirteFile(datas,"DataLeoBefore","1");
client.getMap("data").put("LEO", datas);
Object redeserialized = client.getMap("data").get("LEO");
wirteFile(redeserialized,"DataLeoAfter","1");
}
public void wirteFile(Object value, String key, String fileName) {
try {
Files.write(Paths.get("./" + fileName + "_" + key), SerializationUtils.serialize(((ArrayList) value)));
} catch (IOException e) {
e.printStackTrace();
}
}

Hazelcast can be configured to use several different serialization schemes; Java serialization (the default) is the least efficient in terms of both time and space. Typically choosing the right serialization strategy gives a bigger payoff than almost any other optimization you could do.
The reference manual gives a good overview of the different serialization schemes and the tradeoffs involved.
IMDG Reference Manual v3.11 - Serialization
I typically would go with IdentifiedDataSerializable if my application is all Java, or Portable if I needed to support cross-language clients or object versioning.
If you need to use Java serialization for some reason, you might check and verify that the SharedObject property is set to true to avoid creating multiple copies of the same object. (That property can be set via the element in hazelcast.xml, or programmatically through the SerializationConfig object).

Related

How to serialise/deserialise long[] value with get/set on random indices using Chronicle Map?

I am new to chronicle-map. I am trying to model an off heap map using chronicle-map where the key is a primitive short and the value is a primitive long array. The max size of the long array value is known for a given map. However I will have multiple maps of this kind each of which may have a different max size for the long array value. My question relates to the serialisation/deserialisation of the key and value.
From reading the documentation I understand that for the key I can use the value type ShortValue and reuse the instance of the implementation of that interface. Regarding the value I have found the page talking about DataAccess and SizedReader which gives an example for byte[] but I'm unsure how to adapt this to a long[]. One additional requirement I have is that I need to get and set values at arbitrary indices in the long array without paying the cost of a full serialisation/deserialisation of the entire value each time.
So my question is: how can I model the value type when constructing the map and what serialisation/deserialisation code do I need for a long[] array if the max size is known per map and I need to be able to read and write random indices without serialising/deserialising the entire value payload each time? Ideally the long[] would be encoded/decoded directly to/from off heap without undergoing an on heap intermediate conversion to a byte[] and also the chronicle-map code would not allocate at runtime. Thank you.
First, I recommend to use some kind of LongList interface abstraction instead of long[], it will make it easier to deal with size variability, provide alternative flyweight implementations, etc.
If you want to read/write just single elements in large lists, you should use advanced contexts API:
/** This method is entirely garbage-free, deserialization-free, and thread-safe. */
void putOneValue(ChronicleMap<ShortValue, LongList> map, ShortValue key, int index,
long element) {
if (index < 0) throw throw new IndexOutOfBoundsException(...);
try (ExternalMapQueryContext<ShortValue, LongList, ?> c = map.getContext(key)) {
c.writeLock().lock(); // (1)
MapEntry<ShortValue, LongList> entry = c.entry();
if (entry != null) {
Data<LongList> value = entry.value();
BytesStore valueBytes = (BytesStore) value.bytes(); // (2)
long valueBytesOffset = value.offset();
long valueBytesSize = value.size();
int valueListSize = (int) (valueBytesSize / Long.BYTES); // (3)
if (index >= valueListSize) throw new IndexOutOfBoundsException(...);
valueBytes.writeLong(valueBytesOffset + ((long) index) * Long.BYTES,
element);
((ChecksumEntry) entry).updateChecksum(); // (4)
} else {
// there is no entry for the given key
throw ...
}
}
}
Notes:
You must acquire writeLock() from the beginning, because otherwise readLock() is going to be acquired automatically when you call context.entry() method, and you won't be able to upgrade read lock to write lock later. Please read HashQueryContext javadoc carefully.
Data.bytes() formally returns RandomDataInput, but you could be sure (it's specified in Data.bytes() javadoc) that it's actually an instance of BytesStore (that's combination of RandomDataInput and RandomDataOutput).
Assuming proper SizedReader and SizedWriter (or DataAccess) are provided. Note that "bytes/element joint size" technique is used, the same as in the example given in SizedReader and SizedWriter doc section, PointListSizeMarshaller. You could base your LongListMarshaller on that example class.
This cast is specified, see ChecksumEntry javadoc and the section about checksums in the doc. If you have a purely in-memory (not persisted) Chronicle Map, or turned checksums off, this call could be omitted.
Implementation of single element read is similar.
Answering extra questions:
I've implemented a SizedReader+Writer. Do I need DataAccess or is SizedWriter fast enough for primitive arrays? I looked at the ByteArrayDataAccess but it's not clear how to port it for long arrays given that the internal HeapBytesStore is so specific to byte[]/ByteBuffers?
Usage of DataAccess instead of SizedWriter allows to make one less value data copy on Map.put(key, value). However, if in your use case putOneValue() (as in the example above) is the dominating type of query, it won't make much difference. If Map.put(key, value) (and replace(), etc., i. e. any "full value write" operations) are important, it is still possible to implement DataAccess for LongList. It will look like this:
class LongListDataAccess implements DataAccess<LongList>, Data<LongList>,
StatefulCopyable<LongListDataAccess> {
transient ByteStore cachedBytes;
transient boolean cachedBytesInitialized;
transient LongList list;
#Override public Data<LongList> getData(LongList list) {
this.list = list;
this.cachedBytesInitialized = false;
return this;
}
#Override public long size() {
return ((long) list.size()) * Long.BYTES;
}
#Override public void writeTo(RandomDataOutput target, long targetOffset) {
for (int i = 0; i < list.size(); i++) {
target.writeLong(targetOffset + ((long) i) * Long.BYTES), list.get(i));
}
}
...
}
For efficiency, the methods size() and writeTo() are key. But it's important to implement all other methods (which I didn't write here) correctly too. Read DataAccess, Data and StatefulCopyable javadocs very carefully, and also Understanding StatefulCopyable, DataAccess and SizedReader and Custom serialization checklist in the tutorial with great attention too.
Does the read/write locking mediate across multiple process reading and writing on same machine or just within a single process?
It's safe accross processes, note that the interface is called InterProcessReadWriteUpdateLock.
When storing objects, with a variable size not known in advance, as values will that cause fragmentation off heap and in the persisted file?
Storing value for a key once and not changing the size of the value (and not removing keys) after that won't cause external fragmentation. Changing size of the value or removing keys could cause external fragmentation. ChronicleMapBuilder.actualChunkSize() configuration allows to trade between external and internal fragmentation. The bigger the chunk, the less external fragmentation, but the more internal fragmentation. If your values are significantly bigger than page size (4 KB), you could set up absurdly big chunk size and still have internal fragmentation bound by the page size, because Chronicle Map is able to exploit lazy page allocation feature in Linux.

Changing Implementation/Class at runtime

I am looking for real world examples of (open source) programs (or algorithms) that change the concrete class of an object (or variable) at runtime.
An example of such behaviour in Java could look like the code snipper below.
Here, a LinkedList, which performs well in the context of frequent inserts and/or removes, is changed into an ArrayList, which performs well in the context of random access and iteration.
List myList = new LinkedList();
/* Lots of inserts */
...
myList = new ArrayList( myList ); // 'change' into different class
/* Lots of iteration */
...
The Java example above changes between LinkedList and ArrayList for the
sake of performance.
However, examples in any language, for any data structure, using any technique*, and for any reason are welcome.
*Technique: plain and simple like in the example above, or
using become: in SmallTalk,or __class__ in Python, or ...
You might want to check use cases for become method in Smalltalk. The method changes the class of the instance at runtime (or to change all references to the instance to reference different instance)
Become is commonly used to grow/shrink collections, e.g. Dictionary with more buckets, ByteArray with bigger buffer etc. It is possible to convert from SmallInteger to BigIntegers (former are limited in size, latter are not, but are much slower), and the programmer wouldn't even notice (this is reasonable only if you have mutable integers, therefore this is not how this is done in Smalltalk. But it could be :)
Another case might be when loading an instance from serialized form back into the running system, and updating its class to the newest version.
Yes, look at #become in Smalltalk (for instance MIT licensed Pharo.org).
Beside the examples already given #become is for instance usefull when
you work with proxies. Think of a proxy object within an ORM framework like
Glorp where you first have the proxy and when the real full object is needed
it can be loaded from a database and easily all references will be switched.
Another example is the Fuel framework in Pharo.
Don't know if this is relevant but maybe the usage of spy (partial mocks) also fits your description (see http://docs.mockito.googlecode.com/hg/1.9.5/org/mockito/Spy.html):
An example:
Person person = new Person();
person = spy(person);
doReturn("dominiek").when(person).getName();
Behind the scenes a subclass is created and the behavior of the class is altered according to the users' behavior declarations.
I've just run across an instance of this in the (Python) NLTK source. The LazyCorpusLoader (an object used to load a dataset from disk) "morphs" into the dataset itself. Here's the relevant section of the linked source code (creating a dataset object and then becoming it):
corpus = self.__reader_cls(root, *self.__args, **self.__kwargs)
# This is where the magic happens! Transform ourselves into
# the corpus by modifying our own __dict__ and __class__ to
# match that of the corpus.
args, kwargs = self.__args, self.__kwargs
name, reader_cls = self.__name, self.__reader_cls
self.__dict__ = corpus.__dict__
self.__class__ = corpus.__class__
Here's the rationale given (in the header of the same file) for this technique:
LazyCorpusLoader is a proxy object which is used to stand in for a
corpus object before the corpus is loaded. This allows NLTK to
create an object for each corpus, but defer the costs associated
with loading those corpora until the first time that they're
actually accessed.
So the purpose of changing the class at runtime in this case is to emulate lazy evaluation.
(Edit: Since I'm quoting verbatim from the NLTK source (Apache 2.0 license), here's the mandatory link to the license itself: http://www.apache.org/licenses/LICENSE-2.0)

Any reason to initialize Entity properties with synchronized Collections?

For JPA-Entities in a project I work on, properties of type List or Map are always initialized to the synchronized implementations Vector and Hashtable.
(Unsynchronized ArrayList and HashMap are the standard implementations in Java, except if synchronization is really needed.)
Does anyone know a reason why synchronized Collections would be needed? We use EclipseLink.
When I asked about it, nobody knew why it was done like that. It seems it was always done like this. Maybe this was needed for an old version of EclipseLink?
I'm asking for two reasons:
I would prefer to use the standard implementations ArrayList and HashMap like anywhere else. If that's safe.
There's no matching synchronized Set implementation in the JDK. At least not a serializable one as EclipseLink expects.
Example Entity:
#Entity
public class Person {
...
#ManyToMany(cascade=CascadeType.ALL)
#JoinTable( ... )
private List<Role> accessRoles;
#ElementCollection
#CollectionTable( ... )
#MapKeyColumn(name="KEY")
#Column(name="VALUE")
private Map<String, String> attrs;
public Person() {
// Why Vector/Hashtable instead of ArrayList/HashMap?
accessRoles = new Vector<Role>();
attrs = new Hashtable<String, String>();
}
public List<Role> getAccessRoles() {
return accessRoles;
}
public void setAccessRoles(List<Role> accessRoles) {
this.accessRoles = accessRoles;
}
public Map<String, String> getAttrs() {
return attrs;
}
public void setAttrs(Map<String, String> attrs) {
this.attrs = attrs;
}
}
There's usually no need for a Vector and an ArrayList is more commonly used. So if your current codebase is full of Vectors, this is a bit of a code smell and it is wise to make sure your team members know what the difference is. See also What are the differences between ArrayList and Vector? and Why is Java Vector class considered obsolete or deprecated?
That does not mean you should do the Big Cleanup and replace all the Vectors in your existing code with ArrayLists.
Your code uses Lists and you won't notice a single difference when programming.
The only advantage to be expected is increased performance.
It is hard to tell if none of your code depends on the synchronization provided by the Vectors.
So, unless you are currently suffering performance issues, or are explicitly (re)designing the synchronization of your entire codebase, you risk introducing hard to fix concurrency bugs without any benefits.
Also, be aware that performance suffers most significantly from the use of Vectors when multiple threads access your collections concurrently. So if you are suffering from performance loss and decide to replace the Vectors for that reason, you'll need to be very careful to keep access sufficiently synchronized.
EDIT: You ask about EclipseLink JPA specifically.
It'd be rather surprising if they demanded you use Vectors and Hashtables since that would mean they ask you to rely on obsolete data structures.
In their examples, they use ArrayLists and HashMaps so from that we may conclude that this is indeed not the case.
Diving a bit more specifically into the source code, we can see that their CollectionContainerPolicy uses the Collection interface and does not care about the implementation of your collections. It does, however, surprisingly have special cases for when your internal collection class is Vector. See for instance buildContainerFromVector. And its default container class is Vector, though you can alter that.
See also the documentation for the Container policy.
The most intrusive moment where EclipseLink and your Lists meet is when you're lazy loading collections. EclipseLink will replace the collection with its own IndirectList which internally uses a Vector. See What collections does jpa return? So in those cases, EclipseLink will give you a Vector anyways(!) and it does not even matter what collection you specify in the collection's initialization.
So EclipseLink indeed has a preference for using Vectors and using
Vectors with EclipseLink means less copying of object references from
one collection to the other.
Much of the internals of EclipseLink date back to a time when Vector and Hashtable were the standard collection types in Java. EclipseLink was TopLink back then, which originated from a persistence framework for Smalltalk - so, much of EclipseLinks code is actually older than Java itself, so to speak.
For many years I have worked with TopLink, and always their standard mappings for collection properties used Vector and Hashtable.
To me, the only reasonable explanation for Vector and Hashtable still appearing in EclipseLink is that it has been working like this for a long time and - because it is working - hitherto no one has gotten around to changing it.
For myself, I wouldn't ever use Vector or Hashtable again. If I need a synchronized collection, I'd rather use the SynchronizedList ...Map etc. APIs.
Just my 2 ct.
Going through the code base of eclipselink, it looks like the usage of vector is inherited from older code base and is much like the Vector class itself - legacy.
Somehow the intent was to use Vector to allow multiple threads to act safely on the relationships which are loaded lazily - "indirection" in eclipselink parlance.
(More on the concepts here- the different types of indirection discussed being ValueHolder indirection, Transparent Indirection, Proxy indirection etc.)
However typically the entities and their relationships are not shared among multiple threads in usual use-cases. Each thread gets it's own copy of entity and its
relationships if accessed in their own unit of work.
In case of ValueHoder indirection - one of the implementations of ValueHoderInterface is ValueHoder which is typically initialized with a vector. The relevant part of code is below along with the
code comment as is. The comments are interesting as well
IndirectList.java
..........................
.........................
/**
* INTERNAL:
* Return the valueHolder.
* This method used to be synchronized, which caused deadlock.
*/
public ValueHolderInterface getValueHolder() {
// PERF: lazy initialize value holder and vector as are normally set after creation.
if (valueHolder == null) {
synchronized(this) {
if (valueHolder == null) {
valueHolder = new ValueHolder(new Vector(this.initialCapacity, this.capacityIncrement));
}
}
}
return valueHolder;
}
...................
..................
Also there were few issues reported due to the usage of Vector as mentioned here and here.
You don't need synchronized Collections for the JPA, It should be only related to the business logic.. Which i supposed that doesn't need this.. Because you would know.
So basically it is suggested to use not synchronize and it will increase performance.
As #flup answered with some interesting references, I could only make some additional presumptions:
The team that developed and/or the specifications simply were unaware of the Collection API.
The team wanted to use the code in a highly concurrent environment (either in your Web application, like passing some entities to some other threads or in another desktop application, as JPA is not limited to WEB applications only). Also do note, that IndirectSet is not thread-safe, so meaning that if the team wanted to write some thread-safe code, they should have taken some additional measures (if they use Sets)!

Why does mongodb seem to save some binary objects and not others?

I'm not sure where to start or what information is relevant please let me know what additional information may be useful in solving this problem.
I am developing a simple cometd application and I'm using mongodb as my storage backend. I obtain a single mongodb instance when the application starts and I use this instance for all queries. This is in fact recommended by the mongo java driver documentation as stated here: http://www.mongodb.org/display/DOCS/Java+Driver+Concurrency. I was grasping at straws thinking that the issue had something to do with thread safety but according to that link mongodb is completely thread safe.
Here's where it gets interesting. I have a class that extends BasicDBObject.
public class MyBasicDBObject {
private static final String MAP = "map";
public boolean updateMapAnd(String submap, String key, byte[] value) {
Map topMap = (Map)this.get(MAP);
Map embeddedMap = topMap.get(submap);
byte[] oldValue = embeddedMap.get(key);
newValue = UtilityClass.binaryAnd(oldValue, value);
embeddedMap.put(key, newValue);
topMap.put(submap, embeddedMap);
this.put(MAP, topMap);
}
public boolean updateMapXor(String submap, String key, byte[] value) {
Map topMap = (Map)this.get(MAP);
Map embeddedMap = topMap.get(submap);
byte[] oldValue = embeddedMap.get(key);
newValue = UtilityClass.binaryXor(oldValue, value);
embeddedMap.put(key, newValue);
topMap.put(submap, embeddedMap);
this.put(MAP, topMap);
}
}
Next two skeleton classes that extend MyBasicDBObject.
public class FirstDBObject extends MyBasicDBObject { //no code }
public class SecondDBObject extends MyBasicDBObject { //no code }
The only reason I've set up my classes this way is to improve code readability in dealing with these two objects within the same scope. This lets me do the following...
//a cometd service callback
public void updateMapObjectsFoo(ServerSession remote, Message message) {
//locate the objects to update...
FirstDBObject first = (FirstDBObject) firstCollection.findOne({ ... });
SecondDBObject second = (SecondDBObject) secondCollection.findOne({ ... });
//update them as follows
first.updateMapAnd("default", "someKey1", newBinaryData1);
second.updateMapAnd("default", "someKey2", newBinaryData2);
//save (update) them to their respective collections
firstCollection.save(first);
secondCollection.save(second);
}
public void updateMapObjectsBar(ServerSession remote, Message message) {
//locate the objects to update...
FirstDBObject first = (FirstDBObject) firstCollection.findOne({ ... });
SecondDBObject second = (SecondDBObject) secondCollection.findOne({ ... });
/**
* the only difference is these two calls
*/
first.updateMapXor("default", "someKey1", newBinaryData1);
second.updateMapXor("default", "someKey2", newBinaryData2);
//save (update) them to their respective collections
firstCollection.save(first);
secondCollection.save(second);
}
The UtilityClass does exactly as the methods are named, bitwise & and bitwise ^ by iterating over the passed byte arrays.
This is where I'm totally lost. updateMapObjectsFoo() works exactly as expected, both first and second reflect the changes in the database. updateMapObjectsBar() on the other hand only manages to properly update first.
Inspection via debugging updateMapObjectsBar() shows that the binary objects are in fact updated properly on both objects, but when I head over to the mongo shell to investigate the problem I see that first is updated in the DB and second is not. Where did I get the idea that thread safety had anything to do with it? The only difference that bugs me is that secondCollection is used by other cometd services while firstCollection is not. That seems relevant in one hand, but not in the other since Foo works and Bar does not.
I have torn the code apart and put it back together and I keep coming back to this same problem. What in the world is going on here?
It seems I left out the most relevant part of all which is the nightmare of java generics and the mongodb driver's reliance on this feature of the language. BasicDBObject is essentially a wrapper for a Map<String, Object>. The problem is that once you store an object in that map, you must cast it back to what it was when you put it in there. Yes that may seem completely obvious, and I knew that full well before posting this question.
I cannot pinpoint what happened exactly but I will offer this advice to java + mongodb users. You will be casting, A LOT, and the more complicated your data structures the more casts you will need. Long story short, don't do this:
DBObject obj = (DBObject) collection.findOne(new BasicDBObject("_id", new ObjectId((String)anotherObj.get("objId"))));
One liners are tempting when you are doing rapid prototypes but when you start doing that over and over you are bound to make mistakes. Write more code now, and suffer less frustration later:
DBObject query = new DBObject();
String objId = (String) anotherObj.get("objId");
query.put("_id", new ObjectId(objId));
obj = (DBObject) collection.findOne(query);
I think this is annoyingly verbose but I should expect as much interacting directly with Mongo instead of using some kind of library to make my life easier. I have made a fool of myself on this one, but hopefully someone will learn from my mistake and save themselves a lot of frustration.
Thanks to all for your help.
It could very easily be a multi-threading issue. While you are correct that the Mongo, DB, and DBCollection objects are threadsafe if there is only one Mongo instance, DBObjects are not threadsafe. But even if they were threadsafe, your updateMapObjectsFoo/Bar methods do nothing to ensure that they are atomic operations on the database.
Unfortunately, the changes you would need to make to your code are more intense than just sprinkling a few "synchronized" keywords around. See if http://www.mongodb.org/display/DOCS/Atomic+Operations doesn't help you understand the scope of the problem and some potential solutions.

Can objects be buffered during java serialization?

I have a very large object which I wish to serialize. During the process of serialization, it comes to occupy some 130MB of heap as an weblogic.utils.io.UnsyncByteArrayOutputStream. I am using a BufferedOutputStream to speed up writing the data to disk, which reduces the amount of time for which this object is held in memory.
Is it possible to use a buffer to reduce the size of the object in memory though? It would be good if there was a way to serialize it x bytes at a time and write those bytes to disk.
Sample code follows if it is of any use. There's not much to go on though I don't think. If it's the case that there needs to be a complete in-memory copy of the object to be serialised (and therefore no concept of a serialization buffer) then I suppose I am stuck.
ObjectOutputStream tmpSerFileObjectStream = null;
OutputStream tmpSerFileStream = null;
BufferedOutputStream bufferedStream = null;
try {
tmpSerFileStream = new FileOutputStream(tmpSerFile);
bufferedStream = new BufferedOutputStream(tmpSerFileStream);
tmpSerFileObjectStream = new ObjectOutputStream(bufferedStream);
tmpSerFileObjectStream.writeObject(siteGroup);
tmpSerFileObjectStream.flush();
} catch (InvalidClassException invalidClassEx) {
throw new SiteGroupRepositoryException(
"Problem encountered with class being serialised", invalidClassEx);
} catch (NotSerializableException notSerializableEx) {
throw new SiteGroupRepositoryException(
"Object to be serialized does not implement " + Serializable.class,
notSerializableEx);
} catch (IOException ioEx) {
throw new SiteGroupRepositoryException(
"Problem encountered while writing ser file", ioEx);
} catch (Exception ex) {
throw new SiteGroupRepositoryException(
"Unexpected exception encountered while writing ser file", ex);
} finally {
if (tmpSerFileObjectStream != null) {
try {
tmpSerFileObjectStream.close();
if(null!=tmpSerFileStream)tmpSerFileStream.close();
if(null!=bufferedStream)bufferedStream.close();
} catch (IOException ioEx) {
logger.warn("Exception caught on trying to close ser file stream", ioEx);
}
}
}
This is wrong on so many levels. This is a massive abuse of serialization. Serialization is mostly intended for temporarily storing an object. For example,
session objects between tomcat server restarts.
transferring objects between jvms ( load balancing at website )
Java's serialization makes no effort to handle long-term storage of objects (No versioning support) and may not handle large objects well.
For something so big, I would suggest some investigation first:
Ensure that you are not trying to persist the entire JVM Heap.
Look for member variables that can be labeled as 'transient' to avoid including them it the serialization ( perhaps you have references to service objects )
Consider possibility that there is a memory leak and the object is excessively large.
If everything is indeed correct, you will have to research alternatives to java.io.Serialization. Taking more control via java.io.Externalization might work. But I would suggest something like a json or xml representation.
Update:
Investigate :
google's protocol buffer
facebook's Thrift
Avro
Cisco's Etch
Take a look at this benchmarkings as well.
What is the "siteGroup" object that you're trying to save? I ask, because it's unlikely that any one object is 130MB in size, unless it has a ginormous list/array/map/whatever in it -- and if that's the case, the answer would be to persist that data in a database.
But if there's no monster collection in the object, then the problem is likely that the object tree contains references to a bagillion objects, and the serialization of course does a deep copy (this fact has been used as a shortcut to implement clone() a lot of times), so everything gets cataloged all at once in a top-down fashion.
If that's the problem, then the solution would be to implement your own serialization scheme where each object gets serialized in a bottom-up fashion, possibly in multiple files, and only references are maintained to other objects, instead of the whole thing. This would allow you to write each object out individually, which would have the effect you're looking for: smaller memory footprint due to writing the data out in chunks.
However, implementing your own serialization, like implementing a clone() method, is not all that easy. So it's a cost/benefit thing.
It sounds like whatever runtime you are using has a less-than-ideal implementation of object serialization that you likely don't have any control over.
A similar complaint is mentioned here, although it is quite old.
http://objectmix.com/weblogic/523772-outofmemoryerror-adapter.html
Can you use a newer version of weblogic? Can you reproduce this in a unit test? If so, try running it under a different JVM and see what happens.
I don't know about weblogic (that is - JRockit I suppose) serialization in particular: honestly I see no reason for using ByteArrayOutputStreams...
You may want to implement java.io.Externalizable if you need more control on how your object is serialized - or switch to an entirely different serialization system (eg: Terracotta) if you don't want to write read/write methods yourself (if you have many big classes).
Why does it occupy all those bytes as an unsync byte array output stream?
That's not how default serialization works. You must have some special code in there to make it do that. Solution: don't.

Categories

Resources