I am coding a game using LibGdx. I created a level editor where you can place objects for the level. All the objects are put into a list and then the whole list is serialized by writing out the list to an object output stream. I then read in the list of objects in my game and copy the list over to my list of objects in the current level being played. On the actual user's device who is playing the game, there will be 20+ serialized level files. They will only be deserialized at this point. Is this an efficient way to do this in regards to memory and performance? Could those files take up a good chunk of memory? I noticed people use xml or Json for what I am doing. Should I be worried about having any issues the way I have done my level loading? Thanks. Let me know if my question isn't clear.
When we were looking for slowdowns in our code we profiled it running and found a huge amount of time being spent on one thing. When we looked it turned out to be a section where an object was being duplicated by serializing it to a string and back. For some reason, the default java implementation of serialization is SLOW.
The other problem with serialization is that it's black box--if your file gets corrupted or you mess up your object enough during an upgrade you can lose everything.
Did you consider ORM and some kind of easy database? There are databases that are compiled into your code (invisible to your user) and it's pretty much just db.save(anyObject)... very easy to use.
for a similiar question I did a short research (because I couldn't believe object serialization is slow) and i would recommend you to use JSON, as it's faster. Memory usage in terms of RAM will be the same (as soon as your object is deserialized). On disk you might want to zip it.
according to those benchmarks, jackson is faster than java serialization:
benchmark
deeplink
another advantage is the human readability of your json files.
Related
I opened this issue on github project prevayler-clj
https://github.com/klauswuestefeld/prevayler-clj/issues/1
because 1M short vectors, like this [:a1 1], forming the state of the prevayler, results in 1GB file size when serialized, one by one, with Java writeObject.
Is it possible? About 1kB for each PersistentVector? Further investigations demonstrated the same amount of vectors can be serialized in a 80MB file. So, what's going wrong in prevayler serialization? Am I doing something wrong in these tests. Please refer to the github issue for my tests code excerpts.
Prevayler apparently starts a fresh ObjectOutputStream for each serialized element, preventing any reuse of class data between them. Your test code, on the other hand, is written the "natural" way, allowing reuse. What forces Prevayler to restart every time is not clear to me, but I would hesitate to call it a "feature", given the negative impact it has; "workaround" is the more likely designation.
There's nothing wrong with prevLayer per say. It's just that java's writeObject method is not exactly tuned to writing clojure data; it's intended to store the internal structure of any serializable java object. Since clojure vectors are reasonably complex java objects under the hood, I'm not very suprised that a small vector may write out as roughly a Kb of data.
I'd guess that pretty much any clojure-specific serialization method would result in smaller files. From experience, standard clojure.core/pr + clojure.core/read gives a good balance between file size and speed and handles data structures of nearly any size.
See these pages for some insight in the internals of clojure vectors:
http://hypirion.com/musings/understanding-persistent-vector-pt-1
http://hypirion.com/musings/understanding-persistent-vector-pt-2
In Java, I know that if you are going to build a B-Tree index on Hard Disk, you probably should use serialisation were the B-Tree structure has to be written from RAM to HD. My question is, if later I'd like to query the value of a key out of the index, is it possible to deserialise just part of the B-Tree back to RAM? Ideally, only retrieving the value of a specific key. Fetching the whole index to RAM is a bad design, at least where the B-Tree is larger than the RAM size.
If this is possible, it'd be great if someone provides some code. How DBMSs are doing this, either in Java or C?
Thanks in advance.
you probably should use serialisation were the B-Tree structure has to be written from RAM to HD
Absolutely not. Serialization is the last technique to use when implementing a disk-based B-tree. You have to be able to read individual nodes into memory, add/remove keys, change pointers, etc, and put them back. You also want the file to be readable by other languages. You should define a language-independent representation of a B-tree node. It's not difficult. You don't need anything beyond what RandomAccessFile provides.
You generally split the B-tree into several "pages," each with some of they key-value pairs, etc. Then you only need to load one page into memory at a time.
For inspiration of how rdbms are doing it, it's probably a good idea to check the source code of the embedded Java databases: Derby, HyperSql, H2, ...
And if those databases solve your problem, I'd rather forget about implementing indices and use their product right away. Because they're embedded, there is no need to set up a server. - the rdbms code is part of the application's classpath - and the memory footprint is modest.
IF that is a possibility for you of course...
If the tree can easily fit into memory, I'd strongly advise to keep it there. The difference in performance will be huge. Not to mention the difficulties to keep changes in sync on disk, reorganizing, etc...
When at some point you'll need to store it, check Externalizable instead of the regular serialization. Serializing is notoriously slow and extensive. While Externalizable allows you to control each byte being written to disk. Not to mention the difference in performance when reading the index back into memory.
If the tree is too big to fit into memory, you'll have to use RandomAccessFile with some kind of memory caching. Such that often accessed items come out of memory nonetheless. But then you'll need to take updates to the index into account. You'll have to flush them to disk at some point.
So, personally, I'd rather not do this from scratch. But rather use the code that's out there. :-)
i just came to the point where whether google nor my knowledge bring me forwards.
Think about the following situation: I read in a lot (up to millions) of large objects (up to 500mb each) and sometimes i read in millions of objects with only 500kb, this completely depends on the user of my software. Each object is gonna be processed in a pipeline so they don't need to be all in the memory for all the time, only a reference would be needed to find the objects again on my harddisk after serializing it so that i can deserialize it again. So it's something like a persistent cache for large objects.
so here come my questions:
Is there a solution (any framework) which does exactly what i need? this includes: arbitrary serialization of large objects after determining somehow if the cache is full?
if there isn't: is there a way to somehow intelligent check weather an object should be serialized or not? e.g. checking somehow the memory size? Or something like a listener on a softreference (when it get's released?).
Thanks alot,
Christian
Storing millions of objects sounds like a database application.
Check out Oracle Coherence. There are a few open source alternatives as well but, Coherence is the most feature rich.
EDIT:
Coherence Alternatives
I'm trying to design a lightweight way to store persistent data in Java. I've already got a very efficient way to serialize POJOs to DataOutputStreams (and back), but I'm trying to think of a good way to ensure that changes to the data in the POJOs gets serialized when necessary.
This is for a client-side app where I'm trying to keep the size of the eventual distributable as low as possible, so I'm reluctant to use anything that would pull-in heavy-weight dependencies. Right now my distributable is almost 10MB, and I don't want it to get much bigger.
I've considered DB4O but its too heavy - I need something light. Really its probably more a design pattern I need, rather than a library.
Any ideas?
The 'lightest weight' persistence option will almost surely be simply marking some classes Serializable and reading/writing from some fixed location. Are you trying to accomplish something more complex than this? If so, it's time to bundle hsqldb and use an ORM.
If your users are tech savvy, or you're just worried about initial payload, there are libraries which can pull dependencies at runtime, such as Grape.
If you already have a compact data output format in bytes (which I assume you have if you can persist efficiently to a DataOutputStream) then an efficient and general technique is to use run-length-encoding on the difference between the previous byte array output and the new byte array output.
Points to note:
If the object has not changed, the difference in byte arrays will be an array of zeros and hence will compress very small....
For the first time you serialize the object, consider the previous output to be all zeros so that you communicate a complete set of data
You probably want to be a bit clever when the object has variable-sized substructures....
You can also try zipping the difference rather than RLE - might be more efficient in some cases where you have a large object graph with a lot of changes
I'm currently working on a Part of an Application where "a lot" of data must be selected for further work and I have the impression that the I/O is limiting and not the following work.
My idea is now to have all these objects in memory but serialized an compressed. The question is, if accessing the objects like this would be faster than direct Database access and if it is a good idea or not. (and if it is feasble in terms of memory consumption = serialized form uses less memory than normal object)
EDIT February 2011:
The creation of the objects is the slow part and not the database access itself. Having all in memory is not possible and using ehcache option to "overflow to disk" is actually slower than just getting the data from the database. Standard java serialization is also unusable. it is also a lot slower. So basically nothing I can do about it...
You're basically looking for an in-memory cache or an in-memory datagrid. There are plenty of APIs/products for this sort of thing. ehcache/hibernate chace/gridgain etc etc
The compressed serialized form will use less memory, if it is a large object. However for smaller objects e.g. which use primtives. The original object will be much smaller.
I would first check whether you really need to do this. e.g. Can you just consume more memory? or restructure your objects so they use less memory.
"I have the impression that the I/O is limiting and not the following work. " -> I would be very sure of this before starting implementing such a thing.
The simpler approach I can suggest you is to use ehcache with the option to store on disk when the size of the cache get too big.
Another completely different approach could be using some doc based nosql db like couchdb to store objects selected "for further work"