Externalizable interface seems to be hard to use. Reasons
Strings in an object can be null. So, I have create and serialize flags to mentioned weather or not to do inReader.readUTF()
For Java Lists its even more hard.
I am not sure, what is the best way to externalize a java HashMap, since I would know at reading time, how many keys are there and if any value is null.
Related
I need a solution to the following problem. Suppose I have different fields in a class. Each of different type, some may be basic types such as Integers, some may be complex object type fields. I need to find a way to compare those fields after exit and restart of the app. By I am limited to dumping the values to file and comparing those. How can I put something on file and compare them so that I can determine whether they have changed or not. I do not need the values. Will getHashCode() help?
If I understand your question, you would like to compare content in a file after exit and before restart. One way would be to use a message digest. As in calculating the SHA1 of the contents and comparing that before restart.
It sounds like Java object serialization might do the trick for you. With serialization, you can write any object to a file, and later read it in again and reconstruct the original object. If you then have an isEqual() method on the object, you can use that to simply check whether the object is the same.
EDIT: reread the question. If you want to compare the file contents, then serialization is not particular useful, as there are bound to be small differences between the two files.
I guess hashCode() will help only if it's implemented in such a way that will return the same result for two objects if the objects have the same values. Of course, for non-primitive fields you'll have to decide what does "same value" mean, and you would be probably required to implement hashCode() for the types of those fields as well.
If you can't/don't want to implement hashCode() maybe JSON could help. I suggest using a library like Google's Gson to render a string representation of your object which you can then dump to file. If the way in which the object (or any of its members) is converted to string does not suit your needs you can specify the conversion with a JsonSerializer.
String strRep = new Gson().toJson(myObject);
I want to have an object that allows other objects of a specific type to register themselves with it. Ideally it would store the references to them in some sort of set collection and have .equals() compare by reference rather than value. It shouldn't have to maintain a sort at all times, but it should be able to be sorted before the collection is iterated over.
Looking through the Java Collection Library, I've seen the various features I'm looking for on different collection types, but I am not sure about how I should go about using them to build the kind of collection I'm looking for.
This is Java in the context of Android if that is significant.
Java's built-in tree-based collections won't work.
To illustrate, consider a tree containing weak references to nodes 'B', 'C', and 'D':
C
B D
Now let the weak reference 'C' get collected, leaving null behind:
-
B D
Now insert an element into the tree. The TreeMap/TreeSet doesn't have sufficient information to select the left or right subtree. If your comparator says null is a small value, then it will be incorrect when inserting 'A'. If it says null is a large value, it will be incorrect when inserting 'E'.
Sort on demand is a good choice.
A more robust solution is to use an ArrayList<WeakReference<T>> and to implement a Comparator<WeakReference<T>> that delegates to a Comparator<T>. Then call Collections.sort() prior to iteration.
Android's Collections.sort uses TimSort behind-the-scenes and so it runs quite efficiently if the input is already partially sorted.
Perhaps the collections classes are a level of abstraction below what you're looking for? It sounds like the end product you want is a cache with the ability to iterate in a user-defined sort order. If so, perhaps the cache interface in the Google Guava library is close enough to what you want:
http://code.google.com/p/guava-libraries/source/browse/trunk/guava/src/com/google/common/cache/Cache.java
At a glance, it looks like CacheBuilder in that package doesn't allow you to build an implementation with user-defined iteration order. However, it does provide a Map view that might be good enough for your needs:
List<Thing> cachedThings = Lists.newArrayList(cache.asMap().values());
Collections.sort(cachedThings, YOUR_THING_COMPARATOR);
for (Thing thing : cachedThings) { ... }
Even if this isn't exactly what you want, the classes in that package might give you some useful insights re: using References with Collections.
DISCLAIMER: This was a comment but it got kinda big, sorry if it doesn't solve your problem:
References in Java
Just to clarify what I mean when I say reference, since it isn't really a term commonly used in Java: Java does not really use references or pointers. It uses a kind of pseudo-reference that can be (and is by default) assigned to the special null instance. That's one way to explain it anyway. In Java, these pseudo-references are the only way that an Object can be handled. When I say reference, I mean these pseudo-references.
Sets
Any Set implementation will not allow two references to the same object to be included in it since it uses identity equality for this check. That violates the mathematical concept of a set. The Java Sets ignore any attempt to add duplicate references.
You mention a Map in your comment though... Could you clarify what kind of collection you are after? And why you need that kind of equality checking within it? Are you thinking in C++ terms? I'll try to edit my answer to be more helpful then :)
EDIT: I thought that might have been your goal ;) So a TreeSet should do the trick then! I would not get concerned about performance until there is a performance issue. Simplicity is fantastic for readability, maintenance and preventing bugs. If performance does become a problem, ideally you should profile your code and only optimize the areas that are proven to be the problem.
I am wondering about the performance of Java HashMap vs JSONObject.
It seems JSONObject stores data internally using HashMap. But JSONObject might have additional overhead compared to HashMap.
Does any one know about the performance of Java JSONObject compared to HashMap?
Thanks!
As you said, JSONObject is backed by a HashMap.
Because of this, performance will be almost identical. JSONObject.get() adds a null check, and will throw an exception if a key isn't found. JSONObject.put() just calls map.put().
So, there is almost no overhead. If you are dealing with JSON objects, you should always use JSONObject over HashMap.
I would say the question doesn't make sense for a few reasons:
Comparing apples to oranges: HashMap and JSONObject are intended for 2 completely different purposes. It's like asking "is the Person class or Company class more efficient for storing a PhoneNumber object". Use what makes sense.
If you are converting to/from JSON, you are likely sending the data to a far away place (like a user's browser). The time taken to send this data over the network and evaluate it in the user's browser will (likely) far eclipse any performance differences of populating a Hashmap or JSONObject.
There is more than 1 "JSONObject" implementation floating around out there.
Finally, you haven't asked about what sort of performance you would like to measure. What are you actually planning to do with these classes?
Existing answers are correct, performance differences between the two are negligible.
Both are basically rather inefficient methods of storing and manipulating data. More efficient method is typically to bind into regular Java objects, which use less memory and are faster to access. Many developers use org.json's simple (primitive) library because it is well-known, but it is possible the least convenient and efficient alternative available. Choices like Jackson and Gson are big improvements so it is worth considering using them.
JSONObject does not have too much additional overhead on top of a HashMap. If you are okay with using a HashMap then you should be okay using a JSONObject. This is provided you want to generate JSON.
JSONObject checks for validity of values that you are storing as part of your JSONObject, to make sure it conforms to the JSON spec. For e.g. NaN values do not form a part of valid JSON. Apart from this, JSONObject can generate json strings (regular | prettfied). Those strings can get pretty big, depending on the amount of JSON. Also, JSONObject uses StringBuffer, so one of the many things that i would do would be to replace all occurrences of StringBuffer with StringBuilder.
JSONObject (from org.json) is one of the simple JSON libraries that you can use. If you want something very efficient, use something like Jackson.
The only performance overhead is on casting data! As you JSONObject stores data on a HashMap of objects and it casts the datatype you want.
Sometime back our architect gave this funda to me and I couldn't talk to him more to get the details at the time, but I couldn't understand how arrays are more serializable/better performant over ArrayLists.
Update: This is in the web services code if it is important and it can be that he might mean performance instead of serializability.
Update: There is no problem with XML serialization for ArrayLists.
<sample-array-list>reddy1</sample-array-list>
<sample-array-list>reddy2</sample-array-list>
<sample-array-list>reddy3</sample-array-list>
Could there be a problem in a distributed application?
There's no such thing as "more serializable". Either a class is serializable, or it is not. Both arrays and ArrayList are serializable.
As for performance, that's an entirely different topic. Arrays, especially of primitives, use quite a bit less memory than ArrayLists, but the serialization format is actually equally compact for both.
In the end, the only person who can really explain this vague and misleading statement is the person who made it. I suggest you ask your architect what exactly he meant.
I'm assuming that you are talking about Java object serialization.
It turns out that an array (of objects) and ArrayList have similar but not identical contents. In the array case, the serialization will consist of the object header, the array length and its elements. In the ArrayList case, the serialization consists of the list size, the array length and the first 'size' elements of the array. So one extra 32 bit int is serialized. There may also be differences in the respective object headers.
So, yes, there is a small (probably 4 byte) difference in the size of the serial representations. And it is possible that an array can be serialized / deserialized
slightly more quickly. But the differences are likely to be down in the noise, and not worth worrying about ... unless profiling, etc tells you this is a bottleneck.
EDIT
Based on #Tom Hawtin's comment, the object header difference is significant, especially if the serialization only contains a small number of ArrayList instances.
Maybe he was refering to XML-serialization used in Webservices ?
Having used those a few years ago, I remember that a Webservice returning a List object was difficult to connect to (at least I could not figure it out, probably because of the inner structure of ArrayLists and LinkedLists), although this was trivially done when a native array was returned.
To adress Reddy's comment,
But in any case (array or ArrayList)
will get converted to XML, right?
Yes they will, but the XML-serialization basically translated in XML all the data contained in the serialized object.
For an array, that is a series of values.
For instance, if you declare and serialize
int[] array = {42, 83};
You will probably get an XML result looking like :
<array>42</array>
<array>83</array
For an ArrayList, that is :
an array (obviously), which may have a size bigger than the actual number of elements
several other members such as integer indexes (firstIndex and lastIndex), counts, etc
(you can find all that stuff in the source for ArrayList.java)
So all of those will get translated to XML, which makes it more difficult for the Webservice client to read the actual values : it has to read the index values, find the actual array, and read the values contained between the two indexes.
The serialization of :
ArrayList<Integer> list = new ArrayList<Integer>();
list.add(42);
list.add(83);
might end up looking like :
<firstIndex>0</firstIndex>
<lastIndex>2</lastIndex>
<E>42</E>
<E>83</E>
<E>0</E>
<E>0</E>
<E>0</E>
<E>0</E>
<E>0</E>
<E>0</E>
<E>0</E>
<E>0</E>
So basically, when using XML-serialization in Webservices, you'd better use arrays (such as int[]) than collections (such as ArrayList<Integer>). For that you might find useful to convert Collections to arrays using Collection#toArray().
They both serialize the same data. So I wouldn't say one is significantly better than the other.
As of i know,both are Serializable but using arrays is better coz the main purpose of implementing the ArrayList is for internal easy manipulation purpose,not to expose to outer world.It is little heavier to use ,so when using in webservices while serializing it ,it might create problems in the namespace and headers.If it automatically sets them ,you ll not be able to receive or send data properly.So it is better to use primitive arrays .
Only in Java does this make a difference, and even then it's hard to notice it.
If he didn't mean Java then yes, your best bet would most likely be asking him exactly what he meant by that.
Just a related thought: The List interface is not Serializable so if you want to include a List in a Serializable API you are forced to either expose a Serializable implementation such as ArrayList or convert the List to an array. Good design practices discourage exposing your implementation, which might be why your architect was encouraging you to convert the List to an array. You do pay a little time penalty converting the List to an array, but on the other end you can wrap the array with a list interface with java.util.Arrays.asList(), which is fast.
I'm looking for the format that Java uses to serialize objects. The default serialization serializes the object in a binary format. In particular, I'm curious to know if two runs of a program can serialize the same object differently.
What condition should an object satisfy so that the object maintains its behavior under Java's default serialization/deserialization round-trip?
You need the Java Object Serialization Specification at http://java.sun.com/javase/6/docs/platform/serialization/spec/protocol.html.
If you have two objects with all properties set to identical values, then they will be serialized the same way.
If it weren't repeatable, then it wouldn't be useful!
They will always serialize it the same way. If this wasn't the case, there would be no guarantee that another program could de-serialize the data correctly, defeating the purpose of serialization.
Typically running the same single-threaded algorithm with the same data will result in the same result.
However, things such as the order with which a HashSet serialises entries is not guaranteed. Indeed, an object may be subtly altered when serialised.
I like #Stephen C's example of Object.hashCode(). If such nondeterministic hash codes are serialized, then when we deserialize, the hash codes will be of no use. For example, if we serialize a HashMap that works based on Object.hashCode(), its deserialized version would behave differently than the original map. That is, looking up the same object would give us different results in the two maps.
If you don't want binary then you can use JSON (http://www.json.org/example.html) in java http://www.json.org/java/
Or XML for that matter http://www.developer.com/xml/article.php/1377961/Serializing-Java-Objects-as-XML.htm
I'm looking for the format that Java
uses to serialize objects.
Not to be inane, it writes them somehow. How exactly that is can and probably should be determined by you. A Character maps to .... uh, it gets involved but rather than re-inventing the wheel let us ask exactly what do you need to have available to reconstruct an object to what state?
The default serialization serializes
the object in a binary format.
So? ( again, not trying to be inane - sounds like we need to define a problem that may not have data concepted )
I'm curious to know if two runs of a
program can serialize the same object
differently.
If you had a Stream of information, how would you determine what states the object needed to be restored to?