Sometime back our architect gave this funda to me and I couldn't talk to him more to get the details at the time, but I couldn't understand how arrays are more serializable/better performant over ArrayLists.
Update: This is in the web services code if it is important and it can be that he might mean performance instead of serializability.
Update: There is no problem with XML serialization for ArrayLists.
<sample-array-list>reddy1</sample-array-list>
<sample-array-list>reddy2</sample-array-list>
<sample-array-list>reddy3</sample-array-list>
Could there be a problem in a distributed application?
There's no such thing as "more serializable". Either a class is serializable, or it is not. Both arrays and ArrayList are serializable.
As for performance, that's an entirely different topic. Arrays, especially of primitives, use quite a bit less memory than ArrayLists, but the serialization format is actually equally compact for both.
In the end, the only person who can really explain this vague and misleading statement is the person who made it. I suggest you ask your architect what exactly he meant.
I'm assuming that you are talking about Java object serialization.
It turns out that an array (of objects) and ArrayList have similar but not identical contents. In the array case, the serialization will consist of the object header, the array length and its elements. In the ArrayList case, the serialization consists of the list size, the array length and the first 'size' elements of the array. So one extra 32 bit int is serialized. There may also be differences in the respective object headers.
So, yes, there is a small (probably 4 byte) difference in the size of the serial representations. And it is possible that an array can be serialized / deserialized
slightly more quickly. But the differences are likely to be down in the noise, and not worth worrying about ... unless profiling, etc tells you this is a bottleneck.
EDIT
Based on #Tom Hawtin's comment, the object header difference is significant, especially if the serialization only contains a small number of ArrayList instances.
Maybe he was refering to XML-serialization used in Webservices ?
Having used those a few years ago, I remember that a Webservice returning a List object was difficult to connect to (at least I could not figure it out, probably because of the inner structure of ArrayLists and LinkedLists), although this was trivially done when a native array was returned.
To adress Reddy's comment,
But in any case (array or ArrayList)
will get converted to XML, right?
Yes they will, but the XML-serialization basically translated in XML all the data contained in the serialized object.
For an array, that is a series of values.
For instance, if you declare and serialize
int[] array = {42, 83};
You will probably get an XML result looking like :
<array>42</array>
<array>83</array
For an ArrayList, that is :
an array (obviously), which may have a size bigger than the actual number of elements
several other members such as integer indexes (firstIndex and lastIndex), counts, etc
(you can find all that stuff in the source for ArrayList.java)
So all of those will get translated to XML, which makes it more difficult for the Webservice client to read the actual values : it has to read the index values, find the actual array, and read the values contained between the two indexes.
The serialization of :
ArrayList<Integer> list = new ArrayList<Integer>();
list.add(42);
list.add(83);
might end up looking like :
<firstIndex>0</firstIndex>
<lastIndex>2</lastIndex>
<E>42</E>
<E>83</E>
<E>0</E>
<E>0</E>
<E>0</E>
<E>0</E>
<E>0</E>
<E>0</E>
<E>0</E>
<E>0</E>
So basically, when using XML-serialization in Webservices, you'd better use arrays (such as int[]) than collections (such as ArrayList<Integer>). For that you might find useful to convert Collections to arrays using Collection#toArray().
They both serialize the same data. So I wouldn't say one is significantly better than the other.
As of i know,both are Serializable but using arrays is better coz the main purpose of implementing the ArrayList is for internal easy manipulation purpose,not to expose to outer world.It is little heavier to use ,so when using in webservices while serializing it ,it might create problems in the namespace and headers.If it automatically sets them ,you ll not be able to receive or send data properly.So it is better to use primitive arrays .
Only in Java does this make a difference, and even then it's hard to notice it.
If he didn't mean Java then yes, your best bet would most likely be asking him exactly what he meant by that.
Just a related thought: The List interface is not Serializable so if you want to include a List in a Serializable API you are forced to either expose a Serializable implementation such as ArrayList or convert the List to an array. Good design practices discourage exposing your implementation, which might be why your architect was encouraging you to convert the List to an array. You do pay a little time penalty converting the List to an array, but on the other end you can wrap the array with a list interface with java.util.Arrays.asList(), which is fast.
Related
I am writing a program that will be heavily reliant on ... something ... that stores data like an array where I am able to access any point of the data at any given time as I can in an array.
I know that the java library has an Array class that I could use or I could use a raw array[].
I expect that using the Array type is a bit easier to code, but I expect that it is slightly less efficient as well.
My question is, which is better to use between these two, and is there a better way to accomplish the same result?
Actually Array would be of no help -- it's not what you think it is. The class java.util.ArrayList, on the other hand, is. In general, if you can program with collection classes like ArrayList, do so -- you'll more easily arrive at correct, flexible software that's easier to read, too. And that "if" applies almost all the time; raw arrays are something you use as a last resort or, more often, when a method you want to call requires one as an argument.
The Array class is used for Java reflection and is very, very, rarely used.
If you want to store data in an array, use plain old arrays, indicated with [], or as Gabe's comment on the question suggests, java.util.ArrayList. ArrayList is, as your comment suggests easier to code (when it comes to adding and removing elements!!) but yes, is slightly less efficient. For variable-size collections, ArrayList is all but required.
My question is, which is better to use between these two, and is there a better way to accomplish the same result?
It depends on what you are trying to achieve:
If the number of elements in the array is known ahead of time, then an array type is a good fit. If not, a List type is (at least) more convenient to use.
The List interface offers a number of methods such as contains, insert, remove and so on that can save you coding ... if you need to do that sort of thing.
If properly used, an array type will use less space. The difference is particularly significant for arrays of primitive types where using a List means that the elements need to be represented using wrapper types (e.g. byte becomes Byte).
The Array class is not useful in this context, and neither is the Arrays class. The choice is between ArrayList (or some other List implementation class) and primitive arrays.
In terms of ease of use, the Array class is a lot easier to code.
The array[] is quite a problem in terms of the case that you need to know
the size of the list of objects beforehand.
Instead, you could use a HashMap. It is very efficient in search as well as sorting as
the entire process is carried out in terms of key values.
You could declare a HashMap as:
HashMap<String, Object> map = new HashMap<String, Object>();
For the Object you can use your class, and for key use the value which needs to be unique.
Lets say we have a bunch of data (temp,wind,pressure) that ultimately comes in as a number of float arrays.
For example:
float[] temp = //get after performing some processing (takes time)
float[] wind =
Say we want to store these values in memory for different hours of the day. Is it better to put these on a HashMap like:
HashMap maphr1 = new HashMap();
maphr1.put("temp",temp);
maphr1.put("wind",wind);
...
Or is it better to create a Java object like:
public class HourData(){
private float[] temp,wind,pressure;
//getters and setters for above!
}
...
// use it like this
HourData hr1 = new HourData();
hr1.setTemp(temp);
hr1.setWind(wind);
Out of these two approaches which is better in terms of performance, readability, good OOP practice etc
You're best off having an HourData class that stores a single set of temperature, wind, and pressure values, like this:
public class HourData {
private float temp, wind, pressure;
// Getters and setters for the above fields
}
If you need to store more than one set of values, you can use an array, or a collection of HourData objects. For example:
HourData[] hourDataArray = new HourData[10000];
This is ultimately much more flexible, performant, and intuitive to use than putting storing the arrays of data in your HourData class.
Flexibility
I say that this approach is more flexible because it leaves the choice of what kind of collection implementation to use (e.g. ArrayList, LinkedList, etc.) to users of the HourData class. Moreover, if he/she wishes to deal just with a single set of values, this approach doesn't force them to deal with an array or collection.
Performance
Suppose you have a list of HourData instances. If you used three float arrays in the way that you described, then accessing the i'th temp, wind, and pressure values may cause three separate pages to be accessed in memory. This happens because all of the temp values will be stored contiguously, followed by all of the wind values, followed by all of the pressure values. If you use a class to group these values together, then accessing the i'th temp, wind, and pressure values will be faster because they will all be stored adjacent to each other in memory.
Intuitive
If you use a HashMap, anyone who needs to access any of the fields will have to know the field names in advance. HashMap objects are better suited to key/value pairs where the keys are not known at compile time. Using an HourData class that contains clearly defined fields, one only needs to look at the class API to know that HourData contains values for temp, wind, and pressure.
Also, getter and setter methods for array fields can be confusing. What if I just want to add a single set of temp, wind, and pressure values to the list? Do I have to get each of the arrays, and add the new values to the end of them? This kind of confusion is easily avoided by using a "wrapper" collection around an HourData that deals only with single values.
For readability i would definately go for a object since it makes more sense. Especially since you store different datacollections like the wind longs have a different meaning as the temp longs.
Besides this you can also store other information like the location and time of your measurement.
Well if you dont have any key to differentiate different instances of the same object. I would create HourData objects and store them in a array list.
Putting data in a contained object always increases the readability.
You have mentioned bunch of data, So I would rather read it as collection of data.
So the answer is , if something already available in Java collection framework out of box , why do you want to write one for you.
You should look at Java collection classes and see which fits your requirement better, whether it is concurrent access, fast retrieve time or fast add time etc etc..
Hope this helps
EDIT----
Adding one more dimension to this.
The type of application you are building also affects your approach.
The above discussion rightly mentions readability, flexibility , performance as driving criteria for your design.
But the type of application you are building is also one of the influencing factors.
For example, Lets say you are building a web application.
A Object which is stored in memory for a long time would be either in Application or Session Scope. So you will have to make it immutable by design or use it for thread safe manner.
The business data which remains same across different implementations should be designed as per OOP or best practices but the infrastructure or Application logic should more be your framework driven.
I feel what you are talking, like keeping an object for a long time in memory is more a framework driven outlook, hence I suggested use Java Collection and put your business objects inside it. Important points are
Concurrent Access Control
Immutable by design
If you have a limited and already defined list of parameters then it's better to use the second approach.
In terms of performance: you don't need to search for key in hashmap
In terms of readability: data.setTemp(temp) is better than map.put("temp", temp). One of the benefits of the first approach is that typing errors will be catched during the compilation
In terms of good OOP practices: first approach has nothing to do with OOP practices. Using the second approach you can easily change the implementation, add new methods, provide several alternative data object implementations, etc.
But you might want to use collections if you don't know the parameters and if you want to work with uncategorized(extensible) set of parameters.
When is it better to use a vector than an array and vice versa in java? and why?
Vector: never, unless an API requires it, because it's a class, not an interface.
List: this should be your default array-like collection. It's an interface so anything can be a List if it needs to. (and there are lots of List implementations out there e.g. ArrayList, LinkedList, CopyOnWriteArrayList, ImmutableList, for various feature sets)
Vector is threadsafe, but so is the Collections.synchronizedList() wrapper.
array: rarely, if required by an API. The one other major advantage of arrays is when you need a fixed-length array of primitives, the memory space required is fairly compact, as compared to a List<Integer> where the integers need to be boxed into Integer objects.
A Vector (or List) when you don't know before hand how many elements are going to be inserted.
An array when you absolutely know what's the maximum number of elements on that vector's whole life.
Since there are no high performance penalties when using List or Vector (any collection for that matter), I would always chose to use them. Their flexibility is too important to not be considered.
Nowadays I only use arrays when I absolutely need to. Example: when using an API that requires them.
First off ArrayList is a faster implementation than Vector (but not thread safe though).
Arrays are handy when you know the length beforehand and will not change much (or at all).
When declaring a method, use List.
Do not use Vector, it's an early part of the JDK and was retrofitted to work with Collections.
If there's a very performance-sensitive algorithm, a private array member can be helpful. If there's a need to return its contents or pass them to a method, it's generally best to construct an object around it, perhaps as simple as Arrays.asList(thePrivateArray). For a thread-safe list: Collections.synchronizedList(Arrays.asList(thePrivateArray)). In order to prevent modification of the array contents, I typically use Collections.unmodifiableList(Arrays.asList(thePrivateArray)).
What is the need of Collection framework in Java since all the data operations(sorting/adding/deleting) are possible with Arrays and moreover array is suitable for memory consumption and performance is also better compared with Collections.
Can anyone point me a real time data oriented example which shows the difference in both(array/Collections) of these implementations.
Arrays are not resizable.
Java Collections Framework provides lots of different useful data types, such as linked lists (allows insertion anywhere in constant time), resizeable array lists (like Vector but cooler), red-black trees, hash-based maps (like Hashtable but cooler).
Java Collections Framework provides abstractions, so you can refer to a list as a List, whether backed by an array list or a linked list; and you can refer to a map/dictionary as a Map, whether backed by a red-black tree or a hashtable.
In other words, Java Collections Framework allows you to use the right data structure, because one size does not fit all.
Several reasons:
Java's collection classes provides a higher level interface than arrays.
Arrays have a fixed size. Collections (see ArrayList) have a flexible size.
Efficiently implementing a complicated data structures (e.g., hash tables) on top of raw arrays is a demanding task. The standard HashMap gives you that for free.
There are different implementation you can choose from for the same set of services: ArrayList vs. LinkedList, HashMap vs. TreeMap, synchronized, etc.
Finally, arrays allow covariance: setting an element of an array is not guaranteed to succeed due to typing errors that are detectable only at run time. Generics prevent this problem in arrays.
Take a look at this fragment that illustrates the covariance problem:
String[] strings = new String[10];
Object[] objects = strings;
objects[0] = new Date(); // <- ArrayStoreException: java.util.Date
Collection classes like Set, List, and Map implementations are closer to the "problem space." They allow developers to complete work more quickly and turn in more readable/maintainable code.
For each class in the Collections API there's a different answer to your question. Here are a few examples.
LinkedList: If you remove an element from the middle of an array, you pay the cost of moving all of the elements to the right of the removed element. Not so with a linked list.
Set: If you try to implement a set with an array, adding an element or testing for an element's presence is O(N). With a HashSet, it's O(1).
Map: To implement a map using an array would give the same performance characteristics as your putative array implementation of a set.
It depends upon your application's needs. There are so many types of collections, including:
HashSet
ArrayList
HashMap
TreeSet
TreeMap
LinkedList
So for example, if you need to store key/value pairs, you will have to write a lot of custom code if it will be based off an array - whereas the Hash* collections should just work out of the box. As always, pick the right tool for the job.
Well the basic premise is "wrong" since Java included the Dictionary class since before interfaces existed in the language...
collections offer Lists which are somewhat similar to arrays, but they offer many more things that are not. I'll assume you were just talking about List (and even Set) and leave Map out of it.
Yes, it is possible to get the same functionality as List and Set with an array, however there is a lot of work involved. The whole point of a library is that users do not have to "roll their own" implementations of common things.
Once you have a single implementation that everyone uses it is easier to justify spending resources optimizing it as well. That means when the standard collections are sped up or have their memory footprint reduced that all applications using them get the improvements for free.
A single interface for each thing also simplifies every developers learning curve - there are not umpteen different ways of doing the same thing.
If you wanted to have an array that grows over time you would probably not put the growth code all over your classes, but would instead write a single utility method to do that. Same for deletion and insertion etc...
Also, arrays are not well suited to insertion/deletion, especially when you expect that the .length member is supposed to reflect the actual number of contents, so you would spend a huge amount of time growing and shrinking the array. Arrays are also not well suited for Sets as you would have to iterate over the entire array each time you wanted to do an insertion to check for duplicates. That would kill any perceived efficiency.
Arrays are not efficient always. What if you need something like LinkedList? Looks like you need to learn some data structure : http://en.wikipedia.org/wiki/List_of_data_structures
Java Collections came up with different functionality,usability and convenience.
When in an application we want to work on group of Objects, Only ARRAY can not help us,Or rather they might leads to do things with some cumbersome operations.
One important difference, is one of usability and convenience, especially given that Collections automatically expand in size when needed:
Collections came up with methods to simplify our work.
Each one has a unique feature:
List- Essentially a variable-size array;
You can usually add/remove items at any arbitrary position;
The order of the items is well defined (i.e. you can say what position a given item goes in in the list).
Used- Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them.
Set- Things can be "there or not"— when you add items to a set, there's no notion of how many times the item was added, and usually no notion of ordering.
Used- Remembering "which items you've already processed", e.g. when doing a web crawl;
Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.
Here you find use of each collection as per scenario:
Collection is the framework in Java and you know that framework is very easy to use rather than implementing and then use it and your concern is that why we don't use the array there are drawbacks of array like it is static you have to define the size of row at least in beginning, so if your array is large then it would result primarily in wastage of large memory.
So you can prefer ArrayList over it which is inside the collection hierarchy.
Complexity is other issue like you want to insert in array then you have to trace it upto define index so over it you can use LinkedList all functions are implemented only you need to use and became your code less complex and you can read there are various advantages of collection hierarchy.
Collection framework are much higher level compared to Arrays and provides important interfaces and classes that by using them we can manage groups of objects with a much sophisticated way with many methods already given by the specific collection.
For example:
ArrayList - It's like a dynamic array i.e. we don't need to declare its size, it grows as we add elements to it and it shrinks as we remove elements from it, during the runtime of the program.
LinkedList - It can be used to depict a Queue(FIFO) or even a Stack(LIFO).
HashSet - It stores its element by a process called hashing. The order of elements in HashSet is not guaranteed.
TreeSet - TreeSet is the best candidate when one needs to store a large number of sorted elements and their fast access.
ArrayDeque - It can also be used to implement a first-in, first-out(FIFO) queue or a last-in, first-out(LIFO) queue.
HashMap - HashMap stores the data in the form of key-value pairs, where key and value are objects.
Treemap - TreeMap stores key-value pairs in a sorted ascending order and retrieval speed of an element out of a TreeMap is quite fast.
To learn more about Java collections, check out this article.
In Java, when would it be preferential to use a List rather than an Array?
I see the question as being the opposite-
When should you use an Array over a List?
Only you have a specific reason to do so (eg: Project Constraints, Memory Concerns (not really a good reason), etc.)
Lists are much easier to use (imo), and have much more functionality.
Note: You should also consider whether or not something like a Set, or another datastructure is a better fit than a List for what you are trying to do.
Each datastructure, and implmentation, has different pros/cons. Pick the ones that excel at the things that you need to do.
If you need get() to be O(1) for any item? Likely use an ArrayList, Need O(1) insert()? Possibly a Linked List. Need O(1) contains()? Possibly a Hashset.
TLDR: Each data structure is good at some things, and bad at others. Look at your objectives and choose the data structure that best fits the given problem.
Edit:
One thing not noted is that you're
better off declaring the variable as
its interface (i.e. List or Queue)
rather than its implementing class.
This way, you can change the
implementation at some later date
without changing anything else in the
code.
As an example:
List<String> myList = new ArrayList<String>();
vs
List<String> myList = new LinkedList<String>();
Note that myList is a List in both examples.
--R. Bemrose
Rules of thumb:
Use a List for reference types.
Use arrays for primitives.
If you have to deal with an API that is using arrays, it might be useful to use arrays. OTOH, it may be useful to enforce defensive copying with the type system by using Lists.
If you are doing a lot of List type operations on the sequence and it is not in a performance/memory critical section, then use List.
Low-level optimisations may use arrays. Expect nastiness with low-level optimisations.
Most people have answered it already.
There are almost no good reason to use an array instead of List. The main exception being the primitive array (like int[]). You cannot create a primitive list (must have List<Integer>).
The most important difference is that when using List you can decide what implementation will be used. The most obvious is to chose LinkedList or ArrayList.
I would like to point out in this answer that choosing the implementation gives you very fine grained control over the data that is simply not available to array:
You can prevent client from modifying your list by wrapping your list in a Collection.unmodifiableList
You can synchronize a list for multithreading using Collection.synchronizedList
You can create a fixed length queue with implementation of LinkedBlockingQueue
... etc
In any case, even if you don't want (now) any extra feature of the list. Just use an ArrayList and size it with the size of the array you would have created. It will use an Array in the back-end and the performance difference with a real array will be negligible. (except for primitive arrays)
Pretty much always prefer a list. Lists have much more functionality, particularly iterator support. You can convert a list to an array at any time with the toArray() method.
Always prefer lists.
Arrays when
Varargs for a method ( I guess you are forced to use Arrays here ).
When you want your collections to be covariant ( arrays of reference types are covariant ).
Performance critical code.
If you know how many things you'll be holding, you'll want an array. My screen is 1024x768, and a buffer of pixels for that isn't going to change in size ever during runtime.
If you know you'll need to access specific indexes (go get item #763!), use an array or array-backed list.
If you need to add or remove items from the group regularly, use a linked list.
In general, dealing with hardware, arrays, dealing with users, lists.
It depends on what kind of List.
It's better to use a LinkedList if you know you'll be inserting many elements in positions other than the end. LinkedList is not suitable for random access (getting the i'th element).
It's better to use an ArrayList if you don't know, in advance, how many elements there are going to be. The ArrayList correctly amortizes the cost of growing the backing array as you add more elements to it, and is suitable for random access once the elements are in place. An ArrayList can be efficiently sorted.
If you want the array of items to expand (i.e. if you don't know what the size of the list will be beforehand), a List will be beneficial. However, if you want performance, you would generally use an array.
In many cases the type of collection used is an implementation detail which shouldn't be exposed to the outside world. The more generic your returntype is the more flexibility you have changing the implementation afterwards.
Arrays (primitive type, ie. new int[10]) are not generic, you won't be able to change you implementation without an internal conversion or altering the client code. You might want to consider Iterable as a returntype.