Name this collection - java

This question is language-agnostic (although it assumes one that is both procedural and OO).
I'm having trouble finding if there is a standard name for a collection with the following behavior:
-Fixed-capacity of N elements, maintaining insertion order.
-Elements are added to the 'Tail'
-Whenever an item is added, the head of the collection is returned (FIFO), although not necessarily removed.
-If the collection now contains more than N elements, the Head is removed - otherwise it remains in the collection (now having advanced one step further towards its ultimate removal).
I often use this structure to keep a running count - i.e. the frame length of the past N frames, so as to provide 'moving window' across which I can average, sum, etc.

Sounds very similar to a circular buffer to me; with the exception that you are probably under-defining or over constraining the add / remove behavior.
Note that there are two "views" of a circular buffer. One is the layout view, which has a section of memory being written to with "head" and "tail" indexes and a bit of logic to "wrap" around when the tail is "before" the head. The other is a "logical" view where you have a queue that's not exposing how it is being laid out, but definately has a limited number of slots which it can "grow to".
Within the context of doing computation, there is a very long standing project that I love (although the cli interface is a bit foreign if you're not use to such things). It's called the RoundRobinDatabase, where each databases stores exactly N copies of a single value (providing graphs, averages, etc). It adjusts the next bin based on a number of parameters, but most often it advances bins based on time. It's often the tool behind a large number of network throughput graphs, and it has configurable bin collision resolution, etc.
In general, algorithims that are sensitive to the last "some number" of entries are often called "sliding box" algorithms, but that's focusing on the algorithm and not on the data structure :)

The programming riddle sounds like a circular linked list to me.
Well, all these description fits, doesn't it?
• Fixed-capacity of N elements, maintaining insertion order.
• Elements are added to the 'Tail'
• Whenever an item is added, the head of the collection is returned (FIFO), although not necessarily removed.
This link with source codes for counting frames probably helps too: frameCounter

Related

Why Java ArrayLists do not shrink automatically

Long time ago I watched a video lecture from the Princeton Coursera MOOC: Introduction to algorithms, which can be found here. It explains the cost of resizing an ArrayList like structure while adding or removing the elements from it. It turns out that if we want to supply resizing to our data structure we will go from O(n) to amortized O(n) for add and remove operations.
I have been using Java ArrayList for a couple of years. I've been always sure that they grow and shrink automatically. Only recently, to my great surprise, I was proven wrong in this post. Java ArrayLists do not shrink (even though, of course they do grow) automatically.
Here are my questions:
In my opinion providing shrinking in ArrayLists does not make any harm as the performance is already amortized O(n). Why did Java creators did not include this feature into the design?
I know that other data structures like HashMaps also do not shrink automatically. Is there any other data structure in Java which is build on top of arrays that supports automatic shrinking?
What are the tendencies in other languages? How does automatic shrinking look like in case of lists, dictionaries, maps, sets in Python/C# etc. If they go in the opposite direction to what Java does, then my question is: why?
The comments already cover most of what you are asking. Here some thoughts on your questions:
When creating a structure like the ArrayList in Java, the developers make certain decisions regarding runtime / performance. They obviously decided to exclude shrinking from the “normal” operations to avoid the additional runtime, which is needed.
The question is why you would want to shrink automatically. The ArrayList does not grow that much (the factor is about 1.5; newCapacity = oldCapacity + (oldCapacity >> 1), to be exact). Maybe you also insert in the middle and not just append at the end. Then a LinkedList (which is not based on an array -> no shrinking needed) might be better. It really depends on your use case. If you think you really need everything an ArrayList does, but it has to shrink when removing elements (I doubt you really need this), just extend ArrayList and override the methods. But be careful! If you shrink at every removal, you are back at O(n).
The C# List and the C++ vector behave the same concerning shrinking a list at removal of elements. But the factors of automatic growing vary. Even some Java-implementations use different factors.
Another issue with automatic shrinking is that you could get into really horrible 'pathological' situations where each insert and delete to the list causes the growing or shrinking the backing array.
For example, if the backing array's initial capacity is 10, such that the array would grow upon the insertion of the 11th element (capacity would grow to 15), a natural implementation would be to shrink the backing array once the list size dropped below 11. If you had a list whose length kept changing between 10 and 11, you'd be constantly changing the backing array. Not only would that add a runtime overhead, but you could start putting lots of pressure on the garbage collector if every operation resulted in either 10 or 15 objects becoming garbage.
Although arraylist with shrinking is still amortized O(n) time complexity, it involves more operation.
By shrinking you just save some memory space by adding some calculation, which is not a wise decision because the Moore's law says computer space double every 2 years. So the time is more valuable in algorithm than space.

Efficiency of ArrayList

I am making a program in Java in which a ball bounces around on the screen. The user can add other balls, and they all bounce off of each other. My question lies in the storage of the added balls. At the moment, I am using an ArrayList to store them, and every time the space bar is pressed, a new ball class is created and added to an Array List. Is this the most efficient way of doing things? I don't specify the size of the Array List at the beginning, so is it inefficient to have to allocate a new space on the array every time the user wants a new ball, even if the ball count will get up in the hundreds? Is there another class I could use to handle this in a more efficient manner?
Thanks!
EDIT:
Sorry, I should have been more clear. I iterate through the balls every 30 milliseconds, using nested for loops to see if they are intersecting with each other. I do access one ball the most often (the ball which the user can control with the arrow keys, another feature of the game), but the user can choose to switch control balls. Balls are never removed. So, I am performing some fairly complex calculations (I use my own vector class to move them off of each other every time there is a collision) on the balls very often.
Measure it and find out! In all seriousness, often times the best way to get answers to these questions is to set up a benchmark and swap in different collection types.
I can tell you that it won't allocate new space every time you add a new item to the ArrayList. Extra space is allocated so that it has room to grow.
LinkedList is another List option. It is super cheap to add items, but random access (list.get(10)) is expensive. Sets could also be good if you don't need ordered access (though there are ordered sets, too), and you want a Map implementation if you're accessing them by some sort of key/id. It really all depends on how you're using the collection.
Update based on added details
It sounds like you are mostly doing sequential reads through the entire list. In that scenario, a LinkedList is probably your best choice. Though again, if you only expose the List interface to the rest of your code (or even a more general Collection), you can easily swap in different implementations and actually measure the difference.
ArrayList is a highly optimized and very efficient wrapper on top of a plain Java array. A small timing overhead comes from copying array elements, which happens when the allocated size is less than required number of elements. When you grow the array into a few hundreds of items, the copying will happen less than ten times, so the price you pay for not knowing the size in advance is very small. You can further reduce that overhead by suggesting an initial size for your ArrayList.
Removing from the middle of the ArrayList does take linear time. If you plan to remove items and/or insert them in the middle of the list frequently, this may become an issue. Note, however, that the overhead is not going to be worse than that for a plain array.
I iterate through the balls every 30 milliseconds, using nested for loops to see if they are intersecting with each other.
This does not have much to do with the collection in which the balls are stored. You could use a spatial index to improve the speed of finding intersections.
About ArrayList in Java, the complexity of remove at the end and add one element is Amortize O(1). Or, you can say, it's almost efficient in most cases. (In some rare cases, it will be awful.)
But you should think more carefully about your design before choosing your data structure.
How many objects often in your collection. If it's small, you can free to choose any data structure that you feel easily to work with. it will almost doesn't lost performance for your code.
If you often find one ball in all of your balls, another datastructure such as HashMap or HashSet would be better.
Or you often delete at middle of your list, maybe LinkedList will be appropriate choice :)
I'd recommend working out the way in which you need to access the balls, and pick an appropriate interface (Not implementation) eg. If you're accessing sequentially only, use a List. If you need to look up the ball by ID, think of a Map. The interface should match your requirements in terms of functionality, not in terms of speed/efficiency.
Then pick an implementation, eg. HashMap or TreeMap, and write your code.
Afterwards, profile it - Is your code inefficient in the ball access code? If so, then try to optimise by switching to an alternate implementation thats more appropriate to your needs.

Most efficient way to keep a collection in Java?

Currently, I have a LinkedList which stores a custom Node class. The Nodes are currently removed in order and evaluated, which generally adds more Nodes back into the LinkedList, treating it like a Queue.
But in reality I don't care about maintaining the order of the Nodes because the order they are being added or removed doesn't matter. You can remove the 1st, 54th, or 1032nd Node from the List, it doesn't matter. All that matters is the Nodes are being processed quickly, which means one is removed (at random), mutated, then added back along with several variations of it (once again the order doesn't matter).
Since I haven't been able to find a Java Bag implementation, what is the most efficient way to maintain this type of collection ?
PS Out of laziness I have avoid using arrays because the collection of nodes could theoretically range from 1 Node to 3^64 Nodes in size, though it's more likely to stay under a million.
The Java HashSet or TreeSet types might be good here, since they represent unordered collections of elements that support quick insertion and deletion of elements. That said, you can't possibly hold 364 values in memory, since that's appromately 3.4336838 × 1030, a number vastly bigger than any amount of RAM that I know of can hold.
EDIT: Based on the described use case (support efficient insertion and removal of random elements), you might want to adopt the approach described in this older question for building a data structure that does just that. Intuitively, you would use an ArrayList, then remove elements by swapping them to the end of the ArrayList and removing them. This gives O(1) insertion and O(1) removal with extremely low overhead.
Hope this helps!

Performance of Collection class in Java

All,
I have been going through a lot of sites that post about the performance of various Collection classes for various actions i.e. adding an element, searching and deleting. But I also notice that all of them provide different environments in which the test was conducted i.e. O.S, memory, threads running etc.
My question is, if there is any site/material that provides the same performance information on best test environment basis? i.e. the configurations should not be an issue or catalyst for poor performance of any specific data structure.
[Updated]: Example, HashSet and LinkedHashSet both have a complexity of O (1) for inserting an element. However, Bruce Eckel' test claims that insertion is going to take more time for LinkedHashSet than for HashSet [http://www.artima.com/weblogs/viewpost.jsp?thread=122295]. So should I still go by the Big-Oh notation ?
Here are my recommendations:
First of all, don't optimize :) Not that I am telling you to design crap software, but just to focus on design and code quality more than premature optimization. Assuming you've done that, and now you really need to worry about which collection is best beyond purely conceptual reasons, let's move on to point 2
Really, don't optimize yet (roughly stolen from M. A. Jackson)
Fine. So your problem is that even though you have theoretical time complexity formulas for best cases, worst cases and average cases, you've noticed that people say different things and that practical settings are a very different thing from theory. So run your own benchmarks! You can only read so much, and while you do that your code doesn't write itself. Once you're done with the theory, write your own benchmark - for your real-life application, not some irrelevant mini-application for testing purposes - and see what actually happens to your software and why. Then pick the best algorithm. It's empirical, it could be regarded as a waste of time, but it's the only way that actually works flawlessly (until you reach the next point).
Now that you've done that, you have the fastest app ever. Until the next update of the JVM. Or of some underlying component of the operating system your particular performance bottleneck depends on. Guess what? Maybe your clients have different ones. Here comes the fun: you need to be sure that your benchmark is valid for others or in most cases (or have fun writing code for different cases). You need to collect data from users. LOTS. And then you need to do that over and over again to see what happens and if it still holds true. And then re-write your code accordingly over and over again (The - now terminated - Engineering Windows 7 blog is actually a nice example of how user data collection helps to make educated decisions to improve user experience.
Or you can... you know... NOT optimize. Platforms and compilers will change, but a good design should - on average - perform well enough.
Other things you can also do:
Have a look at the JVM's source code. It's very educative and you discover a herd of hidden things (I'm not saying that you have to use them...)
See that other thing on your TODO list that you need to work on? Yes, the one near the top but that you always skip because it's too hard or not fun enough. That one right there. Well get to it and leave the optimization thingy alone: it's the evil child of a Pandora's Box and a Moebius band. You'll never get out of it, and you'll deeply regret you tried to have your way with it.
That being said, I don't know why you need the performance boost so maybe you have a very valid reason.
And I am not saying that picking the right collection doesn't matter. Just that ones you know which one to pick for a particular problem, and that you've looked at alternatives, then you've done your job without having to feel guilty. The collections have usually a semantic meaning, and as long as you respect it you'll be fine.
In my opinion, all you need to know about a data structure is the Big-O of the operations on it, not subjective measures from different architectures. Different collections serve different purposes.
Maps are dictionaries
Sets assert uniqueness
Lists provide grouping and preserve iteration order
Trees provide cheap ordering and quick searches on dynamically changing contents that require constant ordering
Edited to include bwawok's statement on the use case of tree structures
Update
From the javadoc on LinkedHashSet
Hash table and linked list implementation of the Set interface, with predictable iteration order.
...
Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.
Now we have moved from the very general case of choosing an appropriate data-structure interface to the more specific case of which implementation to use. However, we still ultimately arrive at the conclusion that specific implementations are well suited for specific applications based on the unique, subtle invariant offered by each implementation.
What do you need to know about them, and why? The reason that benchmarks show a given JDK and hardware setup is so that they could (in theory) be reproduced. What you should get from benchmarks is an idea of how things will work. For an ABSOLUTE number, you will need to run it vs your own code doing your own thing.
The most important thing to know is the Big O runtime of various collections. Knowing that getting an element out of an unsorted ArrayList is O(n), but getting it out of a HashMap is O(1) is HUGE.
If you are already using the correct collection for a given job, you are 90% of the way there. The times when you need to worry about how fast you can, say, get items out of a HashMap should be pretty darn rare.
Once you leave single threaded land and move into multi-threaded land, you will need to start worrying about things like ConcurrentHashMap vs Collections.synchronized hashmap. Until you are multi threaded, you can just not worry about this kind of stuff and focus on which collection for which use.
Update to HashSet vs LinkedHashSet
I haven't ever found a use case where I needed a Linked Hash Set (because if I care about order I tend to have a List, if I care about O(1) gets, I tend to use a HashSet. Realistically, most code will use ArrayList, HashMap, or HashSet. If you need anything else, you are in a "edge" case.
The different collection classes have different big-O performances, but all that tells you is how they scale as they get large. If your set is big enough the one with O(1) will outperform the one with O(N) or O(logN), but there's no way to tell what value of N is the break-even point, except by experiment.
Generally, I just use the simplest possible thing, and then if it becomes a "bottleneck", as indicated by operations on that data structure taking much percent of time, then I will switch to something with a better big-O rating. Quite often, either the number of items in the collection never comes near the break-even point, or there's another simple way to resolve the performance problem.
Both HashSet and LinkedHashSet have O(1) performance. Same with HashMap and LinkedHashMap (actually the former are implemented based on the later). This only tells you how these algorithms scale, not how they actually perform. In this case, LinkHashSet does all the same work as HashSet but also always has to update a previous and next pointer to maintain the order. This means that the constant (this is an important value also when talking about actual algorithm performance) for HashSet is lower than LinkHashSet.
Thus, since these two have the same Big-O, they scale the same essentially - that is, as n changes, both have the same performance change and with O(1) the performance, on average, does not change.
So now your choice is based on functionality and your requirements (which really should be what you consider first anyway). If you only need fast add and get operations, you should always pick HashSet. If you also need consistent ordering - such as last accessed or insertion order - then you must also use the Linked... version of the class.
I have used the "linked" class in production applications, well LinkedHashMap. I used this in one case for a symbol like table so wanted quick access to the symbols and related information. But I also wanted to output the information in at least one context in the order that the user defined those symbols (insertion order). This makes the output more friendly for the user since they can find things in the same order that they were defined.
If I had to sort millions of rows I'd try to find a different way. Maybe I could improve my SQL, improve my algorithm, or perhaps write the elements to disk and use the operating system's sort command.
I've never had a case where collections where the cause of my performance issues.
I created my own experimentation with HashSets and LinkedHashSets. For add() and contains the running time is O(1) , not taking into consideration for a lot of collisions. In the add() method for a linkedhashset, I put the object in a user created hash table which is O(1) and then put the object in a separate linkedlist to account for order. So the running time to remove an element from a linkedhashset, you must find the element in the hashtable and then search through the linkedlist that has the order. So the running time is O(1) + O(n) respectively which is o(n) for remove()

Data structures: Which should I use for these conditions?

This shouldn't be a difficult question, but I'd just like someone to bounce it off of before I continue. I simply need to decide what data structure to use based on these expected activities:
Will need to frequently iterate through in sorted order (starting at the head).
Will need to remove/restore arbitrary elements from the/a sorted view.
Later I'll be frequently resorting the data and working with multiple sorted views.
Also later I'll be frequently changing the position of elements within their sorted views.
This is in Java, by the way.
My best guess is that I'll either be rolling some custom Linked Hash Set (to arrange the links in sorted order) or possibly just using a Tree Set. But I'm still not completely sure yet. Recommendations?
Edit: I guess because of the arbitrary remove/restore, I should probably stick with a Tree Set, right?
Actually, not necessarily. Hmmm...
In theory, I'd say the right data structure is a multiway tree - preferably something like a B+ tree. Traditionally this is a disk-based data structure, but modern main memory has a lot of similar characteristics due to layers of cache and virtual memory.
In-order iteration of a B+ tree is very efficient because (1) you only iterate through the linked-list of leaf nodes - branch nodes aren't needed, and (2) you get extremely good locality.
Finding, removing and inserting arbitrary elements is log(n) as with any balanced tree, though with different constant factors.
Resorting within the tree is mostly a matter of choosing an algorithm that gives good performance when operating on a linked list of blocks (the leaf nodes), minimising the need to use leaf nodes - variants of quicksort or mergesort seem like likely candidates. Once the items are sorted in the branch nodes, just propogate the summary information back through the leaf nodes.
BUT - pragmatically, this is only something you'd do if you're very sure that you need it. Odds are good that you're better off using some standard container. Algorithm/data structure optimisation is the best kind of optimisation, but it can still be premature.
Standard LinkedHashSet or LinkedMultiset from google collections if you want your data structure to store not unique values.

Categories

Resources