Efficient update of sorted JavaFX ObservableList

Efficient update of sorted JavaFX ObservableList - java

I have a Java ObservableList with thousands of entries that receives hundreds of updates a second backing a JavaFX TableView.
The ObservableList is backed by an ArrayList. Arbitrary sort orders can be applied to the list. The updates may change the sort order of a single entity in the list. I have performance issues if I try to preform a sort after each update, so currently I have a background task that performs a sort every second. However, I'd like to try to sort in real time if possible.
Assuming that the list is already sorted and I know the index of the element to change, is there a more efficient way to update the index of the element than calling sort on the list again?
I've already determined I can use Collections.binarySearch() to efficiently find the index of the element to update. Is there also a way I can efficiently find the index the updated element needs to move to and shift the ArrayList so it remains in order?
I also need to handle add and remove operations, but those are far less common.

Regarding your answer, FXCollections.sort() should be even faster because it handles the FX-Properties better and is specifically written for ObservableLists.

I would use a TreeSet. It can update the order with O(log N) time complexity whereas ArrayList will do an insertion sort with O(n) per entry.

A few suggestions when dealing with sorting on a JavaFX ObservableList/TableView combo:
Ensure your model class includes Property accessors.
Due to a weird quirk in the JavaFX 2.2 implementation that is not present in JavaFX 8+, TableView is far less efficient when dealing with large data models that do not have property accessors than it is when dealing with those that do include property accessor functions. See JavaFx tableview sort is really slow how to improve sort speed as in java swing for more details.
Perform bulk changes on the ObservableList.
Each time you modify an ObservableList that is being observed, the list change listeners on the list are fired to communicate the permutations of the change to the observers. By reducing the number of modifications you make on the list, you can cut down on the number of change events which occur and hence on the overhead of observer notification and processing.
An example technique for this might be to keep a mirror copy of the list data in a standard non-observable list, sort that data, then set that sorted data into the observable list in a single operation.
To avoid premature optimization issues, only do this sort of optimization if the operation is initially slow and the optimization itself provides a significant measurable improvement.
Don't update the ObservableList more often than necessary.
JavaFX display framerate is capped, by default, at 60fps. Updating visible components more often than once a pulse (a frame render trigger) is unnecessary, so batch up all of your changes for each pulse.
For example, if you have a new record coming in every millisecond, collate all records that come in every 20 milliseconds and apply those changes all at once.
To avoid premature optimization issues, only do this sort of optimization if the operation is initially slow and the optimization itself provides a significant measurable improvement.
Java 8 contains some new classes to assist in using sorted content in tables.
I don't really know how the TableView sorting function and SortList work in Java 8. You can request that Oracle write a tutorial with samples and best practices for the Java 8 TableView sort feature by emailing jfx-docs-feedback_ww#oracle.com
For further reference, see the javadoc:
the sorting section of the JavaFX 8 TableView javadoc.
new SortEvent class.
SortedList class.

What is not quite clear is if you need the list to be sorted the whole time? If you sort it just in order to retrieve and update your entries quicker, you can do that faster using a HashMap. You can create a HashMap<YourClass, YourClass> if you implement a proper hashCode() and equals() method on the key fields in the class. If you only need to output a sorted list occasionally, also implement the Comparable<YourClass> interface and just create a TreeSet<YourClass>( map.keySet() ), that will create a sorted representation while the data in your HashMap stays in place. If you need it sorted always, you can consider to use TreeMap<YourClass,YourClass> instead of a HashMap. Maps are easier than Sets because they provide a way to retrieve the objects.

After some research, I concluded that Collections.sort() is pretty fast, even for 1 item. I haven't found a more efficient way than to update the item in the list and just call sort. I can't use a TreeSet since the the TableView relies on the List interface and I'd have to rebuild the TreeSet every time the sort order is changed.
I found that I could update at 60 FPS by using a Timer or KeyFrame and still have reasonable performance. I haven't found a better solution without upgrading to JavaFX 8.

You could pull the element out of the array list and insert (in sorted order) the updated element.

Related

App Engine + Cloud Datastore performance: order in query or in memory?

Question about Google App Engine + Datastore. We have some queries with several equality filters. For this, we don't need to keep any composed index, Datastore maintains these indexes automatically, as described here.
The built-in indexes can handle simple queries, including all entities of a given kind, filters and sort orders on a single property, and equality filters on any number of properties.
However, we need the result to be sorted on one of these properties. I can do that (using Objectify) with .sort("prop") on the datastore query, which requires me to add a composite index and will make for a huge index once deployed. The alternative I see is retrieving the unordered list (max 100 entities in the resultset) and then sorting them in-memory.
Since our entity implements Comparable, I can simply use Collections.sort(entities).
My question is simple: which one is desired? And even if the datastore composite index would be more performant, is it worth creating all those indexes?
Thanks!

There is no right or wrong approach - solution depends on your requirements. There are several factors to consider:
Extra indexes take space and cost more both in storage costs and in write costs - you have to update every index on every update of an entity.
Sort on property is faster, but with a small result set the difference is negligible.
You can store sorted results in Memcache and avoid sorting them in every request.
You will not be able to use pagination without a composite index, i.e. you will have to retrieve all results every time for in-memory sort.

It depends on your definition of "desired". IMO, if you know the result set is a "manageable" size, I would just do in memory sort. Adding lots of indexes will have cost impact, you can do cost analysis first to check it.

Efficiently update an element in a DelayQueue

I am facing a similar problem as the author in:
DelayQueue with higher speed remove()?
The problem:
I need to process continuously incoming data and check whether the data has been seen in a certain timeframe before. Therefore I calculate a unique ID for incoming data and add this data indexed by the ID to a map. At the same time I store the ID and the timeout timestamp in a PriorityQueue, giving me the ability to efficiently check for the latest ID to time out. Unfortunately if the data comes in again before the specified timeout, I need to update the timeout stored in the PriorityQueue. So far I just removed the old ID and re-added the ID along with the new timeout. This works well, except for the time consuming remove method if my PriorityQueue grows over 300k elements.
Possible Solution:
I just thought about using a DelayQueue instead, which would make it easier to wait for the first data to time out, unfortunately I have not found an efficient way to update a timeout element stored in such a DelayQueue, without facing the same problem as with the PriorityQueue: the remove method!
Any ideas on how to solve this problem in an efficient way even for a huge Queue?

This actually sounds a lot like a Guava Cache, which is a concurrent on-heap cache supporting "expire this long after the most recent lookup for this entry." It might be simplest just to reuse that, if you can use third-party libraries.
Failing that, the approach that implementation uses looks something like this: it has a hash table, so entries can be efficiently looked up by their key, but the entries are also in a concurrent, custom linked list -- you can't do this with the built-in libraries. The linked list is in the order of "least recently accessed first." When an entry is accessed, it gets moved to the end of the linked list. Every so often, you look at the beginning of the list -- where all the least recently accessed entries live -- and delete the ones that are older than your threshold.

Is there sorted data structure in JDK that supports remove at constant time?

I have a task that constantly add items into TreeMap, and
it also remove just inserted items from TreeMap when some condition is
met on this item.
I know the TreeMap's complexity is O(log-n) in both insert/delete. Since
the task had hold the deleted items all information, including: key, value
and maybe position information. If JDK has a data structure that support
delete element with position information, the time complexity can be
constant and reduce half computing time of my task.
Is there any one data structure support this feature, delete entry with position,
and other operations' complexity in O(log-n) in JDK?

You aren't entirely clear on your requirements, but if the use case is to revert the last n inserts then a persistent tree might help. At each stage the tree is immutable and inserting creates a new tree with as much in common with the previous state as possible, so by keeping a reference to the previous state you can remove the inserted element in constant time.
You'll need to google for java implementations and they aren't in the JDK.

Search algorithm (with a sort algorithm already implemented)

Im doing a Java application and Im facing some doubts in which concerns performance.
I have a PriorityQueue which guarantees me the element removed is the one with greater priority. That PriorityQueue has instances of class Event (which implements Comparable interface). Each Event is associated with a Entity.
The size of that priorityqueue could be huge and very frequently I will have to remove events associated to an entity.
Right now Im using an iterator to run all the priorityqueue. However Im finding it heavy and I wonder if there are better alternatives to search and remove events associated with an entity "xpto".
Any suggestions?
Thanks!

Theres a few options:
Could you use separate queues for
each entity? So when you get an
event for "xpto" you put it into the
XptoPriorityQueue? This will reduce
the size of each queue, but could
also lead to some other management
issues.
Are you removing events for
a specific entity to process them
sooner? If so then you should simply
give those entities a higher
priority and bump them to the top of
the queue.
Are you removing events
for a specific entity to remove them
from the queue and ignore them? If
so then they should either never get
into the queue or should get a lower
priority.

A couple of ideas:
Implement your priority queue using a treap ordered by the entity's key. If the keys are randomly distributed then you should be able to remove elements in O(log n) time.
Maintain a separate mapping of Entity to Events currently in the queue. Instead of actually removing events from the queue immediately, just flip a bit on the Event object indicating it should be ignored when it reaches the front of the queue.

Since remove is a linear time operation, you are getting O(N*N) performance.
If you subclass PriorityQueue and create a new method (named removeIf perhaps) that combines the iterator and remove methods then you could reduce this to O(N log(N)).

I'd suggest you make a class composed of a PriorityQueue and a Set, with the following operations:
To add an element, remove it from the Set and, if it weren't there, add it to the PriorityQueue.
To remove an element, add it to the Set.
To unqueue an element, unqueue elements from the PriorityQueue until the unqueued element is not present in the Set. Remove any unqueued elements from the Set.

It sounds like you need to implement a souped-up priority queue. In particular, you need O(logN) removal time. You can do this by keeping two data structures, an array of Events which is basically the binary heap storing your priority queue, and a HashMap from Event to the offset of that Event in the array (or maybe from Entity to a list of offsets of Events in that array).
You can then do normal heap operations as you would for a priority queue, you just need to update the hash map mappings every time an Event is moved. To remove an Event, look up the index of that event in the hash table and do a remove operation on the heap starting at that index.

Java: versioned data structures?

I have a data structure that is pretty simple (basically a structure containing some arrays and single values), but I need to record the history of the data structure so that I can efficiently get the contents of the data structure at any point in time.
Is there a relatively straightforward way to do this?
The best way I can think of would be to encapsulate the whole data structure with something that handles all the mutating operations by storing data in functional data structures, and then for each mutation operation caching a copy of the data structure in a Map indexed by time-ordering (e.g. a TreeMap with real time as keys, or a HashMap with a counter of mutation operations combined with one or more indexes stored in TreeMaps mapping real time / tick count / etc. to mutation operations)
any suggestions?
edit: In one case I already have the history as a series of transactions (this is reading items from a data file) so I can replay them, but this takes O(n) steps (n = # of transactions) every time I need to access the data. I'm looking for alternatives.

You are correct. Storing the data in a purely function data structure is the way to go. Supporting anything moderately complicated using do/undo actions is reliant on the programmer being aware of all side effects of every operation, which does not scale and breaks encapsulation.

You should use some form of persistent data structure that is immutable and is based on structural sharing (i.e. so that the parts of the data structure which do not change between versions only get stored once).
I created an open source Java library of such data structures here:
http://code.google.com/p/mikeralib/source/browse/#svn/trunk/Mikera/src/mikera/persistent
These were somewhat inspired by Clojure's persistent data structures, which might also be suitable for your purposes (they are also written in Java).

If you are only storing a little bit of data and don't have a lot of changes then storing each version is fine.
If you don't need to access the old version of the data too often, I wouldn't cache each one, I'd just make it so you could rebuild to it.
You could do this by saving mutations as transactions and replaying the transactions (with the ability to stop at any point.
So you start with an empty data structure and you might get an "Add" instruction followed by a "Change" and another "add" and then maybe a "Delete". Each of these objects would contain a COPY (not a pointer to the same object) of the thing being added or changed.
You would concatenate each operation into a list while at the same time mutating your collection.
If you find that you need a version at an older timestamp, start with a new empty collection, replay until you hit that timestamp then stop and you have the collection as it would be at that time.
If this was a very long-running application and you often needed to access items near the end, you could write an "Undo" for each add/change/delete operation object and actually mutate the data back and forth.
So imagine you have your data object and this array of mutations, you could easily run up and down the mutation list changing the data object back and forth to any version you want.
You could even contain multiple data objects, just create a new empty one and run it up the mutation array (think of it as a timeline--where each stored mutation would contain a timestamp or some version number) until you get it to the timestamp you want--this way you could have "Milestones" that you could reach instantly--for instance, if you allocated one milestone for each thread you could make the addMutation method synchronized and this data collection would become 100% threadsafe.
Note that if you actually return the data object you should only return a copy of the data--otherwise the next time you mutated that milestone it would mutate the data object you returned.
Hmm, you could also include "Rollup" functionality--if you ever decide that you will not need access to the tail (the first few transactions) you could apply them to a "Start" structure and then delete them--from then on you copy the start structure to begin from the start rather than always starting with an empty data structure.
Man, this is an awesome pattern--now I want to implement it.

Either do as you already suggested, or have a base class of some sort with subclasses that represent the different changes. Then get the proper class at run-time by passing the version/timestamp/whatever to a factory that hands you back the right one.

Multi-level undo can be based on a model (ie data structure) and a sequence of actions. Each action supports two operations: "do" and "undo". To perform a change on the model you register a new action and "do" it. This allows you to "walk" back and forth in the history, but the state of the model at a specific index cannot be accessed in constant time.
Maybe something like this would be applicable to your situation?

How long will the application be running for?
It seems like you could do what you suggested -- playing the transactions back -- but cache the data structure and list of transactions at particular points in time (every hour or every day?) to ease the pain of having to go through O(n) operations every time you need to rebuild the collection from scratch.
Granted, there is definitely a trade-off between space (that the cache takes up) and the number of operations needed to re-build it, but hopefully you will be able to find a happy medium for it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.