client view of very large collection of objects. How to optimize?

client view of very large collection of objects. How to optimize? - java

I have 3-tier EJB application, and I need to create a view on a thick client (desktop Java application) that shows a very large collection of objects (over 5000 orders). Each object has child properties that are also complex objects, for example:
class Address
{
String value
// other properties
}
class Order
{
public String Number
// this is collection of complex object and I need first and last object to show it's
// properties in view
public List<Address> getAddresses()
// other properties
}
The view is a table of Orders:
Number | FirstAddress | LastAddress | ...
My first attempt was to load full List of orders (without child properties) and then dynamically download child objects when needed for display. But when I have 10000 orders and begin fast scrolling, the UI become unresponsive.
Then I try to load all orders and all children that need to be shown in the table, but the UI gets very heavy and slow, possibly because of memory cost). And it's not thick client at all, because I download almost all data from db.
What is best practice to solve this task?

Assuming you are using a JTable as the view of a suitable TableModel, query the database using a SwingWorker and publish() the results as they arrive. For simplicity, this example simply fetches random data in blocks of 10. Note that the UI remains responsive as data accumulates.

Follow Value Object or Data Transfer Object pattern. Send only what you really need. Instead of sending a graph of domain objects, just create one or more 'stupid' flat objects (containg simple attributes) per view.

I suggest implementing some sort of pagination, in other words you'll have to implement a mechanism for retrieving only a small subset of all your data, and show them chunk by chunk in different pages.
Exactly "how" depends on your approach so far.
you can either use some programming pattern like those already
mentioned
or you can implement it at DB level, where you query your DB server,
i.e. depending on the chosen DBMS you'll have to write the fetch
queries in such a manner that they retrieve only a portion of all the
data, like in here.
hope this helps!

It's advised to make a proxy object for your list that simply gets only a small part of it's elements, and also the total count, and then has the ability to load on demand other parts of the original list

Related

In populating an ObservableList, do I have to load ALL the records from my database?

So I'm porting my Swing Java database application to Java FX (still a beginner here, I recently just learned the basics of FXML and the MVC pattern, so please bear with me).
I intend to load the data from my existing database to the "students" ObservableList so I can show it on a TableView, but on my original Swing code, I have a search TextField, and when the user clicks on a button or presses Enter, the program:
Executes an SQLite command that searches for specific records, and retrieves the RecordSet.
Creates a DefaultTableModel based on the RecordSet contents
And throws that TableModel to the JTable.
However, Java FX is a completely different beast (or at least it seems so to me--don't get me wrong, I love Java FX :D ) so I'm not sure what to do.
So, my question is, do I have to load ALL the students in the database, then use some Java code to filter out students that don't fit the search criteria (and display all students when the search text is blank), or do I still use SQLite in filtering and retrieving records (which means I need to clear the list then add students every time a search is performed, and maybe it will also mess up with the bindings? Maybe there will be a speed penalty on this method also? Besides that, it will also reset the currently selected record because I clear the list--basically, bad UI design and will negatively impact the usability)
Depending on the right approach, there is also a follow-up question (sorry, I really can't find the answer to these even after Googling):
If I get ALL students from database and implement a search feature in Java, won't it use up more RAM than it should, because I am storing ALL the database data in RAM, instead of just the ones searched for? I mean, sure, even my lowly laptop has 4GB RAM, but the feeling of using more memory than I should makes me feel somewhat guilty LOL
If I choose to just update the contents of the ObservableList every time a new search has been performed, will it mess up with the bindings? Do I have to set up bindings again? How do I clear the contents of the ObservableList before adding the new contents?
I also have the idea of just setting the selected table item to the first record that matches the search string but I think it will be difficult to use, since only one record can be highlighted per search. Even if we highlight multiple rows, it'd be difficult to browse all selected items.
Please give me the proper way, not the "easy" way. This is my first time implementing a pattern (MVC or am I actually doing MVP, I don't know) and I realized how unmaintainable and ugly my previous programs are because I used my own style. This is a relatively big project that I need to support and improve for several years so having clean code and doing stuff the right way should help in maintaining the functionality of this program.
Thank you very much in advance for your help, and I hope I don't come off as a "dumb person who can't even Google" in asking these questions. Please bear with me here.

Basic design tradeoffs
You can, of course, do this either of the ways you describe. The basic tradeoffs are:
If you load everything from the database, and filter the table in Java, you use more memory (though not as much as you might think, as explained below)
If you filter from the database and reload every time the user changes the filter, there will be a bigger latency (delay) in displaying the data, as a new query will be executed on the database, with (usually) network communication between the database and the application being the biggest bottleneck (though there are others).
Database access and concurrency
In general, you should perform database queries on a background thread (see Using threads to make database requests); if you are frequently making database queries (i.e. filtering via the database), this gets complex and involves frequently disabling controls in the UI while a background task is running.
TableView design and memory management
The JavaFX TableView is a virtualized control. This means that the visual components (cells) are created only for visible elements (plus, perhaps, a small amount of caching). These cells are then reused as the user scrolls around, displaying different "items" as required. The visual components are typically quite memory-consumptive (they have hundreds of properties - colors, font properties, dimensions, layout properties, etc etc - most of which have CSS representations), so limiting the number created saves a lot of memory, and the memory consumption of the visible part of the table view is essentially constant, no matter how many items are in the table's backing list.
General memory consumption computations
The items observable list that forms the table's backing list contains only the data: it is not hard to ballpark-estimate the amount of memory consumed by a list of a given size. Strings use 2 bytes per character, plus a small fixed overhead, doubles use 8 bytes, ints use 4 bytes, etc. If you wrap the fields in JavaFX properties (which is recommended), there will be a few bytes overhead for each; each object has an overhead of ~16 bytes, and references themselves typically use up to 8 bytes. So a typical Student object that stores a few string fields will usually consume of the order of a few hundred bytes in memory. (Of course, if each has an image associated with it, for example, it could be a lot more.) Thus if you load, say 100,000 students from a database, you would use up of the order of 10-100MB of RAM, which is pretty manageable on most personal computer systems.
Rough general guidelines
So normally, for the kind of application you describe, I would recommend loading what's in your database and filtering it in memory. In my usual field of work (genomics), where we sometimes need 10s or 100s of millions of entities, this can't be done. (If your database contains, say, all registered students in public schools in the USA, you may run into similar issues.)
As a general rule of thumb, though, for a "normal" object (i.e. one that doesn't have large data objects such as images associated with it), your table size will be prohibitively large for the user to comfortably manage (even with filtering) before you seriously stretch the memory capacity of the user's machine.
Filtering a table in Java (all objects in memory)
Filtering in code is pretty straightforward. In brief, you load everything into an ObservableList, and wrap the ObservableList in a FilteredList. A FilteredList wraps a source list and a Predicate, which returns true is an item should pass the filter (be included) or false if it is excluded.
So the code snippets you would use might look like:
ObservableList<Student> allStudents = loadStudentsFromDatabase();
FilteredList<Student> filteredStudents = new FilteredList<>(allStudents);
studentTable.setItems(filteredStudents);
And then you can modify the predicate based on a text field with code like:
filterTextField.textProperty().addListener((obs, oldText, newText) -> {
if (newText.isEmpty()) {
// no filtering:
filteredStudents.setPredicate(student -> true);
} else {
filteredStudents.setPredicate(student ->
// whatever logic you need:
student.getFirstName().contains(newText) || student.getLastName().contains(newText));
}
});
This tutorial has a more thorough treatment of filtering (and sorting) tables.
Comments on implementing "filtering via queries"
If you don't want to load everything from the database, then you skip the filtered list entirely. Querying the database will almost certainly not work fast enough to filter (using a new database query) as the user types, so you would need an "Update" button (or action listener on the text field) which recomputed the new filtered data. You would probably need to do this in a background thread too. You would not need to set new cellValueFactorys (or cellFactorys) on the table's columns, or reload the columns; you would just call studentTable.setItems(newListOfStudents); when the database query finished.

query a database for a list of objects in order to create objects from that list: proper way to do

I have a list of report types (objects) in the database that need to be generated for the user and sent out by email / printed / saved on hdd etc.
One report ('skeleton') is one row in database.
My question is: should I create a separate object for query result of one row - 'skeleton' report object and then use this object to create the end 'report' object. Is this the correct way of handling such task?
I have been told that it is easier to create a method and just get a rowset from the database in it. Then parse the row set for required parameters necessary to create the report, create the end report object and etc.

I am not totally sure if I understand your question correctly, but I assume that you want to know if you should fill an object with the data from the database and parse the object when creating the report or just pass the resultset to the creation method?
I would recommend using an object 'Skeleton' and filling this one, since you can reuse it later on and it makes the code way more readable in my opinion.
More information on this toppic:
In many applications the MVC pattern is used to structure your program. In this pattern you structure your program in 3 layers, the first one for your UI(VIEW), the second for you buisness logic(Controller) and the third one for your persistence data(Model). These layers only communicate through domain model objects which represent your data (in your case this would be the 'Skeleton' object, also called POJOs ). This is especially helpful if you suddenly want to change from a database to a textfile or any other persistence strategy, since you should only have to change the model layer while keeping the other layers mostly the same (especially if you're using interfaces). You can find a lot on this pattern in the internet and for most standard applications i would definitely recommend it

Checking if a Set of items exist in database quickly

I have an external service which I'm grabbing a list of items from, and persisting locally a relationship between those items and a user. I feed that external service a name, and get back the associated items with that name. I am choosing to persist them locally because I'd like to keep my own attributes about those external items once they've been discovered by my application. The items themselves are pretty static objects, but the total number of them are unknown to me, and the only time I learn about new ones is if a new user has an association with them on the external service.
When I get a list of them back from the external service, I want to check if they exist in my database first, and use that object instead but if it doesn't I need to add them so I can set my own attributes and keep the association to my user.
Right now I have the following (pseudocode, since it's broken into service layers etc):
Set<ExternalItem> items = externalService.getItemsForUser(user.name);
for (ExternalItem externalItem : items){
Item dbItem = sessionFactory.getCurrentSession().get(Item.class,item.id);
if (dbitem == null){
//Not in database, create it.
dbItem = mapToItem(externalItem);
}
user.addItem(dbItem);
}
sessionFactory.getCurrentSession().save(user);//Saves the associated Items also.
The time this operation is taking is around 16 seconds for approximately 500 external items. The remote operation is around 1 second of that, and the save is negligible also. The drain that I'm noticing comes from the numerous session.get(Item.class,item.id) calls I'm doing.
Is there a better way to check for an existing Item in my database than this, given that I get a Set back from my external service?
Note: The external item's id is reliable to be the same as mine, and a single id will always represent the same External Item

I would definitely recommend a native query, as recommended in the comments.
I would not bother to chunk them, though, given the numbers you are talking about. Postgres should be able to handle an IN clause with 500 elements with no problems. I have had programmatically generated queries with many more items than that which performed fine.
This way you also have only one round trip, which, assuming the proper indexes are in place, really should complete in sub-second time.

How to store database data with lots of attributes into cache?

Let's say that I have a table with columns TABLE_ID, CUSTOMER_ID, ACCOUNT_NUMBER, PURCHASE_DATE, PRODUCT_CATEGORY, PRODUCT_PRICE.
This table contains all purchases made in some store.
Please don't concentrate on changing the database model (there are obvious improvement possibilities) because this is a made-up example and I can't change the actual database model, which is far from perfect.
The only thing I can change is the code which uses the already existing database model.
Now, I don't want to access the database all the time, so I have to store the data into cache and then read it from there. The problem is, my program has to support all sorts of things:
What is the total value of purchases made by customer X on date Y?
What is the total value of purchases made for products from category X?
Give me a list of total amounts spent grouped by customer_id.
etc.
I have to be able to preserve this hierarchy in my cache.
One possible solution is to have a map inside a map inside a map... etc.
However, that gets messy very quickly, because I need an extra nesting level for every attribute in the table.
Is there a smarter way to do this?

Have you already established that you need a cache? Are you sure the performance of your application requires it? The database itself can optimize queries, have things in memory, etc.
If you're sure you need a cache, you also need to think about cache invalidation: is the data changing from beneath your feet, i.e. is another process changing the data in the database, or is the database data immutable, or is your application the only process modifying your data.
What do you want your cache to do? Just keep track of queries and results that have been requested so the second time a query is run, you can return the result from the cache? Or do you want to aggressively pre calculate some aggregates? Can the cache data fit into your app memory or do you want to use ReferenceMaps for example that shrink when memory gets tight?
For your actual question, why do you need maps inside maps? You probably should design something that's closer to your business model, and store objects that represent the data in a meaningful way. You could have each query (PurchasesByCustomer, PurchasesByCategory) represented as an object and store them in different maps so you get some type safety. Similarly don't use maps for the result but the actual objects you want.
Sorry, your question is quite vague, but hopefully I've given you some food for thoughts.

Java: versioned data structures?

I have a data structure that is pretty simple (basically a structure containing some arrays and single values), but I need to record the history of the data structure so that I can efficiently get the contents of the data structure at any point in time.
Is there a relatively straightforward way to do this?
The best way I can think of would be to encapsulate the whole data structure with something that handles all the mutating operations by storing data in functional data structures, and then for each mutation operation caching a copy of the data structure in a Map indexed by time-ordering (e.g. a TreeMap with real time as keys, or a HashMap with a counter of mutation operations combined with one or more indexes stored in TreeMaps mapping real time / tick count / etc. to mutation operations)
any suggestions?
edit: In one case I already have the history as a series of transactions (this is reading items from a data file) so I can replay them, but this takes O(n) steps (n = # of transactions) every time I need to access the data. I'm looking for alternatives.

You are correct. Storing the data in a purely function data structure is the way to go. Supporting anything moderately complicated using do/undo actions is reliant on the programmer being aware of all side effects of every operation, which does not scale and breaks encapsulation.

You should use some form of persistent data structure that is immutable and is based on structural sharing (i.e. so that the parts of the data structure which do not change between versions only get stored once).
I created an open source Java library of such data structures here:
http://code.google.com/p/mikeralib/source/browse/#svn/trunk/Mikera/src/mikera/persistent
These were somewhat inspired by Clojure's persistent data structures, which might also be suitable for your purposes (they are also written in Java).

If you are only storing a little bit of data and don't have a lot of changes then storing each version is fine.
If you don't need to access the old version of the data too often, I wouldn't cache each one, I'd just make it so you could rebuild to it.
You could do this by saving mutations as transactions and replaying the transactions (with the ability to stop at any point.
So you start with an empty data structure and you might get an "Add" instruction followed by a "Change" and another "add" and then maybe a "Delete". Each of these objects would contain a COPY (not a pointer to the same object) of the thing being added or changed.
You would concatenate each operation into a list while at the same time mutating your collection.
If you find that you need a version at an older timestamp, start with a new empty collection, replay until you hit that timestamp then stop and you have the collection as it would be at that time.
If this was a very long-running application and you often needed to access items near the end, you could write an "Undo" for each add/change/delete operation object and actually mutate the data back and forth.
So imagine you have your data object and this array of mutations, you could easily run up and down the mutation list changing the data object back and forth to any version you want.
You could even contain multiple data objects, just create a new empty one and run it up the mutation array (think of it as a timeline--where each stored mutation would contain a timestamp or some version number) until you get it to the timestamp you want--this way you could have "Milestones" that you could reach instantly--for instance, if you allocated one milestone for each thread you could make the addMutation method synchronized and this data collection would become 100% threadsafe.
Note that if you actually return the data object you should only return a copy of the data--otherwise the next time you mutated that milestone it would mutate the data object you returned.
Hmm, you could also include "Rollup" functionality--if you ever decide that you will not need access to the tail (the first few transactions) you could apply them to a "Start" structure and then delete them--from then on you copy the start structure to begin from the start rather than always starting with an empty data structure.
Man, this is an awesome pattern--now I want to implement it.

Either do as you already suggested, or have a base class of some sort with subclasses that represent the different changes. Then get the proper class at run-time by passing the version/timestamp/whatever to a factory that hands you back the right one.

Multi-level undo can be based on a model (ie data structure) and a sequence of actions. Each action supports two operations: "do" and "undo". To perform a change on the model you register a new action and "do" it. This allows you to "walk" back and forth in the history, but the state of the model at a specific index cannot be accessed in constant time.
Maybe something like this would be applicable to your situation?

How long will the application be running for?
It seems like you could do what you suggested -- playing the transactions back -- but cache the data structure and list of transactions at particular points in time (every hour or every day?) to ease the pain of having to go through O(n) operations every time you need to rebuild the collection from scratch.
Granted, there is definitely a trade-off between space (that the cache takes up) and the number of operations needed to re-build it, but hopefully you will be able to find a happy medium for it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.