I am building a wicket web application. It shows a list of 'entities' with a DataView and also displaying a details page which works fine. Only the needed rows from the database are loaded into my DataView, so performance is good.
I can also show a calculated value on the details page, which is generated on the fly from the 'entity'. But when i want to sum up this calculated value on my list page, performance gets poor when there are several hundreds of rows in the table. This is because the value is generated for the whole table for every session.
Should i cache this calculated value (if yes, how can i do this) or would it be better to add a column in my database table that holds this value, which would violate the DRY principle?
As #biziclop commented, a solution for a simple cache is adding a field to the Application Class, which caches the calculated value. This way, the calculations must be made on application startup and on every change of relevant data.
Related
I am developing a java application that loads certain things from a database, such as client records and product info. When the user navigates to say the 'products' tab, I query for products in the database and update a table with that information.
I am wondering if there is a way to see if the query results have changed since the last check, in order to avoid querying and loading all info from the database, and instead just load updates. Is there a way to do this, or perhaps just load changes only from a query into my table list? My goal is to make the program run faster when switching between tabs.
I am wondering if there is a way to see if the query results have changed since the last check
Stated differently, you want a way to automatically answer the question “is this the same result?” without retrieving the entire result.
The general approach to this problem would be to come up with some fast-to-query proxy for the entire state of the result set, and query that instead.
Once you have determined a stable fast computation for the entire result set, you can compute that any time the relevant data changes; and only poll that stored proxy to see whether the data has changed.
For example, you could say that “the SHA-256 hash of fields lorem, ipsum, and dolor” is your proxy. You can now:
Implement that computation inside the database as a function, maybe products_hash.
Create a latest_products_hash table, that stores created timestamp and products_hash that was computed at that time.
In your application, retrieve the most recent record from latest_products_hash and keep it for reference.
In the database, have a scheduled job, or a trigger on some event you decide makes sense, that will compute and store the products_hash in latest_products_hash automatically without any action from the application.
To determine whether there have been updates yet, the application will query the latest_products_hash table again and compare its most recent record with the one the application stored for reference.
Only if the latest_products_hash most-recent value is different, then query the products table and get the full result set.
That way, the application is polling a much faster query (the most-recent record in latest_products_hash) frequently, and avoiding the full products query until it knows the result set will be new.
So I'm porting my Swing Java database application to Java FX (still a beginner here, I recently just learned the basics of FXML and the MVC pattern, so please bear with me).
I intend to load the data from my existing database to the "students" ObservableList so I can show it on a TableView, but on my original Swing code, I have a search TextField, and when the user clicks on a button or presses Enter, the program:
Executes an SQLite command that searches for specific records, and retrieves the RecordSet.
Creates a DefaultTableModel based on the RecordSet contents
And throws that TableModel to the JTable.
However, Java FX is a completely different beast (or at least it seems so to me--don't get me wrong, I love Java FX :D ) so I'm not sure what to do.
So, my question is, do I have to load ALL the students in the database, then use some Java code to filter out students that don't fit the search criteria (and display all students when the search text is blank), or do I still use SQLite in filtering and retrieving records (which means I need to clear the list then add students every time a search is performed, and maybe it will also mess up with the bindings? Maybe there will be a speed penalty on this method also? Besides that, it will also reset the currently selected record because I clear the list--basically, bad UI design and will negatively impact the usability)
Depending on the right approach, there is also a follow-up question (sorry, I really can't find the answer to these even after Googling):
If I get ALL students from database and implement a search feature in Java, won't it use up more RAM than it should, because I am storing ALL the database data in RAM, instead of just the ones searched for? I mean, sure, even my lowly laptop has 4GB RAM, but the feeling of using more memory than I should makes me feel somewhat guilty LOL
If I choose to just update the contents of the ObservableList every time a new search has been performed, will it mess up with the bindings? Do I have to set up bindings again? How do I clear the contents of the ObservableList before adding the new contents?
I also have the idea of just setting the selected table item to the first record that matches the search string but I think it will be difficult to use, since only one record can be highlighted per search. Even if we highlight multiple rows, it'd be difficult to browse all selected items.
Please give me the proper way, not the "easy" way. This is my first time implementing a pattern (MVC or am I actually doing MVP, I don't know) and I realized how unmaintainable and ugly my previous programs are because I used my own style. This is a relatively big project that I need to support and improve for several years so having clean code and doing stuff the right way should help in maintaining the functionality of this program.
Thank you very much in advance for your help, and I hope I don't come off as a "dumb person who can't even Google" in asking these questions. Please bear with me here.
Basic design tradeoffs
You can, of course, do this either of the ways you describe. The basic tradeoffs are:
If you load everything from the database, and filter the table in Java, you use more memory (though not as much as you might think, as explained below)
If you filter from the database and reload every time the user changes the filter, there will be a bigger latency (delay) in displaying the data, as a new query will be executed on the database, with (usually) network communication between the database and the application being the biggest bottleneck (though there are others).
Database access and concurrency
In general, you should perform database queries on a background thread (see Using threads to make database requests); if you are frequently making database queries (i.e. filtering via the database), this gets complex and involves frequently disabling controls in the UI while a background task is running.
TableView design and memory management
The JavaFX TableView is a virtualized control. This means that the visual components (cells) are created only for visible elements (plus, perhaps, a small amount of caching). These cells are then reused as the user scrolls around, displaying different "items" as required. The visual components are typically quite memory-consumptive (they have hundreds of properties - colors, font properties, dimensions, layout properties, etc etc - most of which have CSS representations), so limiting the number created saves a lot of memory, and the memory consumption of the visible part of the table view is essentially constant, no matter how many items are in the table's backing list.
General memory consumption computations
The items observable list that forms the table's backing list contains only the data: it is not hard to ballpark-estimate the amount of memory consumed by a list of a given size. Strings use 2 bytes per character, plus a small fixed overhead, doubles use 8 bytes, ints use 4 bytes, etc. If you wrap the fields in JavaFX properties (which is recommended), there will be a few bytes overhead for each; each object has an overhead of ~16 bytes, and references themselves typically use up to 8 bytes. So a typical Student object that stores a few string fields will usually consume of the order of a few hundred bytes in memory. (Of course, if each has an image associated with it, for example, it could be a lot more.) Thus if you load, say 100,000 students from a database, you would use up of the order of 10-100MB of RAM, which is pretty manageable on most personal computer systems.
Rough general guidelines
So normally, for the kind of application you describe, I would recommend loading what's in your database and filtering it in memory. In my usual field of work (genomics), where we sometimes need 10s or 100s of millions of entities, this can't be done. (If your database contains, say, all registered students in public schools in the USA, you may run into similar issues.)
As a general rule of thumb, though, for a "normal" object (i.e. one that doesn't have large data objects such as images associated with it), your table size will be prohibitively large for the user to comfortably manage (even with filtering) before you seriously stretch the memory capacity of the user's machine.
Filtering a table in Java (all objects in memory)
Filtering in code is pretty straightforward. In brief, you load everything into an ObservableList, and wrap the ObservableList in a FilteredList. A FilteredList wraps a source list and a Predicate, which returns true is an item should pass the filter (be included) or false if it is excluded.
So the code snippets you would use might look like:
ObservableList<Student> allStudents = loadStudentsFromDatabase();
FilteredList<Student> filteredStudents = new FilteredList<>(allStudents);
studentTable.setItems(filteredStudents);
And then you can modify the predicate based on a text field with code like:
filterTextField.textProperty().addListener((obs, oldText, newText) -> {
if (newText.isEmpty()) {
// no filtering:
filteredStudents.setPredicate(student -> true);
} else {
filteredStudents.setPredicate(student ->
// whatever logic you need:
student.getFirstName().contains(newText) || student.getLastName().contains(newText));
}
});
This tutorial has a more thorough treatment of filtering (and sorting) tables.
Comments on implementing "filtering via queries"
If you don't want to load everything from the database, then you skip the filtered list entirely. Querying the database will almost certainly not work fast enough to filter (using a new database query) as the user types, so you would need an "Update" button (or action listener on the text field) which recomputed the new filtered data. You would probably need to do this in a background thread too. You would not need to set new cellValueFactorys (or cellFactorys) on the table's columns, or reload the columns; you would just call studentTable.setItems(newListOfStudents); when the database query finished.
I want to perform a search of a inputs in a list. That list resides in a database. I see two options for doing that-
Hit the db for each search and return the result.
keep a copy in memory synced with table and search in memory and return the result.
I like the second option as it will be faster. However I am confused on how to keep the list in sync with table.
example : I have a list L = [12,11,14,42,56]
and I receive an input : 14
I need to return the result if the input does exists in the list or not. The list can be updated by other applications. I need to keep the list in sync with table.
What would be the most optimized approach here and how to keep the list in sync with database?
Is there any way my application can be informed of the changes in the table so that I can reload the list on demand.
Instead of recreating your own implementation of something that already exists, I would leverage Hibernate's Second Level Cache (2LC) with an implementation such as EhCache.
By using a 2LC, you can specify the time-to-live expiration time for your entities and once they expire, any query would reload them from the database. If the entity cache has not yet expired, Hibernate will hydrate them from the 2LC application cache rather than the database.
If you are using Spring, you might also want to take a look at #Cachable. This operates at the component / bean tier allowing Spring to cache a result-set into a named region. See their documentation for more details.
To satisfied your requirement, you should control the read and write in one place, otherwise, there will always be some unsync case for the data.
I'm creating a hypermedia driven RESTful API that will be used to query transactional data. The intention is that the results will be paginated.
Each API call will query an indexed database table. Since I don't want to keep the results server side due to memory considerations, I was thinking to retrieve the data based on rownum, dependent upon which page is requested. E.G. on page one, WHERE rownum <= 10, on page two, WHERE rownum BETWEEN 11 AND 20 etc.
However, the database in question is replicated from a production system and could potentially add records into an area of the result set already requested. E.G. page one is requested -> 10 rows are returned -> a transaction is inserted at row 5. Now page two will include a record already displayed on page one, as the results are essentially pushed up by a rownum.
What would be a good way of achieving my objective of creating a hypermedia driven RESTful API that provides paginated transactional data from a database, without holding on to the result sets for the duration of the session?
This is a pretty common problem and there are actually not many approaches.
I can think of only three, actually:
You don't care and the result will change. This is the behaviour of stackoverflow: if you're on page 2 of the questions page and someone posts a new question, when clicking on page 3 you may get one or more of the questions that were already listed on page 2, because the index has shifted.
If you don't want to keep in memory the actual data, you're in for a lot of trouble. You could store the handler for the result set, instead of the results themselves, and loop over it fetching the number of rows that you actually need. E.g. you run the select, fetch 10 rows and store the handler of the resultset. Together with the rows, you return to the client a unique ID of the query. The problem will be when you have a range specified, because you can't really "rewind" a database cursor, and that would mean caching the results, which you may want to do anyway. But if you do it like that, sooner or later you're going to have all of the results in memory anyway.
You could still use some memory, but keep only some unique identifier of the rows, associated with a unique identifier of the query, as above. This could work, but only if the rows may be added, and not deleted or updated (if they're updated, they may not match the query any more).
Personally, I'd go with option 1.
Let's say that I have a table with columns TABLE_ID, CUSTOMER_ID, ACCOUNT_NUMBER, PURCHASE_DATE, PRODUCT_CATEGORY, PRODUCT_PRICE.
This table contains all purchases made in some store.
Please don't concentrate on changing the database model (there are obvious improvement possibilities) because this is a made-up example and I can't change the actual database model, which is far from perfect.
The only thing I can change is the code which uses the already existing database model.
Now, I don't want to access the database all the time, so I have to store the data into cache and then read it from there. The problem is, my program has to support all sorts of things:
What is the total value of purchases made by customer X on date Y?
What is the total value of purchases made for products from category X?
Give me a list of total amounts spent grouped by customer_id.
etc.
I have to be able to preserve this hierarchy in my cache.
One possible solution is to have a map inside a map inside a map... etc.
However, that gets messy very quickly, because I need an extra nesting level for every attribute in the table.
Is there a smarter way to do this?
Have you already established that you need a cache? Are you sure the performance of your application requires it? The database itself can optimize queries, have things in memory, etc.
If you're sure you need a cache, you also need to think about cache invalidation: is the data changing from beneath your feet, i.e. is another process changing the data in the database, or is the database data immutable, or is your application the only process modifying your data.
What do you want your cache to do? Just keep track of queries and results that have been requested so the second time a query is run, you can return the result from the cache? Or do you want to aggressively pre calculate some aggregates? Can the cache data fit into your app memory or do you want to use ReferenceMaps for example that shrink when memory gets tight?
For your actual question, why do you need maps inside maps? You probably should design something that's closer to your business model, and store objects that represent the data in a meaningful way. You could have each query (PurchasesByCustomer, PurchasesByCategory) represented as an object and store them in different maps so you get some type safety. Similarly don't use maps for the result but the actual objects you want.
Sorry, your question is quite vague, but hopefully I've given you some food for thoughts.