I am working on a small, toy application to expand my knowledge about Java JavaFx and Sql. I have a MySQL server in my local network, which I am able to communicate with and a simple Tableview can be populated, sorted ... etc. The data has only to be shown to the user, no editing. Everything nice and clean.
The Problems:
There are around 170 000 rows with 10 col., all chars, to display, which seems to be rather hard to do in reasonable time. The query is done during startup and it take around 1 1/2 min before I can see the table.
Also the memory footprint is enormous, the application without populated Tableview around 70 mb, with all the data it has 600-700 mb (the xml file which is used to populate the mysql is 70 mb in size ... ) !
Sorting is REALLY slow, I am using Stringproperty which should give a boost according to: JavaFx tableview sort is really slow how to improve sort speed as in java swing (If I understood that correctly) However the custom sort, I did not try so far.
My thoughts:
Similar to the application design for mobile, I think an adapter-pattern can fix these problems. Hence, I create an OberservableList with the correct size of elements, but only populate a limit of rows in the beginning. When I am scrolling done (scroll wheel) the List has to be updated with new elements in advance via sql-queries. This should give me a performance boost for the first Problem. Nice idea but what am I going to do if the user is going to scroll done via the scrollbar(click and drag down), then I would skip certain entries, but I need the information to give the user the feedback where to scroll to.
How could I fix this ?
For the sorting, I would use the sql sorting methods, so each sort will be performed on the sql server and a new OberservableList will be created. As before, only a certain amount of data would be loaded in the first query.
If this approach would also effect the memory footprint, I am not sure.
Your opinion:
Are my ideas reasonable and do-able in Java, JavaFx ?
I would love to hear your ideas about these problems.
Thank you.
I found out that JVx is capable of providing the lazy-loading function. This should do the trick
Related
Background: I have done a bit of looking into Caching in Spring and it seems like a great way to save time for common read operations. My code currently has a loop over a large number of items, where I am performing logic to see if certain other objects are connected in a way through common items. A way to think about this is similar to a shopping website's related items showing up when you view a certain item. The values I use to determine this are complex, but that is the basic idea.
On loading the item page there is a very long load time trying to compute and figure out which other items are related in some way as to display links to them. Instead of computing this list every time an item page loads, I have started "caching" items with a list of their recommended items. Many things in the system can trigger a need to recalculate these relations: adding/removing properties to items, adding/removing items, etc.
Problem: My "cache" is simply a singleton object containing a Map for items and their related objects. The process of iterating through every item in the system when any change to the cache is needed is very time consuming and process intensive. Java Caches don't seem to be the right answer due to constant changes to items. Is there any other design patterns that I am overlooking for this design? Caches seem to be close, but I am not sure if this problem fits into the mold of caching, due to it being a little more complex then a bulk amount of reads to a single item.
Are caches the way to go with this? If caching isn't the right solution, what is?
It seems that caches are not a solution for your problem, but they might help you in reaching a solution.
For example instead of caching the created items another approach is to cache information that rarely changes but is crucial to create the lists.
Spring function based caching (ie #Cachable) might come in handy, either for caching or invalidation.
The next level is to examine different types of caches (ie. redis) and what they offer in terms of algorithms, sorting and Pub/Sub.
I have been working on a project using the latest appfuse as a base.. I have extended a lot of the models particularly the User to accommodate some of the things I am doing, one of them is adding a few Lists to the user that have larger lists attached to them, here is an example:
User -> LeadLists (maybe 100 or so of these) -> Leads (Upwards of 50k)
And that is where my problems start. I have a process working that the user uploads a CSV I parse that into Lead objects and add them to the list then to the user then save the user and let cascade save do its work. However once save fires it takes 20 minutes or more for it to finish and usually with a permGen memory error...
Problem #2 is once they are actually in the DB I have not been able to display them at all without getting another permGen out of memory error.
Can anyone please offer some insight into what I may be doing wrong? I have enabled the hibernate batch size and set it to 50, what else can I do to get this ridiculous insert time down?
Did you write your code using Hibernate batch processing best pratice?
If not, check this link.
If yes and you are trying to write a single User+LeadLists(100)+Leads(50k) in a single shot (are 50K items!!! That's no peanuts!) you have this choices:
Move all to flat JDBC (ORM sometimes are not the best choice for batch programming) can be a good solution, but required you to rewrite some parts of code probably,
Move to StatelessSession (just to give it a chance),but I think PermGen error is around the corner,
Increase PermGen space - sized on some statistics about objects dimension, for example - can resolve PermGen problem but not slow issue
Drastic: Move to Spring-batch, framework used to perform batch conversion of big amount of data. Probably you will reduce by a lot saving time, but for sure you will resolve PermGen space (the real problem, IMO. A slow program is better than one that crash and lose data)
My 2 cents
This is a design question involving both Java and MySQL.
The client requires the addition of 14 boolean flags (T/F) to keep track of some new information in an existing class/table.
I can add these flags to the existing table, or I could create a new class and table just for this data. Adding the 14 boolean flags to the existing table will give it quite a few attributes, which I'm inclined to avoid (especially if the number of flags increases in time). Creating a new class/table is cleaner, but it it really necessary in this case?
Alternately, I could use a 16 bit integer with masks to multiplex the data and then I'm only adding one variable to the existing class/table.
My primary question is this: is it more efficient to store 14 individual boolean variables in a MySQL database and load them into the class, or would it be better to store a single integer and then (in Java) multiplex the flags using bit manipulation (i.e. masks)?
Secondary question, if individual flags are more efficient, then is it better to have lots of attributes in one table or split them? What is the penalty for storing lots of boolean flags in a table that already has quite a few entities?
If the primary question's answer is "integer + multiplex" then the second question becomes moot.
Thanks.
-R
I personally like to have separate columns. the only place I might consider masking is when the database and the application are running under extreme conditions or on low memory and storage devices where any use of memory or space is crucial.
1- space should not be a consideration unless the class/table can grow to huge volumes.
to simulate Boolean flags a tiny int (1) is enough and all you need is 0/1 values.
2- it becomes much harder for anyone wanting to do queries on the table or wanting to write reports using it. and if your client does access the database, I am quite sure masking won't be acceptable in most cases.
3- it will be much harder to build indexes on this column when they are needed, if that will be possible at all (based on the database)
4- working more and writing more code should not be an issue. You work more now but you will work less in the future. thinking it is less work for the programmer/dba is just an illusion IMHO. here are some considerations:
a- it will be harder to maintain the code and write database queries. maybe you do everything in your java code now but you never know what the future holds.
b- making structural changes become harder. what if the customer requires removal of two flags and addition of 4 ? do you keep the original two bits that held the removed flags in the database and add 4 bits ? or you use them for two of the new flags and then add two more bits? how would this affect code that is already written ? and how easy would it be to track all places and actually making the changes in the code?
in a small application, this is not a big problem. but applications grow with time. if the table gets to be widely used, this is very dangerous. if you had code working with the 7th and 8th flag, and they were removed and the decision was (by some other programmer lets say) to reuse the same places, any code that used to access the 7th and 8th bit will keep functioning (incorrectly) until that is noticed. it could already do harmful things until the issue is spotted and fixed. if you had separate columns and you dropped them, the error will pop up to the surface on the very first use of that code as the columns won't be there.
c- it will without a doubt be harder to make scripts that upgrade the data and/or change structure for the dba. an experienced dba will not sit and write the column names one after the other and will use its tools to generate scripts. with bit manipulation, he will have to work by hand and make no mistake in the expressions he produces in various selects/updates
5- all the above is database related. once it reaches your application, you are free.
you can read the 16 flags from the database and produce your integer and from now on, your code can use bit manipulation on it and you can save time (by writing your functions that deal with it once and using them). I personally think that here too its better not to do so but anyway its your choice.
I know i am not focused and that i might have repeated here and there. But I also hope that i was able to help you in seeing longer term considerations that will help you make the right choice for your case.
take a look at SET Column Type
You can use EnumSet. It's the best way to emulate flags - much more clear in design and have almost the same performance as int. Can be easily translated to int (to read/put into database). For more information look at "Effective Java" book, chapter "EnumSet"
In the primary question you ask that what is more efficient then what is better. This complicate the answer.
From point of view of Developer and DBA having a single column is more efficient solution. Because you spare place and using masks you increase the performance of inserts and updates.
From point of view data analyst the separate column is more efficient solution, each column has specified role.
As goes fro me i prefer the masks
- Les changes in code
- Better management (limited integer capacity is a risk here)
I have a table that displays the output from a database, this can often be up to or over 10,000 rows. I'm looking to implement a 'quick find' function that will allow the user to easily scroll to a row with the value for which they search.
The idea is for this to be very similar to Chrome or Firefox's quick find. So:
User hits 'CTRL+F'
A box pops up:
The user can then quickly skim through the table pushing / or /\ to each matching value
My question is, does such a table already exist? I have looked at NatTable and XViewer but they both seem like rather large additions for a relatively minor change.
Another library I've looked at that is close to fitting the billl is Lucene TableSearcher unfortunately this is designed for filtering rather than highlighting. However it may be an avenue worth investigating.
It's not quite the same, but JFace does have support for filtering views. You type in a text box and view contents shrink to what matches the filter. It's basically another way to present search/find functionality to the user.
You can see this in action in Eclipse Preferences (the filter text field about the navigation tree in the dialog). See org.eclipse.ui.dialogs.FilteredTree class. I don't believe there is a FilteredTable or equivalent, but underlying facilities are pretty generic, so it shouldn't take much work to put one together. Take a look at FilteredTree source and go from there.
The solution I chose wasn't as simple as I'd hoped but has a satisfactory performance.
The tables involved are virtual tables that can be very large. So on top of this upon loading the dataset we start a thread to build an index in the form of a list. Because these tables can be very large (but aren't often) we set a 50mb limit to this list and stop indexing it when we hit it. For most cases this is perfectly acceptable but for the larger tables it means that whilst it may be slower, it is considerably more stable than other options.
The 'search' is a simple SWT textbox with ModifyListener that iterates over the list, performing a 'contains' on each entry. Upon a match the row and column are calculated from the index number (0=0,0, 1=0,1 ..).
It would have been nice to have a simple 'drop in' solution for this but all of the tables on offer had too many features.
Use FilteredItemsSelectionDialog dialog box
I have read (http://old.nabble.com/using-q%3D--,-adding-fq%3D-to26753938.html#a26805204):
FWIW: limiting the number of rows per
request to 50, but not limiting the
start doesn't make much sense -- the
same amount of work is needed to
handle start=0&rows=5050 and
start=5000&rows=50.
Than he completes:
There are very few use cases for
allowing people to iterate through all
the rows that also require sorting.
Is that right? Is that true just for sorted results?
How many pages of 10 rows each do you recommend to allow the user to iterate?
Does Solr 1.4 suffer the same limitation?
Yes that's true, also for Solr 1.4. That does not mean that start=0&rows=5050 has the same performance as start=5000&rows=50, since the former has to return 5050 documents while the latter only 50. Less data to transfer -> faster.
Solr doesn't have any way to get ALL results in a single page since it doesn't make much sense. As a comparison, you can't fetch the whole Google index in a single query. Nobody really needs to do that.
The page size of your application should be user-definable (i.e. the user might choose to see 10, 25, 50, or 100 results at once).
The default page size depends on what kind of data you're paging and how relevant the results really are. For example, when searching on Google you usually don't look beyond the first few results, so 10 elements are enough. eBay, on the other hand, is more about browsing the results, so it shows 50 results per page by default, and it doesn't even offer 10 results per page.
You also have to take scrolling into account. Users would probably get lost when trying to browse through a 200-result page, not to mention that it takes considerably longer to load.
start=0&rows=5050 and start=5000&rows=50
Depends how you jump to start=5000. If you scroll through all results from 0 to 4999 ignoring them all and then continue scrolling from 5000 to 5050 then yes, same amount of work is done here. Best thing to do is to limit the rows fetched from database itself by using something like ROWNUM in Oracle.
.
iterate through all the rows that also require sorting
Few but yes there are use cases that have this requirement. Examples would be CSV/Excel/PDF exports.