I have been working on a project using the latest appfuse as a base.. I have extended a lot of the models particularly the User to accommodate some of the things I am doing, one of them is adding a few Lists to the user that have larger lists attached to them, here is an example:
User -> LeadLists (maybe 100 or so of these) -> Leads (Upwards of 50k)
And that is where my problems start. I have a process working that the user uploads a CSV I parse that into Lead objects and add them to the list then to the user then save the user and let cascade save do its work. However once save fires it takes 20 minutes or more for it to finish and usually with a permGen memory error...
Problem #2 is once they are actually in the DB I have not been able to display them at all without getting another permGen out of memory error.
Can anyone please offer some insight into what I may be doing wrong? I have enabled the hibernate batch size and set it to 50, what else can I do to get this ridiculous insert time down?
Did you write your code using Hibernate batch processing best pratice?
If not, check this link.
If yes and you are trying to write a single User+LeadLists(100)+Leads(50k) in a single shot (are 50K items!!! That's no peanuts!) you have this choices:
Move all to flat JDBC (ORM sometimes are not the best choice for batch programming) can be a good solution, but required you to rewrite some parts of code probably,
Move to StatelessSession (just to give it a chance),but I think PermGen error is around the corner,
Increase PermGen space - sized on some statistics about objects dimension, for example - can resolve PermGen problem but not slow issue
Drastic: Move to Spring-batch, framework used to perform batch conversion of big amount of data. Probably you will reduce by a lot saving time, but for sure you will resolve PermGen space (the real problem, IMO. A slow program is better than one that crash and lose data)
My 2 cents
Related
I'm developing a software package which makes heavy use of arrays (ArrayLists). Instructions to be process are put into an array queue to be processed, then when used, deleted from the array. Same with drawing on a plot, data is placed into an array queue, which is read to plot data, and the oldest data is eventually deleted as new data comes in. We are talking about thousands of instructions over an hour and at any time maybe 200,000 points plotted, continually growing/shrinking the array.
After sometime, the software beings to slow where the instructions are processed slower. Nothing really changes as to what is going on for processing, that is, the system is stable as to what how much data is plotted and what instructions are being process, just working off similar incoming data time after time.
Is there some memory issue going on with the "abuse" of the variable-sized (not a defined size, add/delete as needed) arrays/queues that could be causing eventual slowing?
Is there a better way than the String ArrayList to act as a queue?
Thanks!
Yes, you are most likely using the wrong data structure for the job. An ArrayList is a list with a backing array so get() is fast.
The Java runtime library has a very rich set of data structures so you can get a well-written and debugged with the characteristics you need out of the box. You most likely should be using one or more Queues instead.
My guess is that you forget to null out values in your arraylist so the JVM has to keep all of them around. This is a memory leak.
To confirm, use a profiler to see where your memory and cpu go. Visualvm is a nice standalone. Netbeans include one.
The use of VisualVM helped. It showed a heavy use of a "message" form that I was dumping incoming data to and forgot existed, so it was dealing with a million characters when the sluggishness became apparent, because I never limited its size.
I am currently working on an app that have 15000+ rows and 20+ columns array on database. If a user search in app then two things I can do
1. Is I will save all data from database when app runs and then after search for the user preference data.
2/ I will convert user data into a query and fire on database and retrieve data from database at same time.
Additionally I want to ask if I store 10^6*20 size array in app then how much space it take and how it behave to reload the app.
And if I fetch then what will be fetching time complexity in worst case data usage.
Thanks for your appreciation in before.
With only about 200 MB or less of data either way will be fast, assuming you allocate enough heap to the process. In the modern era that is not much data. Which will be faster? That all depends.
How fast is your I/O system? What database would you use? How would you code your algorithm? Are you doing any processing in parallel? What bugs have you written into your code? What amount of overall program time does access to these data use? What else is running on the system?
Only actual measurement of both approaches under very well controlled conditions that simulate realistic loads will reveal which is faster.
But at what cost? More complex systems have more risk and costs. Complicated premature "optimization" code makes code brittle and hard to understand. Workarounds​ take time and effort, wasted time and effort if there's nothing to work around.
So instead of asking which is fast(er), ask what makes sense. What gets the job done at all? What costs the least to do? What's more correct? What minimizes risk?
We have market data handlers which publish quotes to KDB Ticker Plant. We use exxeleron q java libary for this purpose. Unfortunately latency is quite high: hundreds milliseconds when we try to insert a batch of records. May you suggest some latency tips for KDB + Java binding, as we need to publish quite fast.
There's not enough information in this message to give a fully qualified response, but having done the same with Java+KDB it really comes down to eliminating the possibilities. This is common sense, really, nothing super technical.
make sure you're inserting asynchronously
Verify it's exxeleron q java that is causing the latency. I don't think there's 100's of millis overhead there.
Verify the CPU that your tickerplant is on isn't overloaded. Consider re-nicing, core binding, etc
Analyse your network latencies. Also, if you're using Linux, there's a few tcp tweaks you can try, e.g. TCP_QUICKACK
As you're using Java, be smarter about garbage collection. It's highly configurable, although not directly controllable.
if you find out the tickerplant is the source of latency, you could either recode it to not write to disk - or get a faster local disk.
There's so many more suggestions, but the question is a bit too ambiguous.
EDIT
Back in 2007, with old(ish) servers and a very old version of KDB+ we were managing an insertion rate of 90k rows per second using the vanilla c.java. That was after many rounds of the above points. I'm sure you can achieve way more now, it's a matter of finding where the bottlenecks are and fixing them one by one.
Make sure the data publish to ticket plant are is batch, like wait for a little bit to insert say few rows of data in batch, but not insert row by row once any new records coming
I am working on a small, toy application to expand my knowledge about Java JavaFx and Sql. I have a MySQL server in my local network, which I am able to communicate with and a simple Tableview can be populated, sorted ... etc. The data has only to be shown to the user, no editing. Everything nice and clean.
The Problems:
There are around 170 000 rows with 10 col., all chars, to display, which seems to be rather hard to do in reasonable time. The query is done during startup and it take around 1 1/2 min before I can see the table.
Also the memory footprint is enormous, the application without populated Tableview around 70 mb, with all the data it has 600-700 mb (the xml file which is used to populate the mysql is 70 mb in size ... ) !
Sorting is REALLY slow, I am using Stringproperty which should give a boost according to: JavaFx tableview sort is really slow how to improve sort speed as in java swing (If I understood that correctly) However the custom sort, I did not try so far.
My thoughts:
Similar to the application design for mobile, I think an adapter-pattern can fix these problems. Hence, I create an OberservableList with the correct size of elements, but only populate a limit of rows in the beginning. When I am scrolling done (scroll wheel) the List has to be updated with new elements in advance via sql-queries. This should give me a performance boost for the first Problem. Nice idea but what am I going to do if the user is going to scroll done via the scrollbar(click and drag down), then I would skip certain entries, but I need the information to give the user the feedback where to scroll to.
How could I fix this ?
For the sorting, I would use the sql sorting methods, so each sort will be performed on the sql server and a new OberservableList will be created. As before, only a certain amount of data would be loaded in the first query.
If this approach would also effect the memory footprint, I am not sure.
Your opinion:
Are my ideas reasonable and do-able in Java, JavaFx ?
I would love to hear your ideas about these problems.
Thank you.
I found out that JVx is capable of providing the lazy-loading function. This should do the trick
I'm writing a small agent in java that will play a game against other agents. I want to keep a small amount of state (probably approx. 1kb at most) around between runs of the program so that I can try to tweak the performance of the agent based upon past successes. Essentially, I will be reading a small amount of data at the beginning of each game and writing a small amount at the end. It seems like I have 2 options, file I/O or derby. Is there a speed advantage to either? Or does it not really matter for such a small amount of data?
With 1kb of data, you are better off using standard file IO. Most likely, you could serialize the entire object tree to disk and dismply deserialize when you startup again. If you wanted to get fancy, you could use JAXB to serialize to XML instead of binary files.
As much as I love to fit every problem to the database solution, I don't think that's very practical here. Unless you have some special need of database specific capabilities, you are introducting a lot of overhead, complexity, maintenance problems by using a database.
The only areas where you might really want to use the database is if you have a lot of small objects/rows and you frequently perform sorts and filters on the data. But even then, you could probably keep a dozen in-memory ordered lists and get better performance with less resources and without the headache of a database.
If you really think you need a database in this scenario, consider HSQL. I don't consider it a real database, but it's a in-memory database that can persist to a file. Low overhead, low complexity, and relatively few points of failure. Plus, if you need to edit the persisted data, you can do so with a text editor. Can't say that about Derby.
Considering that these objects can vary per file size, and your computer's specs (bus speed, HD speed) affect this, the only way to be sure is to write your own benchmark. Just create a simple for loop, count from 1 to 1000, and read the file inside the loop over and over (but do not create and destroy the objects inside the loop, just focus on the reading part).
Of course this whole exercise reeks of pre-optimization, which can lead to bad coding habit. Just write your code in the most readable, simple fashion, and if there is a speed problem, refactor as needed.
But since it's a small amount of data, I would say it won't matter.