How to speed up frequent writing

How to speed up frequent writing - java

we created an java agent which does a check on our application suite to see if for instance the parent/child structure is still correct. Therefore it needs to check for 8000+ documents accros several applications.
The check itself goes very fast. We use a navigator to retrieve data from views and only read data from those entries. The problem is within our logging mechanism. Whenever we report a log entry with level SEVERE ( aka: A realy big issue ) the backend document is directly updated. This is becuase we dont want to lose any info about these issues.
In our test runs we see that everything runs smoot but as soon as we 'create' a lot of severe issues the performance drops enormously because of all the writes. I would like to see if there are any notes developers facing the same challenge.. How couuld we speed up the writing without losing any data?
-- added more info after comment from simon --
Its a scheduled agent which runs every night to check for inconsistencies. Goal is ofcourse to find inconsistencies and fix the cause and to eventualy have no inconsistencies reported at all.

Its a scheduled agent which runs every night to check for
inconsistencies.
OK. So there are a number of factors to take into account.
Are there any embedded Jars? When an agent has embedded jars the server has to detach them from the agent to the disk before they can run the code. This is done every time the agent executes. This can be a performance hit. If your agent spawns a number of times, remove the embedded jars and put them into the lib\ext folder on the server instead (requires server restart).
You mention it runs at night. By default general housekeeping processes run at night. Check the notes ini for Server Tasks scheduled and appraise what impact they have on the server/agent when running. For example:
ServerTasksAt1=Catalog,Design
ServerTasksAt2=Updall
ServerTasksAt5=Statlog
In this case if ran between 2-5 then UPDALL could have an impact on it. Also check program documents for scheduled executions.
In what way are you writing? If you are creating a document for each incident and the document contents is not much then the write time should be reasonable. What is liable to be a hit in performance is one of the following.
If you are multi threading those writes.
Pulling a log document, appending a line, saving and then repeating.
One last thing to think about. If you are getting 3000 errors, there must be a point where X amount of errors means that there is no point continuing and instead to alert the admin via SNMP/email/etc? It might be worth coding that in as well.
Other then that, you should probably post some sample code in relation to the write.

Hmm, difficult or general question.
As far as I understand, you update the documents in the view you are walking through. I would set view.AutoUpdate to false. This ensures that the view is not reloaded while you are running your code. This should speed up your code.
This is an extract from the Designer help:
Avoid automatically updating the parent view by explicitly setting
AutoUpdate to False. Automatic updates degrade performance and may
invalidate entries in the navigator ("Entry not found in index"). You
can update the view as needed with Refresh.
Hope that helps.
If that does not help you might want to post a code fragment or more details.

Create separate documents for each error rather than one huge document.
or
Write to a text file directly rather than a database and then pulling if necessary into a document. This should speed things up considerably.

Related

Java server application slow after period of idleness (Windows)

I'm having trouble with a Jetty 9 server application that seems to go into some kind of resting state after a longer period of idleness. Normally the memory usage of the Java process is ~500 MB, but after being idle for some time it seems to drop down to less than 50MB. The first request that comes takes up to several seconds to respond whereas requests are normally on the scale of tens of milliseconds. But after one or two requests it seems like the application is back to it's normal responsive state.
I'm running on the 32-bit Oracle Java 8 JVM. My JVM configuration is very basic:
java -server -jar start.jar
I was hoping that this issue might be solvable through JVM configuration. Does anyone know if there's any particular parameter to disable this type of behavior?
edit: Based on the comment from Ivan, I was able to identify the source of the issue. Turns out Windows was swapping parts of the Java process out to disk. See my own answer below for a description of my solution.

Based on the comment from Ivan, I was able to identify the source of the issue. Turns out Windows was swapping parts of the Java process out to disk. This was clearly visible when comparing the private working set to the commit size in the task manager.
My solution to this was two-fold. First, I made a simple scheduled job inside my server app that runs every minute and does a simple test run to make sure that the important services never go inactive for long periods. I'm hoping this should ensure that Windows doesn't regard the related pages as inactive.
Afterwards, I also noticed that the process was executing with "Below normal" priority. So I changed the script that starts the server to ensure that it's running with "High" priority going forward. This seems likely to affect swapping behavior and may very well also have been enough to resolve the issue on it's own, but I only found it after already deploying my first solution so that remains unclear. In any case, everything seems to be working as it should now.

Optimizing performance by saving data in local database

We are creating a financial transaction system. There is a blacklist service (soap) exposed by some external system. We have to call this service in each transaction to check whether the sender or receiver exist in black list. If they do we should not let the transaction through.
Black list size is a few thousand.
To optimize the system, we are thinking to keep a copy of this list in our database and check it from there and whenever there is an update in the blacklist external system will inform us.
From architecture point of view, is it a good approach? Should we use caching libraries instead of doing this manually?
Application is being developed in Java with Oracle database.

From my point of view you shouldn't invent the wheel, always use known libraries instead writing your own code since it probably more optimized and maintained.
If you are able to save the data on your local db and be inform somehow about changes in the external system it can be a better approach.
Just be sure to synchronized your actions since you would probably want for the action that checked if someone is blacklisted to wait for the action that synchronized the data from the blacklist system in order to have the most updated data.

I think it can (or not) be a good approach.
1) That SOAP in general means a webservice and that in general means latency. Do you want all step in your process to wait for that call to return?
2) That list maybe don't changes fast? Meaning it ill not hurt to keep that in "cache". You can tune the periodicity you system checks and updates your cache.
3) That SOAP also means an asynch call (in general) you can keep your system running while the "cache" is updated.
If you system is hot processing a lot and needs that to check that list many times/second and that list don't change 0.1% per day you ill be fine.
In the other hand if you system just run a batch a few times a day and that list is changing a lot every second it ill be not the best approach.

What to consider when writing a java program that is supposed to run 'forever'

I have to write a program that is thought to run 'forever' , meaning that it won't terminate regularly. Up until now I always wrote programs that would run and be terminated at the end of the day. The program has to do some synchronizations, pause for n minutes and than sync again.
AFAIK there should be no problem with my current implementation and it should theoretically run just fine, but I'm lacking any real-world experience.
So are there any 'patterns' or best practices for writing very robust and resource efficient java programs that have a very long runtime? What could be possible problems after for example a month/year of runtime?
Some background :
Java : 1.7 but compiled down to 1.5
OS : Windows (exact version is not certain yet)
Thanks in advance

Just a brain dump of all the things I've had to keep in mind when writing this kind of app.
Avoid Memory Leaks
I had an app that runs once at mid day, every day, and in that I had a FileWriter. I wasn't closing that properly, and then we started wondering why our virtual machine was going into melt down after a few weeks. Memory leaks can come in the form of anyhing really, with one of the most common examples being that you don't de-reference an object appropriately. For example, using a class's field as a method of temporary storage. Often the class persists, and so does the reference. This leaves you with objects, sitting in memory and doing nothing.
Use the right kind of Scheduler
I used a java Timer in that app, and later I learnt that it's better to use a ScheduledThreadPoolExecutor when another app was changing the System clock. So if you plan on keeping it completely Java based, I would strongly recommend using that over a Timer for all of the reasons detailed in this question.
Be mindful of memory usage and your environment
If your app is loading large amounts of data each and every day, and you have other apps running on the same server, you may want to be careful about the timing. For example, say at mid day, three of the apps run their scheduled operation, I would say running it at any other time would probably be a smart move. Be mindful of the environment in which you're executing your code in.
Error handling
You probably want to configure your app to let you know if something has gone wrong, without the app breaking down. If it's running at a certain time every few hours, that means people are probably depending on it, so I would have a function in your Java code that sends out an email to you, detailing the nature of the exception.
Make it configurable
Again, if it needs to run at various points in the day, you don't want to have to pull the thing down for a few hours to work out some minor changes to your code. Instead, port it into a java Properties file, or into an XML Config (or really, whatever). The advantage of this is that you can update your program and get it up and running before anyone really noticed the difference.
Be afraid of the static keyword
That bad boy will make objects persist, even when you destroy their parent reference. It is the mother of all memory leaks if you are not careful with it. It's fine for constants, and things that you know don't need to change and need to exist within the project to run well, but if you're using it for random values inside a project, you're going to quickly wonder why your app is crashing every few hours rather than syncing.
Props to #X86 for reminding me of that one.

Memory leaks are likely to be the biggest problem. Ensure that there are no long-term references held after an iteration of your logic. Even a relatively small object being referenced forever, will exhaust the memory eventually (and worse, it's going to be harder to detect during testing if the growth rate is 1GB/month). One approach that may help is using the snapshot functionality of profilers: take a snapshot during the pause, let the sync run a few times, and take another snapshot. Comparing these should show the delta between the synchronizations, which should hopefully be zero.
Cache maintenance is another issue. The overall size of a cache needs to be strictly limited (whereas often you can get away without in short-running programs, because everything seen will be small enough to not cause problems). Equally it's more important to do cache-invalidation properly - broadly speaking, everything that gets cached will become stale at some point while your program is still running, and you need to be able to detect this and take appropriate action. This can be tricky depending on where the golden source of the cached data is.
The last thing I'll mention is exception-handling. For short-running processes, it's often enough to simply let the process die when an exception is encountered, so the issue can be dealt with, and the app rerun. With a long-running process you'll likely need to be more defensive than this. Consider running parts of your program in threads, which can be restarted* if/when they fail. You may need a supervisor-type module, which checks that everything else is still heartbeating and reboots it if not. If appropriate to your structure, this is anecdotally a lot easier to achieve with actors-style libraries rather than Java's standard executors. And if it's at all possible, you may want to have hooks (perhaps exposed over JMX/MBeans) that let you modify the behaviour somewhat, to allow a short-term hack/workaround to be affected without having to bring the process down. Though this requires quite some amount of foresight to predict exactly what's going to go wrong in several months...
*or rather, the job can be restarted in another thread

Memory build up in a java program GapContent$MarkData

I am currently developing an application that need to be very fast executing about every 20ms (yeah I know, should not have taken Java in the first place). I worked a lot optimizing the code so it would not be too computation greedy. However, as I have seen I may have not put enough effort in GUI and memory optimization. My application can run at the speed I want but after 1-2 minutes it drastically slowdown suggesting a memory problem.
I did run the profiler under NetBeans and found out that most of the memory was taken by the javax.swing.text.GapContent$MarkData
And searched on google, I saw mostly nothing understandable helping me with that problem. So is there anyone that could help me? My first guess would be that the garbage collector doesn't run long enough to erase unused object...but I don't have more clue than that.

You are right to employ profiling; now use Profile > Profile Project > CPU to find and target the hot spot(s).
The slowdown was due to a function that closed and opened a connection with the database each iteration.
Consider using SwingWorker to query the database in the background and process() results on the event dispatch thread, as shown in this related example.

What you are calling a "Memory build up" is only 600Kb. If this 600Kb is problematic I question your choice of Java and Swing.
I have an application that sometimes generates hundreds of megabytes of log messages.
I'm guessing your GUI application is somewhat similar. The app probably has a JTextPane that displays a log. As the app runs it adds messages to the JTextPane.
The Document implementation used by JTextPane is a PlainDocument.
Even though you probably always insert new log messages at either only the top or only the bottom, the PlainDocument implementation is general-purpose. It supports modification anywhere in the document by putting a gap in the underlying stream of text and then putting the changes into the gap. As the app inserts new messages into the Document it creates lots and lots of Gaps.
The actual text to display has to exist somewhere. There is probably a better way to implement a huge text pane, but the default JTextPane will look, to the profiler, like a memory leak. If you have 600kb of log messages, its going to take at least 600kb of memory somewhere.

You should know that the Java Console uses a PlainDocument with GapContent$MarkData and just having the console open with lots of data in it will cause this "memory leak" to appear. Clear the console to see the number of MarkData drop back to acceptable levels.

How to improve my software project's speed?

I'm doing a school software project with my class mates in Java.
We store the info on a remote db.
When we start the application we pull all the information from the database and transform it into objects to use in our application (using java sql statemens).
In the application we edit some of these objects and then when we exit the application
we save or update information in the database using Hibernate.
As you see we dont use Hibernate for pulling in information, we use it just for saving and updating.
We have 2, but very similar problems.
The loading of object(when we start the app) and the saving of objects(with Hibernate) in the db(when closing the app) is taking too much time.
And our project its not a huge enterprise application, its a quite small app, we just manage some students, teachers, homeworks and tests. So our db is also very very small.
How could we increase performance ?
later edit: if we use a local database it runs very quick, it just runs slow on remote databases

Are you saying you are loading the entire database into memory and then manipulating it? If that is the case, why don't you instead simply use the database as a storage device, and do lookups and manipulation as necessary (using Hibernate if you like, or something else if you don't)? The key there is to make sure that you are using connection pooling, as that will reduce the connection time.
If this is what you are doing, then you could be running into memory issues as well - first, by not caching the entire database in memory, you will reduce memory and will spread out the network load from the beginning/end to the times when it needs to happen.

These 2 sentences are red flags for me :
When we start the application we pull
all the information from the database
and transform it into objects to use
in our application (using java sql
statemens). In the application we edit
some of these objects and then when we
exit the application we save or update
information in the database using
Hibernate.
Is there a requirements reason that you are loading all the information from the database into memory at startup, or why you're waiting until shutdown to save changes back in the database?
If not, I'd suggest a design change. If you've already got Hibernate mappings for the tables in the DB, I'd use Hibernate for both all of your CRUD (create, read, update, delete) operations. And, I'd only load the data that each page in your app needs, as it needs it.
If you can't make that kind of design change at this point, I think you've got to look closely at how you're managing the database connections. Are you using connection pools? Are you opening up multiple connections? Forgetting to release them?
Something else to look at. How are you using Hibernate to save the entities to the db? Are you doing a getHibernateTemplate().get on each one and then doing an entity.save or entity.update on each one? If so, that means you are also causing Hibernate to run a select query for each database object before it does a save or update. So, essentially, you'd be loading each database object twice (once at the beginning of the program, once before saving). To see if that's what's happening, you can turn on the show_sql property or use P6Spy to see exactly what queries Hibernate is running.

For what you are doing, you may very well be better off serializing your objects and writing them out to a flat file.
But, much more likely, you should just read / update objects directly from your database as needed instead of all at once, for all the reasons aperkins gives.
Also, consider what happens if your application crashes? If all of your updates are saved only in memory until the application is closed, everything would be lost if the app closes unexpectedly.

The difference in loading everything from a remote DB server versus loading everything from a local DB server is the network latency / pipe size. The network is a much smaller pipe than anything else. Two questions: first, how much data are we really talking about? Second, what is your network speed? 10/100/1000? Figure between 10 and 20% of your pipe size is going to be overhead due to everything from networking protocols to the actual queries themselves.
As others have stated, the way you've architected is usually high on the list of "don't do". When starting, pull only enough data to initialize the app. As the user works through it, pull what you need for that task.
The ONLY time you pull everything is when they are working in a disconnected state. In that case, you still don't load everything as objects in the application, you just work from a local data store which gets sync'ed with the remote server every so often.

The project its pretty much complete. we cant do large refactoring on it now.
I tried to use a second level cache for Hibernate when saving. EhCacheProvider.
in hibernate.xml:
net.sf.ehcache.hibernate.EhCacheProvider
i have done a config for the cache, ehcache.xml:
i have put the cache.jar in the project build path
and i have set the hibernate property for every class and set in the mapping.
But this cache doesn't seem to have an effect. I dont know if it works(if it is used).

Try minimising number of SQL queries, since every query has its own overhead.
You can enable database compression, which should speed things up when there is a lot of data.
Maybe you are connecting to the database many times?
Check the ping time of remote database server - it might be the problem.

As your application is just slow when running on a remote database server, I'd assume that the performance loss is due to:
Connecting to the server: try to reuse connections (pass the instance around) or use connection pooling
Query round-trip time: use as few queries as possible, see here in case of a hand-written DAL:
Preferred way of retrieving row with multiple relating rows
For hibernate you may use its batch functionality and adjust hibernate.batch_size.
In all cases, especially when you can't refactor larger parts of the codebase, use a profiler (method time or sql queries) to find the bottleneck. I bet you'll find thousands of queries, each taking 10ms RTT) which could be merged into one.

Some other things you can look into:
You can allocate more memory to the JVM
Use the jconsole tool to investigate what the bottlenecks are.

Why dont you have two separate threads?
Thread 1 will load your objects one by one.
Thread 2 will process objects as they are loaded.
Your app will seem more interactive at startup.

It never hurts to review the basics:
Improving speed means reducing time (obviously), and to do that, you find activities that take significant time but can be eliminated or replaced with something that uses less time. What I mean by activity is almost always a function call, method call, or property call, performed on a specific line of code for a specific purpose. If may invoke I/O or it may invoke computation, or both. If its purpose is not essential, then it can be optimized.
Many people use profilers to try to find these time-wasting lines of code, but most profilers miss the target because they look at functions, not lines, they go to sleep during I/O, and they worry about "self time".
Many more people try to guess what could be the problem, or they ask others to guess, such as by asking on SO. Such guesses, in the nature of guesses, are sometimes right - more often not, but people still invest time and resources in them.
There's a very simple way to find out for sure, without guessing, what could fruitfully be optimized, and here is one way to do it in Java.

Thanks for your answers. Their were more than helpful.
We completely solved this problem like so:
Refactored the LOAD code. Now it uses Hibernate with Lazy Fetching.
Refactored the SAVE code. Now it saves, just the data that was modified and right after the time it was modified. This way we dont have a HUGE save an the end.
Im amazed of how good it all went. The amount of new code we had to write was very very small.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.