Java process in linux needs initial warmup - java

we have a legacy java multithreaded process on a RHEL 6.5 which is very time critical (low latency), and it processes hundreds of thousands of message a day. It runs in a powerful Linux machine with 40cpus. What we found is the process has a high latency when it process the first 50k messages with average of 10ms / msg , and after this 'warmup' time, the latency starts to drop and became about 7ms, then 5ms and eventually stops at about 3-4ms / second at day end.
this puzzles me , and one of possibility that i can think of is maps are being resized at the beginning till it reaches a very big capacity - and it just doesn't exceed the load factor anymore. From what I see, the maps are not initialized with initial capacity - so that is why i say that may be the case. I tried to put it thru profiler and pump millions of messages inside, hoping to see some 'resize' method from the java collections, but i was unable to find any of them. It could be i am searching for wrong things, or looking into wrong direction. As a new joiner and existing team member left, i am trying to see if there are other reasons that i haven't thought of.
Another possibility that i can think of is kernel settings related, but i am unsure what it could be.
I don't think it is a programming logic issue, because it runs in acceptable speed after the first 30k-50k of messages.
Any suggestion?

It sounds like it takes some time for the operating system to realize that your application is a big resource consumer. So after few seconds it sees that there is a lot of activity with files of your application, and only then the operating system deals with the activity by populating the cache and action like this.

Related

Java Multi-threading How to Calculate and Limit the Maximum Number of Threads?

Context to the question;
This is a single bit of code that is triggered by pressing 'go'. Within the code, it branches out multiple threads because the activity per action (i.e. doSomething() within a for-each loop) is really slow, yet very low on processor/memory/hard drive space requirements.
As such, I tried branching out every single iteration of the loop to create a new thread and well.... things just absolutely melted ha ha. Out of Memory exceptions being thrown everywhere.
I want to run this multi-threaded, but how do I go about limiting the available threads within the code base to an acceptable level so that the application continues to work without melting?
I've not had to deal a lot with single source multi-threading previously (web applications are naturally multi-threaded, but also mutli-source and hence easier to manage supply/demand on the server resources).
The application in question doesn't actually need to run this piece of functionality multi-threaded, it would just be better if it did. In reality it doesn't make much difference if the code takes 1 hr to complete or 100 hrs, but for my own sanity, I'd like to to be as efficient as possible.
Pointers?

Adobe CQ Evaluation: Are there problems with Multi Site Manager / TarOptimizer?

I work at a retailer and we consider to introduce CQ5 as a CMS.
However, after doing some research and talking to consultants it turns out, that there may be things that may be "complicated". Perhaps one of you can shed a little light on this.
The first thing is, we were told that when you use the Multi Site Manager to create multi language pages (about 80 languages) the update process can be as slow as half an hour until a change is ultimately published. Did someone of you experience something similar?
The other thing is, that the TarOptimizer has pretty long running times. I was told that runs that take up to 24 hours are not uncommon. Again my question: Did someone of you had such a problem or has an explanation for this?
I am really looking forward to your response.
These are really 2 separate question, but I'll address them based on my experience.
The update process for creating new multi-language pages will vary based on the number of languages, and also the number of publish instances and web-servers (assuming you're using dispatcher to cache) you are running. This is because the replication process is where the bottleneck is (at least in my experience), and as such if you're trying to push out a large amount of content across a large number of publishers with a large number of front-end web-servers whose cache needs to be cleared, there will be some delay in getting this to happen since replication is an asynchronous process. The longest delay I've seen for this has been in the 10-15 minute range, that was with 12 publishers and 12 front end webservers, but this comes with the obvious caveat that your mileage may vary.
For the Tar Optimzation job, I'd encourage you to take a look at this page as it has a lot of good info about the Tar Optizer job and how to tune it. The job can take a long time to run when you have a large repository, especially on an instance with a large number of write operations, but the run times can be configured so that it only runs during a given time period, and it will pick up where it left off the night before if the total run time is longer than the allowed run time. By default, it runs from 2-5 am each night, so if it takes more than that 3 hour period, it will continue where it left off the next night, allowing it to optimize the entire repository over a period of a few days if needed.

Ways to work around a memory leak in Java

I'm building a large import script that uses functionality from a separate code base that I suspect of having a memory leak. It calls the code base as many as 10000 times for the same operations and while the first is relatively quick (2 sec) the script is requiring a long time to run (over 100 hours and counting) and by the end the same task is up to 60 sec or more (and still climbing). What is the best way to work around this while the leaks are found and fixed?
Some solutions that have been brainstormed would be:
Create a process that runs a part of the script then end it, reclaiming the resources it used.
Use a shell script to launch the program multiple times completing a sub-set of the tasks each time and have the updated data output to file to be used by the next iteration
edit: Changed the way the question was phrased to make it clear that the import and the code base are separate programs
You know, none of the evidence you have presented clearly points to a storage leak. The real problem could be something completely different, like a poorly designed algorithm, or a poorly tuned database table or query.
Assuming that this is a storage leak and applying "band-aid" solutions could be a waste of time, or actually make the problem worse.
You will be better off spending the time up front to determine what the real problem is and fix it, rather than trying a series of workarounds ... which may turn out to be futile.
I solved this issue by minimizing the scope that contains references to the other codebase. Basically every time I initialize an object or call a function from the other codebase I went through hoops to make sure it existed for the minimal time possible. Often setting references again to NULL in order to make sure all references were removed.
This ended up working excellently, reduced the time from over 150 hours and counting to under 30.

Methods of limiting emulated cpu speed

I'm writing a MOS 6502 processor emulator as part of a larger project I've undertaken in my spare time. The emulator is written in Java, and before you say it, I know its not going to be as efficient and optimized as if it was written in c or assembly, but the goal is to make it run on various platforms and its pulling 2.5MHZ on a 1GHZ processor which is pretty good for an interpreted emulator. My problem is quite to the contrary, I need to limit the number of cycles to 1MHZ. Ive looked around but not seen many strategies for doing this. Ive tried a few things including checking the time after a number of cycles and sleeping for the difference between the expected time and the actual time elapsed, but checking the time slows down the emulation by a factor of 8 so does anyone have any better suggestions or perhaps ways to optimize time polling in java to reduce the slowdown?
The problem with using sleep() is that you generally only get a granularity of 1ms, and the actual sleep that you will get isn't necessarily even accurate to the nearest 1ms as it depends on what the rest of the system is doing. A couple of suggestions to try (off the top of my head-- I've not actually written a CPU emulator in Java):
stick to your idea, but check the time between a large-ish number of emulated instructions (execution is going to be a bit "lumpy" anyway especially on a uniprocessor machine, because the OS can potentially take away the CPU from your thread for several milliseconds at a time);
as you want to execute in the order of 1000 emulated instructions per millisecond, you could also try just hanging on to the CPU between "instructions": have your program periodically work out by trial and error how many runs through a loop it needs to go between instructions to "waste" enough CPU to make the timing work out at 1 million emulated instructions / sec on average (you may want to see if setting your thread to low priority helps system performance in this case).
I would use System.nanoTime() in a busy wait as #pst suggested earlier.
You can speed up the emulation by generating byte code. Most instructions should translate quite well and you can add a busy wait call so each instruction takes the amount of time the original instruction would have done. You have an option to increase the delay so you can watch each instruction being executed.
To make it really cool you could generate 6502 assembly code as text with matching line numbers in the byte code. This would allow you to use the debugger to step through the code, breakpoint it and see what the application is doing. ;)
A simple way to emulate the memory is to use direct ByteBuffer or native memory with the Unsafe class to access it. This will give you a block of memory you can access as any data type in any order.
You might be interested in examining the Java Apple Computer Emulator (JACE), which incorporates 6502 emulation. It uses Thread.sleep() in its TimedDevice class.
Have you looked into creating a Timer object that goes off at the cycle length you need it? You could have the timer itself initiate the next loop.
Here is the documentation for the Java 6 version:
http://download.oracle.com/javase/6/docs/api/java/util/Timer.html

Possible to slow down time in the Java virtual machine?

Is it possible to slow down time in the Java virtual machine according to CPU usage by modification of the source code of OpenJDK? I have a network simulation (Java to ns-3) which consumes real time, synchronised loosely to the wall clock. However, because I run so many clients in the simulation, the CPU usage hits 100% and hard guarantees aren't maintained about how long events in the simulator should take to process (i.e., a high amount of super-late events). Therefore, the simulation tops out at around 40 nodes when there's a lot of network traffic, and even then it's a bit iffy. The ideal solution would be to slow down time according to CPU, but I'm not sure how to do this successfully. A lesser solution is to just slow down time by some multiple (time lensing?).
If someone could give some guidance, the source code for the relevant file in question (for Windows) is at http://pastebin.com/RSQpCdbD. I've tried modifying some parts of the file, but my results haven't really been very successful.
Thanks in advance,
Chris
You might look at VirtualBox, which allows one to Accelerate or slow down the guest clock from the command line.
I'm not entirely sure if this is what you want but, with the Joda-time library you can stop time completely. So calls to new Date() or new DateTime() within Joda-time will continously return the same time.
So, you could, in one Thread "stop time" with this call:
DateTimeUtils.setCurrentMillisFixed(System.currentTimeMillis());
Then your Thread could sleep for, say, 5000ms, and then call:
// advance time by one second
DateTimeUtils.setCurrentMillisFixed(System.currentTimeMillis() + 1000);
So provided you application is doing whatever it does based on the time within the system this will "slow" time by setting time forwards one second every 5 seconds.
But, as i said... i'm not sure this will work in your environment.

Categories

Resources