JDK 1.6 GC AdaptiveSizeThroughPutPolicy - java

Please explain the detailed meaning of VALUE used in the GC option :
-XX:AdaptiveSizeThroughPutPolicy
By default value given is 0.
Does this VALUE imply - the number of steps to use heuristics before real data is used?. What are implications of using a high(eg: 50 or 100) or low value (eg: 0)

The best way I know to understand those arcane options is going directly to the source:
psAdaptiveSizePolicy.cpp
It seems that 1 and !=1 are the only valid choices.
-XX:AdaptiveSizeThroughPutPolicy=1 is used in coordination with -XX:AdaptiveSizePolicyInitializingSteps=VALUE.

Related

Integrality Gap in MAXIMIZATION in CPLEX+Java | Bug?

I try to solve a large MIP in which the . If it does not solve optimally, it shall return the integrality gap (that is, difference between best integer solution and best solution of the linear relaxation).
Using getMIPRelativeGap of the Java+CPLEX interface, I sometimes got values in the range of 1.0E11-1.0E13 which does not make sense, as an integrality gap should be a percentage between 0 and 1. I tracked those cases down and found out that I get those results, if the best integer solution has a value of 0 (my inner problem is a profitable tour problem, thus, if the best route is not visiting any vertice). The integrality gap should be (bestobjective-bestinteger)/bestobjective (https://www.ibm.com/support/knowledgecenter/SSSA5P_12.6.0/ilog.odms.cplex.help/refdotnetcplex/html/M_ILOG_CPLEX_Cplex_MIPInfoCallback_GetMIPRelativeGap.htm), yet, it seems to be (bestobjective-bestinteger)/bestinteger.
I also tested a couple of other values (if the integer objective is positive), and were able to confirm this in examples.
Can someone else reproduce this behavior? Does this behavior make sense to you?
Thanks :)
Indeed, the documentation for CPXgetmiprelgap in the Callable Library (C API) says the following:
For a minimization problem, this value is computed by
(bestinteger - bestobjective) / (1e-10 + |bestinteger|)
where bestinteger is the value returned by CPXXgetobjval/CPXgetobjval
and bestobjective is the value returned by
CPXXgetbestobjval/CPXgetbestobjval. For a maximization problem,
the value is computed by:
(bestobjective - bestinteger) / (1e-10 + |bestinteger|)
So, it looks like the documentation for the Java API is buggy. The Java API just calls CPXgetmiprelgap under the hood, so it should be the same. Thanks for reporting this. I'll make sure that this gets passed on to the folks who can fix it.

Java class file format limit(s) exceeded:

I am using derby in-memory db in java . And I want to run a sql. My sql string's length is 61671 byte. But I get Java class file format limit(s) exceeded: method:e1 code_length (72447 > 65535) error. I know the limitation about derby like code limit must be less than 65535 byte. It's ok (class file format limit(s) exceeded)
But now my string' length is just 61671. Why I am getting code_length (72447 > 65535) error ?
How can I fix it?
Thx
my string' length is just 61671. Why I am getting code_length (72447 > 65535) error ?
Because code length and string length are two different things. Your code is too long. Split it.
As others have said, your problem is that your code is too long, not any specific string. I am afraid splitting your code into smaller methods/classes is your only option. Not only that, but it's just good practice. It's good practice to break your code into as many different segments as reasonably possible. Often, if you end up with giant classes/methods, it's because your classes or methods have low cohesion.
High cohesion (and therefore ultimately small classes etc) is strongly encouraged (essential reading for object oriented programmers). It will make your life, and the life of any programmers working with you, much more comfortable.
If you're trying to store your classfiles in a Derby databse, perhaps you should use a BLOB datatype, which does not have such a small length limit.

Why does FileChannel.map take up to Integer.MAX_VALUE of data?

I am getting following exception when using FileChannel.map
Exception in thread "main" java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(Unknown Source)
at niotest.NioTest.readUsingNio(NioTest.java:38)
at niotest.NioTest.main(NioTest.java:64)
Quickly looking into OpenJdk implementation shows that the method map(..) in FileChannelImpl takes size of type long as input. But inside the body, it compares it with Integer.MAX_VALUE and throws error if its greater than that. Why take long size as input but limit it to max integer length?
Anyone knows specific reason behind this implementation?
or is it some kind of bug?
Source URL - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/FileChannelImpl.java
I am running this program using 64bit JRE on 64bit Windows-2k8
It's not an implementation specific bug. The size is defined in the FileChannel.map as long, but...
size - The size of the region to be mapped; must be non-negative and no greater than Integer.MAX_VALUE
All compliant JVM implementations will be this way. I suspect the reason is a combination of history (who would need to access a file larger than 2GB? ;) and trying to push things forward in later versions of Java (it will be easier to allow values larger than Integer.MAX than it will be to change the data type from int to long.)
A lot of people find this int-based thinking in the Java API regarding anything File very confounding and short sighted. But remember, Java start development in 1995! I'm sure 2GB seemed like a relatively safe value at the time.
ByteBuffer's capacity is limited to Integer.MAX_VALUE, so there is no way to map anything larger than that.
Look at: MappedByteBuffer map(MapMode mode, long position, long size)
position has to be long for obvious reasons.
size is not necessary to be long but in any calculation it has to be promoted - for example position+size has to be a positive long. OS mapping indeed may use long to carry the mapping, map function (mmap) may need to map more than Integer.MAX_VALUE in order to preserve page size but ByteBuffer just can't use that.
Overall int lays very deep in java's design and there is no size_t C alike type, mass utilizing long instead of int will damper the performance. So in the end: if you need greater maps than 2GB, just use more than a single ByteBuffer.

what is the standard number of parameters that a java method should have?

I am writing a program that checks the number of parameters of a methods and prints out a warning message (its a codesmell program) If there are more than what the standard is, the problem is that I don't know what the agreed number is. I have looked around and not had any luck. Can anyone tell me or at least point me in the right direction?
There is no standard limit on the number of parameters you can specify in Java, but according to "Code Complete" (see this post) you should limit the amount of parameters to about 7, any more and it will have a negative effect on the readability of your code.
This really has nothing to do with Java specifically. And you should definitely make it configurable, because there are quite different views on this.
In "Clean Code", Robert Martin argues that the ideal number of method parameters is 0, 1 is OK, 2 needs strong justification, and 3 or more requires special dispensation from the pope.
Most people will consider this way too strict and wouldn't blink twice at a method with 3 parameters. You can probably get broad agreement that 6 parameters is too many.
In Java you can't define more than 255 pararmeters for a method. This is the restriction.
For and advise, Uncle Bob says -Clean Code- max parameter count should be three.
Too many parameters, parameter xxxxxxx is exceeding the limit of 255 words eligible for method parameters
Checkstyle is a popular tool to check java coding standard.
Here is the link the the ParameterNumber rule: ParameterNumber
My honest opinion is there is no defined limit to the number of parameters. My personal preference is not to have more than 3 or at least 4 since this can affect readability and mental mapping (difficult to remember more than 4 parameters). You can also have a quick peep at Uncle Bob's Clean Code and Steve McConnell's Code Complete regarding this.
There is a similar thread in StackOverflow see When a method has too many parameters?
There really is not a standard number of parameters.
You can use any number of arguments in a function in java. There is no standard limit to have this number of argument in function in java.[As per I know] IMO as a practice you should not have more than 4 arguments for a function but this is not the standard you can have any number of arguments.
There's no hard limit, but I'd say more than five is a code smell in a language that has no keyword arguments (such as Java).

How can I code Java to allow SSE use and bounds-check elimination (or other advanced optimizations)?

The Situation:
I'm optimizing a pure-java implementation of the LZF compression algorithm, which involves a lot of byte[] access and basic int mathematics for hashing and comparison. Performance really matters, because the goal of the compression is to reduce I/O requirements. I am not posting code because it isn't cleaned up yet, and may be restructured heavily.
The Questions:
How can I write my code to allow it to JIT-compile to a form using faster SSE operations?
How can I structure it so the compiler can easily eliminate array bounds checks?
Are there any broad references about the relative speed of specific math operations (how many increments/decrements does it take to equal a normal add/subtract, how fast is shift-or vs. an array access)?
How can I work on optimizing branching -- is it better to have numerous conditional statements with short bodies, or a few long ones, or short ones with nested conditions?
With current 1.6 JVM, how many elements must be copied before System.arraycopy beats a copying loop?
What I've already done:
Before I get attacked for premature optimization: the basic algorithm is already excellent, but the Java implementation is less than 2/3 the speed of equivalent C. I've already replaced copying loops with System.arraycopy, worked on optimizing loops and eliminated un-needed operations.
I make heavy use of bit twiddling and packing bytes into ints for performance, as well as shifting & masking.
For legal reasons, I can't look at implementations in similar libraries, and existing libraries have too restrictive license terms to use.
Requirements for a GOOD (accepted) answer:
Unacceptable answers: "this is faster" without an explanation of how much AND why, OR hasn't been tested with a JIT compiler.
Borderline answers: have not been tested with anything before Hotspot 1.4
Basic answers: will provide a general rule and explanation of why it is faster at the compiler level, and roughly how much faster
Good answers: include a couple of samples of code to demonstrate
Excellent answers: have benchmarks with both JRE 1.5 and 1.6
PERFECT answer: Is by someone who worked on the HotSpot compiler, and can fully explain or reference the conditions for an optimization to be used, and how much faster it typically is. Might include java code and sample assembly code generated by HotSpot.
Also: if anyone has links detailing the guts of Hotspot optimization and branching performance, those are welcome. I know enough about bytecode that a site analyzing performance at a bytecode rather than sourcecode level would be helpful.
(Edit) Partial Answer: Bounds-Check Ellimination:
This is taken from supplied link to the HotSpot internals wiki at: https://wikis.oracle.com/display/HotSpotInternals/RangeCheckElimination
HotSpot will eliminate bounds checks in all for loops with the following conditions:
Array is loop invariant (not reallocated within the loop)
Index variable has a constant stride (increases/decreases by constant amount, in only one spot if possible)
Array is indexed by a linear function of the variable.
Example: int val = array[index*2 + 5]
OR: int val = array[index+9]
NOT: int val = array[Math.min(var,index)+7]
Early version of code:
This is a sample version. Do not steal it, because it is an unreleased version of code for the H2 database project. The final version will be open source. This is an optimization upon the code here: H2 CompressLZF code
Logically, this is identical to the development version, but that one uses a for(...) loop to step through input, and an if/else loop for different logic between literal and backreference modes. It reduces array access and checks between modes.
public int compressNewer(final byte[] in, final int inLen, final byte[] out, int outPos){
int inPos = 0;
// initialize the hash table
if (cachedHashTable == null) {
cachedHashTable = new int[HASH_SIZE];
} else {
System.arraycopy(EMPTY, 0, cachedHashTable, 0, HASH_SIZE);
}
int[] hashTab = cachedHashTable;
// number of literals in current run
int literals = 0;
int future = first(in, inPos);
final int endPos = inLen-4;
// Loop through data until all of it has been compressed
while (inPos < endPos) {
future = (future << 8) | in[inPos+2] & 255;
// hash = next(hash,in,inPos);
int off = hash(future);
// ref = possible index of matching group in data
int ref = hashTab[off];
hashTab[off] = inPos;
off = inPos - ref - 1; //dropped for speed
// has match if bytes at ref match bytes in future, etc
// note: using ref++ rather than ref+1, ref+2, etc is about 15% faster
boolean hasMatch = (ref > 0 && off <= MAX_OFF && (in[ref++] == (byte) (future >> 16) && in[ref++] == (byte)(future >> 8) && in[ref] == (byte)future));
ref -=2; // ...EVEN when I have to recover it
// write out literals, if max literals reached, OR has a match
if ((hasMatch && literals != 0) || (literals == MAX_LITERAL)) {
out[outPos++] = (byte) (literals - 1);
System.arraycopy(in, inPos - literals, out, outPos, literals);
outPos += literals;
literals = 0;
}
//literal copying split because this improved performance by 5%
if (hasMatch) { // grow match as much as possible
int maxLen = inLen - inPos - 2;
maxLen = maxLen > MAX_REF ? MAX_REF : maxLen;
int len = 3;
// grow match length as possible...
while (len < maxLen && in[ref + len] == in[inPos + len]) {
len++;
}
len -= 2;
// short matches write length to first byte, longer write to 2nd too
if (len < 7) {
out[outPos++] = (byte) ((off >> 8) + (len << 5));
} else {
out[outPos++] = (byte) ((off >> 8) + (7 << 5));
out[outPos++] = (byte) (len - 7);
}
out[outPos++] = (byte) off;
inPos += len;
//OPTIMIZATION: don't store hashtable entry for last byte of match and next byte
// rebuild neighborhood for hashing, but don't store location for this 3-byte group
// improves compress performance by ~10% or more, sacrificing ~2% compression...
future = ((in[inPos+1] & 255) << 16) | ((in[inPos + 2] & 255) << 8) | (in[inPos + 3] & 255);
inPos += 2;
} else { //grow literals
literals++;
inPos++;
}
}
// write out remaining literals
literals += inLen-inPos;
inPos = inLen-literals;
if(literals >= MAX_LITERAL){
out[outPos++] = (byte)(MAX_LITERAL-1);
System.arraycopy(in, inPos, out, outPos, MAX_LITERAL);
outPos += MAX_LITERAL;
inPos += MAX_LITERAL;
literals -= MAX_LITERAL;
}
if (literals != 0) {
out[outPos++] = (byte) (literals - 1);
System.arraycopy(in, inPos, out, outPos, literals);
outPos += literals;
}
return outPos;
}
Final edit:
I've marked the best answer so far as accepted, since the deadline is nearly up. Since I took so long before deciding to post code, I will continue to upvote and respond to comments where possible. Apologies if the code is messy: this represented code in development, not polished up for committing.
Not a full answer, I simply don't have time to do the detailed benchmarks your question needs but hopefully useful.
Know your enemy
You are targeting a combination of the JVM (in essence the JIT) and the underlying CPU/Memory subsystem. Thus "This is faster on JVM X" is not likely to be valid in all cases as you move into more aggressive optimisations.
If your target market/application will largely run on a particular architecture you should consider investing in tools specific to it.
* If your performance on x86 is the critical factor then intel's VTune is excellent for drilling down into the sort of jit output analysis you describe.
* The differences between 64 bit and 32 bit JITs can be considerable, especially on x86 platforms where calling conventions can change and enregistering opportunities are very different.
Get the right tools
You would likely want to get a sampling profiler. The overhead of instrumentation (and the associated knock on on things like inlining, cache pollution and code size inflation) for your specific needs would be far too great. The intel VTune analyser can actually be used for Java though the integration is not so tight as others.
If you are using the sun JVM and are happy only knowing what the latest/greatest version is doing then the options available to investigate the output of the JIT are considerable if you know a bit of assembly.
This article details some interesting analysis using this functionality
Learn from other implementations
The Change history change history indicates that previous inline assembly was in fact counter productive and that allowing the compiler to take total control of the output (with tweaks in code rather than directives in assembly) yielded better results.
Some specifics
Since LZF is, in an efficient unmanaged implementation on modern desktop CPUS, largely memory bandwidth limited (hence it being compered to the speed of an unoptimised memcpy) you will need you code to remain entirely within level 1 cache.
As such any static fields you cannot make into constants should be placed within the same class as these values will often be placed within the same area of memory devoted to the vtables and meta data associated with classes.
Object allocations which cannot be trapped by Escape Analysis (only in 1.6 onwards) will need to be avoided.
The c code makes aggressive use of loop unrolling. However the performance of this on older (1.4 era) VM's is heavily dependant on the mode the JVM is in. Apparently latter sun jvm versions are more aggressive at inlining and unrolling, especially in server mode.
The prefetch instrctions generated by the JIT can make all the difference on code like this which is near memory bound.
"It's coming straight for us"
Your target is moving, and will continue to. Again Marc Lehmann's previous experience:
default HLOG size is now 15 (cpu caches have increased)
Even minor updates to the jvm can involve significant compiler changes
6544668 Don't vecorized array operations that can't be aligned at runtime.
6536652 Implement some superword (SIMD) optimizations
6531696 don't use immediate 16-bits value store to memory on Intel cpus
6468290 Divide and allocate out of eden on a per cpu basis
Captain Obvious
Measure, Measure, Measure. IF you can get your library to include (in a separate dll) a simple and easy to execute benchmark that logs the relevant information (vm version, cpu, OS, command line switches etc) and makes this simple to send back to you you will increase your coverage, best of all you'll cover those people using it that care.
As far as bounds check elimination is concerned, I believe the new JDK will already include an improved algorithm that eliminates it, whenever it's possible. These are the two main papers on this subject:
V. Mikheev, S. Fedoseev, V. Sukharev, N. Lipsky. 2002
Effective Enhancement of Loop Versioning in Java. Link. This paper is from the guys at Excelsior, who implemented the technique in their Jet JVM.
Würthinger, Thomas, Christian Wimmer, and Hanspeter Mössenböck. 2007. Array Bounds Check Elimination for the Java HotSpot Client Compiler. PPPJ. Link. Slightly based on the above paper, this is the implementation that I believe will be included in the next JDK. The achieved speedups are also presented.
There is also this blog entry, which discusses one of the papers superficially, and also presents some benchmarking results, not only for arrays but also for arithmetic in the new JDK. The comments of the blog entry are also very interesting, since the authors of the above papers present some very interesting comments and discuss arguments. Also, there are some pointers to other similar blog posts on this subject.
Hope it helps.
It's rather unlikely that you need to help the JIT compiler much with optimizing a straightforward number crunching algorithm like LZW. ShuggyCoUk mentioned this, but I think it deserves extra attention:
The cache-friendliness of your code will be a big factor.
You have to reduce the size of your woking set and improve data access locality as much as possible. You mention "packing bytes into ints for performance". This sounds like using ints to hold byte values in order to have them word-aligned. Don't do that! The increased data set size will outweigh any gains (I once converted some ECC number crunching code from int[] to byte[] and got a 2x speed-up).
On the off chance that you don't know this: if you need to treat some data as both bytes and ints, you don't have to shift and |-mask it - use ByteBuffer.asIntBuffer() and related methods.
With current 1.6 JVM, how many
elements must be copied before
System.arraycopy beats a copying loop?
Better do the benchmark yourself. When I did it way back when in Java 1.3 times, it was somewhere around 2000 elements.
Lots of answers so far, but couple of additional things:
Measure, measure, measure. As much as most Java developers warn against micro-benchmarking, make sure you compare performance between changes. Optimizations that do not result in measurable improvements are generally not worth keeping (of course, sometimes it's combination of things, and that gets trickier)
Tight loops matter as much with Java as with C (and ditto with variable allocations -- although you don't directly control it, HotSpot will eventually have to do it). I manage to practically double the speed of UTF-8 decoding by rearranging tight loop for handling single-byte case (7-bit ascii) as tight(er) inner loop, leaving other cases out.
Do not underestimate cost of allocating and/or clearing large arrays -- if you want lzf encoding/decoding to be faster for small/medium chunks too (not just megabyte sized), keep in mind that ALL allocations of byte[]/int[] are somewhat costly; not because of GC, but because JVM MUST clear the space.
H2 implementation has also been optimized quite a bit (for example: it does not clear the hash array any more, this often makes sense); and I actually helped modify it for use in another Java project. My contribution was mostly just changing it do be more optimal for non-streaming case, but that did not touch the tight encode/decode loops.

Categories

Resources