How to find memory used by a function in java

How to find memory used by a function in java - java

I am calling a function from my code (written in java) and I want to know how much memory that function is using, and do keep in mind that I cannot add any code to the function(which I am calling).
for eg-
//my code starts
.
.
.
.
myfunc();
//print memory used by myfunc() here
.
.
// my code ends
How to do this?

What you're trying to do is basically pointless. There is no such thing as memory used by a function. Your idea of comparing the total memory usage "before" and "after" function call does not have any sense: the function may change global state (which may decrease or increase total memory usage, and in some cases (e.g. cache filling) you probably won't want to consider the increase to count as "memory used by a function), or the garbage collector may run while you're inside of myfunc and decrease the total used memory.
The good question often contains the large part of an answer. What you should do is to correctly ask the question.

I was successful with this code (it's not guaranteed to work, but try it and there's a good chance it'll give you what you need):
final Runtime rt = Runtime.getRuntime();
for (int i = 0; i < 3; i++) rt.gc();
final long startSize = rt.totalMemory()-rt.freeMemory();
myFunc();
for (int i = 0; i < 3; i++) rt.gc();
System.out.println("Used memory increased by " +
rt.totalMemory()-rt.freeMemory()-startSize);

But its wrong, because ideally it should always be zero after subraction.
Actually its right because normally each thread has a TLAB (Thread Local Allocation Buffer) which means it allocates in blocks (so each thread can allocate concurrently)
Turn this off with -XX:-UseTLAB and you will see ever byte allocated.
You should run this multiple times because other things could be running when you use this function.

Related

Confused on result of the implementation for testing object memory consumption in java

I use the below implemenatation to test the memory consumption of the java objects.
But it prints usage as 104936 B when the i limit is from 384 to 1694 in the calling method's for loop.
It prints usage as 0B for when the above i limit is less than 384.
Why is this?
IDE : eclipse kepler
java : 1.5 (this acts differently when the
java version is also changed)
OS : Ubuntu 12.10
public static void main(String[] args) {
Runtime runtime = Runtime.getRuntime();
long totalStart = runtime.totalMemory();
long start = runtime.freeMemory();
SampleTester sampleTester = new SampleTester();
sampleTester.callingMethod();
long totalEnd = runtime.totalMemory();
long end = runtime.freeMemory();
System.out.println("Usage [(("+totalEnd+"-"+end+") - ("+totalStart+"-"+start+"))] \t: " + ((totalEnd-end) - (totalStart-start)));
}
private void callingMethod(){
for(int i = 0; i < 1694; i++){
ArrayList<String> arrayList = new ArrayList<String>();
}
}

Each ArrayList you create is immediately eligible for GC when the current loop iteration ends.
So theoretically the JVM could allocate each ArrayList in the space formerly occupied by the one from the last iteration and only need one "slot", which gets discarded when the last iteration ends and the method exists.
That explains the "0" value (this or a similar optimization).
In practice, the JVM doesn't always do that, but will leave eligible object uncollected for some time in order to be more performant (because doing GC too often can be just as bad for performance).
When and how the JVM decides which optimization to apply is entirely implementation-defined and knowing the very details is rarely useful information (because you can't depend on it and it can change with the next version).
So if you want to find out how much memory n ArrayList object need, make sure that you each one of those is actually reachable and as such not eligible for garbage collection when you check the current used memory.

Declaring multiple arrays with 64 elements 1000 times faster than declaring array of 65 elements

Recently I noticed declaring an array containing 64 elements is a lot faster (>1000 fold) than declaring the same type of array with 65 elements.
Here is the code I used to test this:
public class Tests{
public static void main(String args[]){
double start = System.nanoTime();
int job = 100000000;//100 million
for(int i = 0; i < job; i++){
double[] test = new double[64];
}
double end = System.nanoTime();
System.out.println("Total runtime = " + (end-start)/1000000 + " ms");
}
}
This runs in approximately 6 ms, if I replace new double[64] with new double[65] it takes approximately 7 seconds. This problem becomes exponentially more severe if the job is spread across more and more threads, which is where my problem originates from.
This problem also occurs with different types of arrays such as int[65] or String[65].
This problem does not occur with large strings: String test = "many characters";, but does start occurring when this is changed into String test = i + "";
I was wondering why this is the case and if it is possible to circumvent this problem.

You are observing a behavior that is caused by the optimizations done by the JIT compiler of your Java VM. This behavior is reproducible triggered with scalar arrays up to 64 elements, and is not triggered with arrays larger than 64.
Before going into details, let's take a closer look at the body of the loop:
double[] test = new double[64];
The body has no effect (observable behavior). That means it makes no difference outside of the program execution whether this statement is executed or not. The same is true for the whole loop. So it might happen, that the code optimizer translates the loop to something (or nothing) with the same functional and different timing behavior.
For benchmarks you should at least adhere to the following two guidelines. If you had done so, the difference would have been significantly smaller.
Warm-up the JIT compiler (and optimizer) by executing the benchmark several times.
Use the result of every expression and print it at the end of the benchmark.
Now let's go into details. Not surprisingly there is an optimization that is triggered for scalar arrays not larger than 64 elements. The optimization is part of the Escape analysis. It puts small objects and small arrays onto the stack instead of allocating them on the heap - or even better optimize them away entirely. You can find some information about it in the following article by Brian Goetz written in 2005:
Urban performance legends, revisited: Allocation is faster than you think, and getting faster
The optimization can be disabled with the command line option -XX:-DoEscapeAnalysis. The magic value 64 for scalar arrays can also be changed on the command line. If you execute your program as follows, there will be no difference between arrays with 64 and 65 elements:
java -XX:EliminateAllocationArraySizeLimit=65 Tests
Having said that, I strongly discourage using such command line options. I doubt that it makes a huge difference in a realistic application. I would only use it, if I would be absolutely convinced of the necessity - and not based on the results of some pseudo benchmarks.

There are any number of ways that there can be a difference, based on the size of an object.
As nosid stated, the JITC may be (most likely is) allocating small "local" objects on the stack, and the size cutoff for "small" arrays may be at 64 elements.
Allocating on the stack is significantly faster than allocating in heap, and, more to the point, stack does not need to be garbage collected, so GC overhead is greatly reduced. (And for this test case GC overhead is likely 80-90% of the total execution time.)
Further, once the value is stack-allocated the JITC can perform "dead code elimination", determine that the result of the new is never used anywhere, and, after assuring there are no side-effects that would be lost, eliminate the entire new operation, and then the (now empty) loop itself.
Even if the JITC does not do stack allocation, it's entirely possible for objects smaller than a certain size to be allocated in a heap differently (eg, from a different "space") than larger objects. (Normally this would not produce quite so dramatic timing differences, though.)

Does OutputStream.write(buf, offset, size) have memory leak on Linux?

I write a piece of java code to create 500K small files (average 40K each) on CentOS. The original code is like this:
package MyTest;
import java.io.*;
public class SimpleWriter {
public static void main(String[] args) {
String dir = args[0];
int fileCount = Integer.parseInt(args[1]);
String content="##$% SDBSDGSDF ASGSDFFSAGDHFSDSAWE^#$^HNFSGQW%##&$%^J#%##^$#UHRGSDSDNDFE$T##$UERDFASGWQR!#%!#^$##YEGEQW%!#%!!GSDHWET!^";
StringBuilder sb = new StringBuilder();
int count = 40 * 1024 / content.length();
int remainder = (40 * 1024) % content.length();
for (int i=0; i < count; i++)
{
sb.append(content);
}
if (remainder > 0)
{
sb.append(content.substring(0, remainder));
}
byte[] buf = sb.toString().getBytes();
for (int j=0; j < fileCount; j++)
{
String path = String.format("%s%sTestFile_%d.txt", dir, File.separator, j);
try{
BufferedOutputStream fs = new BufferedOutputStream(new FileOutputStream(path));
fs.write(buf);
fs.close();
}
catch(FileNotFoundException fe)
{
System.out.printf("Hit filenot found exception %s", fe.getMessage());
}
catch(IOException ie)
{
System.out.printf("Hit IO exception %s", ie.getMessage());
}
}
}
}
You can run this by issue following command:
java -jar SimpleWriter.jar my_test_dir 500000
I thought this is a simple code, but then I realize that this code is using up to 14G of memory. I know that because when I use free -m to check the memory, the free memory kept dropping, until my 15G memory VM only had 70 MB free memory left. I compiled this using Eclipse, and I compile this against JDK 1.6 and then JDK1.7. The result is the same. The funny thing is that, if I comment out fs.write(), just open and close the stream, the memory stabilized at certain point. Once I put fs.write() back, the memory allocation just go wild. 500K 40KB files is about 20G. It seems Java's stream writer never deallocate its buffer during the operation.
I once thought java GC does not have time to clean. But this make no sense since I closed the file stream for every file. I even transfer my code into C#, and running under windows, the same code producing 500K 40KB files with memory stable at certain point, not taking 14G as under CentOS. At least C#'s behavior is what I expected, but I could not believe Java perform this way. I asked my colleague who were experienced in java. They could not see anything wrong in code, but could not explain why this happened. And they admit nobody had tried to create 500K file in a loop without stop.
I also searched online and everybody says that the only thing need to pay attention to, is close the stream, which I did.
Can anyone help me to figure out what's wrong?
Can anybody also try this and tell me what you see?
BTW, some people in this community tried the code on Windows and it seemed to worked fine. I didn't tried it on windows. I only tried in Linux as I thought that where people use Java for. So, it seems this issue happened on Linux).
I also did the following to limit the JVM heap, but it take no effects
java -Xmx2048m -jar SimpleWriter.jar my_test_dir 500000

I tried to test your prog on Win XP, JDK 1.7.25. Immediately got OutOfMemoryExceptions.
While debugging, with only 3000 count (args[1]), the count variable from this code:
int count = 40 * 1024 * 1024 / content.length();
int remainder = (40 * 1024 * 1024) % content.length();
for (int i = 0; i < count; i++) {
sb.append(content);
}
count is 355449. So the String you are trying to create will be 355449 * contents long, or as you calculated, 40Mb long. I was out of memory when i was 266587, and sb was 31457266 chars long. At which point each file I get is 30Mb.
The problem does not seem with memory or GC, but with the way you crate the string.
Did you see files created or was memory eating up before any file was created?
I think your main problem is the line:
int count = 40 * 1024 * 1024 / content.length();
should be:
int count = 40 * 1024 / content.length();
to create 40K, not 40Mb files.

[Edit2: The original answer is left in italics at the end of this post]
After your clarifications in the comments, I have run your code on a windows machine (Java 1.6) and here is my findings (numbers are from VisualVM, OS memory as seen from task manager):
Example with 40K size, writing to 500K files (no parameters to JVM):
Used Heap: ~4M, Total Heap: 16M, OS memory: ~16M
Example with 40M size, writing to 500 files (parameters to JVM -Xms128m -Xmx512m. Without parameters I get an OutOfMemory error when creating StringBuilder):
Used Heap: ~265M, Heap size: ~365M, OS memory: ~365M
Especially from the second example you can see that my original explanation still stands. Yes someone would expect that most of the memory would be freed since the byte[] of the BufferedOutputStream reside in the first generation space (short lived objects) but this a) does not happen immediately and b) when GC decides to kicks in (it actually does in my case), yes it will try to clear memory but it can clear as much memory as it sees fit, not necessarily all of it. GC does not provide any guarentees that you can count upon.
So generally speaking you should give to JVM as much memory you feel comfortable with. If you need to keep the memory low for special functionalities you should try a strategy as the code example I gave down below in my original answer i.e. just don't create all those byte[] objects.
Now in your case with CentOS, it does seem that JVM's behaves strangely. Perhaps we could talk about a buggy or bad implementation. To classify it as a leak/bug though you should try to use -Xmx to restrict the heap. Also please try what Peter Lawrey suggested to not create the BufferedOutputStream at all (in the small file case) since you just write all the bytes at once.
If it still exceeds the memory limit then you have encountered a leak and should probably file a bug. (You could still complain though and they may optimize it in the future).
[Edit1: The answer below assumed that the OP's code performed as many reading operations as the write operations, so the memory usage was justifiable. The OP clarified this is not the case, so his question is not answered
"...my 15G memory VM..."
If you give the JVM as much memory why it should try to run GC? As far as the JVM is concerned it is allowed to get as much memory from the system and run GC only when it thinks that is appropriate to do so.
Each execution of BufferedOutputStream will allocate a buffer of 8K size by default. JVM will try to reclaim that memory only when it needs to. This is the expected behaviour.
Do not confuse the memory that you see as free from the system's point of view and from the JVM's point of view. As far the system is concerned the memory is allocated and will be released when the JVM shuts down. As far the JVM's is concerned all the byte[] arrays allocated from BufferedOutputStream are not in use any more, it is "free" memory and will be reclaimed if it needs to.
If for some reason you don't desire this behaviour you could try the following: Extend the BufferedOutputStream class (e.g. create a ReusableBufferedOutputStream class) and add a new method e.g. reUseWithStream(OutputStream os). This method would then clear the internal byte[], flush and close the previous stream, reset any variables used and set the new stream. Your code then would become as below:
// intialize once
ReusableBufferedOutputStream fs = new ReusableBufferedOutputStream();
for (int i=0; i < fileCount; i ++)
{
String path = String.format("%s%sTestFile_%d.txt", dir, File.separator, i);
//set the new stream to be buffered and read
fs.reUseWithStream(new FileOutputStream(path));
fs.write(this._buf, 0, this._buf.length); // this._buf was allocated once, 40K long contain text
}
fs.close(); // Close the stream after we are done
Using the above approach you will avoid creating many byte[]. However I don't see any problem with the expected behaviour neither you mention any problem other than "I see it takes too much memory". You have congifured it to use it after all.]

OutOfMemoryErrors even after using WeakReference's for keys and values

Below is a small test I've coded to educate myself on references API. I thought this would never throw OOME but it is throwing it. I am unable to figure out why. appreciate any help on this.
public static void main(String[] args)
{
Map<WeakReference<Long>, WeakReference<Double>> weak = new HashMap<WeakReference<Long>, WeakReference<Double>>(500000, 1.0f);
ReferenceQueue<Long> keyRefQ = new ReferenceQueue<Long>();
ReferenceQueue<Double> valueRefQ = new ReferenceQueue<Double>();
int totalClearedKeys = 0;
int totalClearedValues = 0;
for (long putCount = 0; putCount <= Long.MAX_VALUE; putCount += 100000)
{
weak(weak, keyRefQ, valueRefQ, 100000);
totalClearedKeys += poll(keyRefQ);
totalClearedValues += poll(valueRefQ);
System.out.println("Total PUTs so far = " + putCount);
System.out.println("Total KEYs CLEARED so far = " + totalClearedKeys);
System.out.println("Total VALUESs CLEARED so far = " + totalClearedValues);
}
}
public static void weak(Map<WeakReference<Long>, WeakReference<Double>> m, ReferenceQueue<Long> keyRefQ,
ReferenceQueue<Double> valueRefQ, long limit)
{
for (long i = 1; i <= limit; i++)
{
m.put(new WeakReference<Long>(new Long(i), keyRefQ), new WeakReference<Double>(new Double(i), valueRefQ));
long heapFreeSize = Runtime.getRuntime().freeMemory();
if (i % 100000 == 0)
{
System.out.println(i);
System.out.println(heapFreeSize / 131072 + "MB");
System.out.println();
}
}
}
private static int poll(ReferenceQueue<?> keyRefQ)
{
Reference<?> poll = keyRefQ.poll();
int i = 0;
while (poll != null)
{
//
poll.clear();
poll = keyRefQ.poll();
i++;
}
return i;
}
}
And below is the log when ran with 64MB of heap
Total PUTs so far = 0
Total KEYs CLEARED so far = 77982
Total VALUESs CLEARED so far = 77980
100000
24MB
Total PUTs so far = 100000
Total KEYs CLEARED so far = 134616
Total VALUESs CLEARED so far = 134614
100000
53MB
Total PUTs so far = 200000
Total KEYs CLEARED so far = 221489
Total VALUESs CLEARED so far = 221488
100000
157MB
Total PUTs so far = 300000
Total KEYs CLEARED so far = 366966
Total VALUESs CLEARED so far = 366966
100000
77MB
Total PUTs so far = 400000
Total KEYs CLEARED so far = 366968
Total VALUESs CLEARED so far = 366967
100000
129MB
Total PUTs so far = 500000
Total KEYs CLEARED so far = 533883
Total VALUESs CLEARED so far = 533881
100000
50MB
Total PUTs so far = 600000
Total KEYs CLEARED so far = 533886
Total VALUESs CLEARED so far = 533883
100000
6MB
Total PUTs so far = 700000
Total KEYs CLEARED so far = 775763
Total VALUESs CLEARED so far = 775762
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at Referencestest.weak(Referencestest.java:38)
at Referencestest.main(Referencestest.java:21)

from http://weblogs.java.net/blog/2006/05/04/understanding-weak-references
I think your use of HashMap is likely to be the issue. You might want to use WeakHashMap
To solve the "widget serial number"
problem above, the easiest thing to do
is use the built-in WeakHashMap class.
WeakHashMap works exactly like
HashMap, except that the keys (not the
values!) are referred to using weak
references. If a WeakHashMap key
becomes garbage, its entry is removed
automatically. This avoids the
pitfalls I described and requires no
changes other than the switch from
HashMap to a WeakHashMap. If you're
following the standard convention of
referring to your maps via the Map
interface, no other code needs to even
be aware of the change.

The heart of the problem is probably that you're filling your heap with WeakReference-objects, the weak references are cleared when you're getting low on memory, but the reference objects themselves are not, so your hashmap is filling up with boat-load if WeakReference objects (not to mention the object array the hashmap uses, which will grow indefinitely), all pointing to null.
The solution, as already pointed out, is a weak hashmap, which will clear out those objects if they're no longer in use (this is done during put).
EDIT:
As Kevin pointed out, you already have your reference-queue logic worked out (I didn't pay close enough attention), a solution using your code is to just clear it out of the map at the point where the key has been collected. This is exactly how weak hash map works (where the poll is simply triggered on insert).

Even when your weak references let go of the things they are referencing, they still do not get recycled themselves.
So eventually your hash will fill up with references to nothing and crash.
What you would need (if you wanted to do it this way) would be to have an event triggered by object deletion that went in and removed the reference from the hash. (which would cause threading issues you need to be aware of as well)

I'm not a java expert at all, but I know in .NET when doing a lot of large object memory allocation you can get heap fragmentation to the point where only small pieces of contiguous memory are available for allocation even though much more memory appears as "free".
A quick google search on "java heap fragmentation" brings up some seemingly relevant result although I haven't taken a good look at them.

Other's have correctly pointed out what the problem is; e.g. #roe, #Bill K.
But another way to solve this kind problem (apart from scratching your head, asking on SO, etc), is to look and see how the Sun recommended approach works. In this case, you can find it in the source code for the WeakHashMap class.
There are a few ways to find Java source code:
If you have a decent Java IDE to running, it should be able to show you the source code of any class in the class library.
Most J2SE JDK downloads include source JAR files for (at least) the public API classes.
You can specifically download full source distributions for the OpenJDK-based releases of Java.
But the ZERO EFFORT approach is to do a Google search, using the fully qualified name of the class with ".java.html" tacked on the end. For example, searching for "java.util.WeakHashMap.java.html" gives this link in the first page of search results.
And the source will tell you that the standard WeakHashMap implementation explicitly polls its reference queue to expunge stale (i.e. broken) weak references from the map's key set. In fact, it does this every time you access or update the map, or even just ask for its size.

An other problem might be that Java for some reason don't always activate its garbadge collecter when running out of memmory, so you might need to insert explicit calls to activate the collector. Try something like
if( (putCount%1000)===0)
Runtime.getRuntime().gc();
in your loop.
Edit: It seems that the new java implementations from sun now does call the garbadge collector before throwing OutOfMemmoryException, but I am pretty sure that the following program would throw OutOfMemmoryException with jre1.3 or 1.4
public class Test {
public static void main(String args[]) {
while(true) {
byte []data=new byte[1000000];
}
}
}

C++ and Java performance

this question is just speculative.
I have the following implementation in C++:
using namespace std;
void testvector(int x)
{
vector<string> v;
char aux[20];
int a = x * 2000;
int z = a + 2000;
string s("X-");
for (int i = a; i < z; i++)
{
sprintf(aux, "%d", i);
v.push_back(s + aux);
}
}
int main()
{
for (int i = 0; i < 10000; i++)
{
if (i % 1000 == 0) cout << i << endl;
testvector(i);
}
}
In my box, this program gets executed in approx. 12 seconds; amazingly, I have a similar implementation in Java [using String and ArrayList] and it runs lot faster than my C++ application (approx. 2 seconds).
I know the Java HotSpot performs a lot of optimizations when translating to native, but I think if such performance can be done in Java, it could be implemented in C++ too...
So, what do you think that should be modified in the program above or, I dunno, in the libraries used or in the memory allocator to reach similar performances in this stuff? (writing actual code of these things can be very long, so, discussing about it would be great)...
Thank you.

You have to be careful with performance tests because it's very easy to deceive yourself or not compare like with like.
However, I've seen similar results comparing C# with C++, and there are a number of well-known blog posts about the astonishment of native coders when confronted with this kind of evidence. Basically a good modern generational compacting GC is very much more optimised for lots of small allocations.
In C++'s default allocator, every block is treated the same, and so are averagely expensive to allocate and free. In a generational GC, all blocks are very, very cheap to allocate (nearly as cheap as stack allocation) and if they turn out to be short-lived then they are also very cheap to clean up.
This is why the "fast performance" of C++ compared with more modern languages is - for the most part - mythical. You have to hand tune your C++ program out of all recognition before it can compete with the performance of an equivalent naively written C# or Java program.

All your program does is print the numbers 0..9000 in steps of 1000. The calls to testvector() do nothing and can be eliminated. I suspect that your JVM notices this, and is essentially optimising the whole function away.
You can achieve a similar effect in your C++ version by just commenting out the call to testvector()!

Well, this is a pretty useless test that only measures allocation of small objects.
That said, simple changes made me get the running time down from about 15 secs to about 4 secs. New version:
typedef vector<string, boost::pool_allocator<string> > str_vector;
void testvector(int x, str_vector::iterator it, str_vector::iterator end)
{
char aux[25] = "X-";
int a = x * 2000;
for (; it != end; ++a)
{
sprintf(aux+2, "%d", a);
*it++ = aux;
}
}
int main(int argc, char** argv)
{
str_vector v(2000);
for (int i = 0; i < 10000; i++)
{
if (i % 1000 == 0) cout << i << endl;
testvector(i, v.begin(), v.begin()+2000);
}
return 0;
}
real 0m4.089s
user 0m3.686s
sys 0m0.000s
Java version has the times:
real 0m2.923s
user 0m2.490s
sys 0m0.063s
(This is my direct java port of your original program, except it passes the ArrayList as a parameter to cut down on useless allocations).
So, to sum up, small allocations are faster on java, and memory management is a bit more hassle in C++. But we knew that already :)

Hotspot optimises hot spots in code. Typically, anything that gets executed 10000 times it tries to optimise.
For this code, after 5 iterations it will try and optimise the inner loop adding the strings to the vector. The optimisation it will do more than likely will include escape analyi o the variables in the method. A the vector is a local variable and never escapes local context, it is very likely that it will remove all of the code in the method and turn it into a no op. To test this, try returning the results from the method. Even then, be careful to do something meaningful with the result - just getting it's length for example can be optimised as horpsot can see the result is alway the same a s the number of iterations in the loop.
All of this points to the key benefit of a dynamic compiler like hotspot - using runtime analysis you can optimise what is actually being done at runtime and get rid of redundant code. After all, it doesn't matter how efficient your custom C++ memory allocator is - not executing any code is always going to be faster.

In my box, this program gets executed in approx. 12 seconds; amazingly, I have a similar implementation in Java [using String and ArrayList] and it runs lot faster than my C++ application (approx. 2 seconds).
I cannot reproduce that result.
To account for the optimization mentioned by Alex, I’ve modified the codes so that both the Java and the C++ code printed the last result of the v vector at the end of the testvector method.
Now, the C++ code (compiled with -O3) runs about as fast as yours (12 sec). The Java code (straightforward, uses ArrayList instead of Vector although I doubt that this would impact the performance, thanks to escape analysis) takes about twice that time.
I did not do a lot of testing so this result is by no means significant. It just shows how easy it is to get these tests completely wrong, and how little single tests can say about real performance.
Just for the record, the tests were run on the following configuration:
$ uname -ms
Darwin i386
$ java -version
java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b03-226)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-92, mixed mode)
$ g++ --version
i686-apple-darwin9-g++-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5490)

It should help if you use Vector::reserve to reserve space for z elements in v before the loop (however the same thing should also speed up the java equivalent of this code).

To suggest why the performance both C++ and java differ it would essential to see source for both, I can see a number of performance issues in the C++, for some it would be useful to see if you were doing the same in the java (e.g. flushing the output stream via std::endl, do you call System.out.flush() or just append a '\n', if the later then you've just given the java a distinct advantage)?

What are you actually trying to measure here? Putting ints into a vector?
You can start by pre-allocating space into the vector with the know size of the vector:
instead of:
void testvector(int x)
{
vector<string> v;
int a = x * 2000;
int z = a + 2000;
string s("X-");
for (int i = a; i < z; i++)
v.push_back(i);
}
try:
void testvector(int x)
{
int a = x * 2000;
int z = a + 2000;
string s("X-");
vector<string> v(z);
for (int i = a; i < z; i++)
v.push_back(i);
}

In your inner loop, you are pushing ints into a string vector. If you just single-step that at the machine-code level, I'll bet you find that a lot of that time goes into allocating and formatting the strings, and then some time goes into the pushback (not to mention deallocation when you release the vector).
This could easily vary between run-time-library implementations, based on the developer's sense of what people would reasonably want to do.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.