I am developing a relatively fast paced game (Flash/Apache Mina Server back end) and I am having some difficulty getting an accurate benchmark of the type of bandwidth my current setup would use.
My question is: How do I get an accurate benchmark of the bandwidth required for my tests? What I am doing now wouldn't take into account any overhead?
On the message sent/received methods I am doing
[out/in]Bandwidth+= message.toString().getBytes().length;
I then print out the current values every 250 milliseconds (since that is how frequently "world" updates are done currently) .
With 10 "monsters" all randomly moving around and 1 player randomly moving around I am getting this output.. (1 second window here)
In bandwidth: 1647, Outgoing: 35378
In bandwidth: 1658, Outgoing: 35585
In bandwidth: 1669, Outgoing: 35792
In bandwidth: 1680, Outgoing: 35999
So acting strictly on the size of the messages (outgoing) being passed that works out to about 621 bytes/second or (621/10) 62.1 bytes per second per constantly moving item on screen per person. This seems a little low, a good high speed connection could handle 1000+ object updates per second at this "rate" no problem.
Something definitely smells fishy here. According to the performance testing provided by them: here mina is capable of 20K+ 405 byte requests per second on ~10 connections - way more than what you're seeing.
My guess is that there is some kind of theading\timing issue going on here that is causing the delay. I would enlist the help of a packet tracing application such as wireshark and see if your observations in code mesh with the raw network data. I would also try "flooding" the server side with more data if possible - this might provide some insight to where the issue lies.
I hope this helps, good luck.
Related
we have a legacy java multithreaded process on a RHEL 6.5 which is very time critical (low latency), and it processes hundreds of thousands of message a day. It runs in a powerful Linux machine with 40cpus. What we found is the process has a high latency when it process the first 50k messages with average of 10ms / msg , and after this 'warmup' time, the latency starts to drop and became about 7ms, then 5ms and eventually stops at about 3-4ms / second at day end.
this puzzles me , and one of possibility that i can think of is maps are being resized at the beginning till it reaches a very big capacity - and it just doesn't exceed the load factor anymore. From what I see, the maps are not initialized with initial capacity - so that is why i say that may be the case. I tried to put it thru profiler and pump millions of messages inside, hoping to see some 'resize' method from the java collections, but i was unable to find any of them. It could be i am searching for wrong things, or looking into wrong direction. As a new joiner and existing team member left, i am trying to see if there are other reasons that i haven't thought of.
Another possibility that i can think of is kernel settings related, but i am unsure what it could be.
I don't think it is a programming logic issue, because it runs in acceptable speed after the first 30k-50k of messages.
Any suggestion?
It sounds like it takes some time for the operating system to realize that your application is a big resource consumer. So after few seconds it sees that there is a lot of activity with files of your application, and only then the operating system deals with the activity by populating the cache and action like this.
I have a Java SOAP web service initially designed in Axis 1 which isn't meeting my performance requirements.
The request I'm most concerned about is one used to add lots (millions of rows) of data to the database. On the client side, I'll just be looping through files, pushing this data up to my web service. Each row has three elements, so the request looks something like:
<SOAP Envelope/Header/Body>
<AddData>
<Data>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<Age>42</Age>
</Data>
</AddData>
</SOAP Envelope/Body>
I'm finding the following performance trends:
When I do one row per request, I can get around 720 rows per minute.
When I encapsulate multiple rows into a single request, I can get up to 2,400 rows per minute (100 rows per request).
Unfortunately, that performance isn't going to meet our requirements, as we have hundreds millions of rows to insert (at 2,500 rows per minute, it would take about 2 months to load all the data in).
So I've been looking into the application to see where our bottleneck is. Each request of 100 rows is taking about 2.5 seconds (I've tried a few different servers and get similar results). I've found the following:
Client-side overhead is negligible (from monitoring the performance of my own client and using SOAP UI)
The database activity only accounts for about 10% (.2s) of the total time, so Hibernate caching, etc. won't help out much.
The network overhead is negligible (<1ms ping time from client to server, getting >10MB/s throughput with each request sending <20KB).
So this leaves some 2 seconds unaccounted for. The only other piece of this puzzle that I can point a finger at is the overhead of deserializing the incoming requests on the server side. I noticed that Axis 2 claims speed improvements in this area, so I ported this function over the an Axis 2 web service but didn't get the speedup I was looking for (the overall time per request improved by about 10%).
Am I underestimating the amount of time needed to deserialize 100 of the elements described above? I can't imagine that that deserialization could possibly take ~2 seconds.
What can I do to optimize the performance of this web application and cut down on that 2 second overhead?
Thanks in advance!
========= The next day.... ===========
The plot thickens...
At the recommendation of #millhouse, I investigated single row requests one a production server a bit more. I found that they could be suitably quick on good hardware. So I tried adding 1,000 rows using increments ranging from 1 (1,000 separate requests) to 1,000 (a single request).
1 row / Request - 14.5 seconds
3/req - 5.8s
5/req - 4.5s
6/req - 4.2s
7/req - 287s
25/req - 83s
100/req - 22.4s
1000/req - 4.4s
As you can see, the extra 2 second lag kicks in ay 7 rows per request (approximately 2 extra seconds per request when compared to 6 rows per request). I can reproduce this consistently. Larger numbers of requests all had similar overhead, but that became less noticeable when inserting 1,000 rows per request. Database time grew linearly and was still fairly negligible compared to the overall request time.
So I've found that I get best performance using either 6 rows per request, or thousands of rows per request.
Is there any reason why 7 would have such lower performance than 6 rows per request? The machine has 8 cores, and we have 10 connections in the session pool (i.e. I have no idea where the threshold of 6 is coming from).
I used Axis2 for a similar job about 5 years ago, but I'm afraid I can't offer any real "magic bullet" that will make it better. I recall our service performing at hundreds-per-second not seconds-per-hundred though.
I'd recommend either profiling your request-handling, or simply adding copious amounts of logging (possibly using one of the many stopwatch implementations around to give detailed timings) and seeing what's using the time. Does a request really take 2 seconds to get through the Axis layer to your code, or is it just accumulating through lots of smaller things?
If the processing for a single request in isolation is fast, but things get bogged down once you start loading the service up, investigate your app server's thread settings. I seem to recall having to break my processing into synchronous and asynchronous parts (i.e. the synchronous part doing the bare minimum to give a suitable response back to the client, and heavy-lifting being done in a thread from a pool), but that might not be appropriate for your situation.
Also make sure that construction of a new User object (or whatever it is) doesn't do anything too expensive (like grabbing a new ID, from a service, which wraps a DAO, which hits a slow database server, which runs a badly-written stored-procedure, which locks an entire table ;-) )
I just ran the zeroMQ hello world example and timed the request-response latency. It averaged about 0.1ms running using the IPC protocol. This sounds quite slow to me....Does this sound about right?
long start=System.nanoTime();
socket.send(request, 0);
// Get the reply.
byte[] reply = socket.recv(0);
System.out.println((System.nanoTime()-start)/1000000.0);
I assume your average had a sample of more than one? I would run the test for at least 2-10 seconds before taking an average. The average latency in the same process/thread may be misleading.
I would create a second process which echo everything it gets if you are not doing this already. (And divide the latency in two unless you want the RTT latency)
Plain Sockets can get a RTT latency of 20 micro-seconds on a typical multi-core box and I would expect IPC to be faster. On a fast PC you can get a typical RTT latency of 9 micro-second using sockets.
If you want latency much lower than this, I would consider doing everything in one process or one thread if you can, in which case the cost of a method call is around 10 ns (if its not inlined ;)
I'm writing a Java server (java.net.Socket, java.net.ServerSocket, java.io.ObjectOutputStream, java.io.ObjectInputStream) and I know I'm going to have limited bandwidth allocated for it.
I've written a decorator object for my output and input streams so I can count how many bytes go through it for profiling purposes. But this won't give me any indication of the amount of overhead I'm using for the connection.
I don't anticipate it will be much, but I'd like to prepare for it. I'm not going try to optimize it, I just want to know how much it will be for logistical reasons (how much bandwidth must I request, etc.)
I can't be the first person to try to get this information, but I can't seem to find good resources on the overhead of Java Sockets and TCP/IP in general. (Perhaps that's because there's nothing noteworthy to find... If we're on the order of kb per minute, it's really not much of a concern, but I'd still like to know!)
Thanks!
This question is challenging to answer with the information we have right now... for instance, what are you calling 'overhead'? Is it only TCP ACK packets, or all packet overhead (for instance ethernet, IP and tcp headers) for anything other than your data payload?
How many connections per minute? What is the average data transfer, per connection? If there are many very short-lived connections, your overhead requirements go up (due to 3-way handshake, and connection close requirements)... you could also have high overhead if the clients don't read much data, but many clients keep the connections open for days at a time.
Honestly, you're 50x better off modeling this in a lab and making some assumptions about hit rate per minute and concurrent clients... that will give you some ballpark numbers. Play around with limiting the bandwidth afforded to the application to the maximum your budget would allow... then start backing off... you can throttle bandwidth by using wanem on a dual-port linux machine.
Getting lab results like this is far better than theoretical calculations.
HTH,
\mike (who spends all day testing network gear)
TCP overhead varies based on a number of factors, but is typically around 5% at full capacity.
Basically each "packet" has 20 bytes of IP header (and 20 more if IPv6) plus 20-32 bytes of TCP header. Packet sizes vary based on the network devices and conditions, but are often in the neighborhood of 1500 bytes.
This page has some detail: http://sd.wareonearth.com/~phil/net/overhead/
In my opinion you can completely ignore keep-alives, as they are only used when the connection is idle anyway.
I need to be able to monitor the speed of my internal network using java. I was thinking I could use a two part system with a server and a client. I do not need need response time such as what is generated with ping but and actual speed in mbps for upload and download.
My idea would be to have the Server send a packet or series of packets to the client which then replies and then the Server would calculate the speed of the network between those two points. Does anyone have any idea how I could implement this?
Thank You ahead of time.
Hmm, an interesting problem. I hope you like reading... :-)
I'd be interested to know how the monitoring tool would be used. At
work, the sysadmins just have a couple of large screens in the room,
showing a webpage containing loads of network stats, with it constantly
updating.
The rest of my description assumes the network monitoring tool would be
used as described above. If you just want to be able to do an ad-hoc
test between two random hosts on your network, I'd just use rsync to
transfer a reasonably large file (about 1 - 2MB). I'm sure there are
other file transfer tools that calculate the transfer speed too.
When implementing this, (especially within a large network) you must
minimise the risk that the test floods the network, hampering the people
(or programs) actually using it. You don't want to be blamed for a
massive slowdown (or worse, an outage) just because you were conducting
a test. Your sysadmins won't thank you...
I'd architect the tool in the following way:
Bob is a server which participates in an individual 'test' by doing
the following:
Bob receives a request from a client. The request states how much data the client is about to send.
If the amount of data proposed to be sent is not too large, wait for the data. Otherwise Bob rejects the request immediately and ends the communication.
Once the required number of bytes has been received, reply with the amount of time it took to receive it all. Bob terminates the communication.
Alice is the component that displays the result of the measurements
taken (via a webpage or otherwise). Alice is a long lived process
(maybe a web server), configured to periodically connect to a list of
Bob servers. For each configured Bob:
Send Bob a request with the amount of data Alice is about to
send.
Send Bob the specified amount of data, as fast as possible.
Await the reply from Bob, and compute the network speed.
'Display' the result for this instance of Bob. You may choose
to display an aggregate result. For example, the average result for
each of the last 20 tests, to iron out any anomalies...
When conducting a given test, Alice should report any failures. E.g.
'a TCP connection could not be established with Bob', or 'Bob
prematurely terminated the transfer' or whatever else...
Scatter Bob servers to strategic locations in your (possibly large)
network, and configure Alice to go them. For each instance of Bob, you
should configure
The time interval in between tests.
The 'leeway' (I'll explain this in a bit).
The amount of data to send to Bob for each test.
Bob's address (duh).
You want to 'stagger' the tests that a given Alice will attempt. You
don't want Alice to trigger the test to all Bob servers at once, thereby
flooding your network, possibly giving skewed results and so forth.
Allow the test to occur at a randomised time in the future. For
example, if the test interval is every 10 minutes, configure a 'leeway'
of 1 minute, meaning the next test might occur anywhere between 9 and 11
minutes' time.
If there is to be more than one Alice running at a time, the total
number of instances should be small. The more Alices you have, the more
you interfere with the network. Again, you don't want to be responsible
for an outage.
The amount of data Alice should send in an individual test should be
small. 500KB? You probably want a given test to run for no more than
10 seconds. Maybe get Bob to timeout if the test takes too long.
I've deliberately omitted the transport to use (TCP, UDP, whatever)
because you'll get issues depending on the transport, and I don't know
how you want to handle those issues. For example, you'd have to
consider how to handle dropped datagrams with UDP. What result would
you compute? You don't get this issue with TCP, because it
automatically retransmits dropped packets. With TCP, your
throughput will be artificially low if the two endpoints
are far away from each other. Here's some
info on it.
If you had the patience to read this far, I hope it helped!
Rather than writing a server you might want to just use tomcat or apache to be the server, then you just have the client upload a file of a specific size, and measure the time, then turn around and download the file, to measure the download speed.
You could write your own server to do this, but you would be basically doing what has been done many times before, then you will need to ensure your server isn't skewing the numbers.