Performance measurement in distributed system (Java + C#) - java

I have a distributed system with a server that is written in Java and a Client that is written in C#. I want to do some performance measurements on that system, e.g. "How long does one roundtrip take?" or "What is the time between sending the request and calculating the response value?".
My first attempt was to print timestamps at specific situations on server and client and later calculate the differences between the two timestamps to get the duration.
BUT... when I ran the test, I noticed that sometimes the difference between two timestamps (A-B) was negative, even though timestamp B was created before timestamp A. (one of the timestamps was created on the server, the other one on the client) So I guess that the way the timestamps are created is not a exactly the same.
Does anybody have an idea how I can do that correctly?
I hope that I could explain my problem. If there are any questions, please let me know in the comments.
EDIT: further information about the setup
The C#-client is running on my local PC. The Java-server is running on a virtual machine, which itself is located on my PC.

Related

Get time diff between windows and linux machine using java or any other API

I have a code running on Windows and Linux machine.
Message will be sent from windows to Linux machine and we include sent_time on message.
Linux machine will receive the message and we stamp receive_time on message.
Both the time are calculated using standard java API System.currentTimeMillis().
Problem here is, receive_time is ahead of sent_time which is not right as message was sent first and then received. This mostly looks to be clock sync issue between Windows and Linux machine.
How can we get the clock diff between Windows and Linux machine? (Not sure how to run program on both the machines and claculate time diff which is in millis)
if there is any sync issue, how can we get and add delta time receive_time ?
Please note that all the libraries are same on both machines.
You are trying to find time deviations, maybe even synchronize clocks across the network. This adds complexity because neither of the two machines know how much lag is in the network passing the data.
The NTP protocol mitigates all this. I'd use an existing implementation. But if you do not want to synchronize the system clock but still understand how it is done, check out the protocol documentation: http://www.ntp.org/rfc.html

Significant runtime differences performing an PHP-Script from Android (WiFi/GPRS)

I'm quite sure that I'm overlooking something very fundamentally but I can't understand the following case:
On the one hand i got my Android-app which I may start with using WiFi or GRPS.
On the other hand i got a PHP-Script on an independent server which executes a relatively complex algorithm, but returns just a few bytes (about 80).
The Android-App connects to the server and triggers the PHP-Script by a normal URLConnection, fetches the InputStream and processes it.
Now, when I do that on WiFi, its quite fast. But if I just use GPRS/EDGE its about 10-20 times slower.
Thats what i do not understand... I would understand such a difference if the script does return alot of bytes which has to be transferred but these are just a few bytes.
I would have thought that the runtime of the PHP-Script on the server is fully independent from whos calling it and is providing the information constantly fast.
May somebody tell me where these performance differences might arise from?
greetz
rob
Most likely your script isn't running slower- check your server logs. What you're seeing is the high latency of cellular data plans. Basically, it takes a long time for requests to get from your phone to the tower. It isn't causing your php to run slower, its just taking a while for the data to transfer to your server and back.

How to improve the performance of a stock data transfer application?

This is a question which I have worked for several years, but now I still don't get a good solution.
My application has two part:
The first one is running in a server which is called "ROOT server". It will receive the realtime stock data from HKEx(Securities and futures exchange in Hong Kong), and broadcast them to 5 other children servers. It will append a timestamp to each data item when broadcasting.
The second ones are running in the "children" servers. They will receive the stock data from ROOT server, parse each of them, and get the important information. At last, they will send them in a new text format to the clients. The clients may be hundreds to thousands, they can register for some kind of stocks, and get the realtime information of them.
The performance is the most important thing. In the past several years, I tried all kinds of solutions I know to make it faster. The "faster" here means, the first one will receive and send the data to the children servers as fast as it can, and the children servers will receive and parse and send the data to the clients as fast as they can.
For now, when the data speed is 200K from HKEx and there are 5 children servers, the first one application will have 10ms latency for each data item in average. And the second one is not easy to test, it depends on the clients count.
What I'm using:
OpenSUSE 10
Sun Java 5.0
Mina 2.0
The server hardware:
4-core CPU (I don't know the type)
4G ram
I'm considering how to improve the performance.
Do I need to use a concurrent framework as akka
try another language, e.g. Scala? C++?
use the real-time java system?
your advices...
Need your help!
Update:
The applications have logged some important information for analysis, but I don't find any bottlenecks. The HKEx will provide more data in the next year, I don't think my application will be fast enough.
One of my customer have tested our application and another company's, but ours didn't have advantage in speed. I just want to find a way to make it faster.
How is the first application running
The first application will receive the stock data from HKEx and broadcast them to several other servers. The steps are:
It connects HKEx
logins
reads the data. The data is in binary format, each item has a head, which is 2 bytes of integer which means the length of body, then body, then next item.
put them into a hashmap in memory. Key is the sequence of the item, value is the byte array.
log the sequence of each received item into disk. Use log4j's buffer appender.
a daemon thread try to read the data from hashmap, and inserts them into postgresql in every 1 minute. (this is just used to backup the data)
when clients connect to this server, it accepts them and try to send all the data from hashmap from memory. I used thread pool in mina, the acceptor and senders are in different threads.
I think the logic is very simple. When there are 5 clients, I monitored the speed of transfer is only 1.5M/s at most. I used java to write a simplest socket program, and found it can be 10M/s.
Actually, I've spent more than 1 year trying all kinds of solutions on this application, just to make it faster. That why I feel desperate. Do I need to try another language than Java?
about 10ms latency
When the application received a data from HKEx, I will record the timestamp for it. When the root server broadcast the data to the children servers, it will append the timestamp to the data.
when children server get the data, it will send a message to root server to get the current timestamp, then compare them.
So, the 10ms latency contains:
root server got the data ---> the child server got the data
child server send a request for root server's timestamp ---> root server got it
But the 2nd one is very small that we can ignore it.
The first thing to do to find performance bottlenecks is to find out where most of the time is spent. A way to determine this is to use a profiler.
There are open source profilers available such as http://www.eclipse.org/tptp/, or commercial profilers such as Yourkit Java Profiler.
One easy thing to do could be to upgrade the JVM to Java SE6 or Java 7. General JVM performance improved a lot at version 6. See the Java SE 6 Performance White Paper for more details.
If you have checked everything, and found no obvious performance optimizations, you may need to change the architecture to get better performance. This would obviously be most fruitful if you could at least identify where your application is spending time - sounds like there are several major components:
The HK Ex server (out of your control)
The network between the Exchange and your system
The "root" server
The network between the "root" and the "child" servers
The "child" servers
The network between "child" servers and the client
The clients
To know where to spend your time, money and energy, I'd at least want to see an analysis of those components, how long each component takes (min, max, avg), and what the specification is of each resource.
Easiest thing to change is hardware - bigger servers, more memory etc., or better bandwidth. Can you see if any of those resources are constrained?
Next thing to look at is to change the communication protocol to be more efficient - how do clients receive the stocks? Can you reduce data size? 1.5M for only 5 clients sounds a lot...
Next, you might look at some kind of quality of service solution - provide dedicated hardware for "premium" customers, with reduced resource contention, more servers, more bandwidth - this will probably require changes to the architecture.
Next, you could consider changing the architecture - right now, your clients "pull" data from the client servers. You could, instead, "push" data out - that way, you shave off the polling interval on the client end.
At the very end of the list, I'd consider a different technology stack; Java is a fine programming language, but if absolute performance is a key priority, C/C++ is still faster. Clearly, that's a huge change, and a well-written Java app will be faster than a poorly written C/C++ app (and far more stable).
To trace the source of the delay I would add timing data to your end to end process. You can do this using an external log, or by adding meta data to your messages.
What you want to get is a timestamp at key stages in your application 3-5 is enough to start with. Normally I would use System.nanoTime() because I am looking for micro-second delays, but in your case System.currentTimeMillis() is likely to be enough, esp if you average over many samples (you will still get 0.1 ms accuracy on an average, with Ubuntu)
Compare time stamps for the same messages as it passes through your system and look for the highest average delay. Once you have found this try breaking this interval into more stages to zoom in on the problem.
I would analyse any stage which has a verage delay over over 1 ms for your situation.
If clients are updating every minute, there might not be a good technical reason to do this, but you don't want to be seen as being slow and your traders at a disavantage even if in reality it won't make a difference.

Long-running stats process - thoughts on language choice?

I am on a LAMP stack for a website I am managing. There is a need to roll up usage statistics (a variety of things related to our desktop product).
I initially tackled the problem with PHP (being that I had a bunch of classes to work with the data already). All worked well on my dev box which was using 5.3.
Long story short, 5.1 memory management seems to suck a lot worse, and I've had to do a lot of fooling to get the long-term roll up scripts to run in a fixed memory space. Our server guys are unwilling to upgrade PHP at this time. I've since moved my dev server back to 5.1 so I don't run into this problem again.
For mining of MySQL databases to roll up statistics for different periods and resolutions, potentially running a process that does this all the time in the future (as opposed to on a cron schedule), what language choice do you recommend? I was looking at Python (I know it more or less), Java (don't know it that well), or sticking it out with PHP (know it quite well).
Edit: design clarification for commenter
Resolutions: The way the rollup script works currently, is I have some classes for defining resolutions and buckets. I have year, month, week, day -- given a "bucket number" each class gives a start and end timestamp that defines the time range for that bucket -- this is based on arbitrary epoch date. The system maintains "complete" records, ie it will complete its rolled up dataset for each resolution since the last time it was run, currently.
SQL Strat: The base stats are located in many dissimilar schemas and tables. I do individual queries for each rolled up stat for the most part, then fill one record for insert. Your are suggesting nested subqueries such as:
INSERT into rolled_up_stats (someval, someval, someval, ... ) VALUES (SELECT SUM(somestat) from someschema, SELECT AVG(somestat2) from someschema2)
Those subqueries will generate temporary tables, right? My experience is that had been slow as molasses in the past. Is it a better approach?
Edit 2: Adding some inline responses to the question
Language was a bottleneck in the case of 5.1 php -- I was essentially told I made the wrong language choice (though the scripts worked fine on 5.3). You mention python, which I am checking out for this task. To be clear, what I am doing is providing a management tool for usage statistics of a desktop product (the logs are actually written by an EJB server to mysql tables). I do apache log file analysis, as well as more custom web reporting on the web side, but this project is separate. The approach I've taken so far is aggregate tables. I'm not sure what these message queue products could do for me, I'll take a look.
To go a bit further -- the data is being used to chart activity over time at the service and the customer level, to allow management to understand how the product is being used. You might select a time period (April 1 to April 10) and retrieve a graph of total minutes of usage of a certain feature at different granularities (hours, days, months etc) depending on the time period selected. Its essentially an after-the-fact analysis of usage. The need seems to be tending towards real-time, however (look at the last hour of usage)
There are a lot of different approaches to this problem, some of which are mentioned here, but what you're doing with the data post-rollup is unclear...?
If you want to utilize this data to provide digg-like 'X diggs' buttons on your site, or summary graphs or something like that which needs to be available on some kind of ongoing basis, you can actually utilize memcache for this, and have your code keep the cache key for the particular statistic up to date by incrementing it at the appropriate times.
You could also keep aggregation tables in the database, which can work well for more complex reporting. In this case, depending on how much data you have and what your needs are, you might be able to get away with having an hourly table, and then just creating views based on that base table to represent days, weeks, etc.
If you have tons and tons of data, and you need aggregate tables, you should look into offloading statistics collection (and perhaps the database queries themselves) to a queue like RabbitMQ or ActiveMQ. On the other side of the queue put a consumer daemon that just sits and runs all the time, updating things in the database (and perhaps the cache) as needed.
One thing you might also consider is your web server's logs. I've seen instances where I was able to get a somewhat large portion of the required statistics from the web server logs themselves after just minor tweaks to the log format rules in the config. You can roll the logs every , and then start processing them offline, recording the results in a reporting database.
I've done all of these things with Python (I released loghetti for dealing with Apache combined format logs, specifically), though I don't think language is a limiting factor or bottleneck here. Ruby, Perl, Java, Scala, or even awk (in some instances) would work.
I have worked on a project to do a similar thing in the past, so I have actual experience with performance. You would be hard pressed to beat the performance of "INSERT ... SELECT" (not "INSERT...VALUES (SELECT ...)". Please see http://dev.mysql.com/doc/refman/5.1/en/insert-select.html
The advantage is that if you do that, especially if you keep the roll-up code in MySQL procedures, is that all you need from the outside is just a cron-job to poke the DB into performing the right roll-ups at the right times -- as simple as a shell-script with 'mysql <correct DB arguments etc.> "CALL RollupProcedure"'
This way, you are guaranteeing yourself zero memory allocation bugs, as well as having decent performance when the MySQL DB is on a separate machine (no moving of data across machine boundary...)
EDIT: Hourly resolution is fine -- just run an hourly cron-job...
If you are running mostly SQL commands, why not just use MySQL etc on the command line? You could create a simple table that lists aggregate data then run a command like mysql -u[user] -p[pass] < commands.sql to pass SQL in from a file.
Or, split the work into smaller chunks and run them sequentially (as PHP files if that's easiest).
If you really need it to be a continual long-running process then a programming language like python or java would be better, since you can create a loop and keep it running indefinitely. PHP is not suited for that kind of thing. It would be pretty easy to convert any PHP classes to Java.

Stored proc running 30% slower through Java versus running directly on database

I'm using Java 1.6, JTDS 1.2.2 (also just tried 1.2.4 to no avail) and SQL Server 2005 to create a CallableStatement to run a stored procedure (with no parameters). I am seeing the Java wrapper running the same stored procedure 30% slower than using SQL Server Management Studio. I've run the MS SQL profiler and there is little difference in I/O between the two processes, so I don't think it's related to query plan caching.
The stored proc takes no arguments and returns no data. It uses a server-side cursor to calculate the values that are needed to populate a table.
I can't see how the calling a stored proc from Java should add a 30% overhead, surely it's just a pipe to the database that SQL is sent down and then the database executes it....Could the database be giving the Java app a different query plan??
I've posted to both the MSDN forums, and the sourceforge JTDS forums (topic: "stored proc slower in JTDS than direct in DB") I was wondering if anyone has any suggestions as to why this might be happening?
Thanks in advance,
-James
(N.B. Fear not, I will collate any answers I get in other forums together here once I find the solution)
Java code snippet:
sLogger.info("Preparing call...");
stmt = mCon.prepareCall("SP_WB200_POPULATE_TABLE_limited_rows");
sLogger.info("Call prepared. Executing procedure...");
stmt.executeQuery();
sLogger.info("Procedure complete.");
I have run sql profiler, and found the following:
Java app :
CPU: 466,514 Reads: 142,478,387 Writes: 284,078 Duration: 983,796
SSMS :
CPU: 466,973 Reads: 142,440,401 Writes: 280,244 Duration: 769,851
(Both with DBCC DROPCLEANBUFFERS run prior to profiling, and both produce the correct number of rows)
So my conclusion is that they both execute the same reads and writes, it's just that the way they are doing it is different, what do you guys think?
It turns out that the query plans are significantly different for the different clients (the Java client is updating an index during an insert that isn't in the faster SQL client, also, the way it is executing joins is different (nested loops Vs. gather streams, nested loops Vs index scans, argh!)). Quite why this is, I don't know yet (I'll re-post when I do get to the bottom of it)
Epilogue
I couldn't get this to work properly. I tried homogenising the connection properties (arithabort, ansi_nulls etc) between the Java and Mgmt studio clients. It ended up the two different clients had very similar query/execution plans (but still with different actual plan_ids). I posted a summary of what I found to the MSDN SQL Server forums as I found differing performance not just between a JDBC client and management studio, but also between Microsoft's own command line client, SQLCMD, I also checked some more radical things like network traffic too, or wrapping the stored proc inside another stored proc, just for grins.
I have a feeling the problem lies somewhere in the way the cursor was being executed, and it was somehow giving rise to the Java process being suspended, but why a different client should give rise to this different locking/waiting behaviour when nothing else is running and the same execution plan is in operation is a little beyond my skills (I'm no DBA!).
As a result, I have decided that 4 days is enough of anyone's time to waste on something like this, so I will grudgingly code around it (if I'm honest, the stored procedure needed re-coding to be more incremental instead of re-calculating all data each week anyway), and chalk this one down to experience. I'll leave the question open, big thanks to everyone who put their hat in the ring, it was all useful, and if anyone comes up with anything further, I'd love to hear some more options...and if anyone finds this post as a result of seeing this behaviour in their own environments, then hopefully there's some pointers here that you can try yourself, and hope fully see further than we did.
I'm ready for my weekend now!
-James
You can attach the Profiler and monitor for the events SQL:BatchCompleted and SP:Completed, with a filter on duration > 1000. Run the procedure from your Java client and from SSMS. Compare the Reads and the Writes of the two events (Java vs. SSMS). Are they significantly different? This would indicate considerably different execution paths or plans, with significant difference in I/O.
Also try to capture the Showplan XML event of the two and compare the plans (save the event as a .sqlplan file, open it in SSMS to easy analysis). Do they have similar plans? Are there wild differences in Estimate vs. Actual (rows, rewinds, rebinds)? Do they have same degree of parallelism? The plans can aso be retrieved from sys.dm_exec_requests view.
Are there any warning events raised, like Missing Column Statistics, Sort Warnings, Hash Warning, Execution Warnings, Blocked Process?
the point is that you have at your disposal a whole arsenal of investigation tools. Once you find the root cause of the difference, you can trace it down to what is different between your Java environment settings and the SSMS environment (ADO.Net SqlClient). Things like default transaction isolation level, ANSI settings etc etc.
Checking: Is your problem that two applications (SSMS, Java) are making the exact same identical call to SQL Server, and SQL Server is acting differently for each? If so, I hit things like this every year or two, and they hurt my brain for days.
Once, I ultimately isolated each process call and logging everything for the entire process in Profiler. I eventually noticed that the Login event (under TextData) showed a host of information, like so:
-- network protocol: TCP/IP
set quoted_identifier on
set arithabort off
set numeric_roundabort off
set ansi_warnings on
set ansi_padding on
set ansi_nulls on
set concat_null_yields_null on
set cursor_close_on_commit off
set implicit_transactions off
set language us_english
set dateformat mdy
set datefirst 7
set transaction isolation level read committed
The "Existing Connection" event will show this information as well--but, sometimes immediately subsequent calls (batches, RPCs, I disremember just now) are sent [ISQL or OSQL did this, I think] to immediately reset some of these -- Arithabort and Quoted_Identifier seem to be favorites, and other SET options also get modified depending on the settings or requirements of whatever connectivity protocols your application's database interface is using.
Another one: some settings are kept as attributes of a procedure at "create" time, and others are factored in at compile time. On the one hand, your connection's SET values may be being overwritten by the configuration saved at the time the procedure was created; on the other hand, your two connections may differ so much that two execution plans are generated for one procedure. (All of this information is, after sufficient research, available in the sys. tables and DMVs.)
In short, it seems to me that SQL obscurities are messing you up. To this day, I loathe all these goombah settings. Things below my notice keep messing around with them [I mean, really, what fool would set implicit_transaction for a connection pool on? But once they did...] and it's hard to build structures when the ground (rules) keep changing out from underneath you. After all, remember what the guy said about building castles in a swamp...
I recall having a similar issue a while ago, because JTDS was silently converting a string parameter to Unicode or something similar. As a result of that conversion, SQL Server was unable to use the index which is was using when we ran the stored proc from SSMS.
HIH
Does the Java case include transmission of the results to the Java server (network overhead) plus some Java processing? A 12 minute query might produce quite a large amount of data.
If you are looking at the profiler and there is no difference between the executions then the difference must be with the client systems.
4 mins does seem like to long just to prepare a statement to send so the 12 min wait must cause some other effect -- no idea what it is.
I am not sure if this post is still relevant. We faced a similar problem in our application.
One key difference between running a stored procedure in SQL Management studio and one running from JDBC is that of transaction context. If you are using an ORM in Java, by default the stored procedure runs in a transaction context. When you run a stored procedure directly in SQL management studio the transaction is off. There is a substantial performance difference.
Sorry, I've not found a correct answer to this, so I don't want to allocate any of these as correct, so I am going to mark this answer as correct, and wish anyone luck who comes across anything similar!
Did you know that Microsoft ship JDBC drivers for their databases?
These may be more performant.
Obviously.. you may have resolved the problem by now.

Categories

Resources