Long-running stats process - thoughts on language choice?

Long-running stats process - thoughts on language choice? - java

I am on a LAMP stack for a website I am managing. There is a need to roll up usage statistics (a variety of things related to our desktop product).
I initially tackled the problem with PHP (being that I had a bunch of classes to work with the data already). All worked well on my dev box which was using 5.3.
Long story short, 5.1 memory management seems to suck a lot worse, and I've had to do a lot of fooling to get the long-term roll up scripts to run in a fixed memory space. Our server guys are unwilling to upgrade PHP at this time. I've since moved my dev server back to 5.1 so I don't run into this problem again.
For mining of MySQL databases to roll up statistics for different periods and resolutions, potentially running a process that does this all the time in the future (as opposed to on a cron schedule), what language choice do you recommend? I was looking at Python (I know it more or less), Java (don't know it that well), or sticking it out with PHP (know it quite well).
Edit: design clarification for commenter
Resolutions: The way the rollup script works currently, is I have some classes for defining resolutions and buckets. I have year, month, week, day -- given a "bucket number" each class gives a start and end timestamp that defines the time range for that bucket -- this is based on arbitrary epoch date. The system maintains "complete" records, ie it will complete its rolled up dataset for each resolution since the last time it was run, currently.
SQL Strat: The base stats are located in many dissimilar schemas and tables. I do individual queries for each rolled up stat for the most part, then fill one record for insert. Your are suggesting nested subqueries such as:
INSERT into rolled_up_stats (someval, someval, someval, ... ) VALUES (SELECT SUM(somestat) from someschema, SELECT AVG(somestat2) from someschema2)
Those subqueries will generate temporary tables, right? My experience is that had been slow as molasses in the past. Is it a better approach?
Edit 2: Adding some inline responses to the question
Language was a bottleneck in the case of 5.1 php -- I was essentially told I made the wrong language choice (though the scripts worked fine on 5.3). You mention python, which I am checking out for this task. To be clear, what I am doing is providing a management tool for usage statistics of a desktop product (the logs are actually written by an EJB server to mysql tables). I do apache log file analysis, as well as more custom web reporting on the web side, but this project is separate. The approach I've taken so far is aggregate tables. I'm not sure what these message queue products could do for me, I'll take a look.
To go a bit further -- the data is being used to chart activity over time at the service and the customer level, to allow management to understand how the product is being used. You might select a time period (April 1 to April 10) and retrieve a graph of total minutes of usage of a certain feature at different granularities (hours, days, months etc) depending on the time period selected. Its essentially an after-the-fact analysis of usage. The need seems to be tending towards real-time, however (look at the last hour of usage)

There are a lot of different approaches to this problem, some of which are mentioned here, but what you're doing with the data post-rollup is unclear...?
If you want to utilize this data to provide digg-like 'X diggs' buttons on your site, or summary graphs or something like that which needs to be available on some kind of ongoing basis, you can actually utilize memcache for this, and have your code keep the cache key for the particular statistic up to date by incrementing it at the appropriate times.
You could also keep aggregation tables in the database, which can work well for more complex reporting. In this case, depending on how much data you have and what your needs are, you might be able to get away with having an hourly table, and then just creating views based on that base table to represent days, weeks, etc.
If you have tons and tons of data, and you need aggregate tables, you should look into offloading statistics collection (and perhaps the database queries themselves) to a queue like RabbitMQ or ActiveMQ. On the other side of the queue put a consumer daemon that just sits and runs all the time, updating things in the database (and perhaps the cache) as needed.
One thing you might also consider is your web server's logs. I've seen instances where I was able to get a somewhat large portion of the required statistics from the web server logs themselves after just minor tweaks to the log format rules in the config. You can roll the logs every , and then start processing them offline, recording the results in a reporting database.
I've done all of these things with Python (I released loghetti for dealing with Apache combined format logs, specifically), though I don't think language is a limiting factor or bottleneck here. Ruby, Perl, Java, Scala, or even awk (in some instances) would work.

I have worked on a project to do a similar thing in the past, so I have actual experience with performance. You would be hard pressed to beat the performance of "INSERT ... SELECT" (not "INSERT...VALUES (SELECT ...)". Please see http://dev.mysql.com/doc/refman/5.1/en/insert-select.html
The advantage is that if you do that, especially if you keep the roll-up code in MySQL procedures, is that all you need from the outside is just a cron-job to poke the DB into performing the right roll-ups at the right times -- as simple as a shell-script with 'mysql <correct DB arguments etc.> "CALL RollupProcedure"'
This way, you are guaranteeing yourself zero memory allocation bugs, as well as having decent performance when the MySQL DB is on a separate machine (no moving of data across machine boundary...)
EDIT: Hourly resolution is fine -- just run an hourly cron-job...

If you are running mostly SQL commands, why not just use MySQL etc on the command line? You could create a simple table that lists aggregate data then run a command like mysql -u[user] -p[pass] < commands.sql to pass SQL in from a file.
Or, split the work into smaller chunks and run them sequentially (as PHP files if that's easiest).
If you really need it to be a continual long-running process then a programming language like python or java would be better, since you can create a loop and keep it running indefinitely. PHP is not suited for that kind of thing. It would be pretty easy to convert any PHP classes to Java.

Related

Is there a best practice to query on huge CSV in spring boot context?

I'm working for well known company in a project that should bring integration with other system that are producing one csv per hour of 27Gb. The target is query these files without import em (the main problem is bureaucracy, nobody want resposibility if some data change).
Main filters on this files can be done by dates, the end-user must insert a range start-end dates. After that can be filter by few strings.
Context: spring boot microservices
Server: xeon processor 24 core 256gb Ram
Filesystem: NFS mounted from external server
Test data: 1000 files, each one 1Gb
For performance improvement i'm indexing files by date writing on each file name the range that contains and making a folder structure like yyyy/mm/dd. For each of following test the first step was make a raw file paths list that will be read.
research will read all files
Spring batch - buffered reader and parse into object: 12,097 sec
Plain java - threadpool, buffered reader and parse into object: 10,882 sec
Linux egrep with regex and parallel ran from java and parse into object: 7,701 sec
The dirtiest is also fastes. I want avoid it because security department warned me about all checks to make on input data to prevent shell injection.
Googling i found mariadb CONNECT engine that can point also huge csvs, so now i'm going on this way creating temporary table with files that research have interest, the bad part is i have to do one table for each query since dates can be different.
For first year We're expecting not more than 5 parallel researches in same time, with an average of 3 weeks of range. This queries will be done asyncronousely.
Do you know something that can help me on it? Not only for the speed but a good practice to apply.
Thanks a lot folks.

To answer your question:
No. There are no best practices. And, AFAIK there are no generally applicable "good" practices.
But I do have some general advice. If you allow considerations such as bureaucracy and (to a lesser extent) security edicts to dictate your technical solutions, then you are liable to end up with substandard solutions; i.e. solutions that are slow or costly to run and keep running. (If "they" want it to be fast, then "they" shouldn't put impediments in your way.)
I don't think we can give you an easy solution to your problem, but I can say some things about your analysis.
You said about the grep solution.
"I want avoid it because security department warned me about all checks to make on input data to prevent shell injection."
The solution to that concern simple: don't use an intermediate shell. The dangerous injection attacks will be via shell trickery rather than grep. Java's ProcessBuilder doesn't use a shell unless you explicitly use one. The grep program itself can only read the files that are specified in its arguments, and write to standard output and standard error.
You said about the general architecture:
"The target is query these files without import them (the main problem is bureaucracy, nobody want responsibility if some data change)."
I don't understand the objection here. We know that the CSV files are going to change. You are getting a new 27GB CSV file every hour!
If the objection is that the format of the CSV files is going to change, well that affects your ability to effectively query them. But with a little ingenuity, you could detect the the change in format and adjust the ingestion process on the fly.
"We're expecting not more than 5 parallel researches in same time, with an average of 3 weeks of range."
If you haven't done this already, you need to do some analysis to see whether your proposed solution is going to be viable. Estimate how much CSV data needs to be scanned to satisfy a typical query. Multiply that by the number of queries to be performed in (say) 24 hours. Then compare that against your NFS server's ability to satisfy bulk reads. Then redo the calculation assuming a given number of queries running in parallel.
Consider what happens if your (above) expectations are wrong. You only need a couple of "idiot" users doing unreasonable things ...
Having a 24 core server for doing the queries is one thing, but the NFS server also needs to be able to supply the data fast enough. You can improve things with NFS tuning (e.g. by tuning block sizes, the number of NFS daemons, using FS-Cache) but the the ultimate bottlenecks will be getting the data off the NFS server's disks and across the network to your server. Bear in mind that there could be other servers "hammering" the NFS server while your application is doing its thing.

Getting statistics of the activity of my engine

One of my application is an engine that executes some complex calculations. These calculations may take several hours. I want to know the activity of this engine among time.
If you are using Hudson CI server, there is such a feature in Administration > Usages statistics option. Here is an example:
In my application, I already have a function that returns the number of calculations. So my idea is to periodically call this method, for example every 10 minutes (using Quartz) in order to retrieve the number of running calculations and store it in a int[]. Every day, this int[] is stored in an external file, and cleaned. So after some days, the content of this file will look like:
data.20101008=1;2;1;0;1;1;0;0;0;0;....;0
data.20101009=1;0;1;1;0;0;0;0;3;2;....;0
data.20101010=1;2;1;0;1;1;2;3;4;4;....;2
...
(I simplified a little this processing, as I want to update this file more frequently, in case of engine shutdown, or something like that)
To summarize, my idea is to store in a properties file the number of running calculations for a specific day. Eventually, I can provide a function that returns this data as a Map<Date, int[]>.
Finally, I will use JFreeChart to display these data in a graphical
What do you think of this idea? Any advice to enhance this process?
Note that this functionality is one a nice to have feature, so I don't want to build too complex systems.
I am using Java 6.

For simple ad-hoc monitoring by a human operator, you can simply use JMX. With JConsole (ships with the JDK) you can connect to a running java process and it will do nice graphs of CPU and memory usage, number of live threads, any many more. Using MBeans, you can expose current values over JMX, and use JConsole to chart them (while it is connected, of course).

Instead of a properties file and JFreeChart, maybe consider using a RRDTool Java implementation like RRD4J. From the RRDTool website:
RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data.

Stored proc running 30% slower through Java versus running directly on database

I'm using Java 1.6, JTDS 1.2.2 (also just tried 1.2.4 to no avail) and SQL Server 2005 to create a CallableStatement to run a stored procedure (with no parameters). I am seeing the Java wrapper running the same stored procedure 30% slower than using SQL Server Management Studio. I've run the MS SQL profiler and there is little difference in I/O between the two processes, so I don't think it's related to query plan caching.
The stored proc takes no arguments and returns no data. It uses a server-side cursor to calculate the values that are needed to populate a table.
I can't see how the calling a stored proc from Java should add a 30% overhead, surely it's just a pipe to the database that SQL is sent down and then the database executes it....Could the database be giving the Java app a different query plan??
I've posted to both the MSDN forums, and the sourceforge JTDS forums (topic: "stored proc slower in JTDS than direct in DB") I was wondering if anyone has any suggestions as to why this might be happening?
Thanks in advance,
-James
(N.B. Fear not, I will collate any answers I get in other forums together here once I find the solution)
Java code snippet:
sLogger.info("Preparing call...");
stmt = mCon.prepareCall("SP_WB200_POPULATE_TABLE_limited_rows");
sLogger.info("Call prepared. Executing procedure...");
stmt.executeQuery();
sLogger.info("Procedure complete.");
I have run sql profiler, and found the following:
Java app :
CPU: 466,514 Reads: 142,478,387 Writes: 284,078 Duration: 983,796
SSMS :
CPU: 466,973 Reads: 142,440,401 Writes: 280,244 Duration: 769,851
(Both with DBCC DROPCLEANBUFFERS run prior to profiling, and both produce the correct number of rows)
So my conclusion is that they both execute the same reads and writes, it's just that the way they are doing it is different, what do you guys think?
It turns out that the query plans are significantly different for the different clients (the Java client is updating an index during an insert that isn't in the faster SQL client, also, the way it is executing joins is different (nested loops Vs. gather streams, nested loops Vs index scans, argh!)). Quite why this is, I don't know yet (I'll re-post when I do get to the bottom of it)
Epilogue
I couldn't get this to work properly. I tried homogenising the connection properties (arithabort, ansi_nulls etc) between the Java and Mgmt studio clients. It ended up the two different clients had very similar query/execution plans (but still with different actual plan_ids). I posted a summary of what I found to the MSDN SQL Server forums as I found differing performance not just between a JDBC client and management studio, but also between Microsoft's own command line client, SQLCMD, I also checked some more radical things like network traffic too, or wrapping the stored proc inside another stored proc, just for grins.
I have a feeling the problem lies somewhere in the way the cursor was being executed, and it was somehow giving rise to the Java process being suspended, but why a different client should give rise to this different locking/waiting behaviour when nothing else is running and the same execution plan is in operation is a little beyond my skills (I'm no DBA!).
As a result, I have decided that 4 days is enough of anyone's time to waste on something like this, so I will grudgingly code around it (if I'm honest, the stored procedure needed re-coding to be more incremental instead of re-calculating all data each week anyway), and chalk this one down to experience. I'll leave the question open, big thanks to everyone who put their hat in the ring, it was all useful, and if anyone comes up with anything further, I'd love to hear some more options...and if anyone finds this post as a result of seeing this behaviour in their own environments, then hopefully there's some pointers here that you can try yourself, and hope fully see further than we did.
I'm ready for my weekend now!
-James

You can attach the Profiler and monitor for the events SQL:BatchCompleted and SP:Completed, with a filter on duration > 1000. Run the procedure from your Java client and from SSMS. Compare the Reads and the Writes of the two events (Java vs. SSMS). Are they significantly different? This would indicate considerably different execution paths or plans, with significant difference in I/O.
Also try to capture the Showplan XML event of the two and compare the plans (save the event as a .sqlplan file, open it in SSMS to easy analysis). Do they have similar plans? Are there wild differences in Estimate vs. Actual (rows, rewinds, rebinds)? Do they have same degree of parallelism? The plans can aso be retrieved from sys.dm_exec_requests view.
Are there any warning events raised, like Missing Column Statistics, Sort Warnings, Hash Warning, Execution Warnings, Blocked Process?
the point is that you have at your disposal a whole arsenal of investigation tools. Once you find the root cause of the difference, you can trace it down to what is different between your Java environment settings and the SSMS environment (ADO.Net SqlClient). Things like default transaction isolation level, ANSI settings etc etc.

Checking: Is your problem that two applications (SSMS, Java) are making the exact same identical call to SQL Server, and SQL Server is acting differently for each? If so, I hit things like this every year or two, and they hurt my brain for days.
Once, I ultimately isolated each process call and logging everything for the entire process in Profiler. I eventually noticed that the Login event (under TextData) showed a host of information, like so:
-- network protocol: TCP/IP
set quoted_identifier on
set arithabort off
set numeric_roundabort off
set ansi_warnings on
set ansi_padding on
set ansi_nulls on
set concat_null_yields_null on
set cursor_close_on_commit off
set implicit_transactions off
set language us_english
set dateformat mdy
set datefirst 7
set transaction isolation level read committed
The "Existing Connection" event will show this information as well--but, sometimes immediately subsequent calls (batches, RPCs, I disremember just now) are sent [ISQL or OSQL did this, I think] to immediately reset some of these -- Arithabort and Quoted_Identifier seem to be favorites, and other SET options also get modified depending on the settings or requirements of whatever connectivity protocols your application's database interface is using.
Another one: some settings are kept as attributes of a procedure at "create" time, and others are factored in at compile time. On the one hand, your connection's SET values may be being overwritten by the configuration saved at the time the procedure was created; on the other hand, your two connections may differ so much that two execution plans are generated for one procedure. (All of this information is, after sufficient research, available in the sys. tables and DMVs.)
In short, it seems to me that SQL obscurities are messing you up. To this day, I loathe all these goombah settings. Things below my notice keep messing around with them [I mean, really, what fool would set implicit_transaction for a connection pool on? But once they did...] and it's hard to build structures when the ground (rules) keep changing out from underneath you. After all, remember what the guy said about building castles in a swamp...

I recall having a similar issue a while ago, because JTDS was silently converting a string parameter to Unicode or something similar. As a result of that conversion, SQL Server was unable to use the index which is was using when we ran the stored proc from SSMS.
HIH

Does the Java case include transmission of the results to the Java server (network overhead) plus some Java processing? A 12 minute query might produce quite a large amount of data.

If you are looking at the profiler and there is no difference between the executions then the difference must be with the client systems.
4 mins does seem like to long just to prepare a statement to send so the 12 min wait must cause some other effect -- no idea what it is.

I am not sure if this post is still relevant. We faced a similar problem in our application.
One key difference between running a stored procedure in SQL Management studio and one running from JDBC is that of transaction context. If you are using an ORM in Java, by default the stored procedure runs in a transaction context. When you run a stored procedure directly in SQL management studio the transaction is off. There is a substantial performance difference.

Sorry, I've not found a correct answer to this, so I don't want to allocate any of these as correct, so I am going to mark this answer as correct, and wish anyone luck who comes across anything similar!

Did you know that Microsoft ship JDBC drivers for their databases?
These may be more performant.
Obviously.. you may have resolved the problem by now.

Terracotta + Compass = Hibernate + HSQLDB + JMS?

I am currently in need of a high performance java storage mechanism.
This means:
1) I have 10,000+ objects with 1 - Many Relationship.
2) The objects are updated every 5 seconds, with the most recent updates persistent in the case of system failure.
3) The objects need to be queryable in a reasonable time (1-5 seconds). (IE: Give me all of the objects with this timestamp or give me all of the objects within these location boundaries).
4) The objects need to be available across various Glassfish installs.
Currently:
I have been using JMS to distribute the objects, Hibernate as an ORM, and HSQLDB to provide the needed recoverablity.
I am not exactly happy with the performance. Especially the JMS part of this.
After doing some Stack Overflow research, I am wondering if this would be a better solution. Keep in mind that I have no experience with what Terracotta gives me.
I would use Terracotta to distribute objects around the system, and something else need to give the ability to "query" for attributes of those objects.
Does this sound reasonable? Would it meet these performance constraints? What other solutions should I consider?

I know it's not what you asked, but, you may want to start by switching from HSQLDB to H2. H2 is a relatively new, pure Java DB. It is written by the same guy who wrote HSQLDB and he claims the performance is much better. I'm using it for some time now and I'm very happy with it. It should be a very quick transition (add a Jar, change the connection string, create the database) so it's worth a shot.
In general, I believe in trying to get the most of what I have before rewriting the application in a different architecture. Try profiling it to identify the bottleneck first.

At first, Lucene isn't your friend here. (read only)
Terracotta is to scale around at the Logical layer! Your problem seems not to be related to the processing logic. It's more around the Storage/Communication point.
Identify your bottleneck! Benchmark the Storage/Logic/JMS processing time and overhead!
Kill JMS issues with a good JMS framework (eg. ActiveMQ) and a good/tuned configuration.
Maybe a distributed key=>value store is your friend. Try Project Voldemort!
If you like to stay at Hibernate and HSQL, check out the Hibernate 2nd level cache and connection pooling (c3po, container driven...)!

Several Terracotta users have built systems like this in the past, so I can you tell you by proof of existence that it can be done. :)
Compass does have support for clustering with Terracotta so that might help you. I suspect you might get further faster by just being careful with how you create your clustered data structures.
Regarding your requirements and Terracotta:
1) 10k objects is quite small from a Terracotta perspective
2) 5 sec update rate doesn't seem like an issue. Might depend how many nodes there are and whether there is any natural partitioning you can take advantage of. All updates will be persistent.
3) 1-5 second query time seems quite easy. Building your own well-organized data structures for lookup is the tricky part. Obviously you want to avoid scanning all the data.
4) Terracotta currently supports Glassfish v1 and v2.
If you post on the Terracotta forums, you could probably get more Terracotta eyeballs on the problem.

I am currently working on writing the client for a very (very) fast Key/Value distributed hash DB that provides set + list semantics. The DB is C99 and requires GCC and right now I'm battling with good old Java network IO to break my current 30,000 get/sets per/sec barrier. Hope to be done within the week. Drop me a line through my account and I'll get back when its show time.

With such a high update rate, Lucene is almost definitely not what you're looking for, since there is no way to update a document once it's indexed. You'd have to keep all the object versions in the index and select the one with the latest time stamp, which will kill your performance.
I'm no DB expert, but I think you should look into any one of the distributed DB solutions that's been on the news lately. (CouchDB, Cassandra)

Maybe you should take a look to: Prevayler.
Your objects are always in mem.
The "changes" to your objects are persisted.
From time to time you are able to take a snapshot: every object is persisted.

You don't say what vendor you are using for JMS, but I wouldn't surprise me if you have some bottle neck there. I couldn't get more than 100 messages a second from ActiveMq, and whatever I tried in terms of configuration of acknowledgment, queue size, etc we were unable to soak the CPU beyond a few percent.
The solution was to batch many queries into one JMS message. We had a simple class that either sent a batch of messages when it got to 200 queries or reached a timeout (we used 20ms), which gave us a dramatic increase in message throughput.

Guaranteed messaging is going to be much slower than volatile messaging. Given every object is updated every few second, you might consider batching your updates (into say 500 changes or by time say 1-10 ms' worth), sending over volatile messaging, and batching your transactions. In this case you are more likely to be limited by bandwidth. Tuning your use case you may find smaller batch sizes also work efficiently. If bandwidth is critical (say you have a 10 MB connection or slower, then you could use compression over JMS)
You can achieve much higher performance with a custom solution (which also might be simpler) e.g. Hazelcast & JGroups are free (you can add a node(s) which does the database synchronization so your main app doesn't slow down). There are commercial products which handle in the order of half a million durable messages/sec.

Terracotta + jofti = queryable persistent clustered data structures
Search google for terracotta querymap or visit tusharkhairnar.blogspot.com for querymap blog
You may want to integrate timasync as well to update your database. Database is is your system of record use terracotta as caching and database offloading mechanism you can even batch async updates to make it faster so that I'd db contains fairly recent data
Tushar
tusharkhairnar.blogspot.com

Is there a SQL Server profiler similar to Java/.Net profilers?

I love the way I can profile a Java/.Net app to find performance bottlenecks or memory problems. For example, it's very easy to find a performance bottleneck looking at the call tree with execution times and invocation counts per method. In SQL Server, I have stored procedures that call other stored procedures that depend on views, which is similar to Java/.Net methods calling other methods. So it seems the same kind of profiler would be very helpful here. However, I looked far and wide and could not find one. Is anyone aware of such tools, either for SQL Server or any other DBMS?
Update: Thanks fro your replies around SQL Server Profiler, but this tool is very limited. Take a look at the screenshot.

Check out SQL Nexus Tool. This has some good reports on identifying bottlenecks.
SQL Nexus is a tool that helps you identify the root cause of SQL Server performance issues. It loads and analyzes performance data collected by SQLDiag and PSSDiag. It can dramatically reduce the amount of time you spend manually analyzing data.
In one of the Inside SQL 2005 books (maybe T-SQL Querying), there was a cool technique in which the author dumps the SQL profiler output to a table or excel file and applies a pivot to get the output in a similar format as your screenshot.
I have not seen any built-in SQL tools which gives you that kind of analysis.
Another useful post.

In addition to SQL Server Profiler, as mentioned in a comment from #Galwegian, also check out your execution plan when you run a query.
http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx
http://en.wikipedia.org/wiki/Query_plan

Another whole thread about the SQL Server profiler:
Identifying SQL Server Performance Problems
I understand what you are talking about, but typically, database optimization takes place at a finer grained level. If the database activity is driven from a client, you should be able to use the existing client profiler to get the total time on each step and then address the low hanging fruit (whether in the database or not).
When you need to profile a particular database step in detail, you can use profiler and a trace.
Typically, the database access has a certain granularity which is addressed on an individual basis and database activity is not linear with all kinds of user access going on, whereas a program profiler is typically profiling a linear path of code.

As mentioned, SQL Server Profiler, which is great for checking what parameters you're program is passing to SQL etc. It won't show you an execution tree though if that's what you need. For that, all I can think of is to use Show Plan to see what exactly is executed at run-time. E.g. if you're calling an sp that calls a view, Profiler will only show you that the sp was executed and what params were passed in.
Also, the Windows Performance Monitor has extensive run-time performance metrics specific to SQL Server. You can run it on the server, or connect remotely.

To find performance bottlenecks, you can use the Database Engine Tuning Advisor (found in Tools menu of SQL Server Management Studio. It provides suggestions for optimizing your queries and offers to optimize them for you automatically (e.x. create the appropriate indexes, etc.).

You could use Sql Profiler - which covers the profiling aspect, but I tend to think of it more as a logging tool.
For diagnosing performance, you should probably just be looking at the query plan.

There's the sql server profiler, but despite it's name, it doesn't do what you want, by the sound of your question. It'll show you a detailed view of all the calls going on in the database. It's Better for troubleshooting the app as a whole, not just one sproc at a time
Sounds like you need to view the execution plan of your queries/spocs in query analyzer and that will give you something akin to the data you are looking for.

As mentioned by several replies the SQL Profiler will show what you're asking for. What you'll have to be sure to do is to turn on the events SP:StmtCompleted, which is in the Stored Procedures group, and if you want the query plans as well turn on Showplan XML Statistics Profile, which is in the Performance group. The XML plan last one gives you a graphical description and shows the actual rows processed by each step in the plan.
If the profiler is slowing your app down, filter it as much as possible and consider going to a server side trace.
HTH
Andy

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.