I have two test environments. My application is performing much worst on the seconds one. I suspect that this because the first one system is using database which runs on better hardware (more CPU, faster connection). I would like to verify my claims somehow. Are there any tools, which would help me with that? Should it helpful, I am using Oracle 11g and my app is using Hibernate to connect to the database.
Mind you, I am not interested in profiling my schema. I would like to compare how fast is the same database (meaning schema + data) on two different machines.
If you are interested, why I suspect that database is the problem: I profiled my application during tests on those two environments. During the second test environment methods responsible for connecting to database (namely org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeQuery()) are using much more of the CPU time.
To answer the question: I believe you'd use JMeter to profile the two environments, and get comprehensive data out of the tests you run. VisualVM will also be helpful, but that depends on the kind of data you need, and how you need to present (or analyze) it.
But as for the general problem, is the data on the two databases exactly the same? Because if this is not the case, some possibilities are open - your transactions might be depending on data that is locked by another process (therefore, you'd need to look at your transactions and the transaction isolation they use).
Related
For reasons that are beside the point, a company has bought an Exadata Eighth Rack. Some of the managers thought that this would improve performance of current applications. The problem is that hardly any application makes intensive database work (yes, this is a good moment for looking at facepalm animated gifs). So, at the moment, migrations have proven just little benefit.
The question is obvious. Most of the applications are written in Java, and some of them make intensive use of Solr and Cassandra. For what I know, Exadata is intended for storing data, while Exalogic can hold applications too. Anyway, I'm wondering if there is some way of taking advantage of mentioned infrastructure.
Replace Solr with Oracle Text.
Before I get down-voted: normally I would not recommend replacing existing code built with a popular, open-source program with a seldom-used, proprietary product. But if you want to use a lot of space and CPU on your database servers then Oracle Text can definitely help.
As more generic advice, the primary role of a database is not to store data. A file system can do that. Databases are built to join data. If an application is reading a large amount of data and doing ad hoc joins, those are the jobs you want to move to the database.
Exadata -> Oracle Database extreme performance.
Exalogic -> Fusion Middleware extreme performance. (Java goes here)
Your best move will be refactoring the application to put as much workload as possible on the DB (PL/SQL).
Another thing I could think of, but this would be a radical approach I have never really tried it myself (Yes I work with Exadatas too) maybe you can give it a shot and let us know here...
What about using all those GBs on the Exadata's RAM and start tuning your Java application's latency? I mean with that gruesome amount of Memory you can try and set a real nice amount of heap and avoid Garbage Collection induced latency. Please do let me know here what comes out if you actually try this.
Which protocol do the Java applications use to connect to Oracle?
If it's not IPC (inter process communication, aka BEQUEATH, aka shared memory), but maybe TCP and you have many fast & tiny roundtrips, than this would be your low-hanging fruit - eliminate the network stack.
edit: just realized that exadata cannot run java applications by default (only ODA does) - so it wouldn't be possible to make use of IPC. However, perhaps you're able to test the impact of IPC in one of your applications using the former infrastructure?
Exadata cannot host any customer application. You cannot install anything there. You only can host Oracle database on Exadata.
It means you can use database features like DBFS (file system over Oracle database), Java option (storing and executing java code in database). But you need to check what options you have license for. And internal JVM is used, which cannot be customized or upgraded.
Exadata is database appliance designed to work with large amount of differently accessed data in very effective and manageable way.
I am considering moving from H2 to MemSQL - and I would greatly appreciate any comments:
My application has to query very quickly concurrently from large tables of up to 300Million rows. To achieve this I have been using the H2 in-memory database.
I'm currently using the H2 database which allows me to create linked tables in the H2 in-memory database that point to a MySQL database. This is very useful in loading data from MySQL to H2.
Can I create Linked tables in MemSQL - I see no references to this in the online MemSQL documentation?
Another challenge is that I will need to run multiple instances of the application across many servers, so having MemSQL running distributed across servers is very attractive rather than having to duplicate the H2 database in every JVM instance of the application across the servers. Running one instance of H2 via TCP to the other servers will be too slow.
The other advantage I see with MemSQL is that there is apparently no locking and the queries are compiled into native C++ which could speed them up.
Has anyone compared MemSQL performance with H2? - I've found nothing on line from real world tests.
Mark L here from MemSQL. I wanted to address a few of your questions and offer additional help in getting the info/benchmarks you're asking about.
MemSQL does support linked tables via the JDBC connector - which in practice works just as it would with MySQL - so you'll have no issues getting that to work. Running MemSQL in distributed mode is indeed going to provide a big performance advantage and you'll see some significant improvements across the board both on throughput and latency. There's no direct comparison that I've found directly between H2 and MemSQL - however, you can draw some indirect conclusions by looking at comparisons of MemSQL vs MySQL since we have the comparison data for H2 vs. MySQL from the website. From our field experience I would expect you to observe significant performance gains when using MemSQL.
In general a few observations: in the MemSQL distributed version you would have several advantages that you can't get from H2: reads never blocking writes thanks to lock-free indexes, full MVCC (H2 can only do this in single-box), and auto sharding of data being among the highlights. Out of all the features, auto-sharding is likely to be the most substantial for your use case - H2 can't auto-shard the data, and having that ability when distributed is obviously a big advantage even if speed were equal between the two. As I mentioned though it will be much faster with MemSQL distributed, as well as easier to manage vs. multiple instances of H2.
In any case we're more than happy to help you prove this out! Please feel free to reach out to me via email- larosa at memsql dot com.
There are 2 different applications running on the same jboss server. I want to connect these 2 applications with the same mysql database through the same data source.
what kind of impact can be occur in running these 2 application -
I am supposing that these issues could be happen .
- Table Locking issues, Slow performance, connectivity issue, ACID properties lost issue.
Are there any drawbacks with this approach?
Nothing of this can happen (Table Locking issues, Slow performance, connectivity issue, ACID properties lost issue).
The database cannot really distinguish if two connections come from the same application.
Of course, two applications still means twice as many requests, so performance may be affected. ACIDity is not affected and you are unlikely to run out of TCP ports either.
The performance of two apps accessing the same database is the same as the performance of one app twice as popular.
There are no drawbacks, as long as there are enough connections for both apps and your transactions are well written its normal to share the same datasource
Perhaps you should consider if there are any advantages?
In my opinion there is have to be compelling reasons to not create separate datasources for separate applications - I sense that this is not the case for you.
In any case, some of the drawbacks you mention may happen when you share a connection pool between applications or when you don't since they are properties of the database not the connections.
Edit: Summarizing the good discussion with Jan Dvorak below, here are some arguments for using one (or more) datasources per application:
It enables one to discard eg. stale or appserver-culled connections (-threads) from one application without affecting others
Having multiple DS'es per application enables one to reset admin connections without affecting production connections.
Cheers,
The main issue would be with pooling. As more applications use the datasource, more connections are open so the default pool may not be enough.
Apart from that, all the other issues are related to the DB engine and your DB design than to the datasource. A DB engine won't stop being ACID compliant no matter how it is accessed, won't be slower than if the accesses were from different datasources, and so on...
I'm doing a school software project with my class mates in Java.
We store the info on a remote db.
When we start the application we pull all the information from the database and transform it into objects to use in our application (using java sql statemens).
In the application we edit some of these objects and then when we exit the application
we save or update information in the database using Hibernate.
As you see we dont use Hibernate for pulling in information, we use it just for saving and updating.
We have 2, but very similar problems.
The loading of object(when we start the app) and the saving of objects(with Hibernate) in the db(when closing the app) is taking too much time.
And our project its not a huge enterprise application, its a quite small app, we just manage some students, teachers, homeworks and tests. So our db is also very very small.
How could we increase performance ?
later edit: if we use a local database it runs very quick, it just runs slow on remote databases
Are you saying you are loading the entire database into memory and then manipulating it? If that is the case, why don't you instead simply use the database as a storage device, and do lookups and manipulation as necessary (using Hibernate if you like, or something else if you don't)? The key there is to make sure that you are using connection pooling, as that will reduce the connection time.
If this is what you are doing, then you could be running into memory issues as well - first, by not caching the entire database in memory, you will reduce memory and will spread out the network load from the beginning/end to the times when it needs to happen.
These 2 sentences are red flags for me :
When we start the application we pull
all the information from the database
and transform it into objects to use
in our application (using java sql
statemens). In the application we edit
some of these objects and then when we
exit the application we save or update
information in the database using
Hibernate.
Is there a requirements reason that you are loading all the information from the database into memory at startup, or why you're waiting until shutdown to save changes back in the database?
If not, I'd suggest a design change. If you've already got Hibernate mappings for the tables in the DB, I'd use Hibernate for both all of your CRUD (create, read, update, delete) operations. And, I'd only load the data that each page in your app needs, as it needs it.
If you can't make that kind of design change at this point, I think you've got to look closely at how you're managing the database connections. Are you using connection pools? Are you opening up multiple connections? Forgetting to release them?
Something else to look at. How are you using Hibernate to save the entities to the db? Are you doing a getHibernateTemplate().get on each one and then doing an entity.save or entity.update on each one? If so, that means you are also causing Hibernate to run a select query for each database object before it does a save or update. So, essentially, you'd be loading each database object twice (once at the beginning of the program, once before saving). To see if that's what's happening, you can turn on the show_sql property or use P6Spy to see exactly what queries Hibernate is running.
For what you are doing, you may very well be better off serializing your objects and writing them out to a flat file.
But, much more likely, you should just read / update objects directly from your database as needed instead of all at once, for all the reasons aperkins gives.
Also, consider what happens if your application crashes? If all of your updates are saved only in memory until the application is closed, everything would be lost if the app closes unexpectedly.
The difference in loading everything from a remote DB server versus loading everything from a local DB server is the network latency / pipe size. The network is a much smaller pipe than anything else. Two questions: first, how much data are we really talking about? Second, what is your network speed? 10/100/1000? Figure between 10 and 20% of your pipe size is going to be overhead due to everything from networking protocols to the actual queries themselves.
As others have stated, the way you've architected is usually high on the list of "don't do". When starting, pull only enough data to initialize the app. As the user works through it, pull what you need for that task.
The ONLY time you pull everything is when they are working in a disconnected state. In that case, you still don't load everything as objects in the application, you just work from a local data store which gets sync'ed with the remote server every so often.
The project its pretty much complete. we cant do large refactoring on it now.
I tried to use a second level cache for Hibernate when saving. EhCacheProvider.
in hibernate.xml:
net.sf.ehcache.hibernate.EhCacheProvider
i have done a config for the cache, ehcache.xml:
i have put the cache.jar in the project build path
and i have set the hibernate property for every class and set in the mapping.
But this cache doesn't seem to have an effect. I dont know if it works(if it is used).
Try minimising number of SQL queries, since every query has its own overhead.
You can enable database compression, which should speed things up when there is a lot of data.
Maybe you are connecting to the database many times?
Check the ping time of remote database server - it might be the problem.
As your application is just slow when running on a remote database server, I'd assume that the performance loss is due to:
Connecting to the server: try to reuse connections (pass the instance around) or use connection pooling
Query round-trip time: use as few queries as possible, see here in case of a hand-written DAL:
Preferred way of retrieving row with multiple relating rows
For hibernate you may use its batch functionality and adjust hibernate.batch_size.
In all cases, especially when you can't refactor larger parts of the codebase, use a profiler (method time or sql queries) to find the bottleneck. I bet you'll find thousands of queries, each taking 10ms RTT) which could be merged into one.
Some other things you can look into:
You can allocate more memory to the JVM
Use the jconsole tool to investigate what the bottlenecks are.
Why dont you have two separate threads?
Thread 1 will load your objects one by one.
Thread 2 will process objects as they are loaded.
Your app will seem more interactive at startup.
It never hurts to review the basics:
Improving speed means reducing time (obviously), and to do that, you find activities that take significant time but can be eliminated or replaced with something that uses less time. What I mean by activity is almost always a function call, method call, or property call, performed on a specific line of code for a specific purpose. If may invoke I/O or it may invoke computation, or both. If its purpose is not essential, then it can be optimized.
Many people use profilers to try to find these time-wasting lines of code, but most profilers miss the target because they look at functions, not lines, they go to sleep during I/O, and they worry about "self time".
Many more people try to guess what could be the problem, or they ask others to guess, such as by asking on SO. Such guesses, in the nature of guesses, are sometimes right - more often not, but people still invest time and resources in them.
There's a very simple way to find out for sure, without guessing, what could fruitfully be optimized, and here is one way to do it in Java.
Thanks for your answers. Their were more than helpful.
We completely solved this problem like so:
Refactored the LOAD code. Now it uses Hibernate with Lazy Fetching.
Refactored the SAVE code. Now it saves, just the data that was modified and right after the time it was modified. This way we dont have a HUGE save an the end.
Im amazed of how good it all went. The amount of new code we had to write was very very small.
I am currently in need of a high performance java storage mechanism.
This means:
1) I have 10,000+ objects with 1 - Many Relationship.
2) The objects are updated every 5 seconds, with the most recent updates persistent in the case of system failure.
3) The objects need to be queryable in a reasonable time (1-5 seconds). (IE: Give me all of the objects with this timestamp or give me all of the objects within these location boundaries).
4) The objects need to be available across various Glassfish installs.
Currently:
I have been using JMS to distribute the objects, Hibernate as an ORM, and HSQLDB to provide the needed recoverablity.
I am not exactly happy with the performance. Especially the JMS part of this.
After doing some Stack Overflow research, I am wondering if this would be a better solution. Keep in mind that I have no experience with what Terracotta gives me.
I would use Terracotta to distribute objects around the system, and something else need to give the ability to "query" for attributes of those objects.
Does this sound reasonable? Would it meet these performance constraints? What other solutions should I consider?
I know it's not what you asked, but, you may want to start by switching from HSQLDB to H2. H2 is a relatively new, pure Java DB. It is written by the same guy who wrote HSQLDB and he claims the performance is much better. I'm using it for some time now and I'm very happy with it. It should be a very quick transition (add a Jar, change the connection string, create the database) so it's worth a shot.
In general, I believe in trying to get the most of what I have before rewriting the application in a different architecture. Try profiling it to identify the bottleneck first.
At first, Lucene isn't your friend here. (read only)
Terracotta is to scale around at the Logical layer! Your problem seems not to be related to the processing logic. It's more around the Storage/Communication point.
Identify your bottleneck! Benchmark the Storage/Logic/JMS processing time and overhead!
Kill JMS issues with a good JMS framework (eg. ActiveMQ) and a good/tuned configuration.
Maybe a distributed key=>value store is your friend. Try Project Voldemort!
If you like to stay at Hibernate and HSQL, check out the Hibernate 2nd level cache and connection pooling (c3po, container driven...)!
Several Terracotta users have built systems like this in the past, so I can you tell you by proof of existence that it can be done. :)
Compass does have support for clustering with Terracotta so that might help you. I suspect you might get further faster by just being careful with how you create your clustered data structures.
Regarding your requirements and Terracotta:
1) 10k objects is quite small from a Terracotta perspective
2) 5 sec update rate doesn't seem like an issue. Might depend how many nodes there are and whether there is any natural partitioning you can take advantage of. All updates will be persistent.
3) 1-5 second query time seems quite easy. Building your own well-organized data structures for lookup is the tricky part. Obviously you want to avoid scanning all the data.
4) Terracotta currently supports Glassfish v1 and v2.
If you post on the Terracotta forums, you could probably get more Terracotta eyeballs on the problem.
I am currently working on writing the client for a very (very) fast Key/Value distributed hash DB that provides set + list semantics. The DB is C99 and requires GCC and right now I'm battling with good old Java network IO to break my current 30,000 get/sets per/sec barrier. Hope to be done within the week. Drop me a line through my account and I'll get back when its show time.
With such a high update rate, Lucene is almost definitely not what you're looking for, since there is no way to update a document once it's indexed. You'd have to keep all the object versions in the index and select the one with the latest time stamp, which will kill your performance.
I'm no DB expert, but I think you should look into any one of the distributed DB solutions that's been on the news lately. (CouchDB, Cassandra)
Maybe you should take a look to: Prevayler.
Your objects are always in mem.
The "changes" to your objects are persisted.
From time to time you are able to take a snapshot: every object is persisted.
You don't say what vendor you are using for JMS, but I wouldn't surprise me if you have some bottle neck there. I couldn't get more than 100 messages a second from ActiveMq, and whatever I tried in terms of configuration of acknowledgment, queue size, etc we were unable to soak the CPU beyond a few percent.
The solution was to batch many queries into one JMS message. We had a simple class that either sent a batch of messages when it got to 200 queries or reached a timeout (we used 20ms), which gave us a dramatic increase in message throughput.
Guaranteed messaging is going to be much slower than volatile messaging. Given every object is updated every few second, you might consider batching your updates (into say 500 changes or by time say 1-10 ms' worth), sending over volatile messaging, and batching your transactions. In this case you are more likely to be limited by bandwidth. Tuning your use case you may find smaller batch sizes also work efficiently. If bandwidth is critical (say you have a 10 MB connection or slower, then you could use compression over JMS)
You can achieve much higher performance with a custom solution (which also might be simpler) e.g. Hazelcast & JGroups are free (you can add a node(s) which does the database synchronization so your main app doesn't slow down). There are commercial products which handle in the order of half a million durable messages/sec.
Terracotta + jofti = queryable persistent clustered data structures
Search google for terracotta querymap or visit tusharkhairnar.blogspot.com for querymap blog
You may want to integrate timasync as well to update your database. Database is is your system of record use terracotta as caching and database offloading mechanism you can even batch async updates to make it faster so that I'd db contains fairly recent data
Tushar
tusharkhairnar.blogspot.com