I would like to ask the experts of what is the recommendation for fetching 3000-5000 records from oracle 11g database from Java application (using JDBC). Our standard is to always invoke a stored procedure.
I did some research and found that ref cursor makes multiple round trips to the database based on the JDBC fetch count property. (can somebody throw more light on this of the end to end flow of how data is stored in memory in oracle and JVM when processing ref cursors)
I was thinking collections are more efficient because the data is sent in one shot to the caller (Java) from oracle db (use bulk collect). With this approach we can avoid multiple network calls from Java to Oracle servers. is this a true assumption?
Appreciate your help!
This is a much bigger topic than anyone is willing to commit to in a posting. Here's a link that discusses how Oracle manages read consistency. That entire page is probably a good read to get some of idea of what's going on in the server. There's also an article here that discusses what happens using collections. How would you return the collection to a JDBC Client (not something I've ever tried)?
Essentially, there's a lot involved in performance, from how your database is configured to how your network is tuned to disk performance, to client performance, etc.
The short answer is you need to try things. Retrieving 3-5k records isn't a lot, and it depends on how big the record is, that your bringing back across the network. If they are 20 byte records, and your network (MTU?) size is 4k blocks, you can fit about 200 records in a block. At some point, you run into the law of diminishing returns.
I use stored procedures as a matter of habit, but you don't need to. It would depend on the complexity of the query (number of tables and the type of joins) and the ability for someone like a DBA to be able to go in and see what the query is doing.
Worrying about network trips is a little less critical, because there's only so much data you can stuff in a packet. There's going to be a number of network trips no matter what you use, it really depends on your use case to determine how critical it is to get that to a bare minimum.
Related
I am currently using raw JDBC to query records in a MySql database; each record in the subsequent Resultset is ultimately extracted, placed in a domain specific model, and stored to a List Instance.
My query is: in circumstances where there is a requirement to further filter that data (incidentally based on columns that exist in the SAME Table) which of the following approaches would generally be considered best practice:
1.The issuance of further WHERE clause calls into the database. This will effectively offload the filtering process to the database but obviously results in an additional query or queries where multiple filters are applied consecutively.
2.Explicitly filtering the aforementioned preprocessed List at the Application level, thus negating the need to have to make additional calls into the database each time the records are filtered.
3.Some hybrid combination of the above two approaches, perhaps where all filtering operations are initially undertaken by the database server but THEN preprocessed to a application specific model and implicitly cached to a collection for some finite amount of time. Further filter queries, received within this interval, would then be serviced from the data stored in the cache.
It is important to note that the Database Server in this scenario is actually located on
an external machine, therefore the overhead and latency of sending query traffic over the local network also has to be factored into the approach we ultimately elect to take.
I am patently aware of the age-old mantra that stipulates that: "The database server should be used to do what its good at." however in this scenario it just seems like a less than adequate solution to be making numerous calls into the database to filter data that I ALREADY HAVE at the application level.
Your thoughts and insights would be greatly appreciated.
I have used the hybrid approach on many applications with good results.
Database filtering works good especially for columns that are indexed. This reduces network overhead since fewer rows are sent to application.
Database filtering can be really slow for some columns depending upon the quantity of rows in the results and the lack of indexes. The network overhead can be negligible compared to database query time so application filtering may be faster for this situation.
I also find that application filtering in Java easier to write and understand instead of complex SQL.
I usually experiment manually to get the fewest rows in a reasonable time with plain SQL. Then write Java to refine to the desired rows.
i appreciate this question first...as i too faced similar situation few days back...as you already discussed all available options i prefer to go with the second option....i mean handling at application level rather than filtering at DB level.
I have a RMI application,
Basically every request from the client, created a new connection (on the server side) to the database, r n an SQL query and turned the data to a serializable class that was sent back to the client.
The user base of the app grew,and the request took a very long time to complete. the solution previous programmers came up with is to create a fixed size connection pool from the server to the DB, and every client's request used the oldest(the one used least recently) to run the SQL query.
My question is: what is the correct way to solve such a problem?
I would say, pooling the DB connections is already an important step, since to establish a connection is expensive. Instead of implementing my own pool I would however use an existing and proven pooled data-source implementation such as DBCP or C3P0. They have many useful features such as varying size, automatic connection check, etc...
If it is the query itself that takes up too long, the optimization would be more complex than just that. Various approaches are possible and depend on the details of the situation, for example :
Is there only one SQL query, always the same, as your question seems to imply ?
Is the database read-only ?
If not, are the modifications made within the same application or externally ?
etc...
Possible approaches (I can think of right now) to reduce the request time :
Caching of the result in the java app (but this is a vast subject...)
Optimization of the SQL request
Optimization of the DB schema, with indexes or deeper refactoring of the tables structure
Reduce the amount of data sent back to the client to just the bare minimum (In case the network is the bottleneck)
I hope this helps. We would really need more details on the use case to give you a better answer.
For a thick-client project I'm working on, I have to remotely connect to a database (IBM i-series) and perfom a number of SQL related tasks:
Download/Update a set of local/offline 'control' data - this data may have changed between runs unnoticed.
On command, download data from multiple (15-20) tables and store separately into a single Java object. The names of the tables are known, but the schema name changes between runs and can change inter-run (as far as I know, PreparedStatements do not allow one to dynamically insert the schema).
I had considered using joins/unions/etc to perform all of these queries as one, but the project requires me to have in-memory separations between table data (instead of one big joined lump).
Perform between 2 and 100+ repetitions of (2)
The last factor is that this needs to be run on high-latency (potentially dial-up) network connections using Java 1.5 on the oldest computers possible.
Currently I run 15-20 dynamically constructed PreparedStatements but I know this to be rather inefficient (I measured, so as to avoid premature optimization ala Knuth).
What would be the most efficient and error-tolerant method of performing these tasks?
My thoughts:
Regarding (1), I really have no idea other than checking the entire table against the new table, at which point I feel I might as well just download the new (potentially and likely unchanged) table and replace the old one, but this takes more time.
For (2): Ideally I'd be able to construct something similar to an array of SELECT statements, send them all at once, and have the database return one ResultSet per internal query. From what I understand, however, neither Statement nor PreparedStatement support returning multiple ResultSet objects.
Lastly, the best way I can think of doing (3) is to batch a number of (2) operations.
There is nothing special about having moving requirements, but the single most important thing to use when talking to most databases is having a connection pool in your Java application and use it properly.
This also applies here. The IBM i DB2/400 database is quite fast, and the database driver available in the jt400 project (type 4, no native code) is quite good, so you can pull over quite a bit of data in a short while simply by generating SQL on the fly.
Note that if you only have a single schema you can tell in the conneciton which one you need, and can then use non-qualified table names in your SQL statements. Read the JDBC properties in the InfoCenter very carefully - it is a bit tricky to get right. If you need multiple schemaes, the "naming=system" allows for library lists - i.e. a list of schemaes to look for the tables, which can be very useful when done correctly. The IBM i folks can help you here.
That said, if the connection is the limiting factor, you might have a very strong case for running the "create object from tables" Java code directly on the IBM i. You should already now prepare for being able to measure the traffic to the database - either with network monitoring tooling, using p6spy or simply going through a proxy (perhaps even a throtteling one)
Ideally, you would have the database group provide you with a set of stored procedures to optimize the access to the database.
Since you don't have access, you may want to ask them if they have timestamp data in the database at the row level to see when records were modified, this way you can select only the data that's changed since some point in time.
What #ThorbjørnRavnAndersen is suggesting is moving the database code on to the IBM host and connecting to it via RMI or JMS from the client. So the server code would be a RMI or JMS Server that accesses the database on your behalf and returns you java objects instead of bringing SQL resultsets across the wire.
I would pass along your requirements to the database team and see if they can't do something for you. I'm sure they don't want all these remote clients bringing all the data down each time, so it would benefit them as much as it would benefit you.
I need one help from you guys regarding JDBC performance optimization. One of our pojo is using jdbc to connect to a oracle database and retrieve the records. Basically the records are email addresses basing upon which emails will be sent to the users. The problem here is the performance. This process happens every weekend and the records are very huge in number, around 100k.
The performance is very slow and it worries us a lot. Only 1000 records seem to be fetched from the database every 1 hour, which means that it will take 100 hours for this process to complete (which is very bad). Please help me on this.
The database server and the java process are in two different remote servers. We have used rs_email.setFetchSize(1000); hoping that it would make any difference but no change at all.
The same query executed on server takes 0.35 seconds to complete. Any quick suggestion would of great help to us.
Thanks,
Aamer.
First look at your queries. Analyze them. See if the SQL could be made more efficient (ie, ask the database for what you want, not for what you don't want -- makes a big difference). Also check to see if there are indexes on any fields in your where and join clauses. Indexes make a big difference. But it can't be just any indexes. They have to be good indexes (ie, that the fields that make up the index provide enough uniqueness for the database to retrieve things appropriately). Work with your DBA on this. Look for either high run time against the db or check for queries with high CPU usage (even if the queries run sub-second). These are the thing that can kill your database.
Also from a code perspective, check to see if you are opening and closing your connections or if you are re-using them. Can make a big difference too.
It would help to post your code, queries, table layouts, and any indexes you have.
Use log4jdbc to get the real sql for fetching single record. Then check speed and plan for that sql. You may need a proper index or even db defragmentation.
Not sure about the Oracle driver, but I do know that the MySQL driver supports two different results retrieval methods: "stream" and "wait until you've got it all".
The streaming method lets you start process the results the moment you've got the first row returned from the query, whereas the other method retrieves the entire resultset before you can start work on it. In cases where you deal with huge recordsets, this often leads to memory exceptions, or slow performance because java hit the "memory roof" and the garbage collector can't throw away "used" records like it can in the streaming mode.
The streaming mode doesn't let you navigate/scroll the resultset the way the "normal"/"wait until you've got it all" mode...
Anyway, not sure if this is of any help but it might be worth checking out.
My answer to your question, in summary is:
1. Check network
2. Check SQL
3. Check Java code.
It sounds very slow. First thing to check would be to see if you have a slow network. You can do this pretty quickly by just pinging the database server. Or run the database server on the same machine as your JVMM. If it is not the network, get an explain plan for your SQL and ensure you are not doing table scans when you don't need to be. If it is not the network or the SQL, then it's time to check your Java code. Are you doing anything like blocking when you shouldn't be?
I need to store about 100 thousands of objects representing users. Those users have a username, age, gender, city and country.
The users should be searchable by a range of age and any of the other attributes, but also a combination of attributes (e.g. women between 30 and 35 from Brussels). The results should be found quickly as it is one of the Server's services for many connected Clients). Users may only be deleted or added, not updated.
I've thought of a fast database with indexed attributes (like h2 db which seems to be pretty fast, and I've seen they have a in-memory mode)
I was wondering if any other option was possible before going for the DB.
Thank you for any ideas !
How much memory does your server have? How much memory would these objects take up? Is it feasible to keep them all in memory, or not? Do you really need the speedup of keeping in memory, vs shoving in a database? It does make it more complex to keep in memory, and it does increase hardware requirements... are you sure you need it?
Because all of what you describe could be ran on a very simple server and put in a very simple database and give you the results you want in the order of 100ms per request. Do you need faster than 100ms response time? Why?
I would use a RDBMS - there are plenty of good ORMs available, such as Hibernate, which allow you to transparently stuff the POJOs into a db. Once you've got the data access abstracted, you then have the freedom to decide how best to persist the data.
For this size of project, I would use the H2 database. It has both embedded and client/server modes, and can operate from disk or entirely in memory.
Most definitely a relational database. With that size you'll want a client-server system, not something embedded like Sqlite. Pick one system depending on further requirements. Indexing is a basic feature, most systems support it. Personally I'd try something that's popular and free such as MySQL or PostgreSQL so you can more easily google your way out of problems. If you make your SQL queries generic enough (no vendor-specific constructs), you can switch systems without much pain. I agree with bwawok, try whether a standard setup is good enough and think of optimizations later.
Did you think to use cache system like EHCache or Memcached?
Also If you have enough memory you can use some sorted collection like TreeMap as index map, or HashMap to search user by name (separate Map per field). It will take more memory but can be effective. Also you can find based on the user query experience the most frequently used query with the best selectivity and create comparator based on this query onli. In this case subset of the element will not be a big and can can be filter fast without any additional optimization.