Oracle: Result set in insertion order - java

In the Oracle database, the select statement, select * from tablename, does not give output in the order of insertion. In few articles, we have found that the Oracle database stores the row information based on Rowid.
We are using Oracle in a web application based on Java and there is a requirement to display the data in order of insertion in each module. So, applying an order by clause on each table is not feasible and can degrade the application's performance.
Is there any other way that the select statement returns data in insertion order?
Oracle version used is "Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production Version 19.3.0.0.0"

Oracle is a relational database. In it, rows don't have any particular order which means that select statement might return result in different order when you run it several times. Usually it doesn't, but - if there are a lot of inserts/deletes - sooner or later you'll notice such a behavior. Therefore, the only certain way to return rows in desired order is to use - ta-daaa! - order by clause.
Also, you'll have to maintain your own order of insertion. A simple way to do that is to use a column whose source is a sequence.

I would check here first: previous-post
I would recommend not relying on any ordering unless you specify order by.
But what is the deficiency of adding something like ORDER BY ROWNUM ASC; to your queries? You can trim your result sets ( paginate ) or even only do it to the entities you want to 'maintain insertion order'.
Are you using anything for entity management? Hibernate has some defaults you could use as well. - Post some code examples and can provide additional help.

Related

Sorting funcationality Optimization using MySQL and Java

I am going to generate simple CSV file report in Java using Hibernate and MySQL.
I am using Native SQL (because query is too complex which is not possible with HQL or Criteria query and also this doesn't matter here) part of Hibernate to fetch the data and simply writing it using any of CSVWriter api (this doesn't matter here.)
As far all is well, but the problem starts now.
Requirements:
The report size can be with 5000K to 15000K records with 25 fields.
It can be run on real time.
There is one report column (let's say finalValue) for which I want sorting and it can be extract like this, (sum(b.quantity*c.unit_gross_price) - COALESCE(sum(pai.value),0)).
Problem:
MySQL Indexing can not be used for finalValue column (mentioned above) as it is complex combination of aggregate functions. So if execute the query (with or without limit) with sorting, it is taking 40sec, otherwise 0.075sec.
The Solutions:
These are the some solutions, that I can think but each have some limitations.
Sorting using java.util.TreeSet : It will throw the OutOfMemoryError, which is obvious as heap space will be exceed if I will put 15000K heavy objects.
Using limit in MySQL query and write file for each iteration : It will take much time as every query will take same time around 50sec as without sorting limit can't be use.
So the main problem here is to overcome two parameters : Memory and Time. I need to balance both of them.
Any ideas, suggestions?
NOTE: I am not given here any snaps of code that doesn't mean question details is not enough. Code doe's not require here.
I think you can use a streaming ResultSet here. As documeted on this page under the ResultSet section.
Here are the main points from the documentation.
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate and, due to the design of the MySQL network protocol, is easier to implement. If you are working with ResultSets that have a large number of rows or large values and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this, any result sets created with the statement will be retrieved row-by-row.
There are some caveats with this approach. You must read all of the rows in the result set (or close it) before you can issue any other queries on the connection, or an exception will be thrown.
The earliest the locks these statements hold can be released (whether they be MyISAM table-level locks or row-level locks in some other storage engine such as InnoDB) is when the statement completes.
If using streaming results, process them as quickly as possible if you want to maintain concurrent access to the tables referenced by the statement producing the result set.
So, with a streaming result-set, write your order by query, and then start writing the results into your CSV file.
This still probably doesn't solve the sorting issue, but I think if you can't pre-generate that value and put an index on it, the sorting is going to take some time.
However, there might be some server config variables that you can use to optimize the sorting performance.
From the MySQL Order-By optimization page
I think you can set the read_rnd_buffer_size value, which, according to the docs, can:
Setting the variable to a large value can improve ORDER BY performance by a lot
Another one is sort_buffer_size, for which, the docs say the follwing:
If you see many Sort_merge_passes per second in SHOW GLOBAL STATUS output, you can consider increasing the sort_buffer_size value to speed up ORDER BY or GROUP BY operations that cannot be improved with query optimization or improved indexing.
Another variable that can probably help is the innodb_buffer_pool_size. Which allows innodb to keep as much table data in memory as possible and avoid some disk-seeks.
However, all of these variables require some tuning. Some trial-and-error and probably some kind of benchmarking to get right.
There are some other suggestions on that MySQL Order-By optimization page as well.
Use a temporary table to store your select result with an index on finalValue. This will store and index your intermediate result.
CREATE TEMPORARY TABLE my_temp_table (INDEX my_index_name (finalValue))
SELECT ... -- your select
Note that complex expressions will require an alias in your SELECT to be used as a part of a CREATE TABLE SELECT. I assume that your SELECT has the alias finalValue (the column you mentioned).
Then select the temporary table ordered by the finalValue (the index will be used).
SELECT * FROM my_temp_table ORDER BY finalValue;
And finally drop the temporary table (or reuse it if you want, but remember that when client session terminates temporary data is automatically deleted).
Summary tables. (Let's see more details to be sure this is Data Warehouse type data.) Summary tables are augmented periodically with subtotals and counts. Then when the report is needed, the data is readily available almost directly from the summary table, rather than scanning lots of raw data and doing aggregates.
My blog on Summary Tables. Let's see your schema and report query; we can discuss this in more detail.

Documentum: ORDER_BY r_modify_date takes too long

I'm new to Documentum and have a simple problem, I am trying to retrieve all the record according to last modified.
Basically I have a datatable with 1000 records.
current we use
Select * from docfolder enabled (FETCH_ALL_RESULTS 1000)
The problem with the above statement is sometimes a newly created report or modified report will out of the 1000 range and our users will complain report not found * valid complain *
actually the last modified record does not even need to be the first on the list, it just need to appear.
I tried using
Select * from docfolder order by r_modify_date enabled (FETCH_ALL_RESULTS 1000)
but this takes too long(never complete). I try replacing * with a,b,c,d (fields) but it does not work too.
May I know if there is other solutions to my issue?
I am considering documentum "ENABLE (RETURN_TOP 10)" hint but I doubt it work for Oracle 11g and how does documentum define top 1000?
UPDATE: It seems that using data link via toad is faster than using DQL, but I need a DQL solution due to legacy issues.
Documentum 6.0 and Oracle 11g.
What version of Documentum are you using?
Ensure that there are indexes on the r_object_id. You may also want to add an index to the r_modify_date.
Further, when adding fields a,b,c,d - ensure that these fields are "non-repeating". In this way, Documentum will not need to join the _r table making the overall query faster.
Further, in DA, if you do the query, you can actually see the SQL query passed to Oracle. Take this query and run it in Toad and look for optimizations. You may also register the _s table so that your can DQL query the _s table directly.
I manage to solve this problem by querying the under lining table in oracle database.
The reason for slow performance was because of the table begin joint behind to obtain the result.
In future if you have exhausted all ways to optimize your DQL, just fall back to querying the oracle database.
I have recommended for all table view and search to query via oracle.
Only individual report are retrieved via documentum, sometimes I question the purpose of having documentum.

Hibernate Enforce use of Index

I have a table that has a well defined index, what I understand from
org.hiber....table(appliesTo="tableName", indexes={#Index(name=" ",columnNames=" "})})
is that it creates an index, now will doing this and mentioning the column names used in the actual oracle DB index give me the optimal results or is the Index never used ? How do i use the index explicitly in HQL ? Also how do i ascertain the index is being used ?
It depends on what DBMS you are using. For example, in Oracle, you cannot control which index to use through the SQL if you are using cost-based optimizer. So it is even less likely you can do anything related to this through HQL.
To assure if indices are used, it is also depends on the DBMS. Normally I will get the actual SQL issued to DBMS, by dumping it in log through Hibernate or other JDBC logging tools (e.g. JdbcDsLog), and view the execution plan of the SQL

JDBC setMaxRows database usage

I am trying to write a database independant application with JDBC. I now need a way to fetch the top N entries out of some table. I saw there is a setMaxRows method in JDBC, but I don't feel comfortable using it, because I am scared the database will push out all results, and only the JDBC driver will reduce the result. If I need the top 5 results in a table with a billion rows this will break my neck (the table has an usable index).
Writing special SQL-statements for every kind of database isn't very nice, but will let the database do clever query planning and stop fetching more results than necessary.
Can I rely on setMaxRows to tell the database to not work to much?
I guess in the worst case I can't rely on this working in the hoped way. I'm mostly interested in Postgres 9.1 and Oracle 11.2, so if someone has experience with these databases, please step forward.
will let the database do clever query planning and stop fetching more
results than necessary.
If you use
PostgreSQL:
SELECT * FROM tbl ORDER BY col1 LIMIT 10; -- slow without index
Or:
SELECT * FROM tbl LIMIT 10; -- fast even without index
Oracle:
SELECT *
FROM (SELECT * FROM tbl ORDER BY col1 DESC)
WHERE ROWNUM < 10;
.. then only 10 rows will be returned. But if you sort your rows before picking top 10, all basically qualifying rows will be read before they can be sorted.
Matching indexes can prevent this overhead!
If you are unsure, what JDBC actually send to the database server, run a test and have the database engine log the statements received. In PostgreSQL you can set in postgresql.conf:
log_statement = all
(and reload) to log all statements sent to the server. Be sure to reset that setting after the test or your log files may grow huge.
The thing which could/may kill you with billion(s) of rows is the (highly likely) ORDER BY clause in your query. If this order cannot be established using an index then . . . it'll break your neck :)
I would not depend on the jdbc driver here. As a previous comment suggests it's unclear what it really does (looking at different rdbms).
If you are concerned regarding speed of your query you can use a LIMIT clause as well. If you use LIMIT you can at least be sure that it's passed on to the DB server.
Edit: Sorry, I was not aware that Oracle doesn't support LIMIT.
In direct answer to your question regarding PostgreSQL 9.1: Yes, the JDBC driver will tell the server to stop generating rows beyond what you set.
As others have pointed out, depending on indexes and the plan chosen, the server might scan a very large number of rows to find the five you want. Proper server configuration can help accurately model the costs to prevent this, but if value distribution is unusual you may need to introduce and optimization barrier (like with a CTE) to coerce the planner to produce a good plan.

Migrating to Oracle

We are using a Java EE application and we are right now using Informix DB.Our code hits the DB with queries like
"select first 10 * from test"
Now as far as I know Oracle does not support 'first 10 *' kind of statements.We have more than 1000 queries like this.Should we manually change this or can have some manual customization?
This is a good reason for either only using standard SQL as much as possible, or for isolating those dependencies into stored procedures (yes, I know that doesn't help you in this specific case, I just thought I'd mention it for future reference).
I suspect you'll have to change each one individually although a simple search over you source code for "select " or "first " will be a good start.
Then you can decide how you want to change them, since you may also still want it to work on Informix.
For what it's worth, I think you get the same effect with Oracle's
select * from ( select * from mytable ) where rownum <= 10
I would be farming the job of dynamically constructing a query (based on a template) out to another layer which can return a different query based on which database you have configured. Then, when you also want to support DB2 (for example), it's a simple matter of changing just that layer.
For example, have a call like:
gimmeRowLimitedSqlQuery ("* from test",10);
which would give you either of:
select first 10 * from test
select * from test where rownum <= 10
I should also mention, although I realise your query is just an example, that SQL can return rows in any order if you don't specify order by so
select first 10 * from test
makes little sense, especially if you may be running it in different DBMS'.
You could write an extension to the JDBC driver to modify the queries on the fly but probably that is an overkill so a careful search and replace on the source code to modify all queries would be more appropriate.
Oracle has the concept of ROWNUM for limiting results. You will have to update your queries for this.
TOP-n and Pagination queries are a little bit more complex than just using ROWNUM. For example, you might be surprised that you don't get the expected results when using ROWNUM with ORDER BY in the same query.
Check http://www.oracle.com/technology/oramag/oracle/07-jan/o17asktom.html for more info on those type of queries in Oracle.

Categories

Resources