I have a existing query in the system which is a simple select query as follows:
SELECT <COLUMN_X>, <COLUMN_Y>, <COLUMN_Z> FROM TABLE <WHATEVER>
Over time, <WHATEVER> is growing in terms of records. Is there any way possible to improve the performance here? The developer is using Statement interface. I believe PreparedStatement won't help here since the query is executed only once.
Is there any thing else that can be done? One of the columns is a primary key and others are VARCHAR (if the information helps)
Does you query have any predicates? Or are you always returning all of the rows from the table?
If you are always returning all the rows, a covering index on column_x, column_y, column_z would allow Oracle to merely scan the index rather than doing a table scan. The query will still slow down over time but the index should grow more slowly than the table.
If you are returning a subset of rows, there are potentially other indexes that would be more advantageous from a performance perspective.
Are there any optimization you can do outside of the SQL query tunning? If yes here are some suggestion:
Try putting the table in memory (like the MEMORY storage engine in MySQL) or any other optimization in the DB
Cache the ResultSet in java. query again only when the table content changes. If the table only has inserts and no updates or delete (wishful thinking), then you can use SELECT COUNT(*) FROM table. If the rows returned are different than the previous time then fire your original query and update cache only if needed.
Related
I have a very large table in the database, the table has a column called
"unique_code_string", this table has almost 100,000,000 records.
Every 2 minutes, I will receive 100,000 code string, they are in an array and they are unique to each other. I need to insert them to the large table if they are all "good".
The meaning of "good" is this:
All 100,000 codes in the array never occur in the database large table.
If one or more codes occur in the database large table, the whole array will not use at all,
it means no codes in the array will insert into the large table.
Currently, I use this way:
First I do a loop and check each code in the array to see if there is already same code in the database large table.
Second, if all code is "new", then, I do the real insert.
But this way is very slow, I must finish all thing within 2 minutes.
I am thinking of other ways:
Join the 100,000 code in a SQL "in clause", each code has 32 length, I think no database will accept this 32*100,000 length "in clause".
Use database transaction, I force insert the codes anyway, if error happens, the transaction rollback. This cause some performance issue.
Use database temporary table, I am not good at writing SQL querys, please give me some example if this idea may work.
Now, can any experts give me some advice or some solutions?
I am a non-English speaker, I hope you see the issue I am meeting.
Thank you very much.
Load the 100,000 rows into a table!
Create a unique index on the original table:
create unique index unq_bigtable_uniquecodestring on bigtable (unique_code_string);
Now, you have the tools you need. I think I would go for a transaction, something like this:
insert into bigtable ( . . . )
select . . .
from smalltable;
If any row fails (due to the unique index), then the transaction will fail and nothing is inserted. You can also be explicit:
insert into bigtable ( . . . )
select . . .
from smalltable
where not exists (select 1
from smalltable st join
bigtable bt
on st.unique_code_string = bt.unique_code_string
);
For this version, you should also have an index/unique constraint on smalltable(unique_code_string).
It's hard to find an optimal solution with so little information. Often this depends on the network latency between application and database server and hardware resources.
You can load the 100,000,000 unique_code_string from the database and use HashSet or TreeSet to de-duplicate in-memory before inserting into the database. If your database server is resource constrained or there is considerable network latency this might be faster.
Depending how your receive the 100,000 records delta you could load it into the database e.g. a CSV file can be read using external table. If you can get the data efficiently into a temporary table and database server is not overloaded you can do it very efficiently with SQL or stored procedure.
You should spend some time to understand how real-time the update has to be e.g. how many SQL queries are reading the 100,000,000 row table and can you allow some of these SQL queries to be cancelled or blocked while you update the rows. Often it's a good idea to create a shadow table:
Create new table as copy of the existing 100,000,000 rows table.
Disable the indexes on the new table
Load the delta rows to the new table
Rebuild the indexes on new table
Delete the existing table
Rename the new table to the existing 100,000,000 rows table
The approach here is database specific. It will depend on how your database is defining the indexes e.g. if you have a partitioned table it might be not necessary.
Trying to write a job that executes SQL query in Java using JDBC drivers (the DB vendors can be either Oracle, DB2 or Postgres).
The query does not really matter. Let’s say it filters on certain values in few columns in 1 DB table and the result is few thousand rows.
For each row in the ResultSet I need to do some logic and sometimes that can fail.
I have a cursor position so, I “remember” last successfully processed row position.
Now I want to implement a “Resume” functionality in case of failure in order not to process again the entire ResultSet.
I went to JDBC spec of Java 8 and found nothing about the order of the rows (is it the same for the same query on the same data or not)?
Also failed to find anything in DB vendors specs.
Anyone who could hint where to look for the answer about row order predictability?
You can guarantee the order of rows by including an ORDER BY clause that includes all of the columns required to uniquely identify a row. In fact, that's the only way to guarantee the order from repeated invocations of a SELECT statement, even if nothing has changed in the database. Without an unambiguous ORDER BY clause the database engine is free to return the rows in whatever order is most convenient for it at that particular moment.
Consider a simple example:
You are the only user of the database. The database engine has a row cache in memory that can hold the last 1000 rows retrieved. The database server has just been restarted, so the cache is empty. You SELECT * FROM tablename and the database engine retrieves 2000 rows, the last 1000 of which remain in the cache. Then you do SELECT * FROM tablename again. The database engine checks the cache and finds the 1000 rows from the previous query, so it immediately returns them because in doing so it won't have to hit the disk again. Then it proceeds to go find other 1000 rows. The net result is that the 1000 rows that were returned last for the initial SELECT are actually returned first for the subsequent SELECT.
I want to fetch a particular value from database in java. I used the following command in prepared statement:
Select Pname from table where pid=458;
The table contains around 50,000 rows and taking more time to fetch, please help me to get the data faster.
i used index and then i bind the variable also but it reduce the execution time only few seconds, i need more efficient. Is there any way to retrieve data faster???
Index your database table for pid, it will make the search faster.
Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.
SQL Server
CREATE TABLE MyCustomers (CustID int, CompanyName nvarchar(50));
CREATE UNIQUE INDEX idxCustId ON MyCustomers (CustId);
References
https://msdn.microsoft.com/en-us/library/ms188783.aspx
https://technet.microsoft.com/en-us/library/ms345331(v=sql.110).aspx
Create index on field pid in your table.
Use bind variables in queries.
Use prepared statement instead of statement in Java, that will use bind variables.
pstatement = conn.prepareStatement("Select Pname from table where pid = ?");
This ensures that the SQl is pre compiled and hence runs faster.
However, you are likely to gain more performance improvement by index than bind variables .
I'm currently using the following query to insert into a table only if the record does not already exist, presumably this leads to a table scan. It inserts 28000 records in 10 minutes:
INSERT INTO tblExample(column)
(SELECT ? FROM tblExample WHERE column=? HAVING COUNT(*)=0)
If I change the query to the following, I can insert 98000 records in 10 minutes:
INSERT INTO tblExample(column) VALUES (?)
But it will not be checking whether the record already exists.
Could anyone suggest another way of querying such that my insert speed is faster?
One simple solution (but not recommended) could be to simply have insert statement, catch duplicate key exception and log them. Assuming that the table has unique key constraint.
Make sure that you have an index on the column[s] you're checking. In general, have a look at the query execution plan that the database is using - this should tell you where the time is going, and so what to do about it.
For Derby db this is how you get a plan and how to read it.
Derby also has a merge command, which can act as insert-if-not-there. I've not used it myself, so you'd need to test it to see if it's faster for your circumstances.
I am new to MySql database. I've large table(ID,...). I select ID frequently with java code and.And that make a heavy load on transaction
select from tableName where ID=someID
notes:
1.Database could be 100,000 records
2.I can't cache result
3.ID is a primary key
4.I try to optimize time needed to return result from query.
Any ideas for optimization ?
thanks in advance
I fail to see the need to optimize. This is a simple query against a very tiny table in database terms and the item inthe where clause is a PK and thus indexed. This should run very fast.
Have you considere partitioning? Improving Database Performance with Partitioning.
If you change the query to use a parameter, it might be a bit more efficient. The server would not have to parse and semantic check the statement each time.
select * from tableName where ID = #someID
Then assign the parameter value for each execution. Here is an explanation of using prepared statements.