I am new to MySql database. I've large table(ID,...). I select ID frequently with java code and.And that make a heavy load on transaction
select from tableName where ID=someID
notes:
1.Database could be 100,000 records
2.I can't cache result
3.ID is a primary key
4.I try to optimize time needed to return result from query.
Any ideas for optimization ?
thanks in advance
I fail to see the need to optimize. This is a simple query against a very tiny table in database terms and the item inthe where clause is a PK and thus indexed. This should run very fast.
Have you considere partitioning? Improving Database Performance with Partitioning.
If you change the query to use a parameter, it might be a bit more efficient. The server would not have to parse and semantic check the statement each time.
select * from tableName where ID = #someID
Then assign the parameter value for each execution. Here is an explanation of using prepared statements.
Related
I'm currently using the following query to insert into a table only if the record does not already exist, presumably this leads to a table scan. It inserts 28000 records in 10 minutes:
INSERT INTO tblExample(column)
(SELECT ? FROM tblExample WHERE column=? HAVING COUNT(*)=0)
If I change the query to the following, I can insert 98000 records in 10 minutes:
INSERT INTO tblExample(column) VALUES (?)
But it will not be checking whether the record already exists.
Could anyone suggest another way of querying such that my insert speed is faster?
One simple solution (but not recommended) could be to simply have insert statement, catch duplicate key exception and log them. Assuming that the table has unique key constraint.
Make sure that you have an index on the column[s] you're checking. In general, have a look at the query execution plan that the database is using - this should tell you where the time is going, and so what to do about it.
For Derby db this is how you get a plan and how to read it.
Derby also has a merge command, which can act as insert-if-not-there. I've not used it myself, so you'd need to test it to see if it's faster for your circumstances.
I am trying delete records in bulk from database table Student.
Everything is running fine but my question is :
Is there any limitation when my list (refer to query: studentIdList) size is greater than 1000000 in given piece of code? Do I need to do anything extra in such situation?
String hql = "delete from Student where id in (:studentIdList)";
session.createQuery(hql).setParameterList("studentIdList",studentIdList).executeUpdate();
session.flush();
There are few things to be considered.
1.) how cache will behave if configured.
2.) For 1000000 records, definitely load test needs to be done. Are there any changes of OOM error.
You can try with above HQL and Hibernate Batch and then measure and come up with the statistics. Blindly telling the stats is impossible.
Also instead of using IN operator, how about the performance of using == in loop.
WHERE id = 1;
Is transformed to a simple equality filter.
WHERE id IN (1);
Is transformed into an array match of:
WHERE id = ANY(ARRAY[1]);
I will suggest you to please try, it will be an interesting exercise for you.
After reading more and experimenting, Batch will internally make a final query with IN operator only.SO using batch will add extra process in between, my thought using HQL with IN for delete operation will be good to go.
I want the DBMS to help me gain speed when doing a lot of inserts.
Today I do an INSERT Query in Java and catch the exception if the data already is in the database.
The exception I get is :
SQLite Exception : [19] DB[1] exec() columns recorddate, recordtime are not unique.
If I get an exception I do a SELECT Query with the primary keys (recorddate, recordtime) and compare the result with the data I am trying to insert in Java. If it is the same I continue with next insert, otherwise I evaluate the data and decide what to save and maybe do an UPDATE.
This process takes time and I would like to speed it up.
I have thought of INSERT IF NOT EXIST but this just ignore the insert if there is any data with the same primary keys, am I right? And I want to make sure it is exactly the same data before I ignore the insert.
I would appreciate any suggestions for how to make this faster.
I'm using Java to handle large amount of data to insert into a SQLite database (SQLite v. 3.7.10). As the connection between Java and SQLite I am using sqlite4java (http://code.google.com/p/sqlite4java/)
I do not think letting the dbms handling more of that logic would be faster, at least not with plain SQL, as far as I can think of there is no "create or update" there.
When handling lots of entries often latency is an important issue, especially with dbs accessed via network, so at least in that case you want want to use mass operations whereever possible. Even if provided, "create or update" instead of select and update or insert (if even) would only half the latency.
I realize that is not what you asked for, but I would try to optimize in a different way, processing chunks of data, select all of them into a map then partition the input in creates, updates and ignores. That way ignores are almost for free, and further lookups are guaranteed to be done in memory. Unlikely that the dbms can be significantly faster.
If unsure if that is the right approach for you, profiling of overhead times should help.
Wrap all of your inserts and updates into a transaction. In SQL this will be written as follows.
BEGIN;
INSERT OR REPLACE INTO Table(Col1,Col2) VALUES(Val1,Val2);
COMMIT;
There are two things to note here: the database paging and commits will not be written to disk until COMMIT is called, speeding up your queries significantly; the second thing is the INSERT OR REPLACE syntax, which does precisely what you want for UNIQUE or PRIMARY KEY fields.
Most database wrappers have a special syntax for managing transactions. You can certainly execute a query, BEGIN, followed by your inserts and updates, and finish by executing COMMIT. Read the database wrapper documentation.
One more thing you can do is switch to Write-Ahead Logging. Run the following command, only once, on the database.
PRAGMA journal_mode = wal;
Without further information, I would:
BEGIN;
UPDATE table SET othervalues=... WHERE recorddate=... AND recordtime=...;
INSERT OR IGNORE INTO table(recorddate, recordtime, ...) VALUES(...);
COMMIT;
UPDATE will update all existing rows, ignoring non existent because of WHERE clause.
INSERT will then add new rows, ignoring existing because of IGNORE.
I have a existing query in the system which is a simple select query as follows:
SELECT <COLUMN_X>, <COLUMN_Y>, <COLUMN_Z> FROM TABLE <WHATEVER>
Over time, <WHATEVER> is growing in terms of records. Is there any way possible to improve the performance here? The developer is using Statement interface. I believe PreparedStatement won't help here since the query is executed only once.
Is there any thing else that can be done? One of the columns is a primary key and others are VARCHAR (if the information helps)
Does you query have any predicates? Or are you always returning all of the rows from the table?
If you are always returning all the rows, a covering index on column_x, column_y, column_z would allow Oracle to merely scan the index rather than doing a table scan. The query will still slow down over time but the index should grow more slowly than the table.
If you are returning a subset of rows, there are potentially other indexes that would be more advantageous from a performance perspective.
Are there any optimization you can do outside of the SQL query tunning? If yes here are some suggestion:
Try putting the table in memory (like the MEMORY storage engine in MySQL) or any other optimization in the DB
Cache the ResultSet in java. query again only when the table content changes. If the table only has inserts and no updates or delete (wishful thinking), then you can use SELECT COUNT(*) FROM table. If the rows returned are different than the previous time then fire your original query and update cache only if needed.
I have the following configuration:
SQL Server 2008
Java as backend technology - Spring + Hibernate
Basically what I want to do is a select with a where clause on a table. The problem is the table has about 700M entries and the query takes a really long time.
Can you please indicate some pointers on where to optimize the query or what sort of techniques are can I use in order to get an improvement in performance?
Thanks.
Using indexes is the standard technique used to deal with this problem. As requested, here are some pointers that should get you started:
http://odetocode.com/articles/70.aspx
http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/
http://www.petri.co.il/introduction-to-sql-server-indexes.htm
The first thing I do in this case is isolate whether it is the amount of data I am returning that is the problem or not (an i/o issue). A simple non-scientific way to do this is change your query to just return the count:
select count(*) --just return a count, no data!
from MyTable
inner join MyOtherTable on ...
where ...
If this runs very quickly, it tells you your indexes are in order (assuming no sub-selects in your WHERE clause). If not, then you need to work on indexes, the WHERE clause, or your query construction itself (JOINs being done, etc).
Once that is satisfactory, add back in your SELECT clause. If it is slow, you are going to have to look at your data access pattern:
Can you return fewer columns?
Can you return fewer rows at once?
Is there caching you can do in the application layer?
Is this query a candidate for partitioned/materialized views (if your database supports those)?
I would run Profiler to find the exact query that is being generated. ORMs can create less than optimal queries. Once you know the query, you can run it in SSMS and see the execution plan. This will give you clues as to where you have performance problems.
Several things that can cause performance problems:
Lack of correct indexing (Foreign keys should be indexed if you have
joins as well as the criteria in the where clause)
Lack of sargability in the where clause forcing the query to not use
existing indexes
Returning more columns than are needed
Correlated subqueries and scalar functions that cause
row-by-agonzing-row operations
Returning too much data (will anybody really be looking at 1 million
records returned? You only want to return the amount you show on page
not the whole possible recordset)
Locking and blocking
There's more (After all whole very long books are written o nthis subject) but that should be enough to get you started at where to look.
You should provide some indexes for those column you often use to restrict the result. Other thing is the pagination of the result set.
Regardless of the specific DB, I would do the following:
run an explain analyze
make sure you have an index for the columns that are part of your where clause
If indexes are ok, it's very likely that you are fetching a lot of
records from disk, which is very slow: if you really cannot refine
your query so that you fetch fewer records, consider clustering your
table, to improve disk locality of your records.