I came across this scenario where I have to insert some 100 rows into my table using the Java application that I support.
I do not want my application to hit the DB every time with the insert query to do this.
Can you suggest some idea for me to insert all those 100 rows in to that table with a single DB hit.
You can do batch query processing for this purpose.
Related
I have a very large table in the database, the table has a column called
"unique_code_string", this table has almost 100,000,000 records.
Every 2 minutes, I will receive 100,000 code string, they are in an array and they are unique to each other. I need to insert them to the large table if they are all "good".
The meaning of "good" is this:
All 100,000 codes in the array never occur in the database large table.
If one or more codes occur in the database large table, the whole array will not use at all,
it means no codes in the array will insert into the large table.
Currently, I use this way:
First I do a loop and check each code in the array to see if there is already same code in the database large table.
Second, if all code is "new", then, I do the real insert.
But this way is very slow, I must finish all thing within 2 minutes.
I am thinking of other ways:
Join the 100,000 code in a SQL "in clause", each code has 32 length, I think no database will accept this 32*100,000 length "in clause".
Use database transaction, I force insert the codes anyway, if error happens, the transaction rollback. This cause some performance issue.
Use database temporary table, I am not good at writing SQL querys, please give me some example if this idea may work.
Now, can any experts give me some advice or some solutions?
I am a non-English speaker, I hope you see the issue I am meeting.
Thank you very much.
Load the 100,000 rows into a table!
Create a unique index on the original table:
create unique index unq_bigtable_uniquecodestring on bigtable (unique_code_string);
Now, you have the tools you need. I think I would go for a transaction, something like this:
insert into bigtable ( . . . )
select . . .
from smalltable;
If any row fails (due to the unique index), then the transaction will fail and nothing is inserted. You can also be explicit:
insert into bigtable ( . . . )
select . . .
from smalltable
where not exists (select 1
from smalltable st join
bigtable bt
on st.unique_code_string = bt.unique_code_string
);
For this version, you should also have an index/unique constraint on smalltable(unique_code_string).
It's hard to find an optimal solution with so little information. Often this depends on the network latency between application and database server and hardware resources.
You can load the 100,000,000 unique_code_string from the database and use HashSet or TreeSet to de-duplicate in-memory before inserting into the database. If your database server is resource constrained or there is considerable network latency this might be faster.
Depending how your receive the 100,000 records delta you could load it into the database e.g. a CSV file can be read using external table. If you can get the data efficiently into a temporary table and database server is not overloaded you can do it very efficiently with SQL or stored procedure.
You should spend some time to understand how real-time the update has to be e.g. how many SQL queries are reading the 100,000,000 row table and can you allow some of these SQL queries to be cancelled or blocked while you update the rows. Often it's a good idea to create a shadow table:
Create new table as copy of the existing 100,000,000 rows table.
Disable the indexes on the new table
Load the delta rows to the new table
Rebuild the indexes on new table
Delete the existing table
Rename the new table to the existing 100,000,000 rows table
The approach here is database specific. It will depend on how your database is defining the indexes e.g. if you have a partitioned table it might be not necessary.
I have one table. From Java I am trying inserting records to table by batch wise (batch size: 5000).
There is no idle time from Java side.
Can you please let me know how much time it will take for inserting
5,00,000 records (with indexes).
How much time it will for inserting 5,00,000 records (with out
indexes).
Depends on your Hardware and your tables. A single table with 20 column without blob datatype. It should take around ~3-4min.
I am using mysql tables and would like to fetch more than 10 millions of data in a single query for reporting purposes.
My table might contain foreign keys as well. I am using hibernate queries for fetching the data. The querying part alone is taking around 20-30 seconds.
Is there a way I can optimise this ? Indexing the tables in mysql will be of any help?
Have you tried setFetchSize? Hibernate is not optimized for such large quires by default so try to play around with setFetchSize (http://www.blogbyben.com/2007/07/hibernate-performance-tip-setfetchsize.html)
I am having an application for handling more than 10000000 data.
The MainTable has more than 10000000 data
I am trying to Insert the Data into a SubTable From the Main Table as
INSERT INTO SubTable(Value1,Value2)
SELECT Value1,Value2 FROM MainTable
GROUP BY Value1_ID;
After performing certain processing in SubTable..Again I update the new values into the Main Table as
UPDATE MainTable inf,SubTable in
SET inf.Value1=in.Value1, inf.Value2=in.Value2
WHERE inf.Value1_ID= in.Value1_ID;
While Running this query the Entire Server gets very slow and it stops the entire other transaction.I am using the JDBC Driver Manager connection here. How to avoid this? How to solve this problem?
If it's something that you have to do only once in a while, instead of updating the whole table in a single update, you can set up a small script that will update by batch of rows every few seconds/minutes or so. The other processes will have their query executed freely between two updates.
For example, by updating a batch of 100,000 rows every minutes, if your tables have the right indexes, that would take 1~2 hours, but with a far lesser impact on the performance.
The other solution would be do the update when the activity on the server is at its lowest (maybe during the week-ends?), that way you won't impact the other processes as much.
I have a existing query in the system which is a simple select query as follows:
SELECT <COLUMN_X>, <COLUMN_Y>, <COLUMN_Z> FROM TABLE <WHATEVER>
Over time, <WHATEVER> is growing in terms of records. Is there any way possible to improve the performance here? The developer is using Statement interface. I believe PreparedStatement won't help here since the query is executed only once.
Is there any thing else that can be done? One of the columns is a primary key and others are VARCHAR (if the information helps)
Does you query have any predicates? Or are you always returning all of the rows from the table?
If you are always returning all the rows, a covering index on column_x, column_y, column_z would allow Oracle to merely scan the index rather than doing a table scan. The query will still slow down over time but the index should grow more slowly than the table.
If you are returning a subset of rows, there are potentially other indexes that would be more advantageous from a performance perspective.
Are there any optimization you can do outside of the SQL query tunning? If yes here are some suggestion:
Try putting the table in memory (like the MEMORY storage engine in MySQL) or any other optimization in the DB
Cache the ResultSet in java. query again only when the table content changes. If the table only has inserts and no updates or delete (wishful thinking), then you can use SELECT COUNT(*) FROM table. If the rows returned are different than the previous time then fire your original query and update cache only if needed.