improving speed of query processing

improving speed of query processing - java

having major issues with my query processing time :(
i think it is because the query is getting recompiled evrytime. but i dont see any way around it.
the following is the query/snippet of code:
private void readPerformance(String startTime, String endTime,
String performanceTable, String interfaceInput) throws SQLException, IOException {
String interfaceId, iDescp, iStatus = null;
String dtime, ingress, egress, newLine, append, routerId= null;
StringTokenizer st = null;
stmtD = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmtD.setFetchSize(Integer.MIN_VALUE);
BufferedReader interfaceRead = new BufferedReader(new FileReader(interfaceInput));
BufferedWriter pWrite = new BufferedWriter(new FileWriter("performanceInput.txt"));
while((newLine = interfaceRead.readLine())!= null){
st = new StringTokenizer(newLine,",");
while(st.hasMoreTokens()){
append = st.nextToken()+CSV+st.nextToken()+st.nextToken()+CSV+st.nextToken();
System.out.println(append +" ");
iStatus = st.nextToken().trim();
interfaceId = st.nextToken().trim();
append = append + CSV+iStatus+CSV+interfaceId;
System.out.println(append +" ");
pquery = " Select d.dtime,d.ifInOctets, d.ifOutOctets from "+performanceTable+"_1_60" +" AS d Where d.id = " +interfaceId
+ " AND dtime BETWEEN " +startTime+ " AND "+ endTime;
rsD = stmtD.executeQuery(pquery);
/* interface query*/
while(rsD.next()){
dtime = rsD.getString(1);
ingress= rsD.getString(2);
egress = rsD.getString(3);
pWrite.write(append + CSV + dtime+CSV+ingress+CSV+egress+NL);
}//end while
}//end while
}// end while
pWrite.close();
interfaceRead.close();
rsD.close() ;
stmtD.close();
}
my interfaceId value keeps changing. so i have put the query inside the loop resulting in recompilation of query multiple times.
is there any betetr way? can i sue stored procedure in java? if so how? do not have much knowledge of it.
current processing time is almost 60 mins (:(()!!! Text file getting generated is over 300 MB
Please help!!!
Thank you.

You can use a PreparedStatement and paramters, which may avoid recompiling the query. Since performanceTable is constant, this can be put into the prepared query. The remaining variables, used in the WHERE condition, are set as parameters.
Outside the loop, create a prepared statement, rather than a regular statement:
PreparedStatement stmtD = conn.prepareStatement(
"Select d.dtime,d.ifInOctets, d.ifOutOctets from "+performanceTable+"_1_60 AS d"+
" Where d.id = ? AND dtime BETWEEN ? AND ?");
Then later, in your loop, set the parameters:
stmtD.setInteger(1, interfaceID);
stmtD.setInteger(2, startTime);
stmtD.setInteger(3, endTime);
ResultSet rsD = stmtD.executeQuery(); // note no SQL passed in here
It may be a good idea to also check the query plan from MySQL with EXPLAIN to see if that is part of the bottleneck also. Also, there is quite a bit of diagnostic string concatenation going on in the function. Once the query is working, removing that may also improve performance.
Finally, note that even if the query is fast, network latency may slow things down. JDBC provides batch execution of multiple queries to help reduce overall latency per statement. See addBatch/executeBatch on Connection.

More information required but I can offer some general questions/suggestions. It may have nothing to do with the compilation of the query plan (that would be unusual)
Are the id and dtime columns indexed?
How many times does a query get executed in the 60mins?
How much time does each query take?
If the time per query is large then the problem is the query execution itself, not the compilation. Check the indexes as described above.
If there are many many many queries then it might be the sheer volume of queries that is causing the problem. Using PreparedStatement (see mdma's answer) may help. Or you can try and batch the interfaceIDs you want by using an "in" statement and running a query for every 100 interfaceIDs rather than one for each.
EDIT: As a matter of good practice you should ALWAYS use PreparedStatement as it will correctly handle datatypes such as dates so you don't have to worry about formatting them into correct SQL syntax. Also prevents SQL injection.

From the looks of things you are kicking off multiple select queries (even 100's based on your file size)
Instead of doing that, from your input file create a comma delimited list of all the interfaceId values and then make 1 SQL call using the "IN" keyword. You know the performanceTable, startTime and endTime arent changing so the query would look something like this
SELECT d.dtime,d.ifInOctets, d.ifOutOctets
FROM MyTable_1_60 as d
WHERE dtime BETWEEN '08/14/2010' AND '08/15/2010'
AND d.id IN ( 10, 18, 25, 13, 75 )
Then you are free to open your file, dump the result set in one swoop.

Related

Java out of heap space error if I use 'or' in sql instead of 'in'

I am using spring and hibernate in my project and few day ago I found that Dev environment has crashed due to Java out of heap space exception. After some preliminary analysis using some heap analysis tools and visual vm, I found that the problem is with the one select SQL query. I rewrote the SQL in a different way which solved the memory issue. But now I am not sure why the previous SQL has caused the memory issue.
Note: The method is inside a DAO and is called in a while loop with a batch size of 800 until all the data is pulled. Table size is around 20 million rows.
For each call, a new hibernate session is created and destroyed.
Previous SQL:
#Override
public List<Book> getbookByJournalId(UnitOfWork uow,
List<Journal> batch) {
StringBuilder sb = new StringBuilder();
sb.append("select i from Book i where ( ");
if (batch == null || batch.size() <= 0)
sb.append("1=0 )");
else {
for (int i = 0; i < batch.size(); i++) {
if (i > 0)
sb.append(" OR ");
sb.append("( i.journalId='" + batch.get(i).journalId() + "')");
}
sb.append(")");
sb.append(
" and i.isDummy=:isNotDummy and i.statusId !=:BookStatus and i.BookNumber like :book ");
}
Query query = uow.getSession().createQuery(sb.toString());
query.setParameter("isNotDummy", Definitions.BooleanIdentifiers_Char.No);
query.setParameter("Book", "%" + Definitions.NOBook);
query.setParameter("BookStatus", Definitions.BookStatusID.CLOSED.getValue());
List<Book> bookList = (List<Book>) query.getResultList();
return bookList;
}
Rewritten SQL:
#Override
public List<Book> getbookByJournalId(UnitOfWork uow,
List<Journal> batch) {
List<String> bookIds = new ArrayList<>();
for(Journal J : batch){
bookIds.add(J.getJournalId());
}
StringBuilder sb = new StringBuilder();
sb.append("select i from Book i where i.journalId in (:bookIds) and i.isDummy=:isNotDummy and i.statusId !=:BookStatus and i.BookNumber like :Book");
Query query = uow.getSession().createQuery(sb.toString());
query.setParameter("isNotDummy", Definitions.BooleanIdentifiers_Char.No);
query.setParameter("Book", "%" + Definitions.NOBook);
query.setParameter("BookStatus", Definitions.BookStatusID.CLOSED.getValue());
query.setParameter("specimenNums",specimenNums);
query.setParameter("bookIds", bookIds);
List<Book> bookList = (List<Book>) query.getResultList();
return bookList;
}

When you create dynamic SQL statements, you miss out on ability of the database to cache the statement, indexes and even entire tables to optimise your data retrieval. That said, dynamic SQL can still be a practical solution.
But you need to be a good citizen on the both the application and database servers, by being very efficient with your memory usage. For a solution that needs to scale to 20 million rows, I recommend using more of a disk-based approach, using as little RAM as possible (i.e. avoiding arrays).
Problems I can see from the first statement are the following:
Up to 800 OR conditions may be added to the first statement for each batch. That makes for a very long SQL statement (not good). This I believe [please correct me if I'm wrong] would need to be cached in JVM heap and then passed to the database.
Java may not release this statement from the heap straight away, and garbage collection might be too slow to keep up with your code, increasing the RAM usage. You shouldn't rely on it to clean up after you while your code is running.
If you ran this code in parallel, many sessions on hibernate may risk having many sessions on the database too. I believe you should only use one session for this, unless there is a specific reason. Creating and destroying sessions that you don't need just creates unnecessary traffic on servers and the network.
If you are running this code serially, then why drop the session, when you can reuse it for the next batch? You may have a valid reason, but the question must be asked.
In the second statement, creating the bookIds array again uses up RAM in the JVM heap, and the where i.journalId in (:bookIds) part of the SQL will still be lengthy. Not as bad as before, but I think still too long.
You would be much better off doing the following:
Create a table on the database, with batchNumber, bookId and perhaps some meta-data, such as flags or timestamps. Join the Book table to your new table using a static statement, and pass in the batchNumber as a new parameter.
create table Batch
(
id integer primary key,
batchNumber integer not null,
bookId integer not null,
processed_datetime timestamp
);
create unique index Batch_Idx on Batch (batchNumber, bookId);
-- Put this statement into a loop, or use INSERT/SELECT if the data is available in the database
insert into Batch batchNumber values (:batchNumber, :bookId);
-- Updated SQL statement. This is now static. Note that batchNumber needs to be provided as a parameter.
select i
from Book i
inner join Batch b on b.bookId = i.journalId
where b.batchNumber = :batchNumber
and i.isDummy=:isNotDummy and i.statusId !=:BookStatus and i.BookNumber like :Book;

Make performance better checking if a String exists in database

ps = (PreparedStatement) connection.prepareStatement(
"SELECT nm.id, nid.key, nm.name, nm.languageCode FROM odds.name as nm JOIN (odds.name_id as nid)\r\n"
+ "ON (nm.id = nid.id) where nm.name like '%' and nid.key not like \"vhc%\" and nid.key not like \"vdr%\" and nid.key not like \"vto%\" and nid.key not like \"vbl%\"\r\n"
+ "and nid.key not like \"vf%\" and nid.key not like \"vfl%\" and nid.key not like \"vsm%\" and nid.key not like \"rgs%\"\r\n"
+ "and nid.key not like \"srrgs%\" and nm.typeId=8 and nm.sourceId=-1 and nm.languageCode = 'en'");
for(Entry <String, Tag> e : allTags.entrySet()) {
ResultSet rs = ps.executeQuery();
while(rs.next()) {
if(rs.getString("name").equals(e.getValue().getTranslation(Language.EN))) {
e.getValue().setAlternativeKey(rs.getString("name"));
break;
}
}
}
);
Do you have any Ideas how I can do this a way faster. I'll try to find a string in the database and add an extra information to my object. But I have to do this for 1265 objects, so the program runs about 80 seconds.
Thanks in advance!

First of all, when tackling performance problems, get yourself a profiling tool that tells you where you're spending the time, how often a given method is called and so on.
But I think the case is clear enough to give some more specific hints.
You're executing your PreparedStatement over and over again, once for every entry in allTags.entrySet(), always giving you the same results, and inside in software you filter out the lines you're interested in. So you're doing the same query 1265 times, correct?
And it's puzzling me what you're doing inside the while(rs.next()) loop. Effectively, your code does (after introducing some local variables, moving constant values out of loops, ...):
for(Entry <String, Tag> e : allTags.entrySet()) {
Tag tag = e.getValue();
String translation = tag.getTranslation(Language.EN);
ResultSet rs = ps.executeQuery();
while(rs.next()) {
if(rs.getString("name").equals(translation)) {
tag.setAlternativeKey(translation);
break;
}
}
}
So, the only role of the query result seems to be to decide whether the alternative key should be set (if the translation of your Tag shows up as name in the ResultSet) - the value is already fixed by the result of the method call getTranslation(Language.EN), independent of any database result.
I'd suggest to do one execution of your query, collecting the name values in a HashSet names, and after that do the allTags loop setting the translation if the translation is contained in your names set. That should give the same result as your code, and probably much faster.

Open the DB-client of your choice (e.g. HeidiSQL) and do an
explain [the select statement that is originally executed]
That way MySQL explains to you what it's doing when trying to create the result and where time gets lost.
From there you can go on e.g. creating indizes or changing your query to make use of existing ones.
BTW:
nm.name like '%'
looks strange. Is that a variant of
is not null
The latter might be faster. If the texts in the other like-statements are always the same, a better performance might be achieved by checking these conditions when inserting the data, add columns of type int or boolean and save the result of this check as integer/boolean in addition to the text itself. Checking against a fixed numeric value is way faster than text searches.

Improve JDBC Performance

I am executing the following set of statements in my java application. It connects to a oracle database.
stat=connection.createStatement();
stat1=commection.createstatement();
ResultSet rs = stat.executeQuery(BIGQUERY);
while(rs.next()) {
obj1.setAttr1(rs.getString(1));
obj1.setAttr2(rs.getString(1));
obj1.setAttr3(rs.getString(1));
obj1.setAttr4(rs.getString(1));
ResultSet rs1 = stat1.executeQuery(SMALLQ1);
while(rs1.next()) {
obj1.setAttr5(rs1.getString(1));
}
ResultSet rs2 = stat1.executeQuery(SMALLQ2);
while(rs2.next()) {
obj1.setAttr6(rs2.getString(1));
}
.
.
.
LinkedBlockingqueue.add(obj1);
}
//all staements and connections close
The BIGQUERY returns around 4.5 million records and for each record, I have to execute the smaller queries, which are 14 in number. Each small query has 3 inner join statements.
My multi threaded application now can process 90,000 in one hour. But I may have to run the code daily, so I want to process all the records in 20 hours. I am using about 200 threads which process the above code and stores the records in linked blocking queue.
Does increasing the thread count blindly helps increase the performance or is there some other way in which I can increase the performance of the result sets?
PS : I am unable to post the query here, but I am assured that all queries are optimized.

To improve JDBC performance for your scenario you can apply some modifications.
As you will see, all these modifications can significantly speed your task.
1. Using batch operations.
You can read your big query and store results in some kind of buffer.
And only when buffer is full you should run subquery for all data collected in buffer.
This significantly reduces number of SQL statements to execute.
static final int BATCH_SIZE = 1000;
List<MyData> buffer = new ArrayList<>(BATCH_SIZE);
while (rs.hasNext()) {
MyData record = new MyData( rs.getString(1), ..., rs.getString(4) );
buffer.add( record );
if (buffer.size() == BATCH_SIZE) {
processBatch( buffer );
}
}
void processBatch( List<MyData> buffer ) {
String sql = "select ... where X and id in (" + getIDs(buffer) + ")";
stat1.executeQuery(sql); // query for all IDs in buffer
while(stat1.hasNext()) { ... }
...
}
2. Using efficient maps to store content from many selects.
If your records are no so big you can store them all at once event for 4 mln table.
I used this approach many times for night processes (with no normal users).
Such approach may need much heap memory (i.e. 100 MB - 1 GB) - but is much faster that approach 1).
To do that you need efficient map implementation, i.e. - gnu.trove.map.TIntObjectMap (etc)
which is much better that java standard library maps.
final TIntObjectMap<MyData> map = new TIntObjectHashMap<MyData>(10000, 0.8f);
// query 1
while (rs.hasNext()) {
MyData record = new MyData( rs.getInt(1), rs.getString(2), ..., rs.getString(4) );
map.put(record.getId(), record);
}
// query 2
while (rs.hasNext()) {
int id = rs.getInt(1); // my data id
String x = rs.getString(...);
int y = rs.getInt(...);
MyData record = map.get(id);
record.add( new MyDetail(x,y) );
}
// query 3
// same pattern as query 2
After this you have map filled with all data collected. Probably with a lot of memory allocated.
This is why you can use that method only if you hava such resources.
Another topic is how to write MyData and MyDetail classes to be as small as possible.
You can use some tricks:
storing 3 integers (with limited range) in 1 long variable (using util for bit shifting)
storing Date objects as integer (yymmdd)
calling str.intern() for each string fetched from DB
3. Transactions
If you have to do some updates or inserts than 4 mln records is too much to handle in on transactions.
This is too much for most database configurations.
Use approach 1) and commit transaction for each batch.
On each new inserted record you can have something like RUN_ID and if everything went well you can mark this RUN_ID as successful.
If your queries only read - there is no problem. However you can mark transaction as Read-only to help your database.
4. Jdbc fetch size.
When you load a lot of records from database it is very, very important to set proper fetch size on your jdbc connection.
This reduces number of physical hits to database socket and speeds your process.
Example:
// jdbc
statement.setFetchSize(500);
// spring
JdbcTemplate jdbc = new JdbcTemplate(datasource);
jdbc.setFetchSize(500);
Here you can find some benchmarks and patterns for using fetch size:
http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html
5. PreparedStatement
Use PreparedStatement rather than Statement.
6. Number of sql statements.
Always try to minimize number of sql statements you send to database.

Try this
resultSet.setFetchSize(100);
while(resultSet.next) {
...
}
The parameter is the number of rows that should be retrieved from the
database in each roundtrip

More Efficient Way of Doing This SQL Query? A time comparison query?

I have this SQL query which queries the database every 5 seconds to determine who is currently actively using the software. Active users have pinged the server in the last 10 seconds. (The table gets updated correctly on user activity and a I have a thread evicting entries on session timeouts, that all works correctly).
What I'm looking for is a more efficient/quicker way to do this, since it gets called frequently, about every 5 seconds. In addition, there may be up to 500 users in the database. The language is Java, but the question really pertains to any language.
List<String> r = new ArrayList<String>();
Calendar c = Calendar.getInstance();
long threshold = c.get(Calendar.SECOND) + c.get(Calendar.MINUTE)*60 + c.get(Calendar.HOUR_OF_DAY)*60*60 - 10;
String tmpSql = "SELECT user_name, EXTRACT(HOUR FROM last_access_ts) as hour, EXTRACT(MINUTE FROM last_access_ts) as minute, EXTRACT(SECOND FROM last_access_ts) as second FROM user_sessions";
DBResult rs = DB.select(tmpSql);
for (int i=0; i<rs.size(); i++)
{
Map<String, Object> result = rs.get(i);
long hour = (Long)result.get("hour");
long minute = (Long)result.get("minute");
long second = (Long)result.get("second");
if (hour*60*60 + minute*60 + second > threshold)
r.add(result.get("user_name").toString());
}
return r;

If you want this to run faster, then create an index on user_sessions(last_access_ts, user_name), and do the date logic in the query:
select user_name
from user_sessions
where last_access_ts >= now() - 5/(24*60*60);
This does have a downside. You are, presumably, updating the last_access_ts field quite often. An index on the field will also have to be updated. On the positive side, this is a covering index, so the index itself can satisfy the query without resorting to the original data pages.

I would move the logic from Java to DB. This mean you translate if into where, and just select the name of valid result.
SELECT user_name FROM user_sessions WHERE last_access_ts > ?
In your example the c represent current time. It is highly possible that result will be empty.
So your question should be more about date time operation on your database.

Just let the database do the comparison for you by using this query:
SELECT
user_name
FROM user_sessions
where TIMESTAMPDIFF(SECOND, last_access_ts, current_timestamp) > 10
Complete example:
List<String> r = new ArrayList<String>();
Calendar c = Calendar.getInstance();
long threshold = c.get(Calendar.SECOND) + c.get(Calendar.MINUTE)*60 + c.get(Calendar.HOUR_OF_DAY)*60*60 - 10;
// this will return all users that were inactive for longer than 10 seconds
String tmpSql = "SELECT
user_name
FROM user_sessions
where TIMESTAMPDIFF(SECOND, last_access_ts, current_timestamp) > 10";
DBResult rs = DB.select(tmpSql);
for (int i=0; i<rs.size(); i++)
{
Map<String, Object> result = rs.get(i);
r.add(result.get("user_name").toString());
}
return r;
SQLFiddle

The solution is to remove the logic from your code to the sql query to only get the active users from that select, using a where clause.
It is faster to use the sql built-in functions to get fewer records and iterate less in your code.
Add this to your sql query to get the active users only:
Where TIMESTAMPDIFF(SECOND, last_access_ts, current_timestamp) > 10
This will get you all the records whose date is 10 seconds ago or sooner.

Try the MySQL TimeDiff function in your select. This way you can select only the results that are active without having to do any other calculations.
Link: MySQL: how to get the difference between two timestamps in seconds

If I get you right, then you got only 500 entries in your user_sessions table. In this case I wouldn't even care about indexes. Throw them away. The DB engine probably won't use them anyway for such a low record count. The performance gain due to not updating the indexes on every record update could be probably higher than the query overhead.
If you care about DB stress, then lengthen the query/update intervals to 1 minute or more, if your application allows this. Gordon Linoff's answer should give you the best query performance though.
As a side note (because it has bitten me before): If you don't use the same synchronized time for all user callbacks, then your "active users logic" is flawed by design.

JDBC/Resultset error

My mysql-query in Java always stops (i.e. freezes and does not continue) at a certain position, which namely is 543,858; even though the table contains approx. 2,000,000 entries. I've checked this by logging the current result-fetching.
It is reproducible and happens every time at the very same position.
"SELECT abc from bcd WHERE DATEDIFF(CURDATE(), timestamp) <= '"+days+"'");
Addition: It definitely is a Java error, I've just tried out this statement in Navicat (50s running time).
The query seems to freeze after the log tells me that it's now adding the result of position 543,858.
try {
...
ResultSet res = new ResultSet();
PreparedStatement stmt = new PreparedStatement(); // prepare statmenet etc.
stmt.setFetchSize(Integer.MIN_VALUE);
res = stmt.executeQuery();
...
System.out.println(res.getStatement());
...
while (res.next())
treeSet.add(res.getString("userid"));
} catch (Exception e) {
e.printStackTrace();
}
Edit: We were able to figure out the problem. This method is fine and the returned result (500,000 instead of 2,000,000) is right as well (looked up in the wrong db to verify the amount); the problem was, that the next method-call that used the result of the one posted above takes literally forever, but had no logging-implemented. So I've been fooled by missing console-logs.
Thanks anyways!

I think you might be running out of memory after processing half a million records. Try assigning more memory using command line options -Xmx etc. See here for more info about command line options.

In mysql to use streaming ResultSets you have to specify more parameters, not only fetchSize.
Try:
stmt = conn.createStatement('select ...', java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
and see if that works.
It's documented in the ResultSet section.
Strange that it doesn't throw exception, but this is the only suspect I have. Maybe it starts garbage collection/flushes memory to disk and it takes so much time it doesn't get to throw it.

I would try to add to your query " LIMIT 543857" and then " LIMIT 543857" and see what happens.
If the above does not help, use the limit directive combined with order by.
I suspect that there is invalid entry in your table and the way to find it is binary search.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.