I am currently working on a java code that allows me to query a database and extract its content to a file.
So far no problem for small requests.
But I will quickly have to extract large volumes of data and I have been trying for a few days to implement the most efficient solution in order to limit memory consumption as much as possible.
Because as soon as I make an important request, the memory of the source machine and the target machine is saturated.
The java version I use on the redhat linux environment is java-1.8.0
So far, I have been able to redirect the result of my query to a file. But after a lot of documentation, I could see that there were many different methods to limit memory consumption.
DriverManager.registerDriver(new
com.wily.introscope.jdbc.IntroscopeDriver());
Connection conn = DriverManager.getConnection("jdbc:introscope:net//" +
user + ":" + password + "#" + hostname + ":" + port);
String query = "select * from metric_data"
+ " where agent='"
+ agents_filter
+ "' and metric='"
+ metrics_filter
+ "' and timestamp between "
+ queryInterval;
Statement ps=conn.createStatement();
ResultSet rs=ps.executeQuery(query);
rs.setFetchSize(Size);
ResultSetMetaData rsm = rs.getMetaData();
File output = new File("result");
PrintWriter out = new PrintWriter(new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream(output), "UTF-8")), false);
for(int i = 1; i <= rs.getMetaData().getColumnCount(); i++){
String colName = rs.getMetaData().getColumnName(i);
out.print(" " + colName + "\t\t" + "|");
}
while (rs.next()) {
for(int i = 1; i <= rs.getMetaData().getColumnCount(); i++){
String colValue = rs.getString(i);
out.print(" " + colValue + "\t" + "|");
}
out.println();
}
out.close();
out.flush();
rs.close();
ps.close();
conn.close();
Currently the request is fully loaded into memory and then redirected to my file. But as soon as the request is too important I get the following messages:
Exception in thread "PO:client_main Mailman 2" java.lang.OutOfMemoryError: Java heap space
Exception in thread "UnknownHub Hub Receive 1" java.lang.lang.OutOfMemoryError: Java heap space
I would like to be able to write for example 1000 lines by 1000 lines in the file so as not to saturate the memory.
Knowing that files can sometimes reach 40gb
The execution time is not really a problem, but the memory consumption is a really important criterion.
I am far from being a java professional, so that's why I would need a little help from you.
Thank you in advance for your time
constructing your SQL string by concatenating strings is a security leak. Imagine those variables hold something like: "1'; DROP ALL TABLES; --". Even if here you know the strings are 'safe', code changes, and you should not adopt bad habits. Fix this; you can use PreparedStatement to fix it.
metadata isn't free. Cache that stuff. Specifically, cache the value rs.getMetaData().getColumnCount().
For real speed here, run an SQL command that tells the DB engine to directly pump that data to a file, and then transfer this file if it's not on local host. Can't really go any faster than that.
you can't flush after close, and close implies flush. You can just remove the flush() line.
Assuming your fetch size isn't ludicrously large, there's nothing in this code that would indicate an out of memory error would occur. So, it's either the repeated invocations of getMetaData (which means caching the column size would fix your problem here), or the DB engine and/or its JDBC driver is badly written. I haven't heard of introscope which is why I mention it. If that is the case, at best you can use SQL OFFSET and LIMIT to separate your query into 'pages' and thus not grab too many results at once, but without an ORDER in your SQL, technically the DB engine is allowed to change the order on you, and with it, the process might become quite slow.
Related
I am using spring and hibernate in my project and few day ago I found that Dev environment has crashed due to Java out of heap space exception. After some preliminary analysis using some heap analysis tools and visual vm, I found that the problem is with the one select SQL query. I rewrote the SQL in a different way which solved the memory issue. But now I am not sure why the previous SQL has caused the memory issue.
Note: The method is inside a DAO and is called in a while loop with a batch size of 800 until all the data is pulled. Table size is around 20 million rows.
For each call, a new hibernate session is created and destroyed.
Previous SQL:
#Override
public List<Book> getbookByJournalId(UnitOfWork uow,
List<Journal> batch) {
StringBuilder sb = new StringBuilder();
sb.append("select i from Book i where ( ");
if (batch == null || batch.size() <= 0)
sb.append("1=0 )");
else {
for (int i = 0; i < batch.size(); i++) {
if (i > 0)
sb.append(" OR ");
sb.append("( i.journalId='" + batch.get(i).journalId() + "')");
}
sb.append(")");
sb.append(
" and i.isDummy=:isNotDummy and i.statusId !=:BookStatus and i.BookNumber like :book ");
}
Query query = uow.getSession().createQuery(sb.toString());
query.setParameter("isNotDummy", Definitions.BooleanIdentifiers_Char.No);
query.setParameter("Book", "%" + Definitions.NOBook);
query.setParameter("BookStatus", Definitions.BookStatusID.CLOSED.getValue());
List<Book> bookList = (List<Book>) query.getResultList();
return bookList;
}
Rewritten SQL:
#Override
public List<Book> getbookByJournalId(UnitOfWork uow,
List<Journal> batch) {
List<String> bookIds = new ArrayList<>();
for(Journal J : batch){
bookIds.add(J.getJournalId());
}
StringBuilder sb = new StringBuilder();
sb.append("select i from Book i where i.journalId in (:bookIds) and i.isDummy=:isNotDummy and i.statusId !=:BookStatus and i.BookNumber like :Book");
Query query = uow.getSession().createQuery(sb.toString());
query.setParameter("isNotDummy", Definitions.BooleanIdentifiers_Char.No);
query.setParameter("Book", "%" + Definitions.NOBook);
query.setParameter("BookStatus", Definitions.BookStatusID.CLOSED.getValue());
query.setParameter("specimenNums",specimenNums);
query.setParameter("bookIds", bookIds);
List<Book> bookList = (List<Book>) query.getResultList();
return bookList;
}
When you create dynamic SQL statements, you miss out on ability of the database to cache the statement, indexes and even entire tables to optimise your data retrieval. That said, dynamic SQL can still be a practical solution.
But you need to be a good citizen on the both the application and database servers, by being very efficient with your memory usage. For a solution that needs to scale to 20 million rows, I recommend using more of a disk-based approach, using as little RAM as possible (i.e. avoiding arrays).
Problems I can see from the first statement are the following:
Up to 800 OR conditions may be added to the first statement for each batch. That makes for a very long SQL statement (not good). This I believe [please correct me if I'm wrong] would need to be cached in JVM heap and then passed to the database.
Java may not release this statement from the heap straight away, and garbage collection might be too slow to keep up with your code, increasing the RAM usage. You shouldn't rely on it to clean up after you while your code is running.
If you ran this code in parallel, many sessions on hibernate may risk having many sessions on the database too. I believe you should only use one session for this, unless there is a specific reason. Creating and destroying sessions that you don't need just creates unnecessary traffic on servers and the network.
If you are running this code serially, then why drop the session, when you can reuse it for the next batch? You may have a valid reason, but the question must be asked.
In the second statement, creating the bookIds array again uses up RAM in the JVM heap, and the where i.journalId in (:bookIds) part of the SQL will still be lengthy. Not as bad as before, but I think still too long.
You would be much better off doing the following:
Create a table on the database, with batchNumber, bookId and perhaps some meta-data, such as flags or timestamps. Join the Book table to your new table using a static statement, and pass in the batchNumber as a new parameter.
create table Batch
(
id integer primary key,
batchNumber integer not null,
bookId integer not null,
processed_datetime timestamp
);
create unique index Batch_Idx on Batch (batchNumber, bookId);
-- Put this statement into a loop, or use INSERT/SELECT if the data is available in the database
insert into Batch batchNumber values (:batchNumber, :bookId);
-- Updated SQL statement. This is now static. Note that batchNumber needs to be provided as a parameter.
select i
from Book i
inner join Batch b on b.bookId = i.journalId
where b.batchNumber = :batchNumber
and i.isDummy=:isNotDummy and i.statusId !=:BookStatus and i.BookNumber like :Book;
I'm making an online game. I'm testing the game with 300 players now and I have a problem. I have to update about 300 rows in database every second but the update takes too long. It takes about 11143ms (11s) which is pretty much for task which must be done in less than 1s. I'm making those updates to database from JAVA. I tried with PHP already but it's the same. The update SQL query is very simple...
String query5 = "UPDATE naselje SET zelezo = " + zelezo + ", zlato = " + zlato + ", les = " + les + ", hrana = " + hrana + " WHERE ID =" + ID;
So anyone knows how to make updates to database every second with faster performance or any other solution how to update resources for game (gold, wood, food,...)?
My configuration:
Intel Core i5 M520 2.40GHz
6 GB RAM
You are probably updating each row seperatly, you need to use batch update
Switch to PDO if you are not already on it, and use transactions. Also, restructure your tables to use InnoDB vs MyISAM.
InnoDB works better with larger tables which are frequently read/written.
This is one of the things that it was designed to handle. Multiple SELECT/UPDATE/INSERT statements which are very similar in style.
It is also good coding practice to use transactions when handling multiple consecutive calls of the above types.
Use this Google Search to learn more of PHP PDO and MySQL Transactions.
Example:
With Transactions
$pdo = new PDO(...);
$pdo->beginTransaction();
for ( $i = 0; $i < 1001; $i++) {
$pdo->query("UPDATE table SET column='$var' WHERE ID = $i");
}
$pdo->commit();
I have an onCreate that currently opens an sqlite database and reads off 4 values. I then have a conditional and depending on what activity has sent it there it either displays those values or updates two values and then displays the other.
Now if I run this activity without updating the database it is lightning fast. Whereas if I run two queries to write to the database it can be sluggish. Is there anything I am able to do to optimise this.
The problem is the display stays on the previous activity until the sqlite updating has completed. This seems to be the problem.
Sorry for what is most likely a rubbish explanation. Please feel free to ask me to better describe anything.
Any help appreciated.
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.reason);
//Opens DB connection
type = b.getString("TYPE");
get();
if(type.equals("next")){ update();}
db.close();
}
public void get(){
Cursor b = db.rawQuery("SELECT * FROM " +
DB_TABLE2 +" WHERE _id='1'" , null);
b.moveToFirst();
id = b.getInt(b.getColumnIndex("nextq"));
nextvalue = b.getInt(b.getColumnIndex(type));
if(nextvalue==0){nextvalue=1;}
b.close();
nextvalue ++;
}
public void update(){
db.execSQL("UPDATE "
+ DB_TABLE2
+ " SET nextq='" + nextvalue + "'"
+ " WHERE _id='1'");
db.execSQL("UPDATE "
+ DB_TABLE
+ " SET answered_correctly='" + anscorrect +"' , open ='1' WHERE _id='" + id + "'");
}
Enclose all of your updates inside a single transaction. Not only is it better from a data integrity point of view, but it's also much faster.
So, put a db.beginTransaction() at the start of your update(), and a db.setTransactionSuccessful() followed by db.endTransaction() at the end.
You can do something like this, but be warned, the Pragma Syncronize setting can be dangerous, as it turns off security features in Sqlite. Having said that, it increased my recording to roughly 0.5ms per row, or going from 350ms down to 15-20, and for another table, going from 5000-9000ms down to roughly 300.
// this cut down Insert time from 250-600ms to 14-30 ms.
// with prgma syncronous set to off, this drops it down to 0.5ms/row for membernames
final InsertHelper ih = new InsertHelper(database, SQLiteHelper.TABLE_MEMBERS);
final int nameColumn = ih.getColumnIndex(SQLiteHelper.MEMBER_TABLE_MEMBERNAME);
final long startTime = System.currentTimeMillis();
try {
database.execSQL("PRAGMA synchronous=OFF");
database.setLockingEnabled(false);
database.beginTransaction();
for (int i = 0; i < Members.size(); i++) {
ih.prepareForInsert();
ih.bind(nameColumn, Members.get(i));
ih.execute();
}
database.setTransactionSuccessful();
} finally {
database.endTransaction();
database.setLockingEnabled(true);
database.execSQL("PRAGMA synchronous=NORMAL");
ih.close();
if (Globals.ENABLE_LOGGING) {
final long endtime = System.currentTimeMillis();
Log.i("Time to insert Members: ", String.valueOf(endtime - startTime));
}
}
the main things you want are the InsertHelper, the "SetLockingEnabled" features, and the "execSQL Pragma...". Keep in mind as I said that using both of those can potentially cause DB corruption if you experience a power outage on your phone, but can speed up DB inserts greatly. I learned about this from here: http://www.outofwhatbox.com/blog/2010/12/android-using-databaseutils-inserthelper-for-faster-insertions-into-sqlite-database/#comment-2685
You can also ignore my logging stuff, I had it in there to do some sort of benchmarking to see how fast things took.
Edit: To explain briefly what those options do, I'm basically disabling security and integrity features in SQLite in order to basically pipe data into the database. Since this occurs so fast (around 14-20ms on average now), the risk is acceptable. If this was taking seconds to occur, I wouldn't risk it, because in the event something happens, you could get a corrupted DB. The Syncronize Option is the greatest risk of all, so judge if you want to take that risk with your data. I would recommend using timing features like I've included, to see how long it takes to insert data into your db each time you try something, then determine what level of risk you want. Even if you don't use those two, the other features (InsertHelper and the BeginTransaction stuff) are going to help improve your database work greatly.
Either create a new thread for the database to run on and have a callback for UI update, or if the UI is not dependent on the database change just create the new thread. Executing database stuff on the UI thread will always slow down the UI responsiveness a bit. Check out AsyncTasks or just create a new thread if the UI doesn't need a callback on complete.
Just be careful to not get too careless with thread creation :)
java.sql.SQLException: database is locked
at org.sqlite.DB.throwex(DB.java:288)
at org.sqlite.NestedDB.prepare(NestedDB.java:115)
at org.sqlite.DB.prepare(DB.java:114)
at org.sqlite.Stmt.executeQuery(Stmt.java:89)
When I make a query I get this exception. I read up on it on SA and Google, and the most common conclusion is that someone started making another query which never finished. The problem I'm having is that I've never made a query on this DB on this machine before. I downloaded the db file from where I hosted it (I created it earlier) and haven't done anything with it, so I don't know why it would be locked. When I do a query using a program called SQLite Database Browser, it works just fine. Thanks for the help, I'll provide more info if need be, just let me know.
adapter = new DbAdapter();
ResultSet info;
ResultSet attributes;
for (int i = 1; i < 668; i++) {
if (i%50 == 0) {
System.out.print('.');
}
info = adapter.makeQuery("SELECT * FROM vehicles WHERE id = '" + i + "'");
attributes = adapter.makeQuery("SELECT * FROM vehicle_moves WHERE vehicle_id = '" + i + "'");
if(info.next()) {
base = new (info, attributes);
}
vehicleArray[i] = base;
}
System.out.println("Done.");
info.close();
attributes.close();
adapter.close();
Above is the code where this is occurring. I did some homework throughout my code and sure enough the problem is in this code, other DB queries work just fine. Anything jump out at you guys?
SQLite itself can most certainly handle doing a query while the results of another query are being processed. It'd be terribly useless if that couldn't be done! What's more likely to cause problems is if you've got two connections to the database open at once. I don't know that DbAdapter class at all – not what package it is in, or what module provides it – but if it is assuming that it can open many connections (or if it isn't maintaining proper connection hygiene) then that most certainly would be a cause of the sort of problems you're seeing. Look there first.
having major issues with my query processing time :(
i think it is because the query is getting recompiled evrytime. but i dont see any way around it.
the following is the query/snippet of code:
private void readPerformance(String startTime, String endTime,
String performanceTable, String interfaceInput) throws SQLException, IOException {
String interfaceId, iDescp, iStatus = null;
String dtime, ingress, egress, newLine, append, routerId= null;
StringTokenizer st = null;
stmtD = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmtD.setFetchSize(Integer.MIN_VALUE);
BufferedReader interfaceRead = new BufferedReader(new FileReader(interfaceInput));
BufferedWriter pWrite = new BufferedWriter(new FileWriter("performanceInput.txt"));
while((newLine = interfaceRead.readLine())!= null){
st = new StringTokenizer(newLine,",");
while(st.hasMoreTokens()){
append = st.nextToken()+CSV+st.nextToken()+st.nextToken()+CSV+st.nextToken();
System.out.println(append +" ");
iStatus = st.nextToken().trim();
interfaceId = st.nextToken().trim();
append = append + CSV+iStatus+CSV+interfaceId;
System.out.println(append +" ");
pquery = " Select d.dtime,d.ifInOctets, d.ifOutOctets from "+performanceTable+"_1_60" +" AS d Where d.id = " +interfaceId
+ " AND dtime BETWEEN " +startTime+ " AND "+ endTime;
rsD = stmtD.executeQuery(pquery);
/* interface query*/
while(rsD.next()){
dtime = rsD.getString(1);
ingress= rsD.getString(2);
egress = rsD.getString(3);
pWrite.write(append + CSV + dtime+CSV+ingress+CSV+egress+NL);
}//end while
}//end while
}// end while
pWrite.close();
interfaceRead.close();
rsD.close() ;
stmtD.close();
}
my interfaceId value keeps changing. so i have put the query inside the loop resulting in recompilation of query multiple times.
is there any betetr way? can i sue stored procedure in java? if so how? do not have much knowledge of it.
current processing time is almost 60 mins (:(()!!! Text file getting generated is over 300 MB
Please help!!!
Thank you.
You can use a PreparedStatement and paramters, which may avoid recompiling the query. Since performanceTable is constant, this can be put into the prepared query. The remaining variables, used in the WHERE condition, are set as parameters.
Outside the loop, create a prepared statement, rather than a regular statement:
PreparedStatement stmtD = conn.prepareStatement(
"Select d.dtime,d.ifInOctets, d.ifOutOctets from "+performanceTable+"_1_60 AS d"+
" Where d.id = ? AND dtime BETWEEN ? AND ?");
Then later, in your loop, set the parameters:
stmtD.setInteger(1, interfaceID);
stmtD.setInteger(2, startTime);
stmtD.setInteger(3, endTime);
ResultSet rsD = stmtD.executeQuery(); // note no SQL passed in here
It may be a good idea to also check the query plan from MySQL with EXPLAIN to see if that is part of the bottleneck also. Also, there is quite a bit of diagnostic string concatenation going on in the function. Once the query is working, removing that may also improve performance.
Finally, note that even if the query is fast, network latency may slow things down. JDBC provides batch execution of multiple queries to help reduce overall latency per statement. See addBatch/executeBatch on Connection.
More information required but I can offer some general questions/suggestions. It may have nothing to do with the compilation of the query plan (that would be unusual)
Are the id and dtime columns indexed?
How many times does a query get executed in the 60mins?
How much time does each query take?
If the time per query is large then the problem is the query execution itself, not the compilation. Check the indexes as described above.
If there are many many many queries then it might be the sheer volume of queries that is causing the problem. Using PreparedStatement (see mdma's answer) may help. Or you can try and batch the interfaceIDs you want by using an "in" statement and running a query for every 100 interfaceIDs rather than one for each.
EDIT: As a matter of good practice you should ALWAYS use PreparedStatement as it will correctly handle datatypes such as dates so you don't have to worry about formatting them into correct SQL syntax. Also prevents SQL injection.
From the looks of things you are kicking off multiple select queries (even 100's based on your file size)
Instead of doing that, from your input file create a comma delimited list of all the interfaceId values and then make 1 SQL call using the "IN" keyword. You know the performanceTable, startTime and endTime arent changing so the query would look something like this
SELECT d.dtime,d.ifInOctets, d.ifOutOctets
FROM MyTable_1_60 as d
WHERE dtime BETWEEN '08/14/2010' AND '08/15/2010'
AND d.id IN ( 10, 18, 25, 13, 75 )
Then you are free to open your file, dump the result set in one swoop.