Java Machine crashes for too big MySQL result. - java

I am running out of memory when I am trying to read a huge list. This is the code that crashes the server
query = "SELECT * FROM huge_list ORDER BY id ASC";
statement = this.database.createStatement();
results = statement.executeQuery(query);
This is the error
Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
the error points to the results line. Any advice on how to avoid that? Perhaps load the list using Limit? Could the size of the list cause an error?

You could may change the memory settings of your jvm, this could fix your problem but in general you should not load all data at once. Better load always let's say 100 rows and query then the next 100 rows.
Just for security I would store the smallest ID and query that again with limit the next brunch of rows. So this will not break your logic if new rows are added.

I believe you should be streaming the MySQL results. See Reading large amount of records MySQL into Java
Your other option would be to increase the maximum RAM allowed by the JVM, but that's probably not what you want to do.

Try to modify the query to return only those records which you are displaying/processing to the user. You can use limit by clause. One more thing state your column names, SELECT * isn't really needed, are you really processing all the cols that your query returns?

Related

java.lang.OutOfMemoryError: Java heap space in hibernate while picking 200 thousand rows

I am using hibernate 4.3.10, java 1.8 and eclipse neon.
I have been trying to execute the following statement in my code. But
java.lang.OutOfMemoryError: Java heap space
is thrown at the second line after a few minutes.
Query query = mySqlSession.createSQLQuery("select * from hops where processed is null");
List<Object[]> lis=query.list();
The problem is this query return about 200k results which is causing the problem. I have refered
How to deal with "java.lang.OutOfMemoryError: Java heap space" error (64MB heap size)
after reading solutions from the above link, I changed the memory to 2 GB. Still there seems to be a problem.
Any suggestions?
Depending what you're doing with the rows, the proper way to handle the situation is to use LIMIT to retrieve only a part of the results at a time.
You would then loop the query and process the resultset while rows are available.
SELECT * FROM hops WHERE processed IS NULL LIMIT 10000
When using non-native query, see #Stefano's answer.
Actual Problem is returning such a big dataset. Even if you increase the process memory, you will run into this problem again when more data gets into the store. You may want to implement Paging here. That is get the finite and small set of data and then get next page on request.
I think there are 2 ways:
1.Pagination ( select only N row per query )
Criteria c = session.createCriteria(Envelope.class);
List<Envelope> list = c.setMaxResults(10).list();
2.Scrollable Result ( similar to a JDBC ResultSet )
Criteria c = session.createCriteria(Envelope.class);
ScrollableResults scroll = c.scroll(ScrollMode.FORWARD_ONLY);
while (scroll.next()) {
Envelope e = (Envelope) scroll.get(0);
// do whatever
}
You can also use a StetelessSession

How to free memory after fetching data via jdbc

i am using spring jdbc on weblogic. And i set fetch size to 500 for fetching data from db more faster. But this causes memory problems. Here is an example:
http://webmoli.com/2009/02/01/jdbc-performance-tuning-with-optimal-fetch-size/
My question is how to free this memory? Running GC is not working, I guess it is not working because of connection is alive in the connection pool.
Code:
public List<Msisdn> getNewMsisdnsForBulkSmsId(String bulkSmsId,String scheduleId,final int msisdnCount) throws SQLException {
JdbcTemplate jdbcTemplate = getJdbcTemplate();
jdbcTemplate.setFetchSize(500);
jdbcTemplate.setMaxRows(msisdnCount);
jdbcTemplate.query("select BULKSMS_ID, ? as , STATUSSELECTDATE, DELIVERYTIME, ID, MESSAGE from ada_msisdn partition (ID_"+bulkSmsId+") where bulksms_id = ? and status = 0 and ERRORCODE = 0 and SCHEDULEID is null for update skip locked", new Object[]{scheduleId,bulkSmsId}, MsisdnRowMapper.INSTANCE);
//Also i tried to close connection and run gc, this does not free the memory too.
//jdbcTemplate.getDataSource().getConnection().close();
//System.gc();
return null;
}
when i set fetch size to 10, heap size is 12 MB
if i set fetch size to 500, heap size is 206 MB
Thanx
Updates for added sample code, etc:
It sounds like you just need to use a value less than 500, but that makes me think you are returning a lot more data than your result set mapper is actually using.
Now that I see that you're storing all of the mapped results in a List, I would say that the problem seen with the fetch size is likely to be a secondary issue. The combined memory space needed for the List<Msisdn> and a single group of fetched ResultSet rows is pushing you past available memory.
What is the value of msisdnCount? If it's larger than 500, then you are probably using more memory in list than in the ResultSet's 500 records. If it's less than 500, then I would expect that the memory problem also occurs when you set the fetch size to msisdnCount, and the error would go away at some value between min(msisdnCount, 500) and 10.
Loading all of the results into a list and then processing them is a pattern that will very often lead to memory exhaustion. The common solution is to use streaming. If you can process each row as it comes in and not store all of the maps results in your list, then you can avoid the memory problems.
I don't see any streaming support in the Spring JDBC core package, but I'll update if I find it.
--
If that data in the rows you are retrieving is huge enough that fetching 500 rows will use up your heap, then you must either return less data per row or fetch fewer rows at a time.
You may find that you are storing the fetched rows somewhere in your code, which means that it's not the ResultSet using up your memory. For example, you might be copying all of the rows to some collection instance.
I would look at the size of the data in each row and try to reduce unneeded columns that might contain large data types, then try simply loading the data and iterating through the results without doing your normal processing, which may be storing the data somewhere, to see how many rows you can load at a time with the memory you have. If you're running out of memory fetching 500 rows, you must be pulling a lot of data over. If you're not actually using that data, then you're wasting CPU and network resources as well as memory.
edit: You may also want to set the cursor behavior to give your JDBC driver more help to know what it can throw away. You can prepare your statements with ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY for example. http://docs.oracle.com/javase/6/docs/api/index.html?java/sql/ResultSet.html

Android Java Sqlite Exception index 0 requested with size of 0

I think I understand what this error refers to. Basically as I understand it the cursor is empty. Which means the query is not finding any rows that equal its where statement.
The query is basically
SELECT * FROM questions WHERE _id=2
Now the thing I don't understand is that if I user a database with 7 records it is fine then when I change it over to one with 100 it runs this exception. The odd thing is that from other parts of the app I output the entire databases contents referring to specific columns.
All the columns in both db's are called the same and when outputting everything from a table you can refer to everything. The problem seems to be when you query the large db specifically looking for one row. It returns empty.
Is there anything that would be doing this, like special characters or anything else I have over looked?
More code would be helpful. The obvious thing to do is verify that your large database has the correct filename, tablename and _id column and that there is a row with _id=2. Another thing I would try (probably not the problem) is to put quotes around the 2 -- WHERE _id='2'.
I've run into this issue in my app. For me it was because my cursors were exceeding the VM object memory limit of 1mb. This could be your issue.
Are you retrieving any blobs?

Using Hibernate's ScrollableResults to slowly read 90 million records

I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following would be appropriate:
ScrollableResults results = session.createQuery("SELECT person FROM Person person")
.setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY);
while (results.next())
storeInFile(results.get()[0]);
The problem is the above will try and load all 90 million rows into RAM before moving on to the while loop... and that will kill my memory with OutOfMemoryError: Java heap space exceptions :(.
So I guess ScrollableResults isn't what I was looking for? What is the proper way to handle this? I don't mind if this while loop takes days (well I'd love it to not).
I guess the only other way to handle this is to use setFirstResult and setMaxResults to iterate through the results and just use regular Hibernate results instead of ScrollableResults. That feels like it will be inefficient though and will start taking a ridiculously long time when I'm calling setFirstResult on the 89 millionth row...
UPDATE: setFirstResult/setMaxResults doesn't work, it turns out to take an unusably long time to get to the offsets like I feared. There must be a solution here! Isn't this a pretty standard procedure?? I'm willing to forgo Hibernate and use JDBC or whatever it takes.
UPDATE 2: the solution I've come up with which works ok, not great, is basically of the form:
select * from person where id > <offset> and <other_conditions> limit 1
Since I have other conditions, even all in an index, it's still not as fast as I'd like it to be... so still open for other suggestions..
Using setFirstResult and setMaxResults is your only option that I'm aware of.
Traditionally a scrollable resultset would only transfer rows to the client on an as required basis. Unfortunately the MySQL Connector/J actually fakes it, it executes the entire query and transports it to the client, so the driver actually has the entire result set loaded in RAM and will drip feed it to you (evidenced by your out of memory problems). You had the right idea, it's just shortcomings in the MySQL java driver.
I found no way to get around this, so went with loading large chunks using the regular setFirst/max methods. Sorry to be the bringer of bad news.
Just make sure to use a stateless session so there's no session level cache or dirty tracking etc.
EDIT:
Your UPDATE 2 is the best you're going to get unless you break out of the MySQL J/Connector. Though there's no reason you can't up the limit on the query. Provided you have enough RAM to hold the index this should be a somewhat cheap operation. I'd modify it slightly, and grab a batch at a time, and use the highest id of that batch to grab the next batch.
Note: this will only work if other_conditions use equality (no range conditions allowed) and have the last column of the index as id.
select *
from person
where id > <max_id_of_last_batch> and <other_conditions>
order by id asc
limit <batch_size>
You should be able to use a ScrollableResults, though it requires a few magic incantations to get working with MySQL. I wrote up my findings in a blog post (http://www.numerati.com/2012/06/26/reading-large-result-sets-with-hibernate-and-mysql/) but I'll summarize here:
"The [JDBC] documentation says:
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
This can be done using the Query interface (this should work for Criteria as well) in version 3.2+ of the Hibernate API:
Query query = session.createQuery(query);
query.setReadOnly(true);
// MIN_VALUE gives hint to JDBC driver to stream results
query.setFetchSize(Integer.MIN_VALUE);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
// iterate over results
while (results.next()) {
Object row = results.get();
// process row then release reference
// you may need to evict() as well
}
results.close();
This allows you to stream over the result set, however Hibernate will still cache results in the Session, so you’ll need to call session.evict() or session.clear() every so often. If you are only reading data, you might consider using a StatelessSession, though you should read its documentation beforehand."
Set fetch size in query to an optimal value as given below.
Also, when caching is not required, it may be better to use StatelessSession.
ScrollableResults results = session.createQuery("SELECT person FROM Person person")
.setReadOnly(true)
.setFetchSize( 1000 ) // <<--- !!!!
.setCacheable(false).scroll(ScrollMode.FORWARD_ONLY)
FetchSize must be Integer.MIN_VALUE, otherwise it won't work.
It must be literally taken from the official reference: https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
Actually you could have gotten what you wanted -- low-memory scrollable results with MySQL -- if you had used the answer mentioned here:
Streaming large result sets with MySQL
Note that you will have problems with Hibernate lazy-loading because it will throw an exception on any queries performed before the scroll is finished.
With 90 million records, it sounds like you should be batching your SELECTs. I've done with with Oracle when doing the initial load into a distrbuted cache. Looking at the MySQL documentation, the equivalent seems to be using the LIMIT clause: http://dev.mysql.com/doc/refman/5.0/en/select.html
Here's an example:
SELECT * from Person
LIMIT 200, 100
This would return rows 201 through 300 of the Person table.
You'd need to get the record count from your table first and then divide it by your batch size and work out your looping and LIMIT parameters from there.
The other benefit of this would be parallelism - you can execute multiple threads in parallel on this for faster processing.
Processing 90 million records also doesn't sound like the sweet spot for using Hibernate.
The problem could be, that Hibernate keeps references to all objests in the session until you close the session. That has nothing to do with query caching. Maybe it would help to evict() the objects from the session, after you are done writing the object to the file. If they are no longer references by the session, the garbage collector can free the memory and you won't run out of memory anymore.
I propose more than a sample code, but a query template based on Hibernate to do this workaround for you (pagination, scrolling and clearing Hibernate session).
It can also easily be adapted to use an EntityManager.
I've used the Hibernate scroll functionality successfully before without it reading the entire result set in. Someone said that MySQL does not do true scroll cursors, but it claims to based on the JDBC dmd.supportsResultSetType(ResultSet.TYPE_SCROLL_INSENSITIVE) and searching around it seems like other people have used it. Make sure it's not caching the Person objects in the session - I've used it on SQL queries where there was no entity to cache. You can call evict at the end of the loop to be sure or test with a sql query. Also play around with setFetchSize to optimize the number of trips to the server.
recently i worked over a problem like this, and i wrote a blog about how face that problem. is very like, i hope be helpfull for any one.
i use lazy list approach with partial adquisition. i Replaced the limit and offset or the pagination of query to a manual pagination.
In my example, the select returns 10 millions of records, i get them and insert them in a "temporal table":
create or replace function load_records ()
returns VOID as $$
BEGIN
drop sequence if exists temp_seq;
create temp sequence temp_seq;
insert into tmp_table
SELECT linea.*
FROM
(
select nextval('temp_seq') as ROWNUM,* from table1 t1
join table2 t2 on (t2.fieldpk = t1.fieldpk)
join table3 t3 on (t3.fieldpk = t2.fieldpk)
) linea;
END;
$$ language plpgsql;
after that, i can paginate without count each row but using the sequence assigned:
select * from tmp_table where counterrow >= 9000000 and counterrow <= 9025000
From java perspective, i implemented this pagination through partial adquisition with a lazy list. this is, a list that extends from Abstract list and implements get() method. The get method can use a data access interface to continue get next set of data and release the memory heap:
#Override
public E get(int index) {
if (bufferParcial.size() <= (index - lastIndexRoulette))
{
lastIndexRoulette = index;
bufferParcial.removeAll(bufferParcial);
bufferParcial = new ArrayList<E>();
bufferParcial.addAll(daoInterface.getBufferParcial());
if (bufferParcial.isEmpty())
{
return null;
}
}
return bufferParcial.get(index - lastIndexRoulette);<br>
}
by other hand, the data access interface use query to paginate and implements one method to iterate progressively, each 25000 records to complete it all.
results for this approach can be seen here
http://www.arquitecturaysoftware.co/2013/10/laboratorio-1-iterar-millones-de.html
Another option if you're "running out of RAM" is to just request say, one column instead of the entire object How to use hibernate criteria to return only one element of an object instead the entire object? (saves a lot of CPU process time to boot).
For me it worked properly when setting useCursors=true, otherwise The Scrollable Resultset ignores all the implementations of fetch size, in my case it was 5000 but Scrollable Resultset fetched millions of records at once causing excessive memory usage. underlying DB is MSSQLServer.
jdbc:jtds:sqlserver://localhost:1433/ACS;TDS=8.0;useCursors=true

Fastest way to iterate through large table using JDBC

I'm trying to create a java program to cleanup and merge rows in my table. The table is large, about 500k rows and my current solution is running very slowly. The first thing I want to do is simply get an in-memory array of objects representing all the rows of my table. Here is what I'm doing:
pick an increment of say 1000 rows at a time
use JDBC to fetch a resultset on the following SQL query
SELECT * FROM TABLE WHERE ID > 0 AND ID < 1000
add the resulting data to an in-memory array
continue querying all the way up to 500,000 in increments of 1000, each time adding results.
This is taking way to long. In fact its not even getting past the second increment from 1000 to 2000. The query takes forever to finish (although when I run the same thing directly through a MySQL browser its decently fast). Its been a while since I've used JDBC directly. Is there a faster alternative?
First of all, are you sure you need the whole table in memory? Maybe you should consider (if possible) selecting rows that you want to update/merge/etc. If you really have to have the whole table you could consider using a scrollable ResultSet. You can create it like this.
// make sure autocommit is off (postgres)
con.setAutoCommit(false);
Statement stmt = con.createStatement(
ResultSet.TYPE_SCROLL_INSENSITIVE, //or ResultSet.TYPE_FORWARD_ONLY
ResultSet.CONCUR_READ_ONLY);
ResultSet srs = stmt.executeQuery("select * from ...");
It enables you to move to any row you want by using 'absolute' and 'relative' methods.
One thing that helped me was Statement.setFetchSize(Integer.MIN_VALUE). I got this idea from Jason's blog. This cut down execution time by more than half. Memory consumed went down dramatically (as only one row is read at a time.)
This trick doesn't work for PreparedStatement, though.
Although it's probably not optimum, your solution seems like it ought to be fine for a one-off database cleanup routine. It shouldn't take that long to run a query like that and get the results (I'm assuming that since it's a one off a couple of seconds would be fine). Possible problems -
is your network (or at least your connection to mysql ) very slow? You could try running the process locally on the mysql box if so, or something better connected.
is there something in the table structure that's causing it? pulling down 10k of data for every row? 200 fields? calculating the id values to get based on a non-indexed row? You could try finding a more db-friendly way of pulling the data (e.g. just the columns you need, have the db aggregate values, etc.etc)
If you're not getting through the second increment something is really wrong - efficient or not, you shouldn't have any problem dumping 2000, or 20,000 rows into memory on a running JVM. Maybe you're storing the data redundantly or extremely inefficiently?

Categories

Resources