Java Heap Space Exception, with big ammount of data, any solution?

Java Heap Space Exception, with big ammount of data, any solution? - java

I've a litle big problem with java heap memory
I'm trying to migrate from oracle database 11g to access file 2007
This is not a problem below 65.000 records, now from there...
The aplication is throwing java heap exception, the memory consumption is raising over 600m and the CPU usage over 50% until the exeption.
As far as i know the rset.next() don't get all data (over 50 colums x +65000 row), but some records x time
i've try to set fetch size too, nothing happened
rset.setFetchSize(1000);
i've erase my code and show a output, same error
while (rset.next()) {
if (cont % 5000 == 0) {
System.out.println(cont + " proccesed and counting ...");
}
}
please don't give me the answer of using -xm(s, x)512, 1024, etc...
this could solved, not in my particulary case (i've tryied to set this even higher xD, nothing happend, i got the same exception at 65.000 records too)
Is there any other options i could try??,
meaby changing some driver configurations or string conections ??
please help
sorry aboubt my english
this is my connection:
Class.forName("oracle.jdbc.driver.OracleDriver");
this.conn = DriverManager.getConnection("jdbc:oracle:thin:#" + getServer() + ":1521:orcl", getUser(), getPassword());
this.stmt = this.conn.createStatement(java.sql.ResultSet.TYPE_SCROLL_INSENSITIVE, java.sql.ResultSet.CONCUR_UPDATABLE);

It looks like the problem is that you are using a Scrollable ResultSet and that is going to use more memory.

Related

Apache Spark : TaskResultLost (result lost from block manager) Error On cluster

I have a Spark standalone cluster with 3 slaves on Virtualbox. My code is on Java and it is working fine with my small input datasets which their inputs are around 100MB totally.
I set my virtual machines RAM to be 16GB but when I was runnig my code on big input files (about 2GB) I get this error after hours of processing in my reduce part:
Job aborted due to stage failure: Total size of serialized results of 4 tasks (4.3GB) is bigger than spark.driver.maxResultSize`
I edited the spark-defaults.conf and assigned a higher amount (2GB and 4GB) for spark.driver.maxResultSize. It didn't help and the same error showed up.
No I am trying 8GB of spark.driver.maxResultSize and my spark.driver.memory is also the same as RAM size (16GB). But I get this error:
TaskResultLost (result lost from block manager)
Any comments about this? I also include an image.
I don't know if the problem is causing by the large size of maxResultSize or this is something with collections of RDDs in the code. I also provide the mapper part of the code for a better understanding.
JavaRDD<Boolean[][][]> fragPQ = uData.map(new Function<String, Boolean[][][]>() {
public Boolean[][][] call(String s) {
Boolean[][][] PQArr = new Boolean[2][][];
PQArr[0] = new Boolean[11000][];
PQArr[1] = new Boolean[11000][];
for (int i = 0; i < 11000; i++) {
PQArr[0][i] = new Boolean[11000];
PQArr[1][i] = new Boolean[11000];
for (int j = 0; j < 11000; j++) {
PQArr[0][i][j] = true;
PQArr[1][i][j] = true;

In general, this error shows that you are collecting/bringing a large amount of data onto the driver. This should never be done. You need to rethink your application logic.
Also, you don't need to modify spark-defaults.conf to set the property. Instead, you can specify such application-specific properties via --conf option in spark-shell or spark-submit, depending on how you run the job.

SOLVED:
The problem solved by increasing the master RAM size. I studied my case and found out that based on my design assigning 32GB of RAM would be sufficient. Now by doing than, my program is working fine and is calculating everything correctly.

In my case, I got this error because a firewall was blocking the block manager ports between the driver and the executors.
The port can be specified with:
spark.blockManager.port and
spark.driver.blockManager.port
See https://spark.apache.org/docs/latest/configuration.html#networking

Getting CPU 100 percent when I am trying to downloading CSV in Spring

I am getting CPU performance issue on server when I am trying to download CSV in my project, CPU goes 100% but SQL returns the response within 1 minute. In the CSV we are writing around 600K records for one user it is working fine but for concurrent users we are getting this issue.
Environment
Spring 4.2.5
Tomcat 7/8 (RAM 2GB Allocated)
MySQL 5.0.5
Java 1.7
Here is the Spring Controller code:-
#RequestMapping(value="csvData")
public void getCSVData(HttpServletRequest request,
HttpServletResponse response,
#RequestParam(value="param1", required=false) String param1,
#RequestParam(value="param2", required=false) String param2,
#RequestParam(value="param3", required=false) String param3) throws IOException{
List<Log> logs = service.getCSVData(param1,param2,param3);
response.setHeader("Content-type","application/csv");
response.setHeader("Content-disposition","inline; filename=logData.csv");
PrintWriter out = response.getWriter();
out.println("Field1,Field2,Field3,.......,Field16");
for(Log row: logs){
out.println(row.getField1()+","+row.getField2()+","+row.getField3()+"......"+row.getField16());
}
out.flush();
out.close();
}}
Persistance Code:- I am using spring JDBCTemplate
#Override
public List<Log> getCSVLog(String param1,String param2,String param3) {
String sql =SqlConstants.CSV_ACTIVITY.toString();
List<Log> csvLog = JdbcTemplate.query(sql, new Object[]{param1, param2, param3},
new RowMapper<Log>() {
#Override
public Log mapRow(ResultSet rs, int rowNum)
throws SQLException {
Log log = new Log();
log.getField1(rs.getInt("field1"));
log.getField2(rs.getString("field2"));
log.getField3(rs.getString("field3"));
.
.
.
log.getField16(rs.getString("field16"));
}
return log;
}
});
return csvLog;
}

I think you need to be specific on what you meant by "100% CPU usage" whether it's the Java process or MySQL server. As you have got 600K records, trying to load everything in to memory would easily end up in OutOfMemoryError. Given that this works for one user means that you've got enough heap space to process this number of records for just one user and symptoms surface when there are multiple users trying to use the same service.
First issue I can see in your posted code is that you try to load everything into one big list and the size of the list varies based on the content of the Log class. Using a list like this also means that you have to have enough memory to process JDBC result set and generate new list of Log instances. This can be a major problem with a growing number of users. This type of short-lived objects will cause frequent GC and once GC cannot keep up with the amount of garbage being collected it fails obviously. To solve this major issue my suggestion is to use ScrollableResultSet. Additionally you can make this result set read-only, for example below is code fragment for creating a scrollable result set. Take a look at the documentation for how to use it.
Statement st = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_READ_ONLY);
Above option is suitable if you're using pure JDBC or SpringJDBC template. If Hibernate is already used in your project you can still achieve the same this with the below code fragment. Again please check the documentation for more information and you have a different JPA provider.
StatelessSession session = sessionFactory.openStatelessSession();
Query query = session.createSQLQuery(queryStr).setCacheable(false).setFetchSize(Integer.MIN_VALUE).setReadOnly(true);
query.setParameter(query_param_key, query_paramter_value);
ScrollableResults resultSet = query.scroll(ScrollMode.FORWARD_ONLY);
This way you're not loading all the records to Java process in one go, instead you they're loaded on demand and will have small memory footprint at any given time. Note that JDBC connection will be open until you're done with processing the entire record set. This also means that your DB connection pool can be exhausted if many users are going to download CSV files from this endpoint. You need to take measures to overcome this problem (i.e use of an API manager to rate limit the calls to this endpoint, reading from a read-replica or whatever viable option).
My other suggestion is to stream data which you have already done, so that any records fetched from the DB are processed and sent to client before the next set of records are processed. Again I would suggest you to use a CSV library such as SuperCSV to handle this as these libraries are designed to handle a good load of data.
Please note that this answer may not exactly answer your question as you haven't provided necessary parts of your source such as how to retrieve data from DB but will give the right direction to solve this issue

Your problem in loading all data on application server from database at once, try to run query with limit and offset parameters (with mandatory order by), push loaded records to client and load next part of data with different offset. It help you decrease memory footprint and will not required keep connection to database open all the time. Of course, database will loaded a bit more, but maybe whole situation will better. Try different limit values, for example 5K-50K and monitor cpu usage - on both app server and database.
If you can allow keep many open connection to database #Bunti answer is very good.
http://dev.mysql.com/doc/refman/5.7/en/select.html

JDBC - Get all Table names from OpenOffice Database

long time lurker, first question time.
I tried searching for how to get all of the tables from a database created with OpenOffice using JDBC, and while I found answers that work for others, they do not work for me. The code itself actually returns something, but it returns something completely unexpected.
My code:
try {
DatabaseMetaData md = conn.getMetaData();
rs = md.getTables(null, null, "%", null);
while (rs.next()) {
tableNames.add(rs.getString(3));
System.out.println(rs.getString(3));
}
}
catch (Exception e) {
System.out.println("error in sendConnection()");
}
And the output:
SYSTEM_ALIASES
SYSTEM_ALLTYPEINFO
SYSTEM_AUTHORIZATIONS
SYSTEM_BESTROWIDENTIFIER
SYSTEM_CACHEINFO
SYSTEM_CATALOGS
SYSTEM_CHECK_COLUMN_USAGE
SYSTEM_CHECK_CONSTRAINTS
SYSTEM_CHECK_ROUTINE_USAGE
SYSTEM_CHECK_TABLE_USAGE
SYSTEM_CLASSPRIVILEGES
SYSTEM_COLLATIONS
SYSTEM_COLUMNPRIVILEGES
SYSTEM_COLUMNS
SYSTEM_CROSSREFERENCE
SYSTEM_INDEXINFO
SYSTEM_PRIMARYKEYS
SYSTEM_PROCEDURECOLUMNS
SYSTEM_PROCEDURES
SYSTEM_PROPERTIES
SYSTEM_ROLE_AUTHORIZATION_DESCRIPTORS
SYSTEM_SCHEMAS
SYSTEM_SCHEMATA
SYSTEM_SEQUENCES
SYSTEM_SESSIONINFO
SYSTEM_SESSIONS
SYSTEM_SUPERTABLES
SYSTEM_SUPERTYPES
SYSTEM_TABLEPRIVILEGES
SYSTEM_TABLES
SYSTEM_TABLETYPES
SYSTEM_TABLE_CONSTRAINTS
SYSTEM_TEXTTABLES
SYSTEM_TRIGGERCOLUMNS
SYSTEM_TRIGGERS
SYSTEM_TYPEINFO
SYSTEM_UDTATTRIBUTES
SYSTEM_UDTS
SYSTEM_USAGE_PRIVILEGES
SYSTEM_USERS
SYSTEM_VERSIONCOLUMNS
SYSTEM_VIEWS
SYSTEM_VIEW_COLUMN_USAGE
SYSTEM_VIEW_ROUTINE_USAGE
SYSTEM_VIEW_TABLE_USAGE
What is being returned, and how can I work around or resolve this? Thank you in advance!
Edit: The Databases created buh OpenOffice appear to be Embedded Databases by default. This may be causing the problem. Going to try and convert it to something else and see what happens.

I found a way to fix this, in case others come across this problem as well. The problem was OpenOffice was saving the database as a base file, with hsqldb under it. You need to make it just a regular hsqldb.
I used both of these links as resources:
http://programmaremobile.blogspot.com/2009/01/java-and-openoffice-base-db-through.html
https://forum.openoffice.org/en/forum/viewtopic.php?f=83&t=65980
In short, you need to extract the .odb file, go into the directories and find the database directory holding 4 other files. Add a prefix to them and then access it like normal.
I am still getting the monstrosity of the SYSTEM_* tables, but now I am actually getting the tables I want as well. From there I think I can figure out how to just get those random tables.

SpreadsheetAddRows failing on moderate size query

Edit: i changed the name as there is a similar SO question How do I fix SpreadSheetAddRows function crashing when adding a large query? out there that describes my issue so i pharased more succinctly...the issue is spreadsheetAddrows for my query result bombs the entire server at what i consider a moderate size (1600 rows, 27 columns) but that sounds considerably less than his 18,000 rows
I am using an oracle stored procedure accessed via coldfusion 9.0.1 cfstoredproc that on completion creates a spreadsheet for the user to download
The issue is that result sets greater than say 1200 rows are returning a 500 internal server error, 700 rows return fine, so i am guessing it is a memory problem?
the only message i received other than 500 Internal server error in the standard coldfusion look was in small print "gc overhead limit exceeded" and that was only once on a page refresh, which refers to the underlying Java JVM
I am not even sure how to go about diagnosing this
here is the end of the cfstoredproc and spreadsheet obj
<!--- variables assigned correctly above --->
<cfprocresult name="RC1">
</cfstoredproc>
<cfset sObj = spreadsheetNew("reconcile","yes")>
<cfset SpreadsheetAddRow(sObj, "Column_1, ... , Column27")>
<cfset SpreadsheetFormatRow(sObj, {bold=TRUE, alignment="center"}, 1)>
<cfset spreadsheetAddRows(sObj, RC1)>
<cfheader name="content-disposition" value="attachment; filename=report_#Dateformat(NOW(),"MMDDYYYY")#.xlsx">
<cfcontent type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" variable="#spreadsheetReadBinary(sObj)#">

My Answer lies with coldfusion and one simple fact: DO NOT USE SpreadsheetAddRows or any of those related functions like SpreadsheetFormatRows
My solution to this was to execute the query, create an xls file, use the tag cfspreadsheet to write to the newly created xls file, then serve to the browser, deleting after serving
Using SpreadsheetAddRows, Runtime went from crashing server on 1000+ rows, 5+mins on 700 rows
Using the method outlined above 1-1.5 secs
if you are interested in more code, i can provide just comment, i am using the coldbox framework so didnt think the specificness would help just the new workflow

Java storedProcedure stops with OutOfMemoryError

I'm working on a Java project, running on Tomcat 6, which connects to a MySQL database. All procedures run as they should, both when testing local as testing on the server of our customer. There is one exception however, and that's for one procedure which retrieves a whole lot of data to generate a report. The stored procedure takes like 13 minutes or so when executing it from MySQL. When I run the application locally and connect to the online database, the procedure does work, the only time it doesn't work, is when it is run on the server of our client.
The client is pretty protective over his server, so we have limited control over it, but they do want us to solve the problem. When i check the log files, no errors are thrown from the function that executes the stored procedure. And putting some debug logs in the code, it shows that it does get to the execute call, but doesn't log the debug right after the call, neither logs the error in the catch, but does get into the finally section.
They claim there are no time-out errors in the MySQL logs.
If anyone has any idea on what might cause this problem, any help will be appreciated.
update:
after some nagging to the server administrator, I've finally got access to the catalina logs, and in those logs, i've finally found an error that has some meaning:
Exception in thread "Thread-16" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2894)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:407)
at java.lang.StringBuffer.append(StringBuffer.java:241)
at be.playlane.mink.database.SelectExportDataProcedure.bufferField(SelectExportDataProcedure.java:68)
at be.playlane.mink.database.SelectExportDataProcedure.extractData(SelectExportDataProcedure.java:54)
at org.springframework.jdbc.core.JdbcTemplate.processResultSet(JdbcTemplate.java:1033)
at org.springframework.jdbc.core.JdbcTemplate.extractReturnedResultSets(JdbcTemplate.java:947)
at org.springframework.jdbc.core.JdbcTemplate$5.doInCallableStatement(JdbcTemplate.java:918)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:876)
at org.springframework.jdbc.core.JdbcTemplate.call(JdbcTemplate.java:908)
at org.springframework.jdbc.object.StoredProcedure.execute(StoredProcedure.java:113)
at be.playlane.mink.database.SelectExportDataProcedure.execute(SelectExportDataProcedure.java:29)
at be.playlane.mink.service.impl.DefaultExportService$ExportDataRunnable.run(DefaultExportService.java:82)
at java.lang.Thread.run(Thread.java:636)
Weird tho that this doesn't log to the application logs, even tho it is wrapped within a try catch. Now based upon the error, the problem lies withing this methods:
public Object extractData(ResultSet rs) throws SQLException, DataAccessException
{
StringBuffer buffer = new StringBuffer();
try
{
// get result set meta data
ResultSetMetaData meta = rs.getMetaData();
int count = meta.getColumnCount();
// get the column names; column indices start from 1
for (int i = 1; i < count + 1; ++i)
{
String name = meta.getColumnName(i);
bufferField(name, i == count, buffer);
}
while (rs.next())
{
// get the column values; column indices start from 1
for (int i = 1; i < count + 1; ++i)
{
String value = rs.getString(i);
bufferField(value, i == count, buffer);
}
}
}
catch (Exception e)
{
logger.error("Failed to extractData SelectExportDataProcedue: ", e);
}
return buffer.toString();
}
private void bufferField(String field, boolean last, StringBuffer buffer)
{
try
{
if (field != null)
{
field = field.replace('\r', ' ');
field = field.replace('\n', ' ');
buffer.append(field);
}
if (last)
{
buffer.append('\n');
}
else
{
buffer.append('\t');
}
}
catch (Exception e)
{
logger.error("Failed to bufferField SelectExportDataProcedue: ", e);
}
}
The goal of these function is to export a certain resultset to an excel file (which happens on a higher level).
So if anyone has some tips on optimising this, they are very well welcome.

Ok, your stack trace gives you the answer:
Exception in thread "Thread-16" java.lang.OutOfMemoryError: Java heap space
That's why you're not logging, the application is crashing (Thread, to be specific). Judging from your description it sounds like you have a massive dataset that needs to be paged.
while (rs.next())
{
// get the column values; column indices start from 1
for (int i = 1; i < count + 1; ++i)
{
String value = rs.getString(i);
bufferField(value, i == count, buffer);
}
}
This is where you're thread dies (probably). Basically your StringBuffer runs out of memory. As for correcting it, there's a huge amount of options. Throw more memory at the problem on the client side (either by configuring the JVM (Here's a link):
How to set the maximum memory usage for JVM?
Or, if you're already doing that, throw more RAM into the device.
From a programming perspective it sounds like this is a hell of a report. You could offload some of the number crunching to MySQL rather than buffering on your end (if possible), or, if this is a giant report I would consider streaming it to a File and then reading via a buffered stream to fill the report.
It totally depends on what the report is. If it is tiny, I would aim at doing more work in SQL to minimize the result set. If it is a giant report then buffering is the other option.
Another possibility that you might be missing is that the ResultSet (depending on implementations) is probably buffered. That means instead of reading it all to strings maybe your report can take the ResultSet object directly and print from it. The downside to this, of course, is that a stray SQL exception will kill your report.
Best of luck, I'd try the memory options first. You might be running with something hilariously small like 128 and it will be simple (I've seen this happen a lot on remotely administered machines).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.