Exporting large data into CSV from sqlserver using java - java

I have 9 million records in sqlserver. I am trying to import it into csv files so that I can put that data into mongo db. I have written Java code for sql2csv import. But I have two issue
If I read all the data in list and then try to insert into CSV, I got outofmemorry exception.
If I read line by line and try to insert every line in CSV, it took very long time to export data.
My code is some thing like
List list = new ArrayList();
try {
Class.forName(driver).newInstance();
conn = DriverManager.getConnection(url, databaseUserName, databasePassword);
stmt = conn.prepareStatement("select OptimisationId from SubReports");
result = null;
result = stmt.executeQuery();
// stmt.executeQuery("select * from Subscription_OptimisationReports");
result.setFetchSize(1000);
while (result.next()) {
//System.out.println("Inside while");
SubReportsBean bean = new SubReportsBean();
bean.setOptimisationId(result.getLong(("OptimisationId")));
list.add(bean);
generateExcel(list);
}
//generateExcel(list);
conn.close();
}
Can there be a faster approach to export all data quickly? Or even better if it can directly be exported to mongo instead of csv.

Maybe you should paginate your data by only reading a little at a time by using LIMIT and OFFSET.
select OptimisationId from SubReports OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY;
select OptimisationId from SubReports OFFSET 1000 ROWS FETCH NEXT 1000 ROWS ONLY;
select OptimisationId from SubReports OFFSET 2000 ROWS FETCH NEXT 1000 ROWS ONLY;
...
Just keep a counter of the offset.
Another Example
If you use this solution then you'd need to modify your code to append to the end of the Excel file -- don't keep all your results in memory otherwise you'll still run into the OutOfMemoryException.

Definitely when dealing with so much records, collecting all date in a list before dumping in to CSV is bound to fail.
So your solution 2 is the way to go.
Your code seems to correspond to this solution but I think you 've just forgotten to move your list declaration or to empty your list in the loop. You could do :
try {
Class.forName(driver).newInstance();
conn = DriverManager.getConnection(url, databaseUserName, databasePassword);
stmt = conn.prepareStatement("select OptimisationId from SubReports");
result = null;
result = stmt.executeQuery();
// stmt.executeQuery("select * from Subscription_OptimisationReports");
result.setFetchSize(1000);
while (result.next()) {
//System.out.println("Inside while");
SubReportsBean bean = new SubReportsBean();
bean.setOptimisationId(result.getLong(("OptimisationId")));
List list = new ArrayList();
list.add(bean);
generateExcel(list);
}
//generateExcel(list);
conn.close();
}

Related

How to avoid out of memory heap space error in java when trying to retrieve a million rows from a mysql table? [duplicate]

This question already has answers here:
How to avoid OOM (Out of memory) error when retrieving all records from huge table?
(6 answers)
Closed 4 years ago.
I get a out of memory heap space error at- oRsSelect = oPrStmt.executeQuery();
This is the function that retrieves a million records from a mysql table called 'snomedinfo_data':
public String getSnomedCodes()
{
Root oRoot = null;
JSONObject oJsonSno = null;
JSONObject oJson = null;
JSONArray oJsonArr = null;
ResultSet oRsSelect = null;
PreparedStatement oPrStmt = null;
String strSql = null;
String snomedcode=null;
String snomeddesc=null;
String str=null;
int cStart = 0;
try {
oJsonSno = new JSONObject();
oJson = new JSONObject();
oJsonArr = new JSONArray();
oRoot = Root.createDbConnection(null);
//retrieving data from table
strSql = "SELECT * FROM snomedinfo_data ;";
oPrStmt = oRoot.con.prepareStatement(strSql);
oRsSelect = oPrStmt.executeQuery();
while (oRsSelect.next()) {
snomedcode = Root.TrimString(oRsSelect.getString("conceptid"));
snomeddesc = Root.TrimString(oRsSelect.getString("term"));
oJsonSno = new JSONObject();
oJsonSno.put("snomedcode",snomedcode);
oJsonSno.put("snomeddesc",snomeddesc);
oJsonArr.put(oJsonSno);
}
oJson.put("status", "success");
oJson.put("snomeddata", oJsonArr);
str = oJson.toString();
}
catch (Exception e) {
e.printStackTrace();
}
finally {
oRsSelect = Root.EcwCloseResultSet(oRsSelect);
oPrStmt = Root.EcwClosePreparedStatement(oPrStmt);
oRoot = Root.closeDbConnection(null, oRoot);
}
return str;
}
I get a out of memory heap space error at- oRsSelect = oPrStmt.executeQuery();
I have tried using -
strSql = "SELECT * FROM snomedinfo_data ;";
oRoot.con.setAutoCommit(false);
oPrStmt = oRoot.con.prepareStatement(strSql);
oPrStmt = oRoot.con.prepareStatement(strSql, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
oPrStmt.setFetchSize(Integer.MIN_VALUE);
oRsSelect = oPrStmt.executeQuery();
But it still does not work and continues to give the out of memory heapspace error.please help!!
As far as I remember, MySQL does not support result set buffering.
This means MySQL sends you ALL rows back at once and that's it. Theoretically fetchSize is defined to deal with these cases, but it's only a suggestion to the JDBC driver and the RDBMS.
According to the related answer by Gord Thompson, apparently MySQL can adhere to the fecthSize if configured appropriately. This requires to set the parameter useCursorFetch=true in the JDBC connection URL. You'll need to test if your version of MySQL and your version of the JDBC driver abide by it.
Other solutions? Well... I haven't tested these ones but at least I can describe them, so you can see if they are worth the effort:
Use a cursor. Retrieve a cursor instead of a simple SQL select. Then you can retrieve and process rows in smaller chunks.
Use pagination. Some queries naturally support pagination. They need to be ordered by a unique set of columns. This way you execute the query multiple times, so you get 10000 rows at a time only. Big down side: If the data is being constantly modified this can give you an inconsistent result set. The original single SELECT query is fully transactional, while this solution IS NOT. If this is for a nightly process when the table is not modified, then this down side may not be relevant.
The big down side of both solutions is that they require more work. More application effort, more SQL effort, more debugging effort. If it's too much maybe you should consider upgrading to PostgreSQL, DB2, or Oracle, that do implement buffering properly.
Alternatively MariaDB may support it (and is very similar to MySQL) but I wouldn't bet on it.
You could simply increase the memory heap size by passing in parameters to the JVM:
-Xmx2G
The "2G" stands for 2GB of memory. You can use ‘G’ for GB, ‘M’ for MB and ‘K’ for KB. Try it out and use the amount you need and have available. You can do some reading on it here
If it is still using a huge load of memory, you could think to think about is to enable streaming results for MySQL queries, so the ResultSet is not fully loaded into the memory at once - At costs of performance, of course:
oPrStmt = oRoot.con.prepareStatement(strSql);
oPrStmt.setFetchSize(Integer.MIN_VALUE); // Code to use minimum fetching size
oRsSelect = oPrStmt.executeQuery();

How to use multi threading to fetch data in mysql

Hi I am trying to fetch 50K + rows from one of the table in MYSQL DB. It is taking more than 20 minutes to retrieve all the data and writing it to text file. Can I use multi threading to reduce this fetching time and make the code more efficient. Any help will be appreciated.
I have used normal JDBC connection and ResultSetMetaData to fetch rows from the Table.
String row = "";
stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("select * from employee_details");
ResultSetMetaData rsmd = rs.getMetaData();
int columnCount = rsmd.getColumnCount();
while (rs.next()) {
for (int i = 1; i < columnCount; i++) {
row = row + rs.getObject(i) + "|";// check
}
row = row + "\r\n";
}
And I am writing the fetched values in text file as below.
BufferedWriter writer = new BufferedWriter(new FileWriter(
"C:/Users/430398/Desktop/file/abcd.txt"));
writer.write(row);
writer.close();
Remember that rs.next will fetch Results from the DB in n-batches. Where n is a number defined by the JDBC-Implementation. I assume it's at 10 right now. So for every 10 batches it will again query the DB, hence there'll be an network-overhead - even if it's on the very same machine.
Just increasing that number will result in a faster loading time.
edit:
adding this
stmt.setFetchSize(50000);
might be it.
Be aware, that this results in heavy memory consumption.
First you need to identify where the bottleneck is. Is it the SQL query? Or the fetching of the rows via the ResultSet? Or the building of the huge string? Or perhaps writing the file?
You need to measure the duration of the above mentioned individual parts of your algorithm and tells us the results. Without this knowledge is not possible to tell how to speed the algorithm.

How to copy table from one database to another?

I need to take a table from one database and upload it to a different database.
So, I create two separate connection . Here is my code
Connection connection1 = // set up connection to dbms1
Statement statement = connection1.createStatement();
ResultSet result = statement.executeQuery("select * from ............. ");
Connection connection2 = // set up connection to dbms2
// Now I want to upload the ResultSet result into the second database
Statement statement2 = connection2.createStatement("insert into table2 " + result);
statement2.executeUpdate();
The above last lines do not work
How can i do this ? The bottomline is how to reuse a ready resultset
ResultSet is a ready java object . I hope there is a way add it to batch or something like this and executeUpdate , but not to write the result set to some temporary space (List, csv etc.) and the insert
The simplest way to do this is with a prepared statement for the insert. It lets you create a single statement object that can be used to run the query multiple times with different parameter values.
try (final Statement statement1 = connection1.createStatement();
final PreparedStatement insertStatement =
connection2.prepareStatement("insert into table2 values(?, ?)"))
{
try (final ResultSet resultSet =
statement1.executeQuery("select foo, bar from table1"))
{
while (resultSet.next())
{
// Get the values from the table1 record
final String foo = resultSet.getString("foo");
final int bar = resultSet.getInt("bar");
// Insert a row with these values into table2
insertStatement.clearParameters();
insertStatement.setString(1, foo);
insertStatement.setInt(2, bar);
insertStatement.executeUpdate();
}
}
}
The rows are inserted into table2 as you iterate through the results from table1, so there's no need to store the whole result set.
You can also use the prepared statement's addBatch() and executeBatch() methods to queue up all the inserts and send them to the database all at once, instead of sending a separate message to the database for each individual inserted row. But that forces JDBC to hold all the pending inserts in memory locally, which it seems you're trying to avoid. So the one-row-at-a-time inserts are your best bet in this case.
If you don't want to manually list out all the field names for every table in the database, you should be able to do this two-step process instead:
Copy the table schema using the answer in this question.
Use resultSet.getMetaData() to get the list of fields, and use that to drive a modified version of the SELECT/INSERT code in #Wyzard's answer.
I will post code here if I get it working.

SQL - how to return multiple rows with one SQL query?

I have a managed bean which makes SQL queries to Oracle database. This is just very simple example how I make SQL queries. This is the table structure:
GLOBALSETTINGS
---------------------------------
SessionTTL VARCHAR2(40 BYTE)
MAXACTIVEUSERS NUMBER
ACTIVEUSERS VARCHAR2(20 BYTE)
I use this table just to store application settings. In the example listed below I can fetch just one string with one SQL statement. I want with SQL query to fetch the content of the three rows - SessionTTL, MAXACTIVEUSERS, ACTIVEUSERS. Is it possible?
public String CheckUserDB(String userToCheck) throws SQLException {
String storedPassword = null;
String SQL_Statement = null;
if (ds == null) throw new SQLException();
Connection conn = ds.getConnection();
if (conn == null) throw new SQLException();
try {
conn.setAutoCommit(false);
boolean committed = false;
try {
SQL_Statement = "SELECT Passwd from USERS WHERE Username = ?";
PreparedStatement passwordQuery = conn.prepareStatement(SQL_Statement);
passwordQuery.setString(1, userToCheck);
ResultSet result = passwordQuery.executeQuery();
if(result.next()){
storedPassword = result.getString("Passwd");
}
conn.commit();
committed = true;
} finally {
if (!committed) conn.rollback();
}
}
finally {
conn.close();
}
return storedPassword;
}
P.S I want the content of the rows.
I'm hoping I understand what you are asking for, but I fear I don't as it seems too simple, but anyway...
I think you want the contents of 3 columns, not rows. And yes you can, you just specify the columns you want returned in your SQL statement:
SELECT SessionTTL, MAXACTIVEUSERS, ACTIVEUSERS FROM GLOBALSETTINGS WHERE (condition)...
you can also use * as a shortcut for all columns iof you don't want to explicitly specify them:
SELECT * FROM GLOBALSETTINGS WHERE (condition)...
Some background reading on SQL syntax might be useful
If I read this correctly (sorry if mistaken), all you want to do is change your SQL command to select ALL COLUMNS in your database table.
To do so:
string SqlAll = #"SELECT Database.SessionTTL, Database.MAXACTIVEUSERS, Database.ACTIVEUSERS FROM Database";
This will retrieve ALL columns in the database. You can also have conditional statements in your queries when you want to filter for logical reasons, such as TOP 20 to get the first 20 results from the result set.
If you like to return multiple lines with one sql query, you may want to look into ArrayList as you need a loop, where the code would go through your records and match and find all possible results until the end of the records list.

(outofmemoryerror: java heap space) when iterating through oracle records

hello fellow java developers.
I'm having a bit of an issue here. I have code that gets a resultset from an oracle database, prints each row to a file, then gets the next row - and continues till the end of the resultset.
Only this isn't what happens. What happens is that it gets the resultset, starts iterating through the rows, printing to file as it goes, until it runs out of memory - claiming it needs more space on the java heap.
The app is currently running with 2g of memory on the heap and the code breaks at about the 150000th row.
I'm using jodbc6.jar and java 6
Here is an idea of what my code is doing:
Connection conn = DriverManager.getConnection(url,"name","pwd");
conn.setAutoCommit(false);
Statement stmt = conn.createStatement();
ResultSet rset = stmt.executeQuery(strSql);
String strVar_1 = null;
long lCount = 0;
while(rset.next()){
lCount++;
if (lCount % 100000 == 0){
System.out.println(lCount + " rows completed");
}
strVar_1 = rset.getString("StringID"); /// breaks here!!!!!!!!!
if (strVar_1 == null){
strVar_1 = "";
}
if (!strQuery_1.equals("")){
out.write(strVar_1 + "\n");
}
}
out.close();
Try below:
Statement stmt = conn.createStatement();
stmt.setFetchSize(someInt);
ResultSet rset = stmt.executeQuery(strSql);
This will control how many records are fetched at a time.
Well maybe another way of dealing with such large data is to keep returning the row and writing it to a file. This way the string buffer does not keep growing and you should be able to write all the records to file system and then read them late

Categories

Resources