improving performance of writing query results to CSV in java

improving performance of writing query results to CSV in java - java

I have the following code that executes a query and writes it directly to a string buffer which then dumps it to a CSV file. I will need to write large amount of records (maximum to a million). This works for a million records it takes about half an hour for a file that is around 200mb! which seems to me like a lot of time, not sure if this is the best. Please recommend me better ways even if it includes using other jars/db connection utils.
....
eventNamePrepared = con.prepareStatement(gettingStats +
filterOptionsRowNum + filterOptions);
ResultSet rs = eventNamePrepared.executeQuery();
int i=0;
try{
......
FileWriter fstream = new FileWriter(realPath +
"performanceCollectorDumpAll.csv");
BufferedWriter out = new BufferedWriter(fstream);
StringBuffer partialCSV = new StringBuffer();
while (rs.next()) {
i++;
if (current_appl_id_col_display)
partialCSV.append(rs.getString("current_appl_id") + ",");
if (event_name_col_display)
partialCSV.append(rs.getString("event_name") + ",");
if (generic_method_name_col_display)
partialCSV.append(rs.getString("generic_method_name") + ",");
..... // 23 more columns to be copied same way to buffer
partialCSV.append(" \r\n");
// Writing to file after 10000 records to prevent partialCSV
// from going too big and consuming lots of memory
if (i % 10000 == 0){
out.append(partialCSV);
partialCSV = new StringBuffer();
}
}
con.close();
out.append(partialCSV);
out.close();
Thanks,
Tam

Just write to the BufferedWriter directly instead of constructing the StringBuffer.
Also note that you should likely use StringBuilder instead of StringBuffer... StringBuffer has an internal lock, which is usually not necessary.

Profiling is generally the only sure-fire way to know why something's slow. However, in this example I would suggest two things that are low-hanging fruit:
Write directly to the buffered writer instead of creating your own buffering with the StringBuilder.
Refer to the columns in the result-set by integer ordinal. Some drivers can be slow when resolving column names.

You could tweak various things, but for a real improvement I would try using the native tool of whatever database you are using to generate the file. If it is SQL Server, this would be bcp which can take a query string and generate the file directly. If you need to call it from Java you can spawn it as a process.
As way of an example, I have just run this...
bcp "select * from trading..bar_db" queryout bar_db.txt -c -t, -Uuser -Ppassword -Sserver
...this generated a 170MB file containing 2 million rows in 10 seconds.

I just wanted to add a sample code for the suggestion of Jared Oberhaus:
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
public class CSVExport {
public static void main(String[] args) throws Exception {
String table = "CUSTOMER";
int batch = 100;
Class.forName("oracle.jdbc.driver.OracleDriver");
Connection conn = DriverManager.getConnection(
"jdbc:oracle:thin:#server:orcl", "user", "pass");
PreparedStatement pstmt = conn.prepareStatement(
"SELECT /*+FIRST_ROWS(" + batch + ") */ * FROM " + table);
ResultSet rs = pstmt.executeQuery();
rs.setFetchSize(batch);
ResultSetMetaData rsm = rs.getMetaData();
File output = new File("result.csv");
PrintWriter out = new PrintWriter(new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream(output), "UTF-8")), false);
Set<String> columns = new HashSet<String>(
Arrays.asList("COL1", "COL3", "COL5")
);
while (rs.next()) {
int k = 0;
for (int i = 1; i <= rsm.getColumnCount(); i++) {
if (columns.contains(rsm.getColumnName(i).toUpperCase())) {
if (k > 0) {
out.print(",");
}
String s = rs.getString(i);
out.print("\"");
out.print(s != null ? s.replaceAll("\"", "\\\"") : "");
out.print("\"");
k++;
}
}
out.println();
}
out.flush();
out.close();
rs.close();
pstmt.close();
conn.close();
}
}

I have two quick thoughts. The first is, are you sure writing to disk is the problem? Could you actually be spending most of your time waiting on data from the DB?
The second is to try removing all the + ","s, and use more .appends for that. It may help considering how often you are doing those.

You mentioned that you are using Oracle. You may want to investigate using the Oracle External Table feature or Oracle Data Pump depending on exactly what you are trying to do.
See http://www.orafaq.com/node/848 (Unloading data into an external file...)
Another option could be connecting by sqlplus and running "spool " prior to the query.

Writing to a buffered writer is normally fast "enough". If it isn't for you, then something else is slowing it down.
The easiest way to profile it is to use jvisualvm available in the latest JDK.

Related

csv created by java code is writing all the data in the same row

This is a beanshell code , so a few things might look
odd for a java developer. The emailFileAttachment function is a sailpoint API, a tool I am using. My problem is the data I am putting in my map is being put in a single line in excel file. And the header ("Application, Num_entitlement") I am putting in the map is not being printed at the first line in the CSV file. Could anyone please help me. This is my code below:
import sailpoint.object.Application;
import sailpoint.object.Identity;
import java.sql.Connection;
import java.sql.PreparedStatement;
import javax.sql.DataSource;
import java.sql.SQLException;
import sailpoint.server.Environment;
import javax.sql.DataSource;
import java.sql.ResultSet;
import sailpoint.api.SailPointContext;
import java.io.BufferedWriter;
import java.io.IOExceptoin;
import java.io.FileWriter;
import sailpoint.object.EmailTemplate;
import sailpoint.object.EmailOptions;
import java.io.File;
import java.io.FileInputStream;
import sailpoint.object.EmailFileAttachment;
import java.util.HashMap;
import sailpoint.tools.Util;
String query = "SELECT app.name as application, count(*) as num_entitlements FROM spt_application as app, spt_identity_entitlement as ent WHERE app.id = ent.application GROUP BY app.name";
HashMap info = new HashMap();
info.put("Application ", "Num_Entitlement");
PreparedStatement getEntitlement_Num = null;
Connection conn = null;
/*
public static byte[] readFiletoByteArray(File file)
{
FileInputStream fileInputStream = null;
byte[] byteFile = new byte[(int) file.length()];
try
{
fileInputStream = new FileInputStream(file);
fileInputStream.read(byteFile);
fileInputStream.close();
}
catch (Exception e)
{
e.printStackTrace();
}
return byteFile;
}
*/
try{
// Prepared Statements
Environment e = Environment.getEnvironment();
DataSource src = e.getSpringDataSource();
//System.out.println("DataSource: " + src.toString());
conn = src.getConnection();
//System.out.println("Connection: " + conn);
getEntitlement_Num = conn.prepareStatement(query);
ResultSet rs = getEntitlement_Num.executeQuery();
//System.out.println("starting RS");
while(rs.next()) {
String appName = rs.getString("application");
int no_ent = rs.getInt("num_entitlements");
info.put(appName , no_ent);
}
System.out.println("finished RS");
}catch(SQLException e){
log.error( e.toString());
} finally {
if (getEntitlement_Num!= null) {
getEntitlement_Num.close();
}
if(conn != null) {
conn.close();
}
}
//I am using sailpoint APIs for the code below.
String emailDest = "//email address here";
EmailTemplate et = new EmailTemplate();
et.setFrom("//email address here");
et.setBody("Please find an attached CSV file that has the list of all applications in IIQ and their number of Entitlements");
et.setTo(emailDest);
et.setSubject("Entitlement count for each application in IIQ");
EmailOptions ops = new EmailOptions(emailDest,null);
String strInfo = Util.mapToString(info);
byte[] fileData = strInfo.getBytes();
EmailFileAttachment attachment = new EmailFileAttachment( "EntitlementCount.csv", EmailFileAttachment.MimeType.MIME_CSV, fileData );
ops.addAttachment(attachment);
context.sendEmailNotification(et, ops);
//System.out.println("email sent");
return "Success";

info is a HashMap which means there's no guarantee that you can extract data in the same order as you put it in. Therefore your header "Application" might not come first in the CSV file. Instead, use something that maintains the order, eg an ArrayList of Tuple objects (a class you write yourself that contain two String variables).
How does Util.mapToString(info) work? We need so see it so we can investigate the newline problem.

Util.mapToString() will just convert map to string.
Try changing your collection to list of list{app, count} and
iterate over the list to generate the string.
methods Util.listToCsv() or Util.listToQuotedCsv() will be helpful to prepare csv string.
Hope this helps.

You should use a StringBuilder in the same loop as the records iteration and then from the String builder formulate the attachment.
I think the Utility.MapToString with the chashma is the root cause.

Importing a .csv or .xlsx file into mysql DB using Java?

YES, it sounds like a duplicate.
I'm practicing abit of Java on Intellij and tried writing a program to import a .xls excel file into a mysql database. Duplicate question, yes, but trawling the internet didnt yield much.
My code below currently does the job of importing any xls file perfectly. Unfortunately, it doesnt do anything for a .csv file nor an xlsx file.
When i try with a .csv file, the following error is thrown:
Invalid header signature; read 0x6972702C74786574, expected 0xE11AB1A1E011CFD0 - Your file appears not to be a valid OLE2 document
When a xlsx file is used, the following is instead thrown as an error:
Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:152)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:140)
at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:302)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:85)
at FileExport.main(FileExport.java:21)
My code:
import java.io.FileInputStream;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.ss.usermodel.*;
public class FileExport {
public static void main(String[] args) throws Exception {
try {
Class forName = Class.forName("com.mysql.jdbc.Driver");
Connection con = null;
con = DriverManager.getConnection("jdbc:mysql://localhost:3306/test?useSSL=false", "root", "root");
con.setAutoCommit(false);
PreparedStatement pstm = null;
FileInputStream input = new FileInputStream("/Users/User/Desktop/Email/Test.xls");
POIFSFileSystem fs = new POIFSFileSystem(input);
Workbook workbook;
workbook = WorkbookFactory.create(fs);
Sheet sheet = workbook.getSheetAt(0);
Row row;
for (int i = 1; i <= sheet.getLastRowNum(); i++) {
row = (Row) sheet.getRow(i);
String text = row.getCell(0).getStringCellValue();
int price = (int) row.getCell(1).getNumericCellValue();
String sql = "INSERT INTO testtable (text, price) VALUES('" + text + "','" + price + "')";
pstm = (PreparedStatement) con.prepareStatement(sql);
pstm.setString(1, text);
pstm.setInt(2, price);
pstm.execute();
System.out.println("Import rows " + i);
}
con.commit();
pstm.close();
con.close();
input.close();
System.out.println("Success import excel to mysql table");
} catch (IOException e) {
}
}
}
Any suggestions on how to tweak this code to import .csv or xlsx files are greatly appreciated.

You should use PreparedStatement as it was intended for:
String sql = "INSERT INTO testtable (text, price) VALUES(?, ?)";
pstm = (PreparedStatement) con.prepareStatement(sql);
pstm.setString(1, text);
pstm.setInteger(2, text)
pstm.execute();
I guess it doesn't work because there is some punctuation in your text.
Also it is prone to SQL-injection.
Also you should close your closeable objects with try-with-resources or finally (if you stuck to java version prior version 7), as it done here:
https://stackoverflow.com/a/26516076/3323777
EDIT: ah yes, the empty catch block as Alex said. Don't do that, either.
EDIT2:
Apache POI was never designed to call on CSV files.
https://stackoverflow.com/a/1087984/3323777
There is another project for CSV at Apache:
http://commons.apache.org/proper/commons-csv/

Do you see the "Import rows" messages which prove that you actually have some rows to import?
Also the reason for not seeing any errors might be a "catch" block which does nothing. Add any output inside and observe new results, i.e.
System.out.println(e.getMessage());

Tweaked for read from .csv
public class FileExport {
public static void main(String[] args) throws Exception {
try {
Class forName = Class.forName("com.mysql.jdbc.Driver");
Connection con = null;
con = DriverManager.getConnection("jdbc:mysql://localhost:3306/test?useSSL=false", "root", "root");
con.setAutoCommit(false);
PreparedStatement pstm = null;
FileInputStream input = new FileInputStream("/Users/User/Desktop/Email/Test.csv");
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String row = reader.readLine();
String text = row.split(";")[0];
String price = row.split(";")[1];
reader.close();
String sql = "INSERT INTO testtable (text, price) VALUES('" + text + "','" + price + "')";
pstm = (PreparedStatement) con.prepareStatement(sql);
pstm.execute();
System.out.println("Import rows " + i);
}
con.commit();
pstm.close();
con.close();
input.close();
System.out.println("Success import excel to mysql table");
} catch (IOException e) {
}
}

Try using 'poi-ooxml' for creating a Workbook object, its gradle dependency signature is provided below:
compile group: 'org.apache.poi', name: 'poi-ooxml', version: '3.14'
Below code may help you
InputStream inputStream = new FileInputStream("/Users/User/Desktop/Email/Test.xls");
XSSFWorkbook wb = new XSSFWorkbook(inputStream);
XSSFSheet sheet = wb.getSheetAt(0);

Is Teradata CLOB batch processing useless with JDBC?

I think I know the answer to this question but I also want to confirm it with the experts here. I think the answer is: "Yes, because the batch size limit is 16, which is too little. So practically speaking batch processing is useless with Teradata CLOB."
Here is my reasoning. Here is the working Java code. I copy a table from one database connection to another using streaming
public class TestClob {
public void test() throws ClassNotFoundException, SQLException, IOException {
Connection conn1, conn2;
conn1 = DriverManager.getConnection(..., user, pass);
conn2 = DriverManager.getConnection(..., user, pass);
Statement select = conn1.createStatement();
ResultSet rs = select.executeQuery("SELECT TOP 100 myClob FROM myTab " );
int totalRowNumber = 0;
PreparedStatement ps = null;
Clob clob = null;
Reader clobReader = null;
while (rs.next()) {
totalRowNumber++;
System.out.println(totalRowNumber);
clob = rs.getClob(1);
clobReader = clob.getCharacterStream();
ps = conn2.prepareStatement("INSERT INTO myTab2 (myClob2) values (?) ");
ps.setCharacterStream(1, clobReader , clob.length() );
ps.execute(); // HERE I just execute the current row
clob.free(); // FREE the CLOB and READER objects
clobReader.close();
}
conn2.commit();
ps.close();
select.close();
rs.close();
Based on Teradata rules, I cannot have more than 16 object related to LOB open simultaneously.
Therefore I have to make sure that Clob clob and Reader clobReader are freed and closed respectively.
So I have two options
1) do the executeBatch() method and have up to 16 Clob clob and Reader clobReader objects at a time.
2) do the execute() method and close Clob clob and Reader clobReader objects right after that.
The conclusion: Teradata CLOB batch insert is useless with JDBC. One cannot set a batch size of more than 16 when trying to INSERT a Clob
Please help me and let me know if I understand this correctly
I don't see any other ways

You can find here attached an example of batch insert of more than 16 Clobs.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import java.security.GeneralSecurityException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.List;
public class ClobBatch {
public static void main(String[] args) throws GeneralSecurityException, IOException, SQLException {
String databaseCredentials = ExternalData.getCredentials();
Connection c1=DriverManager.getConnection(databaseCredentials);
Connection c2=DriverManager.getConnection(databaseCredentials);
String sql="create volatile table clob_test_input ( id bigint, longobj clob) no primary index on commit preserve rows;";
Statement s=c1.createStatement();
s.execute(sql);
String sql2="create volatile table clob_test_target ( id bigint, longobj clob) no primary index on commit preserve rows;";
Statement s2=c2.createStatement();
s2.execute(sql2);
System.out.println("Inserting test data");
PreparedStatement ps=c1.prepareStatement("insert into clob_test_input (id, longobj) values (?,?);");
for(int i=0; i<1000; i++) {
String st=randomLargeString();
ps.setInt(1, i);
ps.setCharacterStream(2, new BufferedReader(new StringReader(st)), st.length());
ps.addBatch();
}
ps.executeBatch();
System.out.println("reading test data from input table");
Statement select=c1.createStatement();
ResultSet rs=select.executeQuery("select * from clob_test_input");
PreparedStatement ps2=c2.prepareStatement("insert into clob_test_target (id, longobj) values (?,?);");
List<Reader> readerToClose=new ArrayList<Reader>();
System.out.println("start batch creation");
while(rs.next()) {
int pos=rs.getInt("id");
Reader rdr=new BufferedReader(rs.getCharacterStream("longobj"));
StringBuffer buffer=new StringBuffer();
int c=0;
while((c=rdr.read())!=-1) {
buffer.append((char)c);
}
rdr.close();
ps2.setInt(1, pos);
Reader strReader= new StringReader(buffer.toString());
ps2.setCharacterStream(2, strReader,buffer.length());
readerToClose.add(strReader);
ps2.addBatch();
}
System.out.println("start batch execution");
ps2.executeBatch();
rs.close();
c1.commit();
c2.commit();
for(Reader r:readerToClose) r.close();
Statement selectTest=c2.createStatement();
ResultSet rsTest=selectTest.executeQuery("select * from clob_test_target");
System.out.println("show results");
int i=0;
while(rsTest.next()) {
BufferedReader is=new BufferedReader(rsTest.getCharacterStream("longobj"));
StringBuilder sb=new StringBuilder();
int c=0;
while((c=is.read())!=-1) {
sb.append((char)c);
}
is.close();
System.out.println(""+rsTest.getInt("id")+' '+sb.toString().substring(0,80));
}
rsTest.close();
}
private static String randomLargeString() {
StringBuilder sb=new StringBuilder();
for(int i=0;i<10000; i++) {
sb.append((char) (64+Math.random()*20));
}
return sb.toString();
}
}
I've worked on some optimistic hypothesis (e.g. 10000 chars Clobs) but the approach could be made less memory intensive by using temporary files instead of StringBuffers.
The approach is basically find some "buffer" (be it in memory or on temp files) where to keep the data from the source database, so that you can close the input ClobReader. Then you can batch insert the data from the buffer where you don't have the limitation of 16 (you still have memory limitations).

Java looping through array - Optimization

I've got some Java code that runs quite the expected way, but it's taking some amount of time -some seconds- even if the job is just looping through an array.
The input file is a Fasta file as shown in the image below. The file I'm using is 2.9Mo, and there are some other Fasta file that can take up to 20Mo.
And in the code im trying to loop through it by bunches of threes, e.g: AGC TTT TCA ... etc The code has no functional sens for now but what I want is to append each Amino Acid to it's equivalent bunch of Bases. Example :
AGC - Ser / CUG Leu / ... etc
So what's wrong with the code ? and Is there any way to do it better ? Any optimization ? Looping through the whole String is taking some time, maybe just seconds, but need to find a better way to do it.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
public class fasta {
public static void main(String[] args) throws IOException {
File fastaFile;
FileReader fastaReader;
BufferedReader fastaBuffer = null;
StringBuilder fastaString = new StringBuilder();
try {
fastaFile = new File("res/NC_017108.fna");
fastaReader = new FileReader(fastaFile);
fastaBuffer = new BufferedReader(fastaReader);
String fastaDescription = fastaBuffer.readLine();
String line = fastaBuffer.readLine();
while (line != null) {
fastaString.append(line);
line = fastaBuffer.readLine();
}
System.out.println(fastaDescription);
System.out.println();
String currentFastaAcid;
for (int i = 0; i < fastaString.length(); i+=3) {
currentFastaAcid = fastaString.toString().substring(i, i + 3);
System.out.println(currentFastaAcid);
}
} catch (NullPointerException e) {
System.out.println(e.getMessage());
} catch (FileNotFoundException e) {
System.out.println(e.getMessage());
} catch (IOException e) {
System.out.println(e.getMessage());
} finally {
fastaBuffer.close();
}
}
}

currentFastaAcid = fastaString.toString().substring(i, i + 3);
Please replace with
currentFastaAcid = fastaString.substring(i, i + 3);
toString method of StringBuilder create new instance of String object every time you call it. It still contain a copy of all your large string. If you call substring directly from StringBuilder it will return a small copy of substring.
Also remove System.out.println if you don't really need it.

The big factor here is you are doing the call to substring over a new String each time.
Instead, use substring directly over the stringbuilder
for (int i = 0; i < fastaString.length(); i+=3){
currentFastaAcid = fastaString.substring(i, i + 3);
System.out.println(currentFastaAcid);
}
Also, instead of print the currentFastaAcid each time, save it into a list and print this list at the end
List<String> acids = new LinkedList<String>();
for (int i = 0; i < fastaString.length(); i+=3){
currentFastaAcid = fastaString.substring(i, i + 3);
acids.add(currentFastaAcid);
}
System.out.println(acids.toString());

Your main problem besides the debug output surely is, that you are creating a new String with your completely read data from the file in each iteration of your loop:
currentFastaAcid = fastaString.toString().substring(i, i + 3);
fastaString.toString() will give the same result in each iteration and therefore is redundant. Get it outside the loop and you will surely save some seconds runtime.

Apart from suggested optimization in the serial code, I will go for parallel processing to reduce time further. If you have really big file, you can divide the work of reading file and processing read-lines, in separate threads. That way, when one thread is busy reading nextline from large file, other thread can process read-lines and print them on console.

If you remove the
System.out.println(currentFastaAcid);
line in the for loop, you will gain quite decent time.

Improve speed of SQL Inserts into XML column from JDBC (SQL Server)

I am currently writing a Java program which loops through a folder of around 4000 XML files.
Using a for loop, it extracts the XML from each file, assigns it to a String 'xmlContent', and uses the PreparedStatement method setString(2,xmlContent) to insert the String into a table stored in my SQL Server.
The column '2' is a column called 'Data' of type XML.
The process works, but it is slow. It inserts about 50 rows into the table every 7 seconds.
Does anyone have any ideas as to how I could speed up this process?
Code:
{ ...declaration, connection etc etc
PreparedStatement ps = con.prepareStatement("INSERT INTO Table(ID,Data) VALUES(?,?)");
for (File current : folder.listFiles()){
if (current.isFile()){
xmlContent = fileRead(current.getAbsoluteFile());
ps.setString(1, current.getAbsoluteFile());
ps.setString(2, xmlContent);
ps.addBatch();
if (++count % batchSize == 0){
ps.executeBatch();
}
}
}
ps.executeBatch(); // performs insertion of leftover rows
ps.close();
}
private static String fileRead(File file){
StringBuilder xmlContent = new StringBuilder();
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
String strLine = "";
br.readLine(); //removes encoding line, don't need it and causes problems
while ( (strLine = br.readLine() ) != null){
xmlContent.append(strLine);
}
fr.close();
return xmlContent.toString();
}

Just from a little reading and a quick test - it looks like you can get a decent speedup by turning off autoCommit on your connection. All of the batch query tutorials I see recommend it as well. Such as http://www.tutorialspoint.com/jdbc/jdbc-batch-processing.htm
Turn it off - and then drop an explicit commit where you want (at the end of each batch, at the end of the whole function, etc).
conn.setAutoCommit(false);
PreparedStatement ps = // ... rest of your code
// inside your for loop
if (++count % batchSize == 0)
{
try {
ps.executeBatch();
conn.commit();
}
catch (SQLException e)
{
// .. whatever you want to do
conn.rollback();
}
}

Best make the read and write parallel.
Use one thread to read the files and store in a buffer.
Use another thread to read from the buffer and execute queries on database.
You can use more than one thread to write to the database in parallel. That should give you even better performance.
I would suggest you follow this MemoryStreamMultiplexer approach where you can read the XML files in one thread and store in a buffer and then use one or more thread to read from the buffer and execute against database.
http://www.codeproject.com/Articles/345105/Memory-Stream-Multiplexer-write-and-read-from-many
It is a C# implementation, but you get the idea.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

improving performance of writing query results to CSV in java - java

Just write to the BufferedWriter directly instead of constructing the StringBuffer. Also note that you should likely use StringBuilder instead of StringBuffer... StringBuffer has an internal lock, which is usually not necessary.

I have two quick thoughts. The first is, are you sure writing to disk is the problem? Could you actually be spending most of your time waiting on data from the DB? The second is to try removing all the + ","s, and use more .appends for that. It may help considering how often you are doing those.

Writing to a buffered writer is normally fast "enough". If it isn't for you, then something else is slowing it down. The easiest way to profile it is to use jvisualvm available in the latest JDK.

Related

csv created by java code is writing all the data in the same row

Importing a .csv or .xlsx file into mysql DB using Java?

Is Teradata CLOB batch processing useless with JDBC?

Java looping through array - Optimization

Improve speed of SQL Inserts into XML column from JDBC (SQL Server)

Categories

Resources