I think I know the answer to this question but I also want to confirm it with the experts here. I think the answer is: "Yes, because the batch size limit is 16, which is too little. So practically speaking batch processing is useless with Teradata CLOB."
Here is my reasoning. Here is the working Java code. I copy a table from one database connection to another using streaming
public class TestClob {
public void test() throws ClassNotFoundException, SQLException, IOException {
Connection conn1, conn2;
conn1 = DriverManager.getConnection(..., user, pass);
conn2 = DriverManager.getConnection(..., user, pass);
Statement select = conn1.createStatement();
ResultSet rs = select.executeQuery("SELECT TOP 100 myClob FROM myTab " );
int totalRowNumber = 0;
PreparedStatement ps = null;
Clob clob = null;
Reader clobReader = null;
while (rs.next()) {
totalRowNumber++;
System.out.println(totalRowNumber);
clob = rs.getClob(1);
clobReader = clob.getCharacterStream();
ps = conn2.prepareStatement("INSERT INTO myTab2 (myClob2) values (?) ");
ps.setCharacterStream(1, clobReader , clob.length() );
ps.execute(); // HERE I just execute the current row
clob.free(); // FREE the CLOB and READER objects
clobReader.close();
}
conn2.commit();
ps.close();
select.close();
rs.close();
Based on Teradata rules, I cannot have more than 16 object related to LOB open simultaneously.
Therefore I have to make sure that Clob clob and Reader clobReader are freed and closed respectively.
So I have two options
1) do the executeBatch() method and have up to 16 Clob clob and Reader clobReader objects at a time.
2) do the execute() method and close Clob clob and Reader clobReader objects right after that.
The conclusion: Teradata CLOB batch insert is useless with JDBC. One cannot set a batch size of more than 16 when trying to INSERT a Clob
Please help me and let me know if I understand this correctly
I don't see any other ways
You can find here attached an example of batch insert of more than 16 Clobs.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import java.security.GeneralSecurityException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.List;
public class ClobBatch {
public static void main(String[] args) throws GeneralSecurityException, IOException, SQLException {
String databaseCredentials = ExternalData.getCredentials();
Connection c1=DriverManager.getConnection(databaseCredentials);
Connection c2=DriverManager.getConnection(databaseCredentials);
String sql="create volatile table clob_test_input ( id bigint, longobj clob) no primary index on commit preserve rows;";
Statement s=c1.createStatement();
s.execute(sql);
String sql2="create volatile table clob_test_target ( id bigint, longobj clob) no primary index on commit preserve rows;";
Statement s2=c2.createStatement();
s2.execute(sql2);
System.out.println("Inserting test data");
PreparedStatement ps=c1.prepareStatement("insert into clob_test_input (id, longobj) values (?,?);");
for(int i=0; i<1000; i++) {
String st=randomLargeString();
ps.setInt(1, i);
ps.setCharacterStream(2, new BufferedReader(new StringReader(st)), st.length());
ps.addBatch();
}
ps.executeBatch();
System.out.println("reading test data from input table");
Statement select=c1.createStatement();
ResultSet rs=select.executeQuery("select * from clob_test_input");
PreparedStatement ps2=c2.prepareStatement("insert into clob_test_target (id, longobj) values (?,?);");
List<Reader> readerToClose=new ArrayList<Reader>();
System.out.println("start batch creation");
while(rs.next()) {
int pos=rs.getInt("id");
Reader rdr=new BufferedReader(rs.getCharacterStream("longobj"));
StringBuffer buffer=new StringBuffer();
int c=0;
while((c=rdr.read())!=-1) {
buffer.append((char)c);
}
rdr.close();
ps2.setInt(1, pos);
Reader strReader= new StringReader(buffer.toString());
ps2.setCharacterStream(2, strReader,buffer.length());
readerToClose.add(strReader);
ps2.addBatch();
}
System.out.println("start batch execution");
ps2.executeBatch();
rs.close();
c1.commit();
c2.commit();
for(Reader r:readerToClose) r.close();
Statement selectTest=c2.createStatement();
ResultSet rsTest=selectTest.executeQuery("select * from clob_test_target");
System.out.println("show results");
int i=0;
while(rsTest.next()) {
BufferedReader is=new BufferedReader(rsTest.getCharacterStream("longobj"));
StringBuilder sb=new StringBuilder();
int c=0;
while((c=is.read())!=-1) {
sb.append((char)c);
}
is.close();
System.out.println(""+rsTest.getInt("id")+' '+sb.toString().substring(0,80));
}
rsTest.close();
}
private static String randomLargeString() {
StringBuilder sb=new StringBuilder();
for(int i=0;i<10000; i++) {
sb.append((char) (64+Math.random()*20));
}
return sb.toString();
}
}
I've worked on some optimistic hypothesis (e.g. 10000 chars Clobs) but the approach could be made less memory intensive by using temporary files instead of StringBuffers.
The approach is basically find some "buffer" (be it in memory or on temp files) where to keep the data from the source database, so that you can close the input ClobReader. Then you can batch insert the data from the buffer where you don't have the limitation of 16 (you still have memory limitations).
Related
i have a textfile that contains 1300000 lines.i have written the java code for importing it into a mysql database.In the java class i have a method called textloadutility() which is called from a jsp page.Can someone give the asyncronous thread implementation of this java program.
package Snomed;
import catalog.Root;
import java.io.*;
import java.sql.PreparedStatement;
import org.json.JSONObject;
public class Textfileimport {
public String textloadutility() throws Exception {
Root oRoot = null;
PreparedStatement oPrStmt = null;
FileReader in = null;
BufferedReader br=null;
final int batchSize = 1000;
int count = 0;
JSONObject oJson = null;
String str=null;
oJson = new JSONObject();
oJson.put("status","failure");
str=oJson.toString();
try {
oRoot = Root.createDbConnection(null);
String sql = "INSERT INTO textfiledata (col1,col2,col3,col4,col5,col6,col7,col8,col9) VALUES( ?, ?, ?,?,?,?,?,?,?)";
oPrStmt = oRoot.con.prepareStatement(sql);
in = new FileReader("C:/Users/i2cdev001/Desktop/snomedinfo_data.txt");
br = new BufferedReader(in);
String strLine;
while ((strLine = br.readLine()) != null){
String [] splitSt =strLine.split("\\t");
String dat1="",dat2="",dat3="",dat4="",dat5="",dat6="",dat7="",dat8="",dat9="";
dat1=splitSt[0];
dat2=splitSt[1];
dat3=splitSt[2];
dat4=splitSt[3];
dat5=splitSt[4];
dat6=splitSt[5];
dat7=splitSt[6];
dat8=splitSt[7];
dat9=splitSt[8];
oPrStmt.setString(1, dat1);
oPrStmt.setString(2, dat2);
oPrStmt.setString(3, dat3);
oPrStmt.setString(4, dat4);
oPrStmt.setString(5, dat5);
oPrStmt.setString(6, dat6);
oPrStmt.setString(7, dat7);
oPrStmt.setString(8, dat8);
oPrStmt.setString(9, dat9);
oPrStmt.addBatch();
if (++count % batchSize == 0) {
oPrStmt.executeBatch();
oPrStmt.clearBatch();
}
}
oPrStmt.executeBatch();
oJson.put("status","sucess");
str=oJson.toString();
in.close();
br.close();
System.out.println("sucessfully imported");
}
catch (Exception e) {
oJson.put("status","failure");
str=oJson.toString();
e.printStackTrace();
System.err.println("Error: " + e.getMessage());
} finally {
oPrStmt = Root.EcwClosePreparedStatement(oPrStmt);
oRoot = Root.closeDbConnection(null, oRoot);
}
return str;
}
}
Here is the solution for your problem,
File IO should not be async so first Thread-1 should read the file batch by batch and put that into some shared queue.
The another multi-threaded thread should read the contents of the queue and push it into db. You could implement this using ExecutorService class of java concurrent package. And co-ordinate all those threads using CountDown latch.
Once all the lines are read from the file by the single thread then it will return to the caller.
After all those queue entries are processed the db processing threads will be closed and respective countdown latch also will be decreased and finish once it move to 0.
You should use the future response to the actual caller so that after finishing of all those threads you will get the response.
This is high level view.
This is a beanshell code , so a few things might look
odd for a java developer. The emailFileAttachment function is a sailpoint API, a tool I am using. My problem is the data I am putting in my map is being put in a single line in excel file. And the header ("Application, Num_entitlement") I am putting in the map is not being printed at the first line in the CSV file. Could anyone please help me. This is my code below:
import sailpoint.object.Application;
import sailpoint.object.Identity;
import java.sql.Connection;
import java.sql.PreparedStatement;
import javax.sql.DataSource;
import java.sql.SQLException;
import sailpoint.server.Environment;
import javax.sql.DataSource;
import java.sql.ResultSet;
import sailpoint.api.SailPointContext;
import java.io.BufferedWriter;
import java.io.IOExceptoin;
import java.io.FileWriter;
import sailpoint.object.EmailTemplate;
import sailpoint.object.EmailOptions;
import java.io.File;
import java.io.FileInputStream;
import sailpoint.object.EmailFileAttachment;
import java.util.HashMap;
import sailpoint.tools.Util;
String query = "SELECT app.name as application, count(*) as num_entitlements FROM spt_application as app, spt_identity_entitlement as ent WHERE app.id = ent.application GROUP BY app.name";
HashMap info = new HashMap();
info.put("Application ", "Num_Entitlement");
PreparedStatement getEntitlement_Num = null;
Connection conn = null;
/*
public static byte[] readFiletoByteArray(File file)
{
FileInputStream fileInputStream = null;
byte[] byteFile = new byte[(int) file.length()];
try
{
fileInputStream = new FileInputStream(file);
fileInputStream.read(byteFile);
fileInputStream.close();
}
catch (Exception e)
{
e.printStackTrace();
}
return byteFile;
}
*/
try{
// Prepared Statements
Environment e = Environment.getEnvironment();
DataSource src = e.getSpringDataSource();
//System.out.println("DataSource: " + src.toString());
conn = src.getConnection();
//System.out.println("Connection: " + conn);
getEntitlement_Num = conn.prepareStatement(query);
ResultSet rs = getEntitlement_Num.executeQuery();
//System.out.println("starting RS");
while(rs.next()) {
String appName = rs.getString("application");
int no_ent = rs.getInt("num_entitlements");
info.put(appName , no_ent);
}
System.out.println("finished RS");
}catch(SQLException e){
log.error( e.toString());
} finally {
if (getEntitlement_Num!= null) {
getEntitlement_Num.close();
}
if(conn != null) {
conn.close();
}
}
//I am using sailpoint APIs for the code below.
String emailDest = "//email address here";
EmailTemplate et = new EmailTemplate();
et.setFrom("//email address here");
et.setBody("Please find an attached CSV file that has the list of all applications in IIQ and their number of Entitlements");
et.setTo(emailDest);
et.setSubject("Entitlement count for each application in IIQ");
EmailOptions ops = new EmailOptions(emailDest,null);
String strInfo = Util.mapToString(info);
byte[] fileData = strInfo.getBytes();
EmailFileAttachment attachment = new EmailFileAttachment( "EntitlementCount.csv", EmailFileAttachment.MimeType.MIME_CSV, fileData );
ops.addAttachment(attachment);
context.sendEmailNotification(et, ops);
//System.out.println("email sent");
return "Success";
info is a HashMap which means there's no guarantee that you can extract data in the same order as you put it in. Therefore your header "Application" might not come first in the CSV file. Instead, use something that maintains the order, eg an ArrayList of Tuple objects (a class you write yourself that contain two String variables).
How does Util.mapToString(info) work? We need so see it so we can investigate the newline problem.
Util.mapToString() will just convert map to string.
Try changing your collection to list of list{app, count} and
iterate over the list to generate the string.
methods Util.listToCsv() or Util.listToQuotedCsv() will be helpful to prepare csv string.
Hope this helps.
You should use a StringBuilder in the same loop as the records iteration and then from the String builder formulate the attachment.
I think the Utility.MapToString with the chashma is the root cause.
YES, it sounds like a duplicate.
I'm practicing abit of Java on Intellij and tried writing a program to import a .xls excel file into a mysql database. Duplicate question, yes, but trawling the internet didnt yield much.
My code below currently does the job of importing any xls file perfectly. Unfortunately, it doesnt do anything for a .csv file nor an xlsx file.
When i try with a .csv file, the following error is thrown:
Invalid header signature; read 0x6972702C74786574, expected 0xE11AB1A1E011CFD0 - Your file appears not to be a valid OLE2 document
When a xlsx file is used, the following is instead thrown as an error:
Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:152)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:140)
at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:302)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:85)
at FileExport.main(FileExport.java:21)
My code:
import java.io.FileInputStream;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.ss.usermodel.*;
public class FileExport {
public static void main(String[] args) throws Exception {
try {
Class forName = Class.forName("com.mysql.jdbc.Driver");
Connection con = null;
con = DriverManager.getConnection("jdbc:mysql://localhost:3306/test?useSSL=false", "root", "root");
con.setAutoCommit(false);
PreparedStatement pstm = null;
FileInputStream input = new FileInputStream("/Users/User/Desktop/Email/Test.xls");
POIFSFileSystem fs = new POIFSFileSystem(input);
Workbook workbook;
workbook = WorkbookFactory.create(fs);
Sheet sheet = workbook.getSheetAt(0);
Row row;
for (int i = 1; i <= sheet.getLastRowNum(); i++) {
row = (Row) sheet.getRow(i);
String text = row.getCell(0).getStringCellValue();
int price = (int) row.getCell(1).getNumericCellValue();
String sql = "INSERT INTO testtable (text, price) VALUES('" + text + "','" + price + "')";
pstm = (PreparedStatement) con.prepareStatement(sql);
pstm.setString(1, text);
pstm.setInt(2, price);
pstm.execute();
System.out.println("Import rows " + i);
}
con.commit();
pstm.close();
con.close();
input.close();
System.out.println("Success import excel to mysql table");
} catch (IOException e) {
}
}
}
Any suggestions on how to tweak this code to import .csv or xlsx files are greatly appreciated.
You should use PreparedStatement as it was intended for:
String sql = "INSERT INTO testtable (text, price) VALUES(?, ?)";
pstm = (PreparedStatement) con.prepareStatement(sql);
pstm.setString(1, text);
pstm.setInteger(2, text)
pstm.execute();
I guess it doesn't work because there is some punctuation in your text.
Also it is prone to SQL-injection.
Also you should close your closeable objects with try-with-resources or finally (if you stuck to java version prior version 7), as it done here:
https://stackoverflow.com/a/26516076/3323777
EDIT: ah yes, the empty catch block as Alex said. Don't do that, either.
EDIT2:
Apache POI was never designed to call on CSV files.
https://stackoverflow.com/a/1087984/3323777
There is another project for CSV at Apache:
http://commons.apache.org/proper/commons-csv/
Do you see the "Import rows" messages which prove that you actually have some rows to import?
Also the reason for not seeing any errors might be a "catch" block which does nothing. Add any output inside and observe new results, i.e.
System.out.println(e.getMessage());
Tweaked for read from .csv
public class FileExport {
public static void main(String[] args) throws Exception {
try {
Class forName = Class.forName("com.mysql.jdbc.Driver");
Connection con = null;
con = DriverManager.getConnection("jdbc:mysql://localhost:3306/test?useSSL=false", "root", "root");
con.setAutoCommit(false);
PreparedStatement pstm = null;
FileInputStream input = new FileInputStream("/Users/User/Desktop/Email/Test.csv");
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String row = reader.readLine();
String text = row.split(";")[0];
String price = row.split(";")[1];
reader.close();
String sql = "INSERT INTO testtable (text, price) VALUES('" + text + "','" + price + "')";
pstm = (PreparedStatement) con.prepareStatement(sql);
pstm.execute();
System.out.println("Import rows " + i);
}
con.commit();
pstm.close();
con.close();
input.close();
System.out.println("Success import excel to mysql table");
} catch (IOException e) {
}
}
Try using 'poi-ooxml' for creating a Workbook object, its gradle dependency signature is provided below:
compile group: 'org.apache.poi', name: 'poi-ooxml', version: '3.14'
Below code may help you
InputStream inputStream = new FileInputStream("/Users/User/Desktop/Email/Test.xls");
XSSFWorkbook wb = new XSSFWorkbook(inputStream);
XSSFSheet sheet = wb.getSheetAt(0);
The Following code attempts to retrieve a CLOB file from oracle database in JDBC.
Code:-
import java.io.FileWriter;
import java.io.Reader;
import java.sql.Clob;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.Properties;
public class Retrieving_Clob {
public static void main(String[] args) throws Exception{
DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());
Properties p = new Properties();
p.put("user", "system");
p.put("password", "password");
Connection con = DriverManager.getConnection("jdbc:oracle:thin:#localhost:1521:xe",p);
PreparedStatement pstmt = con.prepareStatement("select * from myclob");
ResultSet rs = pstmt.executeQuery();
rs.next();
Reader r = rs.getCharacterStream(1);
int ch;
File file = new File("H:/newFile.txt");
FileWriter fw = new FileWriter(file,true);
while((ch= r.read())!= -1)
fw.write((char)ch);
fw.close();
con.close();
}
}
I am trying to retrieve the CLOB file from the resultset index=1.The code is giving me the following error:-
Exception in thread "main" java.lang.NullPointerException
at jdbc.Retrieving_Clob.main(Retrieving_Clob.java:41)
The erroneous line being:-
while((ch= r.read())!= -1)
What is the reason for the error and how to solve the problem?
Note :- A blank file by the provided name is getting created at the given location.
The error indicates that r is NULL. You're getting r from:
Reader r = rs.getCharacterStream(1);
From http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getCharacterStream%28int%29: (getCharacterStream) returns:
a java.io.Reader object that contains the column value; if the value is SQL NULL, the value returned is null in the Java programming language.
So it appears that the column value is NULL.
The reason for problem is obviously that fact that r is just null. Solving will be additional check for null in this variable:
if (r != null) ... while....
I have the following code that executes a query and writes it directly to a string buffer which then dumps it to a CSV file. I will need to write large amount of records (maximum to a million). This works for a million records it takes about half an hour for a file that is around 200mb! which seems to me like a lot of time, not sure if this is the best. Please recommend me better ways even if it includes using other jars/db connection utils.
....
eventNamePrepared = con.prepareStatement(gettingStats +
filterOptionsRowNum + filterOptions);
ResultSet rs = eventNamePrepared.executeQuery();
int i=0;
try{
......
FileWriter fstream = new FileWriter(realPath +
"performanceCollectorDumpAll.csv");
BufferedWriter out = new BufferedWriter(fstream);
StringBuffer partialCSV = new StringBuffer();
while (rs.next()) {
i++;
if (current_appl_id_col_display)
partialCSV.append(rs.getString("current_appl_id") + ",");
if (event_name_col_display)
partialCSV.append(rs.getString("event_name") + ",");
if (generic_method_name_col_display)
partialCSV.append(rs.getString("generic_method_name") + ",");
..... // 23 more columns to be copied same way to buffer
partialCSV.append(" \r\n");
// Writing to file after 10000 records to prevent partialCSV
// from going too big and consuming lots of memory
if (i % 10000 == 0){
out.append(partialCSV);
partialCSV = new StringBuffer();
}
}
con.close();
out.append(partialCSV);
out.close();
Thanks,
Tam
Just write to the BufferedWriter directly instead of constructing the StringBuffer.
Also note that you should likely use StringBuilder instead of StringBuffer... StringBuffer has an internal lock, which is usually not necessary.
Profiling is generally the only sure-fire way to know why something's slow. However, in this example I would suggest two things that are low-hanging fruit:
Write directly to the buffered writer instead of creating your own buffering with the StringBuilder.
Refer to the columns in the result-set by integer ordinal. Some drivers can be slow when resolving column names.
You could tweak various things, but for a real improvement I would try using the native tool of whatever database you are using to generate the file. If it is SQL Server, this would be bcp which can take a query string and generate the file directly. If you need to call it from Java you can spawn it as a process.
As way of an example, I have just run this...
bcp "select * from trading..bar_db" queryout bar_db.txt -c -t, -Uuser -Ppassword -Sserver
...this generated a 170MB file containing 2 million rows in 10 seconds.
I just wanted to add a sample code for the suggestion of Jared Oberhaus:
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
public class CSVExport {
public static void main(String[] args) throws Exception {
String table = "CUSTOMER";
int batch = 100;
Class.forName("oracle.jdbc.driver.OracleDriver");
Connection conn = DriverManager.getConnection(
"jdbc:oracle:thin:#server:orcl", "user", "pass");
PreparedStatement pstmt = conn.prepareStatement(
"SELECT /*+FIRST_ROWS(" + batch + ") */ * FROM " + table);
ResultSet rs = pstmt.executeQuery();
rs.setFetchSize(batch);
ResultSetMetaData rsm = rs.getMetaData();
File output = new File("result.csv");
PrintWriter out = new PrintWriter(new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream(output), "UTF-8")), false);
Set<String> columns = new HashSet<String>(
Arrays.asList("COL1", "COL3", "COL5")
);
while (rs.next()) {
int k = 0;
for (int i = 1; i <= rsm.getColumnCount(); i++) {
if (columns.contains(rsm.getColumnName(i).toUpperCase())) {
if (k > 0) {
out.print(",");
}
String s = rs.getString(i);
out.print("\"");
out.print(s != null ? s.replaceAll("\"", "\\\"") : "");
out.print("\"");
k++;
}
}
out.println();
}
out.flush();
out.close();
rs.close();
pstmt.close();
conn.close();
}
}
I have two quick thoughts. The first is, are you sure writing to disk is the problem? Could you actually be spending most of your time waiting on data from the DB?
The second is to try removing all the + ","s, and use more .appends for that. It may help considering how often you are doing those.
You mentioned that you are using Oracle. You may want to investigate using the Oracle External Table feature or Oracle Data Pump depending on exactly what you are trying to do.
See http://www.orafaq.com/node/848 (Unloading data into an external file...)
Another option could be connecting by sqlplus and running "spool " prior to the query.
Writing to a buffered writer is normally fast "enough". If it isn't for you, then something else is slowing it down.
The easiest way to profile it is to use jvisualvm available in the latest JDK.