Excuse any wrong practices as I am very new to threading. I have a program that calls my api and gets data back in json format. Each request returns a row of data in json format. All together I need to retrieve about 2,000,000 rows a day which means 2,000,000 requests (I understand that this is bad design, but the system was not designed for this purpose it is just what I need to do for the next couple of weeks). When I tried running it on a single thread I was processing about 200 requests a minute which is much too slow. As a result I created 12 threads and I was processing 5500 rows a minutes which was a great improvement. The problem was only about on average 90% of the rows were inserted into the database as I ran it a few times to make sure. Before each insert printed to a file each URL which was sent and then I checked to see if each insert statement was successful (returned 1 when executed ) and it all seems fine. Every time I run it it inserts about 90% but it does varies and it has never been a consistent number. Am I doing something wrong inside my java code? Essentially the code starts in main by creating 12 threads. Each thread's creates a run method which calls a new instance of MySQLPopulateHistData and passes a start and end integer which are used in the insert statement for ranges. I have done many system.out.println type testing and can see all the threads do start and all the 12 instances (one instance for each thread) called are executing? Does anyone have any idea what it could be?
MAIN:
import java.io.IOException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class MainClass {
public static void main(String[] args) {
try {
//create a pool of threads
Thread[] threads = new Thread[12];
// submit jobs to be executing by the pool
for (int i = 0; i <12; i++) {
threads[i] = new Thread(new Runnable() {
public void run() {
try {
new MySQLPopulateHistData(RangeClass.IdStart, RangeClass.IdEnd);
} catch (Throwable e) {
//TODO Auto-generated catch block
e.printStackTrace();
}
}
});
threads[i].start();
Thread.sleep(1000);
RangeClass.IdStart = RangeClass.IdEnd + 1;
RangeClass.IdEnd = RangeClass.IdEnd + 170000;
}
} catch (Throwable e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
MyDataSourceFactory.class
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Properties;
import javax.sql.DataSource;
import com.mysql.jdbc.jdbc2.optional.MysqlDataSource;
public class MyDataSourceFactory {
static String url = "jdbc:mysql://localhost:3306/my_schema";
static String userName = "root";
static String password = "password";
public synchronized static DataSource getMySQLDataSource() {
MysqlDataSource mysqlDS = null;
mysqlDS = new MysqlDataSource();
mysqlDS.setURL(url);
mysqlDS.setUser(userName);
mysqlDS.setPassword(password);
return mysqlDS;
}
}
MySQLPopulateHistData.class
public class MySQLPopulateHistData {
public MySQLPopulateHistData(int s, int e ) throws IOException, Throwable{
getHistory(s,e);
}
public synchronized void getHistory(int start, int end){
DataSource ds = MyDataSourceFactory.getMySQLDataSource();
Connection con = null;
Connection con2 = null;
Statement stmt = null;
Statement stmt2 = null;
ResultSet rs = null;
try {
con = ds.getConnection();
con2 = ds.getConnection();
stmt = con.createStatement();
stmt2 = con.createStatement();
rs = stmt.executeQuery("SELECT s FROM sp_t where s_id BETWEEN "+ start +" AND "+ end + " ORDER BY s;");
String s = "";
while(rs.next()){
s = rs.getString("s");
if( s == ""){
}
else{
try{
URL fullUrl = new URL(//My Url to my api with password with start and end range);
InputStream is = fullUrl.openStream();
String jsonStr = getStringFromInputStream(is);
JSONObject j = new JSONObject(jsonStr);
JSONArray arr = j.getJSONObject("query").getJSONObject("results").getJSONArray("quote");
for(int i=0; i<arr.length(); i++){
JSONObject obj = arr.getJSONObject(i);
String symbol = obj.getString("s");
stmt2.executeUpdate("INSERT into sp2_t(s) VALUES ('"+ s +"') BETWEEN "+start+" AND "+ end +";");
}
}
catch(Exception e){
}
}
s = "";
}
} catch (Exception e) {
e.printStackTrace();
}finally{
try {
if(rs != null) rs.close();
if(stmt != null) stmt.close();
if(con != null) con.close();
if(stmt2 != null) stmt.close();
if(con2 != null) con.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
UPDATE:
So I put:
(if s.equals("")){
System.out.println("EMPTY");
}
and it never printed out EMPTY. After the JSON requests gets converted to the JSONArray I added:
if(arr.length()>0){
StaticClassHolder.cntResponses++;
}
This is just a static variable in another class that gets incremented everytime there is a valid JSON response. It equalled to the exact right amount it was supposed to be. So it seems as if the URL gets all the responses properly, parses them properly, but is not INSERTING them properly into the database? I can't figure out why?
I also faced the similar issue while inserting records in Oracle. Since I didn't find any concrete solution. I tried with single thread and all went fine.
There are several reasons why this does not work:
A normal computer can only handle about 4-8 threads in total per cpu. As the system uses some of thise threads you would only be able to run some threads at the same time. The computer handles this by pausing some threads then running another thread.
If you try to send several queries through the socket to the mysql server at the same time chanses are that some of the requests will not work and you lose some of your data.
As for now I do not have any solution for faster updates of the table.
Related
i have a textfile that contains 1300000 lines.i have written the java code for importing it into a mysql database.In the java class i have a method called textloadutility() which is called from a jsp page.Can someone give the asyncronous thread implementation of this java program.
package Snomed;
import catalog.Root;
import java.io.*;
import java.sql.PreparedStatement;
import org.json.JSONObject;
public class Textfileimport {
public String textloadutility() throws Exception {
Root oRoot = null;
PreparedStatement oPrStmt = null;
FileReader in = null;
BufferedReader br=null;
final int batchSize = 1000;
int count = 0;
JSONObject oJson = null;
String str=null;
oJson = new JSONObject();
oJson.put("status","failure");
str=oJson.toString();
try {
oRoot = Root.createDbConnection(null);
String sql = "INSERT INTO textfiledata (col1,col2,col3,col4,col5,col6,col7,col8,col9) VALUES( ?, ?, ?,?,?,?,?,?,?)";
oPrStmt = oRoot.con.prepareStatement(sql);
in = new FileReader("C:/Users/i2cdev001/Desktop/snomedinfo_data.txt");
br = new BufferedReader(in);
String strLine;
while ((strLine = br.readLine()) != null){
String [] splitSt =strLine.split("\\t");
String dat1="",dat2="",dat3="",dat4="",dat5="",dat6="",dat7="",dat8="",dat9="";
dat1=splitSt[0];
dat2=splitSt[1];
dat3=splitSt[2];
dat4=splitSt[3];
dat5=splitSt[4];
dat6=splitSt[5];
dat7=splitSt[6];
dat8=splitSt[7];
dat9=splitSt[8];
oPrStmt.setString(1, dat1);
oPrStmt.setString(2, dat2);
oPrStmt.setString(3, dat3);
oPrStmt.setString(4, dat4);
oPrStmt.setString(5, dat5);
oPrStmt.setString(6, dat6);
oPrStmt.setString(7, dat7);
oPrStmt.setString(8, dat8);
oPrStmt.setString(9, dat9);
oPrStmt.addBatch();
if (++count % batchSize == 0) {
oPrStmt.executeBatch();
oPrStmt.clearBatch();
}
}
oPrStmt.executeBatch();
oJson.put("status","sucess");
str=oJson.toString();
in.close();
br.close();
System.out.println("sucessfully imported");
}
catch (Exception e) {
oJson.put("status","failure");
str=oJson.toString();
e.printStackTrace();
System.err.println("Error: " + e.getMessage());
} finally {
oPrStmt = Root.EcwClosePreparedStatement(oPrStmt);
oRoot = Root.closeDbConnection(null, oRoot);
}
return str;
}
}
Here is the solution for your problem,
File IO should not be async so first Thread-1 should read the file batch by batch and put that into some shared queue.
The another multi-threaded thread should read the contents of the queue and push it into db. You could implement this using ExecutorService class of java concurrent package. And co-ordinate all those threads using CountDown latch.
Once all the lines are read from the file by the single thread then it will return to the caller.
After all those queue entries are processed the db processing threads will be closed and respective countdown latch also will be decreased and finish once it move to 0.
You should use the future response to the actual caller so that after finishing of all those threads you will get the response.
This is high level view.
I am trying to execute a query using postgre sql driver for java jdbc.
I have an issue with memory buildup my statement is in a loop and then sleeps.
The problem is when I look at the job in task manager I can see the memory climbing 00,004K at a time. I have read the documentation I have closed all connections statements resultsets but this still happens.
Please could you tell me what is causing this in my code.
String sSqlString = new String("SELECT * FROM stk.comms_data_sent_recv " +
"WHERE msgtype ='RECEIVE' AND msgstat ='UNPRC' " +
"ORDER BY p6_id,msgoccid " +
"ASC; ");
ResultSet rs = null;
Class.forName("org.postgresql.Driver");
Connection connection = DriverManager.getConnection(
"jdbc:postgresql://p6tstc01:5432/DEVC_StockList?autoCloseUnclosedStatements=true&logUnclosedConnections=true&preparedStatementCacheQueries=0&ApplicationName=P6Shunter", "P6dev",
"admin123");
//Main Loop
while(true)
{
try{
Statement statement = connection.createStatement();
statement.executeQuery(sSqlString);
//rs.close();
statement.close();
//connection.close();
rs = null;
//connection = null;
statement =null;
}
finally {
//connection.close();
}
try {
Thread.sleep(loopTime);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Notice the commented out code.. I did close all but that did not seem to make a difference. Whet I did see is that it seems that the statement executeQuery(sSqlString); is causing this the reason I think so is if I remove the statement there is no memory leak.
I could be wrong but please assist me.
UPDATE:
I have changed my code as with your recommendations. Hope its a bit better please let me know if I need to change something.
My main loop :
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
//Main Loop
while(true)
{
getAndProcessAllUnprcMessagesFromStockList();
try {
Thread.sleep(loopTime);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
My Function it will call do fetch data :
public static void getAndProcessAllUnprcMessagesFromStockList() throws Exception
{
ResultSet rs = null;
Statement statement = null;
Connection connection =null;
String sSqlString = new String("SELECT * FROM stk.comms_data_sent_recv " +
"WHERE msgtype ='RECEIVE' AND msgstat ='UNPRC' " +
"ORDER BY p6_id,msgoccid " +
"ASC; ");
try{
Class.forName("org.postgresql.Driver");
connection = DriverManager.getConnection(
"jdbc:postgresql://p6tstc01:5432/DEVC_StockList?autoCloseUnclosedStatements=true&logUnclosedConnections=true&preparedStatementCacheQueries=0&ApplicationName=P6Shunter", "P6dev",
"admin123");
PreparedStatement s = connection.prepareStatement(sSqlString,
ResultSet.TYPE_SCROLL_INSENSITIVE,
ResultSet.CONCUR_READ_ONLY);
rs = s.executeQuery();
while (rs.next()) {
//Process records
UnprcMsg msg = new UnprcMsg();
msg.setP6Id(rs.getString(1));
msg.setMsgOccId(rs.getString(2));
msg.setWsc(rs.getString(3));
msg.setMsgId(rs.getString(4));
msg.setMsgType(rs.getString(5));
msg.setMsgStatus(rs.getString(6));
//JOptionPane.showMessageDialog(null,msg.getP6Id(), "InfoBox: " + "StockListShunter", JOptionPane.INFORMATION_MESSAGE);
//msg2 = null;
}
rs.close();
s.close();
}
catch(Exception e)
{
e.printStackTrace();
}
finally
{
connection.close();
}
}
I have closed my connections statements and results.
I also downloaded eclipse memory analyzer and I ran the jar witch will execute my main loop. Ran it for about an hour and here's some of the data I got from memory analyzer..
Leak suspects :
Now I know I cant go on the memory usage of task manager but whats the difference? Why does task manager show the following :
I was concerned about the memory usage I see in task manager? should I be?
Doing an evaluation of Hive and its features.
Theres a use-case where I need to iterate through a resultset in a separate thread. I can have many ResultSets and spawn a thread to process each one of them.
Below is the code I have written for this use case.
public class ConcurrentRSIteration2 {
private static String[] tableNames =
{
"random_data1",
"random_data2",
"random_data3",
"random_data4"
};
public static void main(String args[]) throws Exception {
String driverName = "org.apache.hive.jdbc.HiveDriver";
Class.forName(driverName);
Connection con = DriverManager.getConnection(
"jdbc:hive2://127.0.0.1:10000/default", "hive", "");
int length = tableNames.length;
StringBuilder[] sql = new StringBuilder[length];
PreparedStatement[] stmt = new PreparedStatement[length];
Thread[] rsIterators = new Thread[length];
for (int i = 0; i < length; i++) {
sql[i] = new StringBuilder().
append("select * from ").
append(tableNames[i]);
stmt[i] = con.prepareStatement(sql[i].toString());
RSIterator2 rsIterator = new RSIterator2(stmt[i].executeQuery());
rsIterators[i] = new Thread(rsIterator);
}
for (int i = 0; i < length; i++) {
rsIterators[i].start();
}
}
}
class RSIterator2 implements Runnable {
private ResultSet rs;
public RSIterator2(ResultSet rs) {
this.rs = rs;
}
#Override
public void run() {
try {
System.out.println(this.hashCode() + " : " + rs);
System.out.println(this.hashCode() + " : RS iteration started.");
int i = 0;
while (rs.next()) {
i++;
}
System.out.println(this.hashCode() + " : RS iteration done.");
} catch (SQLException e) {
e.printStackTrace();
} finally {
try {
rs.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
Below is the stacktrace of the exception.
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:501)
at org.apache.hive.service.cli.thrift.TCLIService$Client.FetchResults(TCLIService.java:488)
at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:360)
at hivetrial.RSIterator2.run(ConcurrentRSIteration2.java:60)
at java.lang.Thread.run(Unknown Source)`
I am new to Hive and might have overlooked few things. Trying to understand this Exception.
Your entire approach is founded on a fallacy. You are using a single connection to execute multiple queries. The database server will therefore sequentialize all the data returned, in the order the queries were executed. Using multiple threads to process a single stream doesn't begin to make sense.
You're also never closing the statements or the connection.
Classes in java.sql package are not thread safe.
Separating a ResultSet from its companion Statement is a bad idea. You should query, load the rows into an object or data structure, then close both in a finally block in separate try/catch blocks.
I would be remiss if I failed to point out connection pools. Why limit yourself to just one?
As I've started in the title, while I'm querying for user data in my java application, I get following message: "Operation not allowed after ResultSet closed".
I know that this is happens if you try to have more ResultSets opened at the same time.
Here is my current code:
App calls getProject("..."), other 2 methods are there just for help. I'm using 2 classes because there is much more code, this is just one example of exception I get.
Please note that I've translated variable names, etc. for better understanding, I hope I didn't miss anything.
/* Class which reads project data */
public Project getProject(String name) {
ResultSet result = null;
try {
// executing query for project data
// SELECT * FROM Project WHERE name=name
result = statement.executeQuery(generateSelect(tProject.tableName,
"*", tProject.name, name));
// if cursor can't move to first place,
// that means that project was not found
if (!result.first())
return null;
return user.usersInProject(new Project(result.getInt(1), result
.getString(2)));
} catch (SQLException e) {
e.printStackTrace();
return null;
} catch (BadAttributeValueExpException e) {
e.printStackTrace();
return null;
} finally {
// closing the ResultSet
try {
if (result != null)
result.close();
} catch (SQLException e) {
}
}
}
/* End of class */
/* Class which reads user data */
public Project usersInProject(Project p) {
ResultSet result = null;
try {
// executing query for users in project
// SELECT ID_User FROM Project_User WHERE ID_Project=p.getID()
result = statement.executeQuery(generateSelect(
tProject_User.tableName, tProject_User.id_user,
tProject_User.id_project, String.valueOf(p.getID())));
ArrayList<User> alUsers = new ArrayList<User>();
// looping through all results and adding them to array
while (result.next()) { // here java gets ResultSet closed exception
int id = result.getInt(1);
if (id > 0)
alUsers.add(getUser(id));
}
// if no user data was read, project from parameter is returned
// without any new user data
if (alUsers.size() == 0)
return p;
// array of users is added to the object,
// then whole object is returned
p.addUsers(alUsers.toArray(new User[alUsers.size()]));
return p;
} catch (SQLException e) {
e.printStackTrace();
return p;
} finally {
// closing the ResultSet
try {
if (result != null)
result.close();
} catch (SQLException e) {
}
}
}
public User getUser(int id) {
ResultSet result = null;
try {
// executing query for user:
// SELECT * FROM User WHERE ID=id
result = statement.executeQuery(generateSelect(tUser.tableName,
"*", tUser.id, String.valueOf(id)));
if (!result.first())
return null;
// new user is constructed (ID, username, email, password)
User usr = new user(result.getInt(1), result.getString(2),
result.getString(3), result.getString(4));
return usr;
} catch (SQLException e) {
e.printStackTrace();
return null;
} catch (BadAttributeValueExpException e) {
e.printStackTrace();
return null;
} finally {
// closing the ResultSet
try {
if (result != null)
result.close();
} catch (SQLException e) {
}
}
}
/* End of class */
Statements from both classes are added in constructor, calling connection.getStatement() when constructing each of the classes.
tProject and tProject_User are my enums, I'm using it for easier name handling. generateSelect is my method and should work as expected. I'm using this because I've found out about prepared statements after I have written most of my code, so I left it as it is.
I am using latest java MySQL connector (5.1.21).
I don't know what else to try. Any advice will be appreciated.
Quoting from #aroth's answer:
There are many situations in which a ResultSet will be automatically closed for you. To quote the official documentation:
http://docs.oracle.com/javase/6/docs/api/java/sql/ResultSet.html
A ResultSet object is automatically closed when the Statement object that generated
it is closed, re-executed, or used to retrieve the next result from a sequence of
multiple results.
Here in your code , You are creating new ResultSet in the method getUser using the same Statement object which created result set in the usersInProject method which results in closing your resultset object in the method usersInProject.
Solution:
Create another statement object and use it in getUser to create resultset.
It's not really possible to say definitively what is going wrong without seeing your code. However note that there are many situations in which a ResultSet will be automatically closed for you. To quote the official documentation:
A ResultSet object is automatically closed when the Statement object
that generated it is closed, re-executed, or used to retrieve the next
result from a sequence of multiple results.
Probably you've got one of those things happening. Or you're explicitly closing the ResultSet somewhere before you're actually done with it.
Also, have you considered using an ORM framework like Hibernate? In general something like that is much more pleasant to work with than the low-level JDBC API.
In my application I have implemented a method to get favourits of particular user. If the user is a new one there will not be a entry in the table.If so I add default favourtis to the table. Code is shown below.
public String getUserFavourits(String username) {
String s = "SELECT FAVOURITS FROM USERFAVOURITS WHERE USERID='" +
username.trim() + "'";
String a = "";
Statement stm = null;
ResultSet reset = null;
DatabaseConnectionHandler handler = null;
Connection conn = null;
try {
handler = DatabaseConnectionHandler.getInstance();
conn = handler.getConnection();
stm = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE,ResultSet.CONCUR_UPDATABLE);
reset = stm.executeQuery(s);
if (reset.next()) {
a = reset.getString("FAVOURITS").toString();
}
reset.close();
stm.close();
}
catch (SQLException ex) {
ex.printStackTrace();
}
catch (Exception ex) {
ex.printStackTrace();
}
finally {
try {
handler.returnConnectionToPool(conn);
if (stm != null) {
stm.close();
}
if (reset != null) {
reset.close();
}
}catch (Exception ex) {
ex.printStackTrace();
}
}
if (a.equalsIgnoreCase("")) {
a = updateNewUserFav(username);
}
return a;
}
You can see that after the Finally block updateNewUserFav(username) method is use to insert default favourits in to table. Normally users are forced to change this in their first login.
My problem is many users have complain me about they hava lost their customized favourits and default has get loaded in their login. When I go through the code I notice that it can only happen if exception occured in the try block. When I debug code works fine. Is this can be coused at time when DB is busy?
Normally there are more than 1000 concurrent user in the system. Since it is real time application there will be huge number a of request comming to the Database(DB is Oracle).
Can some one pls explain.
Firstly, use jonearles suggestion about bind variables. If a lot of your code is like this, with 1000 concurrent users, I'd hate to think what performance is like.
Secondly, if it is busy then there is a chance of time-outs. As you say, if an exception is encountered then it falls back to the "updateNewUserFav"
Really, it should only call that if NO exception is raised.
If an exception is raised, the function should fail. The current code is similar to
"TURN THE IGNITION KEY TO START THE CAR"
"IF THERE IS A PROBLEM, RING GARAGE AND BOOK APPOINTMENT"
"PUT CAR INTO GEAR AND RELEASE HAND_BRAKE"
You really only want to release the hand-brake once the car has successfully started, otherwise you'll end up rolling down the hill until the sudden stop at the end (often involving an expensive CRUNCH sound).