JDBC optimize MySql request on Multithread - java

I'm building a webcrawler and I'm looking for the best way to handle my requests and connection between my threads and the database (MySql).
I've 2 types of threads :
Fetchers : They crawl websites. They produce url and add they into 2 tables : table_url and table_file. They select from table_url
to continue the crawl. And update table_url to set visited=1 when they
have read a url. Or visited=-1 when they are reading it. They can
delete row.
Downloaders : They download files. They select from table_file. They update table_file to change the Downloaded column. They never
insert anything.
Right now I'm working with this :
I've a pool of connection based on c3p0.
Every target (website) have thoses variables :
private Connection connection_downloader;
private Connection connection_fetcher;
I create both connection only once when I instanciate a website. Then every thread will use thoses connections based on their target.
Every thread have thoses variables :
private Statement statement;
private ResultSet resultSet;
Before every Query I open a SqlStatement :
public static Statement openSqlStatement(Connection connection){
try {
return connection.createStatement();
} catch (SQLException e) {
e.printStackTrace();
}
return null;
}
And after every Query I close sql statement and resultSet with :
public static void closeSqlStatement(ResultSet resultSet, Statement statement){
if (resultSet != null) try { resultSet.close(); } catch (SQLException e) {e.printStackTrace();}
if (statement != null) try { statement.close(); } catch (SQLException e) {e.printStackTrace();}
}
Right now my Select queries only work with one select (I never have to select more than one for now but this will change soon) and is defined like this :
public static String sqlSelect(String Query, Connection connection, Statement statement, ResultSet resultSet){
String result = null;
try {
resultSet = statement.executeQuery(Query);
resultSet.next();
result = resultSet.toString();
} catch (SQLException e) {
e.printStackTrace();
}
closeSqlStatement(resultSet, statement);
return result;
}
And Insert, Delete and Update queries use this function :
public static int sqlExec(String Query, Connection connection, Statement statement){
int ResultSet = -1;
try {
ResultSet = statement.executeUpdate(Query);
} catch (SQLException e) {
e.printStackTrace();
}
closeSqlStatement(resultSet, statement);
return ResultSet;
}
My question is simple : can this be improved to be faster ? And I'm concerned about mutual exclusion to prevent a thread to update a link while another is doing it.

I believe your design is flawed. Having one connection assigned full-time for one website will severly limit your overall workload.
As you already have setup a connection pool, it's perfectly okay to fetch before you use (and return afterwards).
Just the same, try-with-catch for closing all your ResultSets and Statements after will make code more readable - and using PreparedStatement instead of Statement would not hurt as well.
One Example (using a static dataSource() call to access your pool):
public static String sqlSelect(String id) throws SQLException {
try(Connection con = dataSource().getConnection();
PreparedStatement ps = con.prepareStatement("SELECT row FROM table WHERE key = ?")) {
ps.setString(1, id);
try(ResultSet resultSet = ps.executeQuery()) {
if(rs.next()) {
return rs.getString(1);
} else {
throw new SQLException("Nothing found");
}
}
} catch (SQLException e) {
e.printStackTrace();
throw e;
}
}
Following the same pattern I suggest you create methods for all the different Insert/Update/Selects your application uses as well - all using the connection only for the short time inside the DB logic.

I can not see a real advantage to have all the Database stuff in your webcrawler threads.
Why don't you use a static class with the sqlSelect and sqlExec method, but without the Connection and ResultSet parameters. Both connection objects are static as well. Make sure the connection objects are valid befor using them.

Related

How to save the results of a query on variables for each field on Java?

I need to accomplish the following:
1.- Save on different variables each field of a query result (Oracle DB).
The query result could be 1 o more rows (5 average).
2.- Invoke a WebService for each row.
4.- Wait for the WebService answer and then repeat the process.
I think that saving the result of 1 row and then invoke the WebService it easy but the problem is when the query result throws more than 1 row.
How can I do this? Is Arraylist the answer?
EDIT: I am using the following code. How can I print the arraylist to see if the connection is working?
If I run this i get:
com.packagename.SomeBean#1d251891
com.packagename.SomeBean#48140564
com.packagename.SomeBean#58ceff1
Connection con = null;
Statement stmt = null;
ResultSet rs = null;
List<SomeBean> v = new ArrayList<SomeBean>();
String query = "select * from table where ROWNUM BETWEEN 1 and 3";
try
{
Class.forName("oracle.jdbc.driver.OracleDriver");
con = DriverManager.getConnection("jdbc:oracle:thin:user/pass#localhost:port:SID");
stmt = con.createStatement();
rs = stmt.executeQuery(query);
while( rs.next() ){
SomeBean n = new SomeBean();
n.setColumn1(rs.getInt("column1"));
n.setColumn2(rs.getString("column2"));
n.setColumn3(rs.getString("column3"));
n.setColumn4(rs.getInt("column4"));
n.setColumn5(rs.getString("column5"));
n.setColumn6(rs.getString("column6"));
n.setColumn7(rs.getString("column7"));
...
v.add(n);
}
for(SomeBean s : v){
System.out.println(s);
}
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
} finally {
try {
stmt.close();
con.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
Answering to your question is quite difficoult.
But I can give you some hints.
Your startpoint is JDBC.
The Java Database Connectivity (JDBC)
The Java Database Connectivity (JDBC) API is the industry standard for database-independent connectivity between the Java programming language and a wide range of databases SQL databases and other tabular data sources, such as spreadsheets or flat files. The JDBC API provides a call-level API for SQL-based database access.
The Java Database Connectivity (JDBC)
Once you are able to establish a connection to the DB, this snippet can help you answering to your question.
// start connection
List<SomeBean> v = new ArrayList<SomeBean>();
Statement st;
try
{
st = conn.createStatement();
ResultSet rs = st.executeQuery(sql);
while( rs.next() ){
SomeBean n = new SomeBean();
n.setFirstField(rs.getInt("firstfield"));
n.setSecondField(rs.getString("secondfield"));
...
...
v.add(n);
}
}
catch (SQLException e)
{
e.printStackTrace();
}
// close connection
Once you have your collection of beans, just write a for loop that calls the webservice one time for each bean.
for(SomeBean s : v){
callToYouWS(s);
}

Java memory leak caused by MySQL libraries

I have a thread that executes and updates a database with some values. I commented out all the operations done to the data and just left the lines you can see below. While running the program with the lines you see bellow I get a memory leak.
This is what it looks like in VisualVM
Mysql Class
public ResultSet executeQuery(String Query) throws SQLException {
statement = this.connection.createStatement();
resultSet = statement.executeQuery(Query);
return resultSet;
}
public void executeUpdate(String Query) throws SQLException {
Statement tmpStatement = this.connection.createStatement();
tmpStatement.executeUpdate(Query);
tmpStatement.close();
tmpStatement=null;
}
Thread file
public void run() {
ResultSet results;
String query;
int id;
String IP;
int port;
String SearchTerm;
int sleepTime;
while (true) {
try {
query = "SELECT * FROM example WHERE a='0'";
results = this.database.executeQuery(query);
while (results.next()) {
id = results.getInt("id");
query = "UPDATE example SET a='1' WHERE id='"
+ id + "'";
SearchTerm=null;
this.database.executeUpdate(query);
}
results.close();
results = null;
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
The problem has happened with many other people after researching the web,
https://forum.hibernate.org/viewtopic.php?f=1&t=987128
Is bone cp or mysql.jdbc.JDBC4Connection known for leaking?
and a few more if you google "jdbc4resultset memory leak"
The leak is yours, not MySQL's. You aren't closing the statement.
The design of your method is poor. It should close the statement, and it should return something that will survive closing of the statement, such as a CachedRowSet.
Or else it should not exist at all. It's only three lines, and it doesn't support query parameters, so it isn't really much use. I would just delete it.
You also appear to have statement as an instance member, which is rarely if ever correct. It should be local to the method. At present your code isn't even thread-safe.
You should also be closing the ResultSet in a finally block to ensure it gets closed. Ditto the Statement.
Make sure that you are explicitly closing the database connections.

one base executeQuery method or one for every query

I've started creating a toDoList and I like to create a "DataMapper" to fire queries to my Database.
I created this Datamapper to handle things for me but I don't know if my way of thinking is correct in this case. In my Datamapper I have created only 1 method that has to execute the queries and several methods that know what query to fire (to minimalize the open and close methods).
For example I have this:
public Object insertItem(String value) {
this.value = value;
String insertQuery = "INSERT INTO toDoList(item,datum) " + "VALUES ('" + value + "', CURDATE())";
return this.executeQuery(insertQuery);
}
public Object removeItem(int id) {
this.itemId = id;
String deleteQuery = "DELETE FROM test WHERE id ='" + itemId + "'";
return this.executeQuery(deleteQuery);
}
private ResultSet executeQuery(String query) {
this.query = query;
Connection con = null;
Statement st = null;
ResultSet rs = null;
try {
con = db.connectToAndQueryDatabase(database, user, password);
st = con.createStatement();
st.executeUpdate(query);
}
catch (SQLException e1) {
e1.printStackTrace();
}
finally {
if (rs != null) {
try {
rs.close();
} catch (SQLException e2) { /* ignored */}
}
if (st != null) {
try {
st.close();
} catch (SQLException e2) { /* ignored */}
}
if (con != null) {
try {
con.close();
} catch (SQLException e2) { /* ignored */}
}
System.out.println("connection closed");
}
return rs;
}
So now I don't know if it's correct to return a ResultSet like this. I tought of doing something like
public ArrayList<ToDoListModel> getModel() {
return null;
}
To insert every record returned in a ArrayList. But I feel like I'm stuck a little bit. Can someone lead me to a right way with an example or something?
It depends on the way the application works. If you have a lot of databases hits in a short time it would be better to bundle them and use the same database connection for all querys to reduce the overhead of the connection establishment and cleaning.
If you only have single querys in lager intervals you could do it this way.
You should also consider if you want to seperate the database layer and the user interface (if existing).
In this case you should not pass the ResultSet up to the user interface but wrap the data in an independent container and pass this through your application.
If I understand your problem correctly!, you need to pass a list of ToDoListModel objects
to insert into the DB using the insertItem method.
How you pass your object to insert items does not actually matter, but what you need to consider is how concurrent this DataMapper works, if it can be accessed by multiple threads at a time, you will end up creating multiple db connections which is little expensive.Your code actually works without any issue in sequential access.
So you can add a synchronized block to connection creation and make DataMapper class singleton.
Ok in that case what you can do is, create a ArrayList of hashmap first. which contains Key, Value as Column name and Column value. After that you can create your model.
public List convertResultSetToArrayList(ResultSet rs) throws SQLException{
ResultSetMetaData mdata = rs.getMetaData();
int columns = mdata.getColumnCount();
ArrayList list = new ArrayList();
while (rs.next()){
HashMap row = new HashMap(columns);
for(int i=1; i<=columns; ++i){
row.put(md.getColumnName(i),rs.getObject(i));
}
list.add(row);
}
return list;
}

jdbc performance

There are thee tables inside my database. One is employee, the second is employee_Project, and the third is employee_Reporting. Each table has a common employee_Number as its primary key, and there is a one to many relationship among them such that an employee has many projects and reporting dates.
I have run select * from employee, select * from employee_project, select * from employee_reporting in three data holder classes which have methods fillResultSet(Result set) and List<T> getData(). This is based on a SqlDbEngine class with a runQuery(PreparedStatement,DataHolder) method, and the implementation has been completed.
Now I have to design a getAllEmployee() method along with project and reporting detail with optimal code in java using JDBC. I have used an iterator but this solution is not acceptable; now I have to use a foreach loop.
This is what I have done:
public List<Employee> getAllEmployees() {
EmployeeDataHolderImpl empdataholder = new EmployeeDataHolderImpl();
List<Employee> list_Employee_Add = null;
try {
Connection connection = mySqlDbConnection.getConnection();
PreparedStatement preparedStatement = connection
.prepareStatement(GET_ALL_EMPLOYEE_DETAILS);
mySqlDBEngineImpl.runQuery(preparedStatement, empdataholder);
} catch (SQLException e) {
e.printStackTrace();
}
for (Employee employee : empdataholder.getData()) {
new EmployeeDAOImpl().getProject(employee);
new EmployeeDAOImpl.getReport(employee);
}
list_Employee_Add = empdataholder.getData();
return list_Employee_Add;
}
and make another method
public void getProject(Employee emp) {
EmployeeProjectDataHolderImpl employeeProjectHolder = new EmployeeProjectDataHolderImpl();
try {
Connection connection = mySqlDbConnection.getConnection();
PreparedStatement preparedStatement = connection
.prepareStatement(GET_ALL_PROJECT_DETAILS);
mySqlDBEngineImpl
.runQuery(preparedStatement, employeeProjectHolder);
} catch (SQLException e) {
e.printStackTrace();
}
for (EmployeeProject employee_Project : employeeProjectHolder.getData()) {
if (employee_Project.getEmployeeNumber() == emp.getEmpNumber()) {
emp.getProjects().add(employee_Project);
}
}
}
public void getReport(Employee emp) {
EmployeeReportDataHolderImpl employeeReportHolder = new EmployeeReportDataHolderImpl();
try {
Connection connection = mySqlDbConnection.getConnection();
PreparedStatement preparedStatement = connection
.prepareStatement(GET_ALL_REPORT_DETAILS);
mySqlDBEngineImpl
.runQuery(preparedStatement, employeeReportHolder);
} catch (SQLException e) {
e.printStackTrace();
}
for (EmployeeReport employee_Report : employeeReportHolder.getData()) {
if (employee_Report.getEmployeeNumber() == emp.getEmpNumber()) {
emp.getProjects().add(employee_Project);
}
}
}
}
and same for Employee Reporting but doing, this performance is going to decrease.no body worry about closing connection i will do it
Please tell me how I could improve my solution..
There are some issue with your code.
1.you are initializing EmployeeDAOImpl everytime, rather you can just keep one instance and call the operations over it.
new EmployeeDAOImpl().getProject(employee); new
EmployeeDAOImpl.getReport(employee);
2.I don't see where you close your connection after performing an SQL operation.
You should be having
try {
--code statements
}
catch(SQLException e){
e.printStackTrace();
}
finally{
-- close your connection and preparedStatement
}
Closing database connections is very vital.
If you use your actual code, you will have 3 impacts in your code:
You're opening a connection to get the employee's data.
For every employee, you open (and close) a new connection to get his projects.
For every employee, you open (and close) a new connection to get his reports.
Note that opening a new connection is a performance hit on your application. It doesn't matter if you use an enhanced for-loop or an Iterator, there would be many hits that can slow down your application.
Two ways to solve this problem:
Open a single connection where you run all your select statements. This will be better than opening/closing lot of connections.
Create a single SQL statement to retrieve the employees and the data you need for every employee. It will have better performance for different reasons:
A single connection to the database.
A single query instead of lot of queries to the database (a single I/O operation).
If your rdbms allows it, the query will be optimized for future requests (a single query instead of multiple queries).
I would prefer to go with the second option. For this, I tend to use a method that executes any SQL select statement and return a ResultSet. I'll post a basic example (note, the provided code can be improved depending on your needs), this method could be in your SqlDbEngine class:
public ResultSet executeSQL(Connection con, String sql, List<Object> arguments) {
PreparedStatement pstmt = null;
ResultSet rs = null;
try {
pstmt = con.prepareStatement(sql);
if (arguments != null) {
int i = 1;
for(Object o : arguments) {
pstmt.setObject(i++, o);
}
}
//method to execute insert, update, delete statements...
rs = pstmt.execute();
} catch(SQLException e) {
//handle the error...
}
return rs;
}
And this other method to handle all the query operation
public List<Employee> getAllEmployee() {
Connection con = null;
ResultSet rs = null;
List<Employee> lstEmployee = new ArrayList<Employee>();
try {
con = mySqlDbConnection.getConnection();
//write the sql to retrieve all the data
//I'm assuming these can be your columns, it's up to you
//this can be written using JOINs...
String sql = "SELECT E.EMPLOYEE_ID, E.EMPLOYEE_NAME, P.PROJECT_NAME, R.REPORT_NAME FROM EMPLOYEE E, PROJECT P, REPORT R WHERE E.EMPLOYEE_ID = P.EMPLOYEE_ID AND E.EMPLOYEE_ID = R.EMPLOYEE_ID";
//I guess you don't need parameters for this...
rs = SqlDbEngine.executeSQL(con, sql, null);
if (rs != null) {
Employee e;
int employeeId = -1, lastEmployeeId = -1;
while (rs.next()) {
//you need to make sure to create a new employee only when
//reading a new employee id
employeeId = rs.getInt("EMPLOYEE_ID");
if (lastEmployeeId != employeeId) {
e = new Employee();
lastEmployeeId = employeeId;
lstEmployee.add(e);
}
Project p = new Project();
Report r = new Report();
//fill values of p...
//fill values of r...
//you can fill the values taking advantage of the column name in the resultset
//at last, link the project and report to the employee
e.getProjects().add(p);
e.getReports().add(r);
}
}
} catch (Exception e) {
//handle the error...
} finally {
try {
if (rs != null) {
Statement stmt = rs.getStatement();
rs.close();
stmt.close();
}
if (con != null) {
con.close();
}
} catch (SQLException e) {
//handle the error...
}
}
return lstEmployee;
}
Note that the second way can be harder to code but it will give you the best performance. It's up to you to improve the provided methods, some advices:
Create a class that receives a ResultSet and builds a Project instance using the columns name of the ResultSet (similar for Report and Employee).
Create a method that handles the ResultSet and its Statement close.
As a best practice, never use select * from mytable, it's preferable to write the needed columns.
If I understand correctly, your code first loads all EmployeeReport rows and then filters them according to getEmployeeNumber(). You can let your database do this by modifying your SQL query.
Since you didn't show your SQL queries (I assume they're in GET_ALL_REPORT_DETAILS), I'll just make a guess... Try executing SQL like:
select *
from employee_reporting
where employeeNumber = ?
If you put this in a PreparedStatement, and then set the parameter value, your database will only return the data you need. For example:
PreparedStatement pstmt = con.prepareStatement(GET_ALL_REPORT_DETAILS);
pstmt.setInt(1, employee.getEmployeeNumber());
That should return only the EmployeeReport records having the desired employeeNumber. In case performance is still an issue, you could consider adding an index to your EmployeeReport table, but that's a different story...

problems with update statement in SQLite

I have created a database using SQLite. I want to update the value of a "features" column( type Blob)...but i do not know how to write the "update" statement .
This is what i tried:
try {
stat = conn.createStatement();
} catch (SQLException e) {
}
try {
byte[] b = getFunction();
stat.executeUpdate("update table set features="+b);
} catch (SQLException e) {
}
i get the follwing error :
java.sql.SQLException: unrecognized token: "[B#13a317a"
so i guess that "b" is the problem ?
[B#13a317a looks like a array to string result (b.toString() in this case). You should use a prepared statement for the blob like:
update table set features=?
An example is here.
Generally, you should never create a SQL by concatenating strings. This is the recipe for SQL injection problems.
Try this one with PreparedStatement:
Connection con = null;
PreparedStatement stmt = null;
try {
byte[] b = getFunction();
con = ...;
stmt = con.prepareStatement("update table set features=?");
stmt.setBytes(1, b);
stmt.executeUpdate();
con.commit();
}
catch (SQLException e) {
//handle exception (consider con.rollback()) and con maybe null here)
}
finally {
//close stmt and at least con here (all maybe null here)
}
Personally I am always using PreparedStatements. When you have to write a lot of this code then consider writing some Utility-Classes to reduce Boilerplate-Code.
In particular you should consider writing Utilty-Classes for null-safe calling methods on Connection, Statement and ResultSet methods when you are dealing with plain JDBC.
EDIT
What Thomas Jung wrote about preventing SQL Injections is another big pro for always using PreparedStatements. +1 for him :-)
stat.executeUpdate("update table set features="+b[0].toString());
you have to use +

Categories

Resources