OrmLite: Difference between Dao.callBatchTasks() and TransactionManager.callInTransaction()

OrmLite: Difference between Dao.callBatchTasks() and TransactionManager.callInTransaction() - java

Which is the difference between these methods? I have readed the docs but I don't understand what callBatchTasks method do. Documentation says:
This will turn off what databases call "auto-commit" mode, run the
call-able and then re-enable "auto-commit".
Is't it a transaction?
Thanks.

Difference between Dao.callBatchTasks() and TransactionManager.callInTransaction()
The difference depends on the database you are using. Under Android, there is no difference. The javadocs for callBatchTasks(...) says:
Call the call-able that will perform a number of batch tasks. This is for performance when you want to run a number of database operations at once -- maybe loading data from a file. This will turn off what databases call "auto-commit" mode, run the call-able, and then re-enable "auto-commit". If auto-commit is not supported then a transaction will be used instead.
Android's SQLite is one of the databases. Inside the internal ORMLite code you see:
private <CT> CT doCallBatchTasks(DatabaseConnection connection, boolean saved,
Callable<CT> callable) throws SQLException {
if (databaseType.isBatchUseTransaction()) {
return TransactionManager.callInTransaction(connection, saved, databaseType,
callable);
}
...
So internally, when using under Android, dao.callBatchTasks(...) is a call through to TransactionManager.callInTransaction(...).

Related

Hbase CopyTable inside Java

I want to copy one Hbase table to another location with good performance.
I would like to reuse the code from CopyTable.java from Hbase-server github page
I've been looking the doccumentation from hbase but it didn't help me much http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CopyTable.html
After looking in this post of stackoverflow: Can a main() method of class be invoked in another class in java
I think I can directly call it using its main class.
Question: Do you think anyway better to get this copy done rather than using CopyTable from hbase-server ? Do you see any inconvenience using this CopyTable ?

Question: Do you think anyway better to get this copy done rather than
using CopyTable from hbase-server ? Do you see any inconvenience using
this CopyTable ?
First thing is snapshot is better way than CopyTable.
HBase Snapshots allow you to take a snapshot of a table without too much impact on Region Servers. Snapshot, Clone and restore operations don't involve data copying. Also, Exporting the snapshot to another cluster doesn't have impact on the Region Servers.
Prior to version 0.94.6, the only way to backup or to clone a table is to use CopyTable/ExportTable, or to copy all the hfiles in HDFS after disabling the table. The disadvantages of these methods are that you can degrade region server performance (Copy/Export Table) or you need to disable the table, that means no reads or writes; and this is usually unacceptable.
Snapshot is not just rename, between multiple operations if you want to restore at one particular point then this is the right case to use :
A snapshot is a set of metadata information that allows an admin to get back to a previous state of the table. A snapshot is not a copy of the table; it’s just a list of file names and doesn’t copy the data. A full snapshot restore means that you get back to the previous “table schema” and you get back your previous data losing any changes made since the snapshot was taken.
Also, see Snapshots+and+Repeatable+reads+for+HBase+Tables
Snapshot Internals
Another Map reduce way than CopyTable :
You can implement something like below in your code this is for standalone program where as you have write mapreduce job to insert multiple put records as a batch (may be 100000).
This increased performance for standalone inserts in to hbase client you can try this in mapreduce way
public void addMultipleRecordsAtaShot(final ArrayList<Put> puts, final String tableName) throws Exception {
try {
final HTable table = new HTable(HBaseConnection.getHBaseConfiguration(), getTable(tableName));
table.put(puts);
LOG.info("INSERT record[s] " + puts.size() + " to table " + tableName + " OK.");
} catch (final Throwable e) {
e.printStackTrace();
} finally {
LOG.info("Processed ---> " + puts.size());
if (puts != null) {
puts.clear();
}
}
}
along with that you can also consider below...
Enable write buffer to large value than default
1) table.setAutoFlush(false)
2) Set buffer size
<property>
<name>hbase.client.write.buffer</name>
<value>20971520</value> // you can double this for better performance 2 x 20971520 = 41943040
</property>
OR
void setWriteBufferSize(long writeBufferSize) throws IOException
The buffer is only ever flushed on two occasions:
Explicit flush
Use the flushCommits() call to send the data to the servers for permanent storage.
Implicit flush
This is triggered when you call put() or setWriteBufferSize().
Both calls compare the currently used buffer size with the configured limit and optionally invoke the flushCommits() method.
In case the entire buffer is disabled, setting setAutoFlush(true) will force the client to call the flush method for every invocation of put().

Getting CPU 100 percent when I am trying to downloading CSV in Spring

I am getting CPU performance issue on server when I am trying to download CSV in my project, CPU goes 100% but SQL returns the response within 1 minute. In the CSV we are writing around 600K records for one user it is working fine but for concurrent users we are getting this issue.
Environment
Spring 4.2.5
Tomcat 7/8 (RAM 2GB Allocated)
MySQL 5.0.5
Java 1.7
Here is the Spring Controller code:-
#RequestMapping(value="csvData")
public void getCSVData(HttpServletRequest request,
HttpServletResponse response,
#RequestParam(value="param1", required=false) String param1,
#RequestParam(value="param2", required=false) String param2,
#RequestParam(value="param3", required=false) String param3) throws IOException{
List<Log> logs = service.getCSVData(param1,param2,param3);
response.setHeader("Content-type","application/csv");
response.setHeader("Content-disposition","inline; filename=logData.csv");
PrintWriter out = response.getWriter();
out.println("Field1,Field2,Field3,.......,Field16");
for(Log row: logs){
out.println(row.getField1()+","+row.getField2()+","+row.getField3()+"......"+row.getField16());
}
out.flush();
out.close();
}}
Persistance Code:- I am using spring JDBCTemplate
#Override
public List<Log> getCSVLog(String param1,String param2,String param3) {
String sql =SqlConstants.CSV_ACTIVITY.toString();
List<Log> csvLog = JdbcTemplate.query(sql, new Object[]{param1, param2, param3},
new RowMapper<Log>() {
#Override
public Log mapRow(ResultSet rs, int rowNum)
throws SQLException {
Log log = new Log();
log.getField1(rs.getInt("field1"));
log.getField2(rs.getString("field2"));
log.getField3(rs.getString("field3"));
.
.
.
log.getField16(rs.getString("field16"));
}
return log;
}
});
return csvLog;
}

I think you need to be specific on what you meant by "100% CPU usage" whether it's the Java process or MySQL server. As you have got 600K records, trying to load everything in to memory would easily end up in OutOfMemoryError. Given that this works for one user means that you've got enough heap space to process this number of records for just one user and symptoms surface when there are multiple users trying to use the same service.
First issue I can see in your posted code is that you try to load everything into one big list and the size of the list varies based on the content of the Log class. Using a list like this also means that you have to have enough memory to process JDBC result set and generate new list of Log instances. This can be a major problem with a growing number of users. This type of short-lived objects will cause frequent GC and once GC cannot keep up with the amount of garbage being collected it fails obviously. To solve this major issue my suggestion is to use ScrollableResultSet. Additionally you can make this result set read-only, for example below is code fragment for creating a scrollable result set. Take a look at the documentation for how to use it.
Statement st = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_READ_ONLY);
Above option is suitable if you're using pure JDBC or SpringJDBC template. If Hibernate is already used in your project you can still achieve the same this with the below code fragment. Again please check the documentation for more information and you have a different JPA provider.
StatelessSession session = sessionFactory.openStatelessSession();
Query query = session.createSQLQuery(queryStr).setCacheable(false).setFetchSize(Integer.MIN_VALUE).setReadOnly(true);
query.setParameter(query_param_key, query_paramter_value);
ScrollableResults resultSet = query.scroll(ScrollMode.FORWARD_ONLY);
This way you're not loading all the records to Java process in one go, instead you they're loaded on demand and will have small memory footprint at any given time. Note that JDBC connection will be open until you're done with processing the entire record set. This also means that your DB connection pool can be exhausted if many users are going to download CSV files from this endpoint. You need to take measures to overcome this problem (i.e use of an API manager to rate limit the calls to this endpoint, reading from a read-replica or whatever viable option).
My other suggestion is to stream data which you have already done, so that any records fetched from the DB are processed and sent to client before the next set of records are processed. Again I would suggest you to use a CSV library such as SuperCSV to handle this as these libraries are designed to handle a good load of data.
Please note that this answer may not exactly answer your question as you haven't provided necessary parts of your source such as how to retrieve data from DB but will give the right direction to solve this issue

Your problem in loading all data on application server from database at once, try to run query with limit and offset parameters (with mandatory order by), push loaded records to client and load next part of data with different offset. It help you decrease memory footprint and will not required keep connection to database open all the time. Of course, database will loaded a bit more, but maybe whole situation will better. Try different limit values, for example 5K-50K and monitor cpu usage - on both app server and database.
If you can allow keep many open connection to database #Bunti answer is very good.
http://dev.mysql.com/doc/refman/5.7/en/select.html

Testing MongoDB in Java: concurrency problems

I'm testing the part of my Java application where I store data in a MongoDB database. My test setup looks like this:
public class MongoDataStoreTest {
private MongoClient client;
#Before
public void before() throws UnknownHostException {
this.client = new MongoClient();
}
#After
public void after() throws InterruptedException {
this.client.dropDatabase("testdb");
this.client.close();
}
}
In my tests I execute some code which does the following:
I create a DB instance with: DB database = client.getDB("testdb")
I add a collection in the database: database.getCollection("testcoll")
And then I insert a BasicDBObject: collection.insert(object, WriteConcern.SAFE)
Directly after this I query the database using the standard cursor method.
As can be seen in my test setup code, after each test I drop the database and close all client connections. I execute ten such tests. When running them locally everything happens as I expect. The objects are inserted and afterwards the database is dropped for each test (I can see this in the mongo log). However when executing this on a Jenkins server it sometimes happens that when querying the database, an object of the previous test is still in that database, although that database should have been dropped. This looks like a concurrency problem to me, but I can't see where the race condition is situated. I have no access to the database log on the Jenkins server. Does anyone know what I should change to make sure my tests always succeed?

Dont't drop the database. There might be some internal references to it in mongo. I don't beleive, that your test-case needs the DB to be dropped. Usually it's enough simply to remove the all documents in the collections under test

To clear MongoDB databases our code looks like this:
public void clearData() {
try {
for (String collection : datastore.getDB().getCollectionNames()) {
// We must not mess with system indexes and users as this will cause
// errors
if (!collection.startsWith("system.")) {
// Do not drop the entire database or full collections as this
// will lead to missing index errors (for no obvious reason).
datastore.getDB().getCollection(collection).drop();
}
}
} catch (MongoException e) {
LOG.log(Level.INFO,
"Could not fetch all collection names - this is a permission thing, but can be ignored");
}
// The indexes are not automatically recreated (for no obvious
// reason) - ensure they are still there after the drop().
datastore.ensureIndexes();
datastore.ensureCaps();
}

The problem was caused by the dropDatabase operation.
This operation seemed to take longer on the Jenkins server than on my local machine. Since MongoDB doesn't seem to wait until the database is completely dropped it added the new document in the old (dropping) database.
To keep my tests as independent as possible I solved the problem by generating a different unique database name for each test.

Should JDBC Blob (not) be free()'d after use?

Just whacking together an export from an old DB that contains binary data, I stumbled over an exception in one of our utility methods:
java.lang.AbstractMethodError: net.sourceforge.jtds.jdbc.BlobImpl.free()
After checking our codebase, I found that utility method was never used until now, bascially it looks like this:
public BinaryHolder getBinary(final int columnIndex) throws SQLException {
Blob blob = null;
try {
blob = resultSet.getBlob(columnIndex);
final BinaryHolder binary = BinaryHolderUtil.create(blob);
return binary;
} finally {
if (blob != null)
blob.free();
}
}
BinaryHolder is just a wrapper that holdes the binary data (and before you ask, the code executes fine until it reaches the finally clause - BinaryHolderUtil.create(blob) does not attempt to free the blob).
Investigating further I found that everywhere else we access Blob's, the blob is just obtained using getBlob() and not free'd at all (The Javadoc says it will be automatically disposed of when the result set is closed).
Question now: Should the blob be free()'d manually (after all the ResultSet may be held for more than just accessing the blob), and if yes how can it be free()'d in a way that works even with a driver that does not implement it?
(We are using SQL-Server with JTDS1.25, if that wasn't already obvious from the exception)

The Blob.free() was introduced in JDBC 4.0 / Java 6. So you are most likely using a JDBC 3.0 or earlier JDBC driver.
As with most (JDBC) resources, closing them as soon as possible has its advantages (eg GC can collect it earlier, database resources are freed etc). That is also why you can close a ResultSet even though it is closed when you close the statement (or execute the statement again), just like you can close a Statement even though it is closed when the Connection is closed.
So a Blob does not need to be freed, but it is - in general - a good idea to free it when you are done with it.
BTW: JTDS is only JDBC 3.0, you would be better off using the Microsoft SQL Server JDBC driver of Microsoft itself.

Opening database managing handles

I am using berkeley database....
Generally at this type of database you open an environment which is a bunch of files to control locking and transactions etc and then you open your database in this environment...
The problem is that there are LOTS of databases to open....
The method to open a database is opendatabase()
However opening and closing the database always is slow... The documentation says
Opening a database is a relatively
expensive operation, and maintaining a
set of open databases will normally be
preferable to repeatedly opening and
closing the database for each new
query.
The problem is how to maintain that set ??????
A simple solution i thought was lazy loading
private static Database db;
public CustomerDAO() {
if (db == null) {
try {
DatabaseConfig dbConfig = new DatabaseConfig();
dbConfig.setAllowCreate(true);
dbConfig.setType(DatabaseType.BTREE);
db = BDBEnvironment.DEFAULT.getEnvironment().openDatabase(null, "C:\\xxxx\\CUSTOMERS",
null, dbConfig);
But this has a problem with double check locking.. Right???
Another issue is that i want to have a default file name or a user specified one.. Of course it is easy to create a DatabaseManager but always double check lock issue would occur.
Any ideas how to maintain a set of database handles??

Use basic Java synchronization techniques and a thread-safe data structure such as a ConcurrentHashMap to store your DB handles. You should probably read this book if you haven't already, as it covers a lot of what you'll need for this kind of issue.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.