I am getting this weird case when querying Postgres 8.4 for some records with Blobs (of type OIDs) with Hibernate. The query does return all right but when my code wants to read the content of the BLOB with the simple code below, it gets 0 bytes back
public static byte[] readBlob(Blob blob) throws Exception {
InputStream is = null;
try {
is = blob.getBinaryStream();
return org.apache.commons.io.IOUtils.toByteArray(is);
}
finally {
if (is != null)
try {
is.close();
}
catch(Exception e) {}
}
}
Funny think is that I am getting this behavior only since I've started adding more then one such records to the table.
The underlying JDBC library is type 3 (postgresq 8.4-701).
Can someone give me a hint as to how to solve this issue?
Thanks
Peter
Looks like you may have found this bug:
http://opensource.atlassian.com/projects/hibernate/browse/HHH-4876
It was a while since I have run into similar issue, and since I've refreshed my memories about this topic I am thinking of sharing the results. The problem is that Postgres (and a few versions back Oracle too) will not handle the Blob content at the record creation time in the same transaction. Funny think is that one needs to pass the content after the external file (where the content gets stored eventually) has been well created and reserved for the operation. Yes, the record gets created but the Blob is blank. To have the Blob filled out with whatever you need to put in, you need that operation in a second transaction (sort of an update record). That's a funny business (maybe a major bug), ehe
Related
I want to copy one Hbase table to another location with good performance.
I would like to reuse the code from CopyTable.java from Hbase-server github page
I've been looking the doccumentation from hbase but it didn't help me much http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CopyTable.html
After looking in this post of stackoverflow: Can a main() method of class be invoked in another class in java
I think I can directly call it using its main class.
Question: Do you think anyway better to get this copy done rather than using CopyTable from hbase-server ? Do you see any inconvenience using this CopyTable ?
Question: Do you think anyway better to get this copy done rather than
using CopyTable from hbase-server ? Do you see any inconvenience using
this CopyTable ?
First thing is snapshot is better way than CopyTable.
HBase Snapshots allow you to take a snapshot of a table without too much impact on Region Servers. Snapshot, Clone and restore operations don't involve data copying. Also, Exporting the snapshot to another cluster doesn't have impact on the Region Servers.
Prior to version 0.94.6, the only way to backup or to clone a table is to use CopyTable/ExportTable, or to copy all the hfiles in HDFS after disabling the table. The disadvantages of these methods are that you can degrade region server performance (Copy/Export Table) or you need to disable the table, that means no reads or writes; and this is usually unacceptable.
Snapshot is not just rename, between multiple operations if you want to restore at one particular point then this is the right case to use :
A snapshot is a set of metadata information that allows an admin to get back to a previous state of the table. A snapshot is not a copy of the table; it’s just a list of file names and doesn’t copy the data. A full snapshot restore means that you get back to the previous “table schema” and you get back your previous data losing any changes made since the snapshot was taken.
Also, see Snapshots+and+Repeatable+reads+for+HBase+Tables
Snapshot Internals
Another Map reduce way than CopyTable :
You can implement something like below in your code this is for standalone program where as you have write mapreduce job to insert multiple put records as a batch (may be 100000).
This increased performance for standalone inserts in to hbase client you can try this in mapreduce way
public void addMultipleRecordsAtaShot(final ArrayList<Put> puts, final String tableName) throws Exception {
try {
final HTable table = new HTable(HBaseConnection.getHBaseConfiguration(), getTable(tableName));
table.put(puts);
LOG.info("INSERT record[s] " + puts.size() + " to table " + tableName + " OK.");
} catch (final Throwable e) {
e.printStackTrace();
} finally {
LOG.info("Processed ---> " + puts.size());
if (puts != null) {
puts.clear();
}
}
}
along with that you can also consider below...
Enable write buffer to large value than default
1) table.setAutoFlush(false)
2) Set buffer size
<property>
<name>hbase.client.write.buffer</name>
<value>20971520</value> // you can double this for better performance 2 x 20971520 = 41943040
</property>
OR
void setWriteBufferSize(long writeBufferSize) throws IOException
The buffer is only ever flushed on two occasions:
Explicit flush
Use the flushCommits() call to send the data to the servers for permanent storage.
Implicit flush
This is triggered when you call put() or setWriteBufferSize().
Both calls compare the currently used buffer size with the configured limit and optionally invoke the flushCommits() method.
In case the entire buffer is disabled, setting setAutoFlush(true) will force the client to call the flush method for every invocation of put().
I am getting CPU performance issue on server when I am trying to download CSV in my project, CPU goes 100% but SQL returns the response within 1 minute. In the CSV we are writing around 600K records for one user it is working fine but for concurrent users we are getting this issue.
Environment
Spring 4.2.5
Tomcat 7/8 (RAM 2GB Allocated)
MySQL 5.0.5
Java 1.7
Here is the Spring Controller code:-
#RequestMapping(value="csvData")
public void getCSVData(HttpServletRequest request,
HttpServletResponse response,
#RequestParam(value="param1", required=false) String param1,
#RequestParam(value="param2", required=false) String param2,
#RequestParam(value="param3", required=false) String param3) throws IOException{
List<Log> logs = service.getCSVData(param1,param2,param3);
response.setHeader("Content-type","application/csv");
response.setHeader("Content-disposition","inline; filename=logData.csv");
PrintWriter out = response.getWriter();
out.println("Field1,Field2,Field3,.......,Field16");
for(Log row: logs){
out.println(row.getField1()+","+row.getField2()+","+row.getField3()+"......"+row.getField16());
}
out.flush();
out.close();
}}
Persistance Code:- I am using spring JDBCTemplate
#Override
public List<Log> getCSVLog(String param1,String param2,String param3) {
String sql =SqlConstants.CSV_ACTIVITY.toString();
List<Log> csvLog = JdbcTemplate.query(sql, new Object[]{param1, param2, param3},
new RowMapper<Log>() {
#Override
public Log mapRow(ResultSet rs, int rowNum)
throws SQLException {
Log log = new Log();
log.getField1(rs.getInt("field1"));
log.getField2(rs.getString("field2"));
log.getField3(rs.getString("field3"));
.
.
.
log.getField16(rs.getString("field16"));
}
return log;
}
});
return csvLog;
}
I think you need to be specific on what you meant by "100% CPU usage" whether it's the Java process or MySQL server. As you have got 600K records, trying to load everything in to memory would easily end up in OutOfMemoryError. Given that this works for one user means that you've got enough heap space to process this number of records for just one user and symptoms surface when there are multiple users trying to use the same service.
First issue I can see in your posted code is that you try to load everything into one big list and the size of the list varies based on the content of the Log class. Using a list like this also means that you have to have enough memory to process JDBC result set and generate new list of Log instances. This can be a major problem with a growing number of users. This type of short-lived objects will cause frequent GC and once GC cannot keep up with the amount of garbage being collected it fails obviously. To solve this major issue my suggestion is to use ScrollableResultSet. Additionally you can make this result set read-only, for example below is code fragment for creating a scrollable result set. Take a look at the documentation for how to use it.
Statement st = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_READ_ONLY);
Above option is suitable if you're using pure JDBC or SpringJDBC template. If Hibernate is already used in your project you can still achieve the same this with the below code fragment. Again please check the documentation for more information and you have a different JPA provider.
StatelessSession session = sessionFactory.openStatelessSession();
Query query = session.createSQLQuery(queryStr).setCacheable(false).setFetchSize(Integer.MIN_VALUE).setReadOnly(true);
query.setParameter(query_param_key, query_paramter_value);
ScrollableResults resultSet = query.scroll(ScrollMode.FORWARD_ONLY);
This way you're not loading all the records to Java process in one go, instead you they're loaded on demand and will have small memory footprint at any given time. Note that JDBC connection will be open until you're done with processing the entire record set. This also means that your DB connection pool can be exhausted if many users are going to download CSV files from this endpoint. You need to take measures to overcome this problem (i.e use of an API manager to rate limit the calls to this endpoint, reading from a read-replica or whatever viable option).
My other suggestion is to stream data which you have already done, so that any records fetched from the DB are processed and sent to client before the next set of records are processed. Again I would suggest you to use a CSV library such as SuperCSV to handle this as these libraries are designed to handle a good load of data.
Please note that this answer may not exactly answer your question as you haven't provided necessary parts of your source such as how to retrieve data from DB but will give the right direction to solve this issue
Your problem in loading all data on application server from database at once, try to run query with limit and offset parameters (with mandatory order by), push loaded records to client and load next part of data with different offset. It help you decrease memory footprint and will not required keep connection to database open all the time. Of course, database will loaded a bit more, but maybe whole situation will better. Try different limit values, for example 5K-50K and monitor cpu usage - on both app server and database.
If you can allow keep many open connection to database #Bunti answer is very good.
http://dev.mysql.com/doc/refman/5.7/en/select.html
long time lurker, first question time.
I tried searching for how to get all of the tables from a database created with OpenOffice using JDBC, and while I found answers that work for others, they do not work for me. The code itself actually returns something, but it returns something completely unexpected.
My code:
try {
DatabaseMetaData md = conn.getMetaData();
rs = md.getTables(null, null, "%", null);
while (rs.next()) {
tableNames.add(rs.getString(3));
System.out.println(rs.getString(3));
}
}
catch (Exception e) {
System.out.println("error in sendConnection()");
}
And the output:
SYSTEM_ALIASES
SYSTEM_ALLTYPEINFO
SYSTEM_AUTHORIZATIONS
SYSTEM_BESTROWIDENTIFIER
SYSTEM_CACHEINFO
SYSTEM_CATALOGS
SYSTEM_CHECK_COLUMN_USAGE
SYSTEM_CHECK_CONSTRAINTS
SYSTEM_CHECK_ROUTINE_USAGE
SYSTEM_CHECK_TABLE_USAGE
SYSTEM_CLASSPRIVILEGES
SYSTEM_COLLATIONS
SYSTEM_COLUMNPRIVILEGES
SYSTEM_COLUMNS
SYSTEM_CROSSREFERENCE
SYSTEM_INDEXINFO
SYSTEM_PRIMARYKEYS
SYSTEM_PROCEDURECOLUMNS
SYSTEM_PROCEDURES
SYSTEM_PROPERTIES
SYSTEM_ROLE_AUTHORIZATION_DESCRIPTORS
SYSTEM_SCHEMAS
SYSTEM_SCHEMATA
SYSTEM_SEQUENCES
SYSTEM_SESSIONINFO
SYSTEM_SESSIONS
SYSTEM_SUPERTABLES
SYSTEM_SUPERTYPES
SYSTEM_TABLEPRIVILEGES
SYSTEM_TABLES
SYSTEM_TABLETYPES
SYSTEM_TABLE_CONSTRAINTS
SYSTEM_TEXTTABLES
SYSTEM_TRIGGERCOLUMNS
SYSTEM_TRIGGERS
SYSTEM_TYPEINFO
SYSTEM_UDTATTRIBUTES
SYSTEM_UDTS
SYSTEM_USAGE_PRIVILEGES
SYSTEM_USERS
SYSTEM_VERSIONCOLUMNS
SYSTEM_VIEWS
SYSTEM_VIEW_COLUMN_USAGE
SYSTEM_VIEW_ROUTINE_USAGE
SYSTEM_VIEW_TABLE_USAGE
What is being returned, and how can I work around or resolve this? Thank you in advance!
Edit: The Databases created buh OpenOffice appear to be Embedded Databases by default. This may be causing the problem. Going to try and convert it to something else and see what happens.
I found a way to fix this, in case others come across this problem as well. The problem was OpenOffice was saving the database as a base file, with hsqldb under it. You need to make it just a regular hsqldb.
I used both of these links as resources:
http://programmaremobile.blogspot.com/2009/01/java-and-openoffice-base-db-through.html
https://forum.openoffice.org/en/forum/viewtopic.php?f=83&t=65980
In short, you need to extract the .odb file, go into the directories and find the database directory holding 4 other files. Add a prefix to them and then access it like normal.
I am still getting the monstrosity of the SYSTEM_* tables, but now I am actually getting the tables I want as well. From there I think I can figure out how to just get those random tables.
Just whacking together an export from an old DB that contains binary data, I stumbled over an exception in one of our utility methods:
java.lang.AbstractMethodError: net.sourceforge.jtds.jdbc.BlobImpl.free()
After checking our codebase, I found that utility method was never used until now, bascially it looks like this:
public BinaryHolder getBinary(final int columnIndex) throws SQLException {
Blob blob = null;
try {
blob = resultSet.getBlob(columnIndex);
final BinaryHolder binary = BinaryHolderUtil.create(blob);
return binary;
} finally {
if (blob != null)
blob.free();
}
}
BinaryHolder is just a wrapper that holdes the binary data (and before you ask, the code executes fine until it reaches the finally clause - BinaryHolderUtil.create(blob) does not attempt to free the blob).
Investigating further I found that everywhere else we access Blob's, the blob is just obtained using getBlob() and not free'd at all (The Javadoc says it will be automatically disposed of when the result set is closed).
Question now: Should the blob be free()'d manually (after all the ResultSet may be held for more than just accessing the blob), and if yes how can it be free()'d in a way that works even with a driver that does not implement it?
(We are using SQL-Server with JTDS1.25, if that wasn't already obvious from the exception)
The Blob.free() was introduced in JDBC 4.0 / Java 6. So you are most likely using a JDBC 3.0 or earlier JDBC driver.
As with most (JDBC) resources, closing them as soon as possible has its advantages (eg GC can collect it earlier, database resources are freed etc). That is also why you can close a ResultSet even though it is closed when you close the statement (or execute the statement again), just like you can close a Statement even though it is closed when the Connection is closed.
So a Blob does not need to be freed, but it is - in general - a good idea to free it when you are done with it.
BTW: JTDS is only JDBC 3.0, you would be better off using the Microsoft SQL Server JDBC driver of Microsoft itself.
I am updating multiple records in database. Now whenever UI sends the list of records to be updated, I have to just update those records in database. I am using JDBC template for that.
Earlier Case
Earlier what I was whenever I got records from UI, I just do
jdbcTemplate.batchUpdate(Query, List<object[]> params)
Whenever there was an exception, I used to rollback whole transaction.
(Updated : Is batchUpdate multi-threaded or faster than batch update in some way?)
Later Case
But later as requirement changed whenever there was exception. So, whenever there is some exception, I should know which records failed to update. So I had to sent the records back to UI in case of exception with a reason, why did they failed.
so I had to do something similar to this:
for(Record record : RecordList)
{
try{
jdbcTemplate.update(sql, Object[] param)
}catch(Exception ex){
record.setReason("Exception : "+ex.getMessage());
continue;
}
}
So am I doing this in right fashion, by using the loop?
If yes, can someone suggest me how to make it multi-threaded.
Or is there anything wrong in this case.
To be true, I was hesitating to use try catch block inside the loop :( .
Please correct me, really need to learn a better way because I myself feel, there must be a better way , thanks.
make all update-operation to a Collection Callable<>,
send it to java.util.concurrent.ThreadPoolExecutor. the pool is multithreaded.
make Callable:
class UpdateTask implements Callable<Exception> {
//constructor with jdbctemplate,sql,param goes here.
#Override
public Exception call() throws Exception {
try{
jdbcTemplate.update(sql, Object[] param)
}catch(Exception ex){
return ex;
}
return null;
}
invoke call:
<T> List<Future<T>> java.util.concurrent.ExecutorService.invokeAll(Collection<? extends Callable<T>> tasks) throws InterruptedException
Your case looks like you need to use validation in java and filter out the valid data alone and send to the data base for updating.
BO layer
-> filter out the Valid Record.
-> Invalid Record should be send back with some validation text.
In DAO layer
-> batch update your RecordList
This will give you the best performance.
Never use database insert exception as a validation mechanism.
Exceptions are costly as the stack trace has to be created
Connection to database is another costly process and will take time to get a connection
Java If-Else will run much faster for same data-base validation