I'm loading local Couchbase instance with application specific json objects.
Relevant code is:
CouchbaseClient getCouchbaseClient()
{
List<URI> uris = new LinkedList<URI>();
uris.add(URI.create("http://localhost:8091/pools"));
CouchbaseConnectionFactoryBuilder cfb = new CouchbaseConnectionFactoryBuilder();
cfb.setFailureMode(FailureMode.Retry);
cfb.setMaxReconnectDelay(1500); // to enqueue an operation
cfb.setOpTimeout(10000); // wait up to 10 seconds for an operation to succeed
cfb.setOpQueueMaxBlockTime(5000); // wait up to 5 seconds when trying to
// enqueue an operation
return new CouchbaseClient(cfb.buildCouchbaseConnection(uris, "my-app-bucket", ""));
}
Method to store entry (I'm using suggestions from Bulk Load and Exponential Backoff):
void continuosSet(CouchbaseClient cache, String key, int exp, Object value, int tries)
{
OperationFuture<Boolean> result = null;
OperationStatus status = null;
int backoffexp = 0;
do
{
if (backoffexp > tries)
{
throw new RuntimeException(MessageFormat.format("Could not perform a set after {0} tries.", tries));
}
result = cache.set(key, exp, value);
try
{
if (result.get())
{
break;
}
else
{
status = result.getStatus();
LOG.warn(MessageFormat.format("Set failed with status \"{0}\" ... retrying.", status.getMessage()));
if (backoffexp > 0)
{
double backoffMillis = Math.pow(2, backoffexp);
backoffMillis = Math.min(1000, backoffMillis); // 1 sec max
Thread.sleep((int) backoffMillis);
LOG.warn("Backing off, tries so far: " + tries);
}
backoffexp++;
}
}
catch (ExecutionException e)
{
LOG.error("ExecutionException while doing set: " + e.getMessage());
}
catch (InterruptedException e)
{
LOG.error("InterruptedException while doing set: " + e.getMessage());
}
}
while (status != null && status.getMessage() != null && status.getMessage().indexOf("Temporary failure") > -1);
}
When continuosSet method called for a large amount of objects to store (single thread) e.g.
CouchbaseClient cache = getCouchbaseClient();
do
{
SerializableData data = queue.poll();
if (data != null)
{
final String key = data.getClass().getSimpleName() + data.getId();
continuosSet(cache, key, 0, gson.toJson(data, data.getClass()), 100);
...
it generates CheckedOperationTimeoutException inside of continuosSet method in result.get() operation.
Caused by: net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node: 127.0.0.1/127.0.0.1:11210
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:160) ~[spymemcached-2.8.12.jar:2.8.12]
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:133) ~[spymemcached-2.8.12.jar:2.8.12]
Can someone shed light into this how to overcome and recover from this situation? Is there a good technique/workaround on how to bulk load in Java client for Couchbase? I already explored documentation Performing a Bulk Set which is unfortunately for PHP Couchbase client.
My suspicion is that you may be running this in a JVM spawned from the command line that doesn't have that much memory. If that's the case, you could hit longer GC pauses which could cause the timeout you're mentioning.
I think the best thing to do is to try a couple of things. First, raise the -Xmx argument to the JVM to use more memory. See if the timeout happens later or goes away. If so, then my suspicion about memory is correct.
If that doesn't work, raise the setOpTimeout() and see if that reduces the error or makes it go away.
Also, make sure you're using the latest client.
By the way, I don't think this is directly bulk loading related. It may happen owing to a lot of resource consumption during bulk loading, but it looks like the regular backoff must be working or you're not ever hitting it.
Related
A process I've been working on for a little while now. Process was running fine until the performance was taking a hit. I figured out a way to get it to perform very fast, but I'm really unsure what is happening behind the scenes. And it's now throwing warnings and errors and I'm not sure what to do. File is getting porocessed but I'm not sure if all threads are complete, and I don't believe I am shutting down the app correctly. Here is everything you need to know...
File is read using a buffered reader, we then run some data quality checks on each record, every record that is read and passes data quality checks we create a java object out of it and insert into a List. Once the List is 1000 objects big, we then call an OracleService class which has a Repo autowired and we execute a saveAll method with the List. We then continue to read the file and do this until the file is done being read. I am passing in, to the service, and ExecutorService object. So every time we call that service it is getting a new List object containing my objects (this object is basically the table we are loading) and a new ExecutorService Object. Process is running fine but getting a ton of exceptions being thrown once I try to shutdown. Here is all my code...
My Controller class run method. This will get called from another class which implements CommandLineRunner
public void run() throws ParseException, IOException, InterruptedException {
logger.info("******************** Aegis Check Inclearing DDA Trial Balance Table Load starting ********************");
try (BufferedReader reader = new BufferedReader(new FileReader(inputFile))) {
String line = reader.readLine();
int count = 0;
TrialBalanceBuilder builder = new TrialBalanceBuilder();
while (line != null) {
if (line.startsWith("D")) {
if (dataQuality(line)) {
TrialBalance trialBalance = builder.buildTrialBalanceObject(line, procDt, time);
insertList.add(trialBalance);
count++;
if (count == 1000) {
oracleService.loadToTableTrialBalance(insertList, executorService);
count = 0;
insertList.clear();
}
} else {
logger.info("Data quality check FAILED for record: " + line);
oracleService.revertInserts("DDA_TRIAL_BAL_STG",procDt.toString());
System.exit(111);
}
}
line = reader.readLine();
}
logger.info("Leftover record count is " + insertList.size());
oracleService.loadToTableTrialBalance(insertList, executorService);
} catch (IOException e) {
e.printStackTrace();
}
logger.info("Updating Metadata table with new batch proc date");
InclearingBatchMetadataBuilder inclearingBatchMetadataBuilder = new InclearingBatchMetadataBuilder();
InclearingBatchMetadata inclearingBatchMetadata = inclearingBatchMetadataBuilder.buildInclearingBatchMetadataObject("DDA_TRIAL_BAL_STG", procDt, time, Constants.bankID);
oracleService.insertBatchProcDtIntoMetaTable(inclearingBatchMetadata);
logger.info("Successfully updated Metadata table with new batch proc date: " + procDt);
Thread.sleep(10000);
oracleService.cleanUpGOS("DDA_TRIAL_BAL_STG",1);
executorService.shutdownNow();
logger.info("******************** Aegis Check Inclearing DDA Trial Balance Table Load ended successfully ********************");
}
I'm passing in an ExecutorService object to the service class. This is defined as...
private final ThreadFactory threadFactory = new ThreadFactoryBuilder().setNameFormat("Orders-%d").setDaemon(true).build();
private ExecutorService executorService = Executors.newFixedThreadPool(10, threadFactory);
My service class looks as such....
#Service("oracleService")
public class OracleService {
private static final Logger logger = LoggerFactory.getLogger(OracleService.class);
#Autowired
TrialBalanceRepo trialBalanceRepo;
#Transactional
public void loadToTableTrialBalance(List<TrialBalance> trialBalanceList, ExecutorService executorService) {
logger.debug("Let's load to the database");
logger.debug(trialBalanceList.toString());
List<TrialBalance> multiThreadList = new ArrayList<>(trialBalanceList);
try {
executorService.execute(() -> trialBalanceRepo.saveAll(multiThreadList));
} catch (ConcurrentModificationException | DataIntegrityViolationException ignored) {}
logger.debug("Successfully loaded to database");
}
In my run method i then call a few more methods in that Service class which create nativequeries and execute on the database (for purging etc.)
Anyway, I never know when the threads are complete. And I am finding in pre-production, when running with a lot of data, we shut down the app and not all the data is completely loaded. Also I don't know if this is even the best design. Do I keep passing in these executorservice objects? The whole point of this was to get optimal parallelism going so that our performance was better. Perhaps there is a better way (preferably without redesigning the entire app and using something other than JPA)
I need to add a timeout for the reply/request transaction using 0MQ. How is this typically accomplished? I tried using the method :
socket.setReceiveTimeOut();
and
socket.setSendTimeout();
but they seem to cause a null pointer exception.
In essence, I want the application to timeout after 10 seconds if the application receiving the request is not available.
Any help is appreciated.
Thanks!
I think jzmq should throw a ZMQException when recv timeout,
But there is no ZMQException, when err = EAGAIN.
https://github.com/zeromq/jzmq/blob/master/jzmq-jni/src/main/c%2B%2B/Socket.cpp
static
zmq_msg_t *do_read(JNIEnv *env, jobject obj, zmq_msg_t *message, int flags)
{
void *socket = get_socket (env, obj);
int rc = zmq_msg_init (message);
if (rc != 0) {
raise_exception (env, zmq_errno());
return NULL;
}
#if ZMQ_VERSION >= ZMQ_MAKE_VERSION(3,0,0)
rc = zmq_recvmsg (socket, message, flags);
#else
rc = zmq_recv (socket, message, flags);
#endif
int err = zmq_errno();
if (rc < 0 && err == EAGAIN) {
rc = zmq_msg_close (message);
err = zmq_errno();
if (rc != 0) {
raise_exception (env, err);
return NULL;
}
return NULL;
}
if (rc < 0) {
raise_exception (env, err);
rc = zmq_msg_close (message);
err = zmq_errno();
if (rc != 0) {
raise_exception (env, err);
return NULL;
}
return NULL;
}
return message;
}
I wonder if your null pointer is related to how your socket was created. I have set a socket timeout successfully in the past.
The following has worked for me when I used the JeroMQ library (native Java implementation of ZMQ). I used this to help do REQ-REP commands via ZMQ.
ZMQ.Context context = ZMQ.context(1);
ZMQ.Socket sock = context.socket(ZMQ.REQ);
sock.setSendTimeOut(10000); // 10 second send timeout
sock.setReceiveTimeOut(10000); // 10 second receive timeout
if (sock.connect("tcp://127.0.0.1:1234")) {
if (sock.send(/* insert data here */)) {
/* Send was successful and did not time out. */
byte[] replyBytes = null;
replyBytes = sock.recv();
if (null == replyBytes) {
/* Receive timed out. */
} else {
/* Receive was successful. Do something with replyBytes. */
}
}
}
How is this [a timeout for the reply/request transaction] typically accomplished ?
I am sad to confirm, there is nothing like this in the ZeroMQ native API. The principle of doing async delivery means, there is no limit for delivery to take place ( in a best-effort model of scheduling, or not at all ).
If new to the ZeroMQ, you may enjoy a fast read into this 5-second read about the main conceptual elements in [ ZeroMQ hierarchy in less than a five seconds ] Section.
I want ... to timeout after 10 seconds if ... receiving the request is not ...
May design your .recv()-method call usage into either a pre-tested / protected after a .poll( 10000 )-method screener, to first explicitly detect a presence of any message for indeed being delivered to your application-code, before ever issuing ( or not ) a call to the actual .recv()-method only upon a previously POSACK-ed message to be ready to get locally read, or may use a bit more "raw" approach, using a handler with a non-blocking form of the method, by a call to the .recv( ZMQ_NOBLOCK )-flagged not to spend a millisecond "there", in cases when "there" are no messages to read right now from the local-side Context()-engine instance, and handle each of the cases accordingly in your code.
A Bonus Point
Also be warned, that using the REQ/REP-Scalable Formal Communication Archetype pattern will not be any easier, as there is a mandatory two-side-step-dance ( sure, if not intentionally artificially ZMQ_RELAXED ), so the both FSA-back-to-back-connected-FSA-s will still have to wait for the next "expected" remote-event, before becoming able to make a chance for handling the next local-event. If interested in details, one will find many posts on un-avoidable, un-salvagable mutual-deadlock, that is sure to happen for REQ/REP, where we only do not know when it happens, but are sure it will.
Currently I have a requirement in which I need to read tons of insert queries approx 100 000 queries written in a .sql file. I need to read this file from SdCard and update the local sqlite database in my application.
I am using below code with successful update:
At the same time I am worried about the performance issue and risk of application hang when it will be like a task of heavier sql file. Do I need to change the approach what I am using right now? Is there any better implementation with Sqlite Database to perform this task with no risk.
/* Iterate through lines (assuming each insert has its own line
and theres no other stuff) */
while (insertReader.ready()) {
String insertStmt = insertReader.readLine();
db.execSQL(insertStmt);
bytesRead += insertStmt.length();
int percent = (int) (bytesRead * 100 / totalBytes);
// To show the updating data progress
if (previousPercent < percent) {
System.out.println(percent);
previousPercent = percent;
Bundle b = new Bundle();
b.putInt(AppConstants.KEY_PROGRESS_PERCENT, percent);
Message msg = new Message();
msg.setData(b);
errorViewhandler.sendMessage(msg);
}
result++;
}
insertReader.close();
I have found that the fastest way to import a lot of data into your database is to use applyBatch() on a ContentProvider.
Here is an example I reduced from one of my projects:
ArrayList<ContentProviderOperation> cpoList = new ArrayList<ContentProviderOperation>();
while (condition) {
ContentProviderOperation.Builder b = ContentProviderOperation.newInsert(YourOwnContentProvider.CONTENT_URI);
b.withYieldAllowed(true);
b.withValue(COLUMN_ONE, 1);
b.withValue(COLUMN_TWO, 2);
cpoList.add(b.build());
// You could check here, whether you have a lot of items in your list already. If you do, you should send them to applyBatch already.
}
if(cpoList.size() > 0) {
context.get().getContentResolver().applyBatch(YourOwnContentProvider.AUTHORITY, cpoList);
}
As an example implementation of applyBatch:
#Override
public ContentProviderResult[] applyBatch(ArrayList<ContentProviderOperation> operations) {
ContentProviderResult[] result = new ContentProviderResult[operations.size()];
int i = 0;
SQLiteDatabase sqlDB = db.getWritableDatabase();
sqlDB.beginTransaction();
try {
for (ContentProviderOperation operation : operations) {
result[i++] = operation.apply(this, result, i);
}
sqlDB.setTransactionSuccessful();
} catch (OperationApplicationException e) {
// Deal with exception
} finally {
sqlDB.endTransaction();
}
return result;
}
I originally tried this:
private static boolean checkUrlsAreReachable(String... urls) {
checkArgument(urls.length > 0);
List<F.Promise<WS.HttpResponse>> promises = newArrayList();
for (String url : urls) {
promises.add(WS.url(url).followRedirects(true).timeout("30s").getAsync());
}
List<WS.HttpResponse> results = await(F.Promise.waitAll(promises));
for (WS.HttpResponse response : results) {
if (!response.success()) {
logger.debug("Failed accessing one of " + Joiner.on(", ").join(urls));
return false;
}
}
return true;
}
But I found several caveats:
I'm getting an exception on WS.url(url) if the URL in question does not resolve well (e.g. http://a.com/).
At least when debugging, it seems the call to getAsync() blocks ... is it really async in production? I know Play has fewer thread in Dev mode, but I thought the call wouldn't even start executing at this point.
If one of the URLs is not reachable, I'm not sure how to log which failed (how to access the URL from the WS.HttpResponse object)
So, I turned to use sync HTTP instead of async. The following implementation seems to work:
private static boolean checkUrlsAreReachable(String... urls) {
checkArgument(urls.length > 0);
List<F.Promise<Boolean>> promises = newArrayList();
for (final String url : urls) {
promises.add(new Job<Boolean>(){
#Override
public Boolean doJobWithResult() throws Exception {
try {
WS.HttpResponse result = WS.url(url).followRedirects(true)
.timeout("30s").get();
return result.success();
} catch (Exception e) {
return false;
}
}
}.now());
}
F.Promise<List<Boolean>> allResults = F.Promise.waitAll(promises);
List<Boolean> booleans = await(allResults);
return Booleans3.and(booleans);
}
Is there a way to make the async implementation work?
set job pool setting part in application.conf
# Jobs executor
# ~~~~~~
# Size of the Jobs pool
play.jobs.pool=20
# Execution pool
# ~~~~~
# Default to 1 thread in DEV mode or (nb processors + 1) threads in PROD mode.
# Try to keep a low as possible. 1 thread will serialize all requests (very useful for debugging purpose)
play.pool=5
And just put the checking part in a job, such as CheckingJob, and start it using
new CheckingJob().now()
it will be async.
I am getting Java Heap Space Error while writing large data from database to an excel sheet.
I dont want to use JVM -XMX options to increase memory.
Following are the details:
1) I am using org.apache.poi.hssf api
for excel sheet writing.
2) JDK version 1.5
3) Tomcat 6.0
Code i have wriiten works well for around 23 thousand records, but it fails for more than 23K records.
Following is the code:
ArrayList l_objAllTBMList= new ArrayList();
l_objAllTBMList = (ArrayList) m_objFreqCvrgDAO.fetchAllTBMUsers(p_strUserTerritoryId);
ArrayList l_objDocList = new ArrayList();
m_objTotalDocDtlsInDVL= new HashMap();
Object l_objTBMRecord[] = null;
Object l_objVstdDocRecord[] = null;
int l_intDocLstSize=0;
VisitedDoctorsVO l_objVisitedDoctorsVO=null;
int l_tbmListSize=l_objAllTBMList.size();
System.out.println(" getMissedDocDtlsList_NSM ");
for(int i=0; i<l_tbmListSize;i++)
{
l_objTBMRecord = (Object[]) l_objAllTBMList.get(i);
l_objDocList = (ArrayList) m_objGenerateVisitdDocsReportDAO.fetchAllDocDtlsInDVL_NSM((String) l_objTBMRecord[1], p_divCode, (String) l_objTBMRecord[2], p_startDt, p_endDt, p_planType, p_LMSValue, p_CycleId, p_finYrId);
l_intDocLstSize=l_objDocList.size();
try {
l_objVOFactoryForDoctors = new VOFactory(l_intDocLstSize, VisitedDoctorsVO.class);
/* Factory class written to create and maintain limited no of Value Objects (VOs)*/
} catch (ClassNotFoundException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
} catch (InstantiationException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
} catch (IllegalAccessException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
}
for(int j=0; j<l_intDocLstSize;j++)
{
l_objVstdDocRecord = (Object[]) l_objDocList.get(j);
l_objVisitedDoctorsVO = (VisitedDoctorsVO) l_objVOFactoryForDoctors.getVo();
if (((String) l_objVstdDocRecord[6]).equalsIgnoreCase("-"))
{
if (String.valueOf(l_objVstdDocRecord[2]) != "null")
{
l_objVisitedDoctorsVO.setPotential_score(String.valueOf(l_objVstdDocRecord[2]));
l_objVisitedDoctorsVO.setEmpcode((String) l_objTBMRecord[1]);
l_objVisitedDoctorsVO.setEmpname((String) l_objTBMRecord[0]);
l_objVisitedDoctorsVO.setDoctorid((String) l_objVstdDocRecord[1]);
l_objVisitedDoctorsVO.setDr_name((String) l_objVstdDocRecord[4] + " " + (String) l_objVstdDocRecord[5]);
l_objVisitedDoctorsVO.setDoctor_potential((String) l_objVstdDocRecord[3]);
l_objVisitedDoctorsVO.setSpeciality((String) l_objVstdDocRecord[7]);
l_objVisitedDoctorsVO.setActualpractice((String) l_objVstdDocRecord[8]);
l_objVisitedDoctorsVO.setLastmet("-");
l_objVisitedDoctorsVO.setPreviousmet("-");
m_objTotalDocDtlsInDVL.put((String) l_objVstdDocRecord[1], l_objVisitedDoctorsVO);
}
}
}// End of While
writeExcelSheet(); // Pasting this method at the end
// Clean up code
l_objVOFactoryForDoctors.resetFactory();
m_objTotalDocDtlsInDVL.clear();// Clear the used map
l_objDocList=null;
l_objTBMRecord=null;
l_objVstdDocRecord=null;
}// End of While
l_objAllTBMList=null;
m_objTotalDocDtlsInDVL=null;
-------------------------------------------------------------------
private void writeExcelSheet() throws IOException
{
HSSFRow l_objRow = null;
HSSFCell l_objCell = null;
VisitedDoctorsVO l_objVisitedDoctorsVO = null;
Iterator l_itrDocMap = m_objTotalDocDtlsInDVL.keySet().iterator();
while (l_itrDocMap.hasNext())
{
Object key = l_itrDocMap.next();
l_objVisitedDoctorsVO = (VisitedDoctorsVO) m_objTotalDocDtlsInDVL.get(key);
l_objRow = m_objSheet.createRow(m_iRowCount++);
l_objCell = l_objRow.createCell(0);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(String.valueOf(l_intSrNo++));
l_objCell = l_objRow.createCell(1);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getEmpname() + " (" + l_objVisitedDoctorsVO.getEmpcode() + ")"); // TBM Name
l_objCell = l_objRow.createCell(2);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getDr_name());// Doc Name
l_objCell = l_objRow.createCell(3);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getPotential_score());// Freq potential score
l_objCell = l_objRow.createCell(4);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getDoctor_potential());// Freq potential score
l_objCell = l_objRow.createCell(5);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getSpeciality());//CP_GP_SPL
l_objCell = l_objRow.createCell(6);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getActualpractice());// Actual practise
l_objCell = l_objRow.createCell(7);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getPreviousmet());// Lastmet
l_objCell = l_objRow.createCell(8);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getLastmet());// Previousmet
}
// Write OutPut Stream
try {
out = new FileOutputStream(m_objFile);
outBf = new BufferedOutputStream(out);
m_objWorkBook.write(outBf);
} catch (Exception ioe) {
ioe.printStackTrace();
System.out.println(" Exception in chunk write");
} finally {
if (outBf != null) {
outBf.flush();
outBf.close();
out.close();
l_objRow=null;
l_objCell=null;
}
}
}
Instead of populating the complete list in memory before starting to write to excel you need to modify the code to work in such a way that each object is written to a file as it is read from the database. Take a look at this question to get some idea of the other approach.
Well, I'm not sure if POI can handle incremental updates but if so you might want to write chunks of say 10000 Rows to the file. If not, you might have to use CSV instead (so no formatting) or increase memory.
The problem is that you need to make objects written to the file elligible for garbage collection (no references from a live thread anymore) before writing the file is finished (before all rows have been generated and written to the file).
Edit:
If can you write smaller chunks of data to the file you'd also have to only load the necessary chunks from the db. So it doesn't make sense to load 50000 records at once and then try and write 5 chunks of 10000, since those 50000 records are likely to consume a lot of memory already.
As Thomas points out, you have too many objects taking up too much space, and need a way to reduce that. There is a couple of strategies for this I can think of:
Do you need to create a new factory each time in the loop, or can you reuse it?
Can you start with a loop getting the information you need into a new structure, and then discarding the old one?
Can you split the processing into a thread chain, sending information forwards to the next step, avoiding building a large memory consuming structure at all?