I know Apache Curator can do the distributed lock feature which is build on the top of zookeeper. It looks like very easy to use based on the document which is posted in the Apache Curator official website. For example:
RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3);
CuratorFramework client = CuratorFrameworkFactory.newClient("host:ip",retryPolicy);
InterProcessSemaphoreMutex lock = new InterProcessSemaphoreMutex(client, path);
if(lock.acquire(10, TimeUnit.SECONDS))
try { /*do something*/ }
finally { lock.release(); }
But what does the second parameter "path" of "InterProcessSemaphoreMutex" mean? It means "the path for the lock" based on API but what exactly is it? Can anybody give me an example?
If I have millions of locks, do I have to create millions of "path to the lock"? Is there any limit that the maximum number of locks(znodes) a zookeeper cluster has? Or can we remove this lock when a process releases it?
ZooKeeper presents what looks like a distributed file system. For any ZooKeeper operation, recipe, etc., you write "znodes" to a particular path and watch for changes. See here: http://zookeeper.apache.org/doc/trunk/zookeeperOver.html#Simple+API (regarding znodes).
For Curator recipes, it needs to know the base path you want to use to perform the recipe. For InterProcessSemaphoreMutex, the path is what every participant should use. i.e. Process 1 and Process 2 want to both contend for the lock. So, they both allocate InterProcessSemaphoreMutex instances with the same path, say "/my/lock". Think of the path as the lock identifier. In the same ZooKeeper cluster, you could have multiple locks by using different paths.
Hope this helps (disclaimer: I'm the main author of Curator).
Some examples about Reaper.
public void testSomeNodes() throws Exception
Timing timing = new Timing();
ChildReaper reaper = null;
CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1));
Random r = new Random();
int nonEmptyNodes = 0;
for ( int i = 0; i < 10; ++i )
client.create().creatingParentsIfNeeded().forPath("/test/" + Integer.toString(i));
if ( r.nextBoolean() )
client.create().forPath("/test/" + Integer.toString(i) + "/foo");
reaper = new ChildReaper(client, "/test", Reaper.Mode.REAP_UNTIL_DELETE, 1);
Stat stat = client.checkExists().forPath("/test");
Assert.assertEquals(stat.getNumChildren(), nonEmptyNodes);
Java Code Examples for org.apache.curator.framework.recipes.locks.Reaper
A process I've been working on for a little while now. Process was running fine until the performance was taking a hit. I figured out a way to get it to perform very fast, but I'm really unsure what is happening behind the scenes. And it's now throwing warnings and errors and I'm not sure what to do. File is getting porocessed but I'm not sure if all threads are complete, and I don't believe I am shutting down the app correctly. Here is everything you need to know...
File is read using a buffered reader, we then run some data quality checks on each record, every record that is read and passes data quality checks we create a java object out of it and insert into a List. Once the List is 1000 objects big, we then call an OracleService class which has a Repo autowired and we execute a saveAll method with the List. We then continue to read the file and do this until the file is done being read. I am passing in, to the service, and ExecutorService object. So every time we call that service it is getting a new List object containing my objects (this object is basically the table we are loading) and a new ExecutorService Object. Process is running fine but getting a ton of exceptions being thrown once I try to shutdown. Here is all my code...
My Controller class run method. This will get called from another class which implements CommandLineRunner
public void run() throws ParseException, IOException, InterruptedException {
logger.info("******************** Aegis Check Inclearing DDA Trial Balance Table Load starting ********************");
try (BufferedReader reader = new BufferedReader(new FileReader(inputFile))) {
String line = reader.readLine();
int count = 0;
TrialBalanceBuilder builder = new TrialBalanceBuilder();
while (line != null) {
if (line.startsWith("D")) {
if (dataQuality(line)) {
TrialBalance trialBalance = builder.buildTrialBalanceObject(line, procDt, time);
if (count == 1000) {
oracleService.loadToTableTrialBalance(insertList, executorService);
count = 0;
} else {
logger.info("Data quality check FAILED for record: " + line);
line = reader.readLine();
logger.info("Leftover record count is " + insertList.size());
oracleService.loadToTableTrialBalance(insertList, executorService);
} catch (IOException e) {
logger.info("Updating Metadata table with new batch proc date");
InclearingBatchMetadataBuilder inclearingBatchMetadataBuilder = new InclearingBatchMetadataBuilder();
InclearingBatchMetadata inclearingBatchMetadata = inclearingBatchMetadataBuilder.buildInclearingBatchMetadataObject("DDA_TRIAL_BAL_STG", procDt, time, Constants.bankID);
logger.info("Successfully updated Metadata table with new batch proc date: " + procDt);
logger.info("******************** Aegis Check Inclearing DDA Trial Balance Table Load ended successfully ********************");
I'm passing in an ExecutorService object to the service class. This is defined as...
private final ThreadFactory threadFactory = new ThreadFactoryBuilder().setNameFormat("Orders-%d").setDaemon(true).build();
private ExecutorService executorService = Executors.newFixedThreadPool(10, threadFactory);
My service class looks as such....
public class OracleService {
private static final Logger logger = LoggerFactory.getLogger(OracleService.class);
TrialBalanceRepo trialBalanceRepo;
public void loadToTableTrialBalance(List<TrialBalance> trialBalanceList, ExecutorService executorService) {
logger.debug("Let's load to the database");
List<TrialBalance> multiThreadList = new ArrayList<>(trialBalanceList);
try {
executorService.execute(() -> trialBalanceRepo.saveAll(multiThreadList));
} catch (ConcurrentModificationException | DataIntegrityViolationException ignored) {}
logger.debug("Successfully loaded to database");
In my run method i then call a few more methods in that Service class which create nativequeries and execute on the database (for purging etc.)
Anyway, I never know when the threads are complete. And I am finding in pre-production, when running with a lot of data, we shut down the app and not all the data is completely loaded. Also I don't know if this is even the best design. Do I keep passing in these executorservice objects? The whole point of this was to get optimal parallelism going so that our performance was better. Perhaps there is a better way (preferably without redesigning the entire app and using something other than JPA)
I am getting 20k small xml files 1kb to 3kb size in a minute.
I have to write all the files as it arrives in the directory.
Sometimes the speed of the incoming files increases to 100k per minute.
Is there anything in java or aws api that can help me match the incoming speed?
I am using uploadFileList() API to upload all the files .
I have tried watch event as well so that when ever files arrives in a folder it will upload that file into S3 but that is so slow compared to incoming files and creates huge amount of backlogs.
I have tried multi threading also but if i spin up more thread i get error from S3 reduce you request rate error.
and some times i get below error also
com.amazonaws.services.s3.model.AmazonS3Exception: Your socket
connection to the server was not read from or written to within the
timeout period. Idle connections will be closed.
but when i dot use threading i do not get this error
Another way I also have tried is to create one big files and then upload into S3 and then in S3 i again split it into small files which is fine but this solution delays the files upload int S3 and impacts the user who access this file from S3.
I know uploading small files into S3 is not suitable but i have use case like that.
The speed i noticed is 5k files upload in a minutes.
Can someone please suggest some alternate way so that my speed of uploading files will increase least 15k per minutes.
I am sharing my full code where i am trying to upload using multi threaded application
Class one where i create File to put into thread
public class FileProcessThreads {
public ArrayList process(String fileLocation) {
File dir = new File(fileLocation);
File[] directoryListing = dir.listFiles();
ArrayList<File> files = new ArrayList<File>();
if (directoryListing.length > 0) {
for (File path : directoryListing) {
return files;
Class 2 where i create Thread pool and Executor
public class UploadExecutor {
private static String fileLocation = "C:\\Users\\u6034690\\Desktop\\ONEFILE";
// private static String fileLocation="D:\\TRFAudits_Moved\\";
private static final String _logFileName = "s3FileUploader.log";
private static Logger _logger = Logger.getLogger(UploadExecutor.class);
public static void main(String[] args) {
_logger.info("----------Stating application's main method----------------- ");
AWSCredentials credential = new ProfileCredentialsProvider("TRFAuditability-Prod-ServiceUser").getCredentials();
final ClientConfiguration config = new ClientConfiguration();
AmazonS3Client s3Client = (AmazonS3Client) AmazonS3ClientBuilder.standard().withRegion("us-east-1")
.withCredentials(new AWSStaticCredentialsProvider(credential)).withForceGlobalBucketAccessEnabled(true)
TransferManager tm = new TransferManager(s3Client);
while (true) {
FileProcessThreads fp = new FileProcessThreads();
List<File> records = fp.process(fileLocation);
while (records.size() <= 0) {
try {
_logger.info("No records found willl wait for 10 Seconds");
records = fp.process(fileLocation);
} catch (InterruptedException e) {
_logger.error("InterruptedException: " + e.toString());
_logger.info("Total no of Audit files = " + records.size());
ExecutorService es = Executors.newFixedThreadPool(2);
int recordsInEachThread = (int) (records.size() / 2);
_logger.info("No of records in each thread = " + recordsInEachThread);
UploadObject my1 = new UploadObject(records.subList(0, recordsInEachThread), tm);
UploadObject my2 = new UploadObject(records.subList(recordsInEachThread, records.size()), tm);
try {
boolean finshed = es.awaitTermination(1, TimeUnit.MINUTES);
if (!finshed) {
} catch (InterruptedException e) {
_logger.error("InterruptedException: " + e.toString());
Last class where i upload files into S3
public class UploadObject implements Runnable{
static String bucketName = "a205381-auditxml/S3UPLOADER";
private String fileLocation="C:\\Users\\u6034690\\Desktop\\ONEFILE";
//private String fileLocation="D:\\TRFAudits\\";
//static String bucketName = "a205381-auditxml/S3UPLOADER";
private static Logger _logger;
List<File> records;
TransferManager tm;
UploadObject(List<File> list,TransferManager tm){
this.records = list;
_logger = Logger.getLogger(UploadObject.class);
public void run(){
System.out.println(Thread.currentThread().getName() + " : ");
public void uploadToToS3() {
_logger.info("Number of record to be processed in current thread: : "+records.size());
MultipleFileUpload xfer = tm.uploadFileList(bucketName, "TEST",new File(fileLocation), records);
try {
TransferState xfer_state = xfer.getState();
_logger.info("Upload status -----------------" + xfer_state);
for (File file : records) {
try {
} catch (IOException e) {
_logger.error("IOException: "+e.toString());
_logger.info("Successfully completed file cleanse");
} catch (AmazonServiceException e) {
_logger.error("AmazonServiceException: "+e.toString());
} catch (AmazonClientException e) {
_logger.error("AmazonClientException: "+e.toString());
} catch (InterruptedException e) {
_logger.error("InterruptedException: "+e.toString());
_logger.info("Upload completed");
_logger.info("Calling Transfer manager shutdown");
It sounds like you're tripping the built-in protections for S3 (quoted docs below). I've also listed some similar questions below; some of these advise rearchitecting using SQS to even out and distribute the load on S3.
Aside from introducing more moving pieces, you can reuse your S3Client and TransferManager. Move them up out of your runnable object and pass them into its constructor. TransferManager itself uses multithreading according to the javadoc.
When possible, TransferManager attempts to use multiple threads to upload multiple parts of a single upload at once. When dealing with large content sizes and high bandwidth, this can have a significant increase on throughput.
You can also increase the max number of simultaneous connections that the S3Client uses.
s3Client.getClientConfiguration().setMaxConnections(75) or even higher.
Lastly, you could try to upload to different prefixes/folders under the bucket, as noted below to allow scaling for high request rates.
The current AWS Request Rate and Performance Guidelines
Amazon S3 automatically scales to high request rates. For example, your application can achieve at least 3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket. It is simple to increase your read or write performance exponentially. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second.
The current AWS S3 Error Best Practices
Tune Application for Repeated SlowDown errors
As with any distributed system, S3 has protection mechanisms which detect intentional or unintentional resource over-consumption and react accordingly. SlowDown errors can occur when a high request rate triggers one of these mechanisms. Reducing your request rate will decrease or eliminate errors of this type. Generally speaking, most users will not experience these errors regularly; however, if you would like more information or are experiencing high or unexpected SlowDown errors, please post to our Amazon S3 developer forum https://forums.aws.amazon.com/ or sign up for AWS Premium Support https://aws.amazon.com/premiumsupport/.
Similar questions:
S3 SlowDown: Please reduce your request rate exception
Amazon Web Services S3 Request Limit
AWS Forums - Maximizing Connection Reuse for S3 getObjectMetadata() Calls
S3 Transfer Acceleration does not necessarily give faster upload speeds. It is sometime slower than normal upload when using from same region. Amazon S3 Transfer Acceleration uses the AWS edge infrastructure they have around the world to get data on to the aws backbone quicker. When you use Amazon S3 Transfer Acceleration your request is routed to the best AWS edge location based on latency. Transfer Acceleration will then send your uploads back to S3 over the AWS-managed backbone network using optimized network protocols, persistent connections from edge to origin, fully-open send and receive windows, and so forth. As you would already be within the region you wouldn't see any benefit to using this. But, its better to test the speed from https://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html
This question already has answers here:
Best Practice to Use HttpClient in Multithreaded Environment
(5 answers)
Closed 7 years ago.
The main goal is to send xml files from one folder through REST service into Cassandra DB. What I'm trying to do is just read all the files inside certain folder, and create a Worker object with file path set into it.
while (RUNS > 0) {
ExecutorService executor = Executors.newFixedThreadPool(N_THREADS);
File dir = new File(PATH_TO_SAMPLES);
File[] listFiles = dir.listFiles();
if (listFiles != null) {
for (File file : listFiles) {
Worker worker = new Worker();
// Wait until all threads are finish
while (!executor.isTerminated()) {
After that executor gets a worker instance and goes to the next file in the directory.
RUNS initialized with value of total number of iteration, default is 100_000.
N_THREADS - total number of threads, set to 100 by default.
Worker class implements Runnable. Run method:
public void run() {
String url = getUrl();
String payload = "xml_file_representation";
MultiThreadedHttpConnectionManager connectionManager = new MultiThreadedHttpConnectionManager();
HttpClient client = new HttpClient();
HttpConnectionManagerParams httpConnectionManagerParams = new HttpConnectionManagerParams();
PostMethod postMethod = new PostMethod(url);
try {
postMethod.setRequestHeader("User-Agent", USER_AGENT);
postMethod.setRequestHeader("Content-Type", "application/xml");
postMethod.setRequestEntity(new StringRequestEntity(payload, "application/xml", StandardCharsets.UTF_8.toString()));
int statusCode = client.executeMethod(postMethod);
InputStream body = postMethod.getResponseBodyAsStream();
if (statusCode == HttpStatus.SC_OK) {
} catch (Exception e) {
LOG.error("POST: ERROR!");
} finally {
If I remove waiting, i.e.
at the end of run, when ~16_000 requests were sent I'll get an exception:
java.net.BindException: Address already in use
It's very similar to BindException: address already in use on a client socket?
Anyway, accepted answer didn't help me. I don't have more ideas what I need to do to close those "connections" in order to prevent that error.
Such workaround as Thread.sleep() doesn't look like good solution also. Thanks for any help or advice.
Creating a conn mgr for every worker defeats the purpose of the conn mgr. It is supposed to be shared among threads, judging by the linked other post from Berger.
It's like having none.
You have the OS keeping sockets lingering after close. Not much you can do other than reusing the sockets (connections) with the design proposed in that other question.
Unless you want to risk playing with Socket.setSOLinger() TCP option SO_LINGER (zero) - when it's required
The Thread.sleep(1000) is certainly not a viable option, because you could require more than 1 second another day... or the OS could change that amount of lingering sockets under other circumstances.
I'm building a java application that connects to a MQQueueManager and extracts information about queues. I'm able to get data like QueueType, MaximumMessageLength and more. However, I also want the name of the cluster the queue might be in. There is no function that comes with the MQQueue that gives me this information. After searching the internet I found several things pointing in this direction, but no examples.
A part of my function that gives me the MaximumDepth is:
queueManager = makeConnection(host, portNo, qMgr, channelName);
queue = queueManager.accessQueue(queueName, CMQC.MQOO_INQUIRE);
maxQueueDepth = queue.getMaximumDepth();
(makeConnection is not shown here, it is the function that makes the actual connection to the QueueManager; I also left out the try/catch/finally for less clutter)
How do I get ClusterName and perhaps other data, that doesn't have a function like queue.getMaximumDepth()?
There are two ways to get information about a queue.
The API Inquire call gets operational status of a queue. This includes things like the name the MQOpen call resolved to or the depth if the queue is local. Much of the q.inquire functionality has been superseded with getter and setter functions on the queue. If you are not using the v8.0 client with the latest functionality, you are highly advised to upgrade. It can access all versions of QMgr.
The following code is from Getting and setting attribute values in WebSphere MQ classes for Java
// inquire on a queue
final static int MQIA_DEF_PRIORITY = 6;
final static int MQCA_Q_DESC = 2013;
final static int MQ_Q_DESC_LENGTH = 64;
int[] selectors = new int[2];
int[] intAttrs = new int[1];
byte[] charAttrs = new byte[MQ_Q_DESC_LENGTH]
selectors[0] = MQIA_DEF_PRIORITY;
selectors[1] = MQCA_Q_DESC;
System.out.println("Default Priority = " + intAttrs[0]);
System.out.println("Description : " + new String(charAttrs,0));
For things that are not part of the API Inquire call, a PCF command is needed. Programmable Command Format, commonly abbreviated as PCF, is a message format used to pass messages to the command queue and for reading messages from the command queue, event queues and others.
To use a PCF command the calling application must be authorized with +put on SYSTEM.ADMIN.COMMAND.QUEUE and for +dsp on the object being inquired upon.
IBM provides sample code.
On Windows, please see: %MQ_FILE_PATH%\Tools\pcf\samples
In UNIX flavors, please see: /opt/mqm/samp/pcf/samples
The locations may vary depending on where MQ was installed.
Please see: Handling PCF messages with IBM MQ classes for Java. The following snippet is from the PCF_DisplayActiveLocalQueues.java sample program.
public static void DisplayActiveLocalQueues(PCF_CommonMethods pcfCM) throws PCFException,
MQDataException, IOException {
// Create the PCF message type for the inquire.
PCFMessage pcfCmd = new PCFMessage(MQConstants.MQCMD_INQUIRE_Q);
// Add the inquire rules.
// Queue name = wildcard.
pcfCmd.addParameter(MQConstants.MQCA_Q_NAME, "*");
// Queue type = LOCAL.
pcfCmd.addParameter(MQConstants.MQIA_Q_TYPE, MQConstants.MQQT_LOCAL);
// Queue depth filter = "WHERE depth > 0".
pcfCmd.addFilterParameter(MQConstants.MQIA_CURRENT_Q_DEPTH, MQConstants.MQCFOP_GREATER, 0);
// Execute the command. The returned object is an array of PCF messages.
PCFMessage[] pcfResponse = pcfCM.agent.send(pcfCmd);
// For each returned message, extract the message from the array and display the
// required information.
System.out.println("|Index| Queue Name |Depth|");
for (int index = 0; index < pcfResponse.length; index++) {
PCFMessage response = pcfResponse[index];
+ (index + pcfCM.padding).substring(0, 5)
+ "|"
+ (response.getParameterValue(MQConstants.MQCA_Q_NAME) + pcfCM.padding).substring(0, 48)
+ "|"
+ (response.getParameterValue(MQConstants.MQIA_CURRENT_Q_DEPTH) + pcfCM.padding)
.substring(0, 5) + "|");
After more research I finally found what I was looking for.
This example of IBM: Getting and setting attribute values in WebSphere MQ classes helped me to set up the inquiry.
The necessary values I found in this list: Constant Field Values.
I also needed to expand the openOptionsArg of accessQueue(), else cluster queues cannot be inquired.
Final result:
(without makeConnection())
public class QueueManagerServices {
MQQueueManager queueManager = null;
String cluster = null;
MQQueue queue = null;
public String getcluster(String host, int portNo, String qMgr, String channelName){
queueManager = makeConnection(host, portNo, qMgr, channelName);
queue = queueManager.accessQueue(queueName, MQOO_INQUIRE_TOTAL);
int[] selectors = new int[1];
int[] intAttrs = new int[1];
byte[] charAttrs = new byte[MQ_CLUSTER_NAME_LENGTH];
selectors[0] = MQCA_CLUSTER_NAME;
queue.inquire(selectors, intAttrs, charAttrs);
cluster = new String (charAttrs);
} catch (MQException e) {
} finally {
if (queue != null){
if (queueManager != null){
return cluster;
I've been testing out DynamoDB as a potential option for a scalable and steady throughput database for a site that will be hit pretty frequently and requires a very fast response time (< 50ms). I'm seeing pretty slow responses (both locally and on an EC2 instance) for the following code:
public static void main(String[] args) {
try {
AWSCredentials credentials = new PropertiesCredentials(new File("aws_credentials.properties"));
long start = System.currentTimeMillis();
AmazonDynamoDBClient client = new AmazonDynamoDBClient(credentials);
System.out.println((System.currentTimeMillis() - start) + " (ms) to connect");
DynamoDBMapper mapper = new DynamoDBMapper(client);
start = System.currentTimeMillis();
Model model = mapper.load(Model.class, "hashkey1", "rangekey1");
System.out.println((System.currentTimeMillis() - start) + " (ms) to load Model");
} catch (Exception e) {
The connection to the DB alone takes about 800 (ms) on average and the loading using the mapper takes an additional 200 (ms). According to Amazon's page about DynamoDB we should expect "Average service-side latencies...typically single-digit milliseconds." I wouldn't expect the full round-trip HTTP request to add that much overhead. Are these expected numbers even on an EC2 instance?
I think a better test would be to avoid the initial costs/latency incurred in starting up the JVM and loading the classes. Something like:
public class TestDynamoDBMain {
public static void main(String[] args) {
try {
AWSCredentials credentials = new PropertiesCredentials(new File("aws_credentials.properties"));
AmazonDynamoDBClient client = new AmazonDynamoDBClient(credentials);
DynamoDBMapper mapper = new DynamoDBMapper(client);
// Warm up
for (int i=0; i < 10; i++) {
testrun(mapper, false);
// Time it
for (int i=0; i < 10; i++) {
testrun(mapper, true);
} catch (Exception e) {
private static void testrun(DynamoDBMapper mapper, boolean timed) {
long start = System.nanoTime();
Model model = mapper.load(Model.class, "hashkey1", "rangekey1");
if (timed)
TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start)
+ " (ms) to load Model");
Furthermore, you may consider enabling the default metrics of the AWS SDK for Java to see the fine grain time allocation in Amazon CloudWatch. For more details, see:
Hope this helps.
Dynamo DB is located in a specific region (they dont yet support cross region replication). This is chosen by you when you create a table. Unless you are calling the APIs from the same region, it is bound to be slow.
It looks like you are trying to call Dynamo from your development desktop. You can re-do the same test from an EC2 instance started in the "same region". This will considerably speed up the responses. This is a more realistic test, since any way when you deploy your production system it will be in the same region as Dynamo.
Again, if you really need very quick response, consider using ElastiCache between your code and Dynamo. On every read, store on cache before returning the results. Next read should read from the cache (say for an expiry time of 10 mins). For "read-heavy" apps this is the suggested route. I have seen many fold better response using this approach.