This question already has answers here:
Best Practice to Use HttpClient in Multithreaded Environment
(5 answers)
Closed 7 years ago.
The main goal is to send xml files from one folder through REST service into Cassandra DB. What I'm trying to do is just read all the files inside certain folder, and create a Worker object with file path set into it.
while (RUNS > 0) {
ExecutorService executor = Executors.newFixedThreadPool(N_THREADS);
File dir = new File(PATH_TO_SAMPLES);
File[] listFiles = dir.listFiles();
if (listFiles != null) {
for (File file : listFiles) {
Worker worker = new Worker();
worker.setPath(file.toPath());
executor.submit(worker);
}
}
executor.shutdown();
// Wait until all threads are finish
while (!executor.isTerminated()) {
}
Thread.sleep(1000);
RUNS--;
}
After that executor gets a worker instance and goes to the next file in the directory.
RUNS initialized with value of total number of iteration, default is 100_000.
N_THREADS - total number of threads, set to 100 by default.
Worker class implements Runnable. Run method:
#Override
public void run() {
String url = getUrl();
String payload = "xml_file_representation";
MultiThreadedHttpConnectionManager connectionManager = new MultiThreadedHttpConnectionManager();
HttpClient client = new HttpClient();
HttpConnectionManagerParams httpConnectionManagerParams = new HttpConnectionManagerParams();
connectionManager.setParams(httpConnectionManagerParams);
client.setHttpConnectionManager(connectionManager);
PostMethod postMethod = new PostMethod(url);
try {
postMethod.setRequestHeader("User-Agent", USER_AGENT);
postMethod.setRequestHeader("Content-Type", "application/xml");
postMethod.setRequestEntity(new StringRequestEntity(payload, "application/xml", StandardCharsets.UTF_8.toString()));
int statusCode = client.executeMethod(postMethod);
InputStream body = postMethod.getResponseBodyAsStream();
if (statusCode == HttpStatus.SC_OK) {
//OK
}
} catch (Exception e) {
LOG.error("POST: ERROR!");
} finally {
postMethod.releaseConnection();
connectionManager.shutdown();
}
}
If I remove waiting, i.e.
Thread.sleep(1000);
at the end of run, when ~16_000 requests were sent I'll get an exception:
java.net.BindException: Address already in use
It's very similar to BindException: address already in use on a client socket?
Anyway, accepted answer didn't help me. I don't have more ideas what I need to do to close those "connections" in order to prevent that error.
Such workaround as Thread.sleep() doesn't look like good solution also. Thanks for any help or advice.
Creating a conn mgr for every worker defeats the purpose of the conn mgr. It is supposed to be shared among threads, judging by the linked other post from Berger.
It's like having none.
You have the OS keeping sockets lingering after close. Not much you can do other than reusing the sockets (connections) with the design proposed in that other question.
Unless you want to risk playing with Socket.setSOLinger() TCP option SO_LINGER (zero) - when it's required
The Thread.sleep(1000) is certainly not a viable option, because you could require more than 1 second another day... or the OS could change that amount of lingering sockets under other circumstances.
Related
A process I've been working on for a little while now. Process was running fine until the performance was taking a hit. I figured out a way to get it to perform very fast, but I'm really unsure what is happening behind the scenes. And it's now throwing warnings and errors and I'm not sure what to do. File is getting porocessed but I'm not sure if all threads are complete, and I don't believe I am shutting down the app correctly. Here is everything you need to know...
File is read using a buffered reader, we then run some data quality checks on each record, every record that is read and passes data quality checks we create a java object out of it and insert into a List. Once the List is 1000 objects big, we then call an OracleService class which has a Repo autowired and we execute a saveAll method with the List. We then continue to read the file and do this until the file is done being read. I am passing in, to the service, and ExecutorService object. So every time we call that service it is getting a new List object containing my objects (this object is basically the table we are loading) and a new ExecutorService Object. Process is running fine but getting a ton of exceptions being thrown once I try to shutdown. Here is all my code...
My Controller class run method. This will get called from another class which implements CommandLineRunner
public void run() throws ParseException, IOException, InterruptedException {
logger.info("******************** Aegis Check Inclearing DDA Trial Balance Table Load starting ********************");
try (BufferedReader reader = new BufferedReader(new FileReader(inputFile))) {
String line = reader.readLine();
int count = 0;
TrialBalanceBuilder builder = new TrialBalanceBuilder();
while (line != null) {
if (line.startsWith("D")) {
if (dataQuality(line)) {
TrialBalance trialBalance = builder.buildTrialBalanceObject(line, procDt, time);
insertList.add(trialBalance);
count++;
if (count == 1000) {
oracleService.loadToTableTrialBalance(insertList, executorService);
count = 0;
insertList.clear();
}
} else {
logger.info("Data quality check FAILED for record: " + line);
oracleService.revertInserts("DDA_TRIAL_BAL_STG",procDt.toString());
System.exit(111);
}
}
line = reader.readLine();
}
logger.info("Leftover record count is " + insertList.size());
oracleService.loadToTableTrialBalance(insertList, executorService);
} catch (IOException e) {
e.printStackTrace();
}
logger.info("Updating Metadata table with new batch proc date");
InclearingBatchMetadataBuilder inclearingBatchMetadataBuilder = new InclearingBatchMetadataBuilder();
InclearingBatchMetadata inclearingBatchMetadata = inclearingBatchMetadataBuilder.buildInclearingBatchMetadataObject("DDA_TRIAL_BAL_STG", procDt, time, Constants.bankID);
oracleService.insertBatchProcDtIntoMetaTable(inclearingBatchMetadata);
logger.info("Successfully updated Metadata table with new batch proc date: " + procDt);
Thread.sleep(10000);
oracleService.cleanUpGOS("DDA_TRIAL_BAL_STG",1);
executorService.shutdownNow();
logger.info("******************** Aegis Check Inclearing DDA Trial Balance Table Load ended successfully ********************");
}
I'm passing in an ExecutorService object to the service class. This is defined as...
private final ThreadFactory threadFactory = new ThreadFactoryBuilder().setNameFormat("Orders-%d").setDaemon(true).build();
private ExecutorService executorService = Executors.newFixedThreadPool(10, threadFactory);
My service class looks as such....
#Service("oracleService")
public class OracleService {
private static final Logger logger = LoggerFactory.getLogger(OracleService.class);
#Autowired
TrialBalanceRepo trialBalanceRepo;
#Transactional
public void loadToTableTrialBalance(List<TrialBalance> trialBalanceList, ExecutorService executorService) {
logger.debug("Let's load to the database");
logger.debug(trialBalanceList.toString());
List<TrialBalance> multiThreadList = new ArrayList<>(trialBalanceList);
try {
executorService.execute(() -> trialBalanceRepo.saveAll(multiThreadList));
} catch (ConcurrentModificationException | DataIntegrityViolationException ignored) {}
logger.debug("Successfully loaded to database");
}
In my run method i then call a few more methods in that Service class which create nativequeries and execute on the database (for purging etc.)
Anyway, I never know when the threads are complete. And I am finding in pre-production, when running with a lot of data, we shut down the app and not all the data is completely loaded. Also I don't know if this is even the best design. Do I keep passing in these executorservice objects? The whole point of this was to get optimal parallelism going so that our performance was better. Perhaps there is a better way (preferably without redesigning the entire app and using something other than JPA)
I am getting 20k small xml files 1kb to 3kb size in a minute.
I have to write all the files as it arrives in the directory.
Sometimes the speed of the incoming files increases to 100k per minute.
Is there anything in java or aws api that can help me match the incoming speed?
I am using uploadFileList() API to upload all the files .
I have tried watch event as well so that when ever files arrives in a folder it will upload that file into S3 but that is so slow compared to incoming files and creates huge amount of backlogs.
I have tried multi threading also but if i spin up more thread i get error from S3 reduce you request rate error.
and some times i get below error also
AmazonServiceException:
com.amazonaws.services.s3.model.AmazonS3Exception: Your socket
connection to the server was not read from or written to within the
timeout period. Idle connections will be closed.
but when i dot use threading i do not get this error
Another way I also have tried is to create one big files and then upload into S3 and then in S3 i again split it into small files which is fine but this solution delays the files upload int S3 and impacts the user who access this file from S3.
I know uploading small files into S3 is not suitable but i have use case like that.
The speed i noticed is 5k files upload in a minutes.
Can someone please suggest some alternate way so that my speed of uploading files will increase least 15k per minutes.
I am sharing my full code where i am trying to upload using multi threaded application
Class one where i create File to put into thread
public class FileProcessThreads {
public ArrayList process(String fileLocation) {
File dir = new File(fileLocation);
File[] directoryListing = dir.listFiles();
ArrayList<File> files = new ArrayList<File>();
if (directoryListing.length > 0) {
for (File path : directoryListing) {
files.add(path);
}
}
return files;
}
}
Class 2 where i create Thread pool and Executor
public class UploadExecutor {
private static String fileLocation = "C:\\Users\\u6034690\\Desktop\\ONEFILE";
// private static String fileLocation="D:\\TRFAudits_Moved\\";
private static final String _logFileName = "s3FileUploader.log";
private static Logger _logger = Logger.getLogger(UploadExecutor.class);
#SuppressWarnings("unchecked")
public static void main(String[] args) {
_logger.info("----------Stating application's main method----------------- ");
AWSCredentials credential = new ProfileCredentialsProvider("TRFAuditability-Prod-ServiceUser").getCredentials();
final ClientConfiguration config = new ClientConfiguration();
AmazonS3Client s3Client = (AmazonS3Client) AmazonS3ClientBuilder.standard().withRegion("us-east-1")
.withCredentials(new AWSStaticCredentialsProvider(credential)).withForceGlobalBucketAccessEnabled(true)
.build();
s3Client.getClientConfiguration().setMaxConnections(100);
TransferManager tm = new TransferManager(s3Client);
while (true) {
FileProcessThreads fp = new FileProcessThreads();
List<File> records = fp.process(fileLocation);
while (records.size() <= 0) {
try {
_logger.info("No records found willl wait for 10 Seconds");
TimeUnit.SECONDS.sleep(10);
records = fp.process(fileLocation);
} catch (InterruptedException e) {
_logger.error("InterruptedException: " + e.toString());
}
}
_logger.info("Total no of Audit files = " + records.size());
ExecutorService es = Executors.newFixedThreadPool(2);
int recordsInEachThread = (int) (records.size() / 2);
_logger.info("No of records in each thread = " + recordsInEachThread);
UploadObject my1 = new UploadObject(records.subList(0, recordsInEachThread), tm);
UploadObject my2 = new UploadObject(records.subList(recordsInEachThread, records.size()), tm);
es.execute(my1);
es.execute(my2);
es.shutdown();
try {
boolean finshed = es.awaitTermination(1, TimeUnit.MINUTES);
if (!finshed) {
Thread.sleep(1000);
}
} catch (InterruptedException e) {
_logger.error("InterruptedException: " + e.toString());
}
}
}
}
Last class where i upload files into S3
public class UploadObject implements Runnable{
static String bucketName = "a205381-auditxml/S3UPLOADER";
private String fileLocation="C:\\Users\\u6034690\\Desktop\\ONEFILE";
//private String fileLocation="D:\\TRFAudits\\";
//static String bucketName = "a205381-auditxml/S3UPLOADER";
private static Logger _logger;
List<File> records;
TransferManager tm;
UploadObject(List<File> list,TransferManager tm){
this.records = list;
this.tm=tm;
_logger = Logger.getLogger(UploadObject.class);
}
public void run(){
System.out.println(Thread.currentThread().getName() + " : ");
uploadToToS3();
}
public void uploadToToS3() {
_logger.info("Number of record to be processed in current thread: : "+records.size());
MultipleFileUpload xfer = tm.uploadFileList(bucketName, "TEST",new File(fileLocation), records);
try {
xfer.waitForCompletion();
TransferState xfer_state = xfer.getState();
_logger.info("Upload status -----------------" + xfer_state);
for (File file : records) {
try {
Files.delete(FileSystems.getDefault().getPath(file.getAbsolutePath()));
} catch (IOException e) {
System.exit(1);
_logger.error("IOException: "+e.toString());
}
}
_logger.info("Successfully completed file cleanse");
} catch (AmazonServiceException e) {
_logger.error("AmazonServiceException: "+e.toString());
System.exit(1);
} catch (AmazonClientException e) {
_logger.error("AmazonClientException: "+e.toString());
System.exit(1);
} catch (InterruptedException e) {
_logger.error("InterruptedException: "+e.toString());
System.exit(1);
}
System.out.println("Completed");
_logger.info("Upload completed");
_logger.info("Calling Transfer manager shutdown");
//tm.shutdownNow();
}
}
It sounds like you're tripping the built-in protections for S3 (quoted docs below). I've also listed some similar questions below; some of these advise rearchitecting using SQS to even out and distribute the load on S3.
Aside from introducing more moving pieces, you can reuse your S3Client and TransferManager. Move them up out of your runnable object and pass them into its constructor. TransferManager itself uses multithreading according to the javadoc.
When possible, TransferManager attempts to use multiple threads to upload multiple parts of a single upload at once. When dealing with large content sizes and high bandwidth, this can have a significant increase on throughput.
You can also increase the max number of simultaneous connections that the S3Client uses.
Maybe:
s3Client.getClientConfiguration().setMaxConnections(75) or even higher.
DEFAULT_MAX_CONNECTIONS is set to 50.
Lastly, you could try to upload to different prefixes/folders under the bucket, as noted below to allow scaling for high request rates.
The current AWS Request Rate and Performance Guidelines
Amazon S3 automatically scales to high request rates. For example, your application can achieve at least 3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket. It is simple to increase your read or write performance exponentially. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second.
The current AWS S3 Error Best Practices
Tune Application for Repeated SlowDown errors
As with any distributed system, S3 has protection mechanisms which detect intentional or unintentional resource over-consumption and react accordingly. SlowDown errors can occur when a high request rate triggers one of these mechanisms. Reducing your request rate will decrease or eliminate errors of this type. Generally speaking, most users will not experience these errors regularly; however, if you would like more information or are experiencing high or unexpected SlowDown errors, please post to our Amazon S3 developer forum https://forums.aws.amazon.com/ or sign up for AWS Premium Support https://aws.amazon.com/premiumsupport/.
Similar questions:
S3 SlowDown: Please reduce your request rate exception
Amazon Web Services S3 Request Limit
AWS Forums - Maximizing Connection Reuse for S3 getObjectMetadata() Calls
S3 Transfer Acceleration does not necessarily give faster upload speeds. It is sometime slower than normal upload when using from same region. Amazon S3 Transfer Acceleration uses the AWS edge infrastructure they have around the world to get data on to the aws backbone quicker. When you use Amazon S3 Transfer Acceleration your request is routed to the best AWS edge location based on latency. Transfer Acceleration will then send your uploads back to S3 over the AWS-managed backbone network using optimized network protocols, persistent connections from edge to origin, fully-open send and receive windows, and so forth. As you would already be within the region you wouldn't see any benefit to using this. But, its better to test the speed from https://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html
Priority & Dependency:
Here I made I simple test. But the result seems not so good.
I tried to make 100 request in a for loop in the same connection(the request url is the same, I am wondering whether this part influence the results).
If the index is i, then my request stream_id is i while the dependent stream_id is 100+i. If our assumption is right, the request can never get response because there is no stream_id from 101 to 200.
But the results shows there is no difference for setting the dependency and not. I got the response data frame one by one without timeout or waiting.
And also some other related test, the start point is to let the stream which depends on other stream to be sent first and the stream dependent later. But the result is same.
I am still thinking the reason of the results. Can anyone help me? Many thanks.
Code here:
public void run() throws Exception
{
host = "google.com";
port = 443;
//client init
HTTP2Client client = new HTTP2Client();
SslContextFactory sslContextFactory = new SslContextFactory(true);
client.addBean(sslContextFactory);
client.start();
//connect init
FuturePromise<Session> sessionPromise = new FuturePromise<>();
client.connect(sslContextFactory, new InetSocketAddress(host, port), new ServerSessionListener.Adapter(), sessionPromise);
Session session = sessionPromise.get(10, TimeUnit.SECONDS);
//headers init
HttpFields requestFields = new HttpFields();
requestFields.put("User-Agent", client.getClass().getName() + "/" + Jetty.VERSION);
final Phaser phaser = new Phaser(2);
//multiple request in one connection
for(int i=0;i<100;i++)
{
MetaData.Request metaData = new MetaData.Request("GET", new HttpURI("https://" + host + ":" + port + "/"), HttpVersion.HTTP_2, requestFields);
PriorityFrame testPriorityFrame = new PriorityFrame(i, 100+i, 4, true);
HeadersFrame headersFrame = new HeadersFrame(0, metaData, testPriorityFrame, true);
//listen header/data/push frame
session.newStream(headersFrame, new Promise.Adapter<Stream>(), new Stream.Listener.Adapter()
{
#Override
public void onHeaders(Stream stream, HeadersFrame frame)
{
System.err.println(frame+"headId:"+frame.getStreamId());
if (frame.isEndStream())
phaser.arrive();
}
#Override
public void onData(Stream stream, DataFrame frame, Callback callback)
{
System.err.println(frame +"streamid:"+ frame.getStreamId());
callback.succeeded();
if (frame.isEndStream())
phaser.arrive();
}
#Override
public Stream.Listener onPush(Stream stream, PushPromiseFrame frame)
{
System.err.println(frame+"pushid:"+frame.getStreamId());
phaser.register();
return this;
}
});
}
phaser.awaitAdvanceInterruptibly(phaser.arrive(), 5, TimeUnit.SECONDS);
client.stop();
}
The Jetty project did not implement (yet) HTTP/2 request prioritization.
We are discussing whether this is any useful for a server, whose concern is to write back the responses as quick as it can.
Having one client changing its mind on the priority of the requests, or making a request knowing that in reality it first wanted another request served, it's a lot of work for the server that in the meantime has to serve the other 10,000 clients connected to it.
By the time we the server has recomputed the priority tree for the dependent requests, it could have probably have served the requests already.
By the time the client realizes that it has to change the priority of a request, the whole response for it could already be in flight.
Having said that, we are certainly interested in real world use cases where request prioritization performed by the server yields a real performance improvement. We just have not seen it yet.
I would love to hear why you are interested in request prioritization and how you are leveraging it. Your answer could be a drive for the Jetty project to implement HTTP/2 priorities.
I know Apache Curator can do the distributed lock feature which is build on the top of zookeeper. It looks like very easy to use based on the document which is posted in the Apache Curator official website. For example:
RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3);
CuratorFramework client = CuratorFrameworkFactory.newClient("host:ip",retryPolicy);
client.start();
InterProcessSemaphoreMutex lock = new InterProcessSemaphoreMutex(client, path);
if(lock.acquire(10, TimeUnit.SECONDS))
{
try { /*do something*/ }
finally { lock.release(); }
}
But what does the second parameter "path" of "InterProcessSemaphoreMutex" mean? It means "the path for the lock" based on API but what exactly is it? Can anybody give me an example?
If I have millions of locks, do I have to create millions of "path to the lock"? Is there any limit that the maximum number of locks(znodes) a zookeeper cluster has? Or can we remove this lock when a process releases it?
ZooKeeper presents what looks like a distributed file system. For any ZooKeeper operation, recipe, etc., you write "znodes" to a particular path and watch for changes. See here: http://zookeeper.apache.org/doc/trunk/zookeeperOver.html#Simple+API (regarding znodes).
For Curator recipes, it needs to know the base path you want to use to perform the recipe. For InterProcessSemaphoreMutex, the path is what every participant should use. i.e. Process 1 and Process 2 want to both contend for the lock. So, they both allocate InterProcessSemaphoreMutex instances with the same path, say "/my/lock". Think of the path as the lock identifier. In the same ZooKeeper cluster, you could have multiple locks by using different paths.
Hope this helps (disclaimer: I'm the main author of Curator).
Some examples about Reaper.
#Test
public void testSomeNodes() throws Exception
{
Timing timing = new Timing();
ChildReaper reaper = null;
CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1));
try
{
client.start();
Random r = new Random();
int nonEmptyNodes = 0;
for ( int i = 0; i < 10; ++i )
{
client.create().creatingParentsIfNeeded().forPath("/test/" + Integer.toString(i));
if ( r.nextBoolean() )
{
client.create().forPath("/test/" + Integer.toString(i) + "/foo");
++nonEmptyNodes;
}
}
reaper = new ChildReaper(client, "/test", Reaper.Mode.REAP_UNTIL_DELETE, 1);
reaper.start();
timing.forWaiting().sleepABit();
Stat stat = client.checkExists().forPath("/test");
Assert.assertEquals(stat.getNumChildren(), nonEmptyNodes);
}
finally
{
CloseableUtils.closeQuietly(reaper);
CloseableUtils.closeQuietly(client);
}
}
Java Code Examples for org.apache.curator.framework.recipes.locks.Reaper
So I have a netty-based websockets client that I am using for performance tests. My idea is that I can use it to simulate 100, 1000, etc simultaneous connections.
I've determined that my current approach to this is not working--the test harness is simply not creating enough websocket connections, althogh it bumps along happily, thinks it's still connected, etc. But my server simply does not show the correct number of connections when I use this test harness. I think most likely this is occurring because I am using various objects in the netty library across multiple threads at once and they don't handle that very well. ClientBootstrap, for example.
This is what I am doing per-thread. Can you tell me where I am going wrong, so that I can fix my test harness?
public void run(){
try{
// client bootstrap. There is one of these per thread. is that part of the problem?
ClientBootstrap bootstrap = new ClientBootstrap(new NIOClientSocketChannelFactory(Executors.newCachedThreadPool(), Executors.newCachedThreadPool())));
Channel ch = null;
try{
// set up ssl engine
final SSLEngine engine = createServerContext().createSSLEngine();
engine.setUseClientMode(true);
// there is a new handhsaker per thread, too. They all go to the same uri
final WebSocketClientHandshaker handshaker = new WebSocketClientHandhsakerFactory().newHandshaker(uri, WebSocketVersion.V08, null, false, null);
// set up the pipeline factory and pipeline
bootstrap.setPipelineFactory(new ChannelPipelieFactory(){
#Override
public Channelpipeline getPipeline() throws Exception(){
ChannelPipeline pipeline = Channels.pipeline();
pipeline.addLast("ssl", new SslHandler(engine));
pipeline.addLast("encoder", new HttpRequestEncoder();
pipeline.addLast("decoder", new HttpResponseDecoder();
// WebSocketClientHandler code not included, it's just a custom handler that sends requests via websockets
pipeline.addLast("ws-handler", new WebSocketClientHandler(handshaker);
return pipleline;
}
});
// connect websockets preflight over http
ChannelFuture future = bootstrap.connect(new InetSocketAddress(uri.getHost(), uri.getPort());
future.sync();
// do websockets handshake
ch = future.getChannel();
ChannelFuture handshakeFuture = handshaker.handshake(ch);
handshakeFuture.syncUninterruptably();
Thread.sleep(1000); // i had to add this. Sync should have meant that the above method didn't return until it was complete... but that was a lie. So I sleep for 1 second to solve that problem.
if(!handshakeDuture.isSuccess())
System.out.println("WHOAH errror");
// send message to server
ch.write(new TextWebSocketFrame("Foo"));
// wait for notifications to close
while(!getShutdownNow().get()) // shutdownNow is an atomicBoolean which is set to true when all my threads have been started up and a certain amount of time has passed
Thread.sleep(2000);
// send close; wait for response
ch.write(new CloseWebSocketFrame());
ch.getCloseFuture().awaitUninterruptibly();
}
}
}
}