Apache HttpClient 4.2.1
I have a Java process that downloads multiple files from one CouchDB A and uploads the same to another CouchDB B. The downloads and uploads are done using 40 worker threads using an ExecutorService.
When the file count is high (~25k) The process throws a SocketException of "too many open files" due to the Linux open file descriptor limit. I figured this might be due to a resource leak and proceeded to analyze the code which is as follows:
Initiate worker threads
private boolean copyDocumentsFromAtoB(Array ids) {
ExecutorService downloader = Executors.newFixedThreadPool(50);
ExecutorService uploader = Executors.newFixedThreadPool(50);
for (String id: ids) {
downloader.execute(new DownloaderThread(id, downloadedDocs));
}
for (JsonElement doc: downloadedDocs) {
uploader.execute(new UploaderThread(doc));
}
}
Runnable Downloader class (UploaderThread class has a similar implementation)
private static final class DocumentDownloader implements Runnable {
private final String documentId;
private final JsonArray downloadedDocs;
DocumentDownloader(String documentId, JsonArray downloadedDocs) {
this.documentId = documentId;
this.downloadedDocs = downloadedDocs;
}
#Override
public void run() {
InputStream docStream = null;
String url = buildUrl(documentId);
HttpGet doc = new HttpGet(url);
try {
//doc.setHeaders()
HttpClient httpClient = new DefaultHttpClient();
HttpResponse docResponse = httpClient.execute(doc);
docStream = docResponse.getEntity().getContent();
//document parsing and adding to downloadedDocs. Not important
} catch (Exception e) {
//handle exceptions
} finally {
if (docStream != null) {
try {
docStream.close();
} catch (IOException e) {
LOGGER.debug("Cannot close input stream", e);
}
}
}
}
}
Findings:
HttpClient is local to each thread. This is sub-optimal according to many resources and posts I found online.
If I did, httpClient.getConnectionManager().shutdown(); at the end of the Runnable class, the open file count decreased.
Using a global HttpClient (instantiated only once in the copyDocumentsFromAtoB method and passed down to the Runnable classes through the contructors) with a PoolingClientConnectionManager instance and a local context object, decreased the open file count further.
Questions:
Why did the original implementation have a increased number of open file descriptors during the copying process? How does multiple HttpClient instances contribute towards this?
Doesn't the docStream.close(); in the finally block close whatever Http connection that is created between the process and the database?
Does the HttpClient instance get destroyed (releasing any open resources in the process) when the thread that it belongs to terminates? (This would explain why the open file count decreases after the copying process terminates prematurely)
Are there any other optimizations I can do (in terms of resource leaks) other than using a single global HttpClient?
Are there tools that I could use to obtain quantifiable statistics of this scenario for different implementations?
What steps can I follow to find the optimum number of worker
threads, setMaxTotal and setDefaultMaxPerRoute (for the
PoolingClientConnectionManager)?
Related
I'm having trouble rewriting java test cases in robot framework.
in order to do this, i need to create new java keywords, but the way tests are implemented, don't make it easy !
this is an example of script that i need to rewrite in RF :
try
{
ServerSocket server = Utils.startSocketServer;
while(true)
{
Socket socket = server.accept();
ObjectInputStream ois = new ObjectInputStream(socket.getInputStream());
RequestX request = (RequestX) ois.readObject();
if(request.getSource().equals(String.INFO)
{
/** do something **/
}
else if(request.getSource().equals(String.X)
{
/** do something **/
}
else
{
/** do something **/
}
/** break on condition **/
}
Utils.closeSocketServer(server);
}catch(Exception e)
{
/** do something **/
}
Any suggestion on how i can make this into a RF test case !
Make the whole script into a single keyword is not an option because somewhere in that loop, in the do something comment, i also need to call keywords.
The main idea is to fragment this script into functions so that i can use them as java keywords in RF but i still can't figure this out!
So, i did further researches and this is what i came up with :
Split this code into functions so that i can call and use them as keywords in robot framework.
So code became like this :
public static String SendTask(String taskFile)
{
ServerSocket server = null;
try
{
server = startSocketServer();
if (taskFile != null)
{
Utils.sendJMSWakeUp();
while(true)
{
Socket socket = server.accept();
ObjectInputStream ois = getInputStream(socket);
RequestX request = (cast)ois.readObject();
if (getSource(request,Strings.INFO)
{
/** log info **/
}
/** if the current jms queue is Scheduler then send task !*/
else if (getSource(request,Strings.SCHEDULER))
{
/** send task **/
break;
}
}
}
else
{
assertion(false, "Illegal Argument Value null");
}
}catch (Exception e)
{
/** log errors **/
}finally
{
/** close socket server & return a task id **/
}
}
the same goes for every JMS queue that I am listening to
public static String getTaskAck(String taskId);
public static String getTaskresult(String taskId);
it did work in my case for synchronous task execution. But this is very incovenient for asynchronous task execution. Because each time i'll have to wait for response on keyword, so the next keyword may fail because the response that he is supposed to read was already sent !
i could look into process BuiltIn library or RobotFramework-Async library for parallel keyword execution but it will be harder to process for many asynchronous jms messages.
After further investigation, i think i will look into robotframework-jmsLibrary. some developpment enhancement has to be done like adding activeMq.
This way, i can send and consume many asynchronous messages via activeMq then process every message via robotframework-jmsLibrary
Example :
RF-jmsLibrary <==> synchronous <==> activeMq <==> asynchronous <==> system
i have written a large scale http server using , but im getting this error when number of concurrent requests increases
WARNING: Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 8458 ms, time limit is 1000
io.vertx.core.VertxException: Thread blocked
here is my full code :
public class MyVertxServer {
public Vertx vertx = Vertx.vertx(new VertxOptions().setWorkerPoolSize(100));
private HttpServer server = vertx.createHttpServer();
private Router router = Router.router(vertx);
public void bind(int port){
server.requestHandler(router::accept).listen(port);
}
public void createContext(String path,MyHttpHandler handler){
if(!path.endsWith("/")){
path += "/";
}
path+="*";
router.route(path).handler(new Handler<RoutingContext>() {
#Override
public void handle(RoutingContext ctx) {
String[] handlerID = ctx.request().uri().split(ctx.currentRoute().getPath());
String suffix = handlerID.length > 1 ? handlerID[1] : null;
handler.Handle(ctx, new VertxUtils(), suffix);
}
});
}
}
and how i call it :
ver.createContext("/getRegisterManager",new ProfilesManager.RegisterHandler());
ver.createContext("/getLoginManager", new ProfilesManager.LoginHandler());
ver.createContext("/getMapcomCreator",new ItemsManager.MapcomCreator());
ver.createContext("/getImagesManager", new ItemsManager.ImagesHandler());
ver.bind(PORT);
how ever i dont find eventbus() useful for http servers that process send/receive files , because u need to send the RoutingContext in the message with is not possible.
could you please point me to the right direction? thanks
added a little bit of handler's code:
class ProfileGetter implements MyHttpHandler{
#Override
public void Handle(RoutingContext ctx, VertxUtils utils, String suffix) {
String username = utils.Decode(ctx.request().headers().get("username"));
String lang = utils.Decode(ctx.request().headers().get("lang"));
display("profile requested : "+username);
Profile profile = ProfileManager.FindProfile(username,lang);
if(profile == null){
ctx.request().response().putHeader("available","false");
utils.sendResponseAndEnd(ctx.response(),400);
return;
}else{
ctx.request().response().putHeader("available","true");
utils.writeStringAndEnd(ctx, new Gson().toJson(profile));
}
}
}
here ProfileManager.FindProfile(username,lang) does a long running database job on the same thread
...
basically all of my processes are happening on the main thread , because if i use executor i will get strange exceptions and nullpointers in Vertx , making me feel like the request proccessors in Vertx are parallel
Given the small amount of code in the question lets agree that the problem is on the line:
Profile profile = ProfileManager.FindProfile(username,lang);
Assuming that this is internally doing some blocking JDBC call which is a anti-pattern in Vert.x you can solve this in several ways.
Say that you can totally refactor the ProfileManager class which IMO is the best then you can update it to be reactive, so your code would be like:
ProfileManager.FindProfile(username,lang, res -> {
if (res.failed()) {
// handle error, sent 500 back, etc...
} else {
Profile profile = res.result();
if(profile == null){
ctx.request().response().putHeader("available","false");
utils.sendResponseAndEnd(ctx.response(),400);
return;
}else{
ctx.request().response().putHeader("available","true");
utils.writeStringAndEnd(ctx, new Gson().toJson(profile));
}
}
});
Now what would be hapening behind the scenes is that your JDBC call would not block (which is tricky because JDBC is blocking by nature). So to fix this and you're lucky enough to use MySQL or Postgres then you could code your JDBC against the async-client if you're stuck with other RDBMS servers then you need to use the jdbc-client which in turn will use a thread pool to offload the work from the event loop thread.
Now say that you cannot change the ProfileManager code then you can still off load it to the thread pool by wrapping the code in a executeBlocking block:
vertx.executeBlocking(future -> {
Profile profile = ProfileManager.FindProfile(username,lang);
future.complete(profile);
}, false, res -> {
if (res.failed()) {
// handle error, sent 500 back, etc...
} else {
Profile profile = res.result();
if(profile == null){
ctx.request().response().putHeader("available","false");
utils.sendResponseAndEnd(ctx.response(),400);
return;
}else{
ctx.request().response().putHeader("available","true");
utils.writeStringAndEnd(ctx, new Gson().toJson(profile));
}
}
});
This example is based on an example from the book Restlet in Action.
If I try
public class StreamResource extends ServerResource
{
#Get
public Representation getStream() throws ResourceException, IOException
{
Representation representation = new WriterRepresentation(MediaType.TEXT_PLAIN)
{
#Override
public void write(Writer writer) throws IOException
{
String json = "{\"foo\" : \"bar\"}";
while (true)
{
writer.write(json);
}
}
};
return representation;
}
}
it works and it continuously sends the json string to the client.
If I introduce a delay in the while loop like this
String json = "{\"foo\" : \"bar\"}\r\n";
while (true)
{
writer.write(json);
try
{
Thread.sleep(250);
}
catch (InterruptedException e)
{}
}
I was hoping that the client would get data 4 times in a second BUT nothing seems to get to the client.
Can anyone explain why the introduction of Thread.sleep() does that? What is a good way to introduce delay in streaming data to the client?
You should try with the Jetty connector instead of the internal Restlet connector. This connector isn't ready for production even though we are working on fixing it.
You can also try the Simple extension which has less dependent JARs than the Jetty extension.
You can try to flush the buffer, like this:
String json = "{\"foo\" : \"bar\"}\r\n";
while (true)
{
writer.write(json);
writer.flush(); // flush the buffer.
try
{
Thread.sleep(250);
}
catch (InterruptedException e)
{}
}
Without writer.flush(), the writer waits to fill the internal buffer before writing the socket. Thread.sleep(250) reduces the output produced at each second, so that far more time is required to fill the buffer.
I am taking some data from a database via a servlet and a db handler java class and hosting it at a url. Since the database is changing I'm taking care only to host the changes rather than the entire db data.
I'm getting the required functionality by a browser i.e after every (manual) reload, I'm getting the data as required by me,
1. at the first page load, entire data gets displayed.
2. at subsequent reloads, I get either null data if there is no change in the database, or the appended rows if the database extends. (the database can only extend).
But then in a java program, I'm not getting the same functionality. The java program using HttpUrlConnection.
This is the code for the java client for servlet...
public class HTTPClient implements Runnable {
private CallbackInterface callbackinterface;
private URL url;
private HttpURLConnection http;
private InputStream response;
private String previousMessage = "";
public HTTPClient() {
try {
url = new URL("http://localhost:8080/RESTful-Server/index.jsp");
http = (HttpURLConnection) url.openConnection();
http.connect();
} catch (IOException e) {
}
}
#Override
public void run() {
while (true) {
try {
String currentmessage = "";
response = http.getInputStream();
if (http.getResponseCode() == HttpURLConnection.HTTP_OK) {
BufferedReader buffread = new BufferedReader(new InputStreamReader(response));
String line;
for (; (line = buffread.readLine()) != null;) {
currentmessage += line;
}
if ((!currentmessage.equals(previousMessage)
|| !previousMessage.equals(""))
&& !currentmessage.equals("")) {
//this.callbackinterface.event(currentmessage);\
System.out.println(currentmessage + "\t" + previousMessage);
}
previousMessage = currentmessage;
Thread.sleep(2500);
} else {
throw new IOException();
}
} catch (IOException | InterruptedException e) {
System.err.println("Exception" + e);
}
}
}
The shown class is a thread which read the connections every 2.5 s. If it gets something significant in the getline(), it will issue a callback to a worker method, which takes care of remaining things.
I am thinking the issues is because of the class variable conn, and that reload as in the browser is not getting replicated..
Any idea how to do this?
You're basically connecting (requesting) only once and trying to read the response multiple times, while it can be read only once. You basically need to create a new connection (request) everytime. You need to move the creation of the connection by url.openConnection() to inside the loop. The line http.connect() is by the way superfluous. You can safely omit it. The http.getInputStream() will already implicitly do it.
See also:
Using java.net.URLConnection to fire and handle HTTP requests
I'm writing a play 2.0 java application that allows users to upload files. Those files are stored on a third-party service I access using a Java library, the method I use in this API has the following signature:
void store(InputStream stream, String path, String contentType)
I've managed to make uploads working using the following simple controller:
public static Result uploadFile(String path) {
MultipartFormData body = request().body().asMultipartFormData();
FilePart filePart = body.getFile("files[]");
InputStream is = new FileInputStream(filePart.getFile())
myApi.store(is,path,filePart.getContentType());
return ok();
}
My concern is that this solution is not efficient because by default the play framework stores all the data uploaded by the client in a temporary file on the server then calls my uploadFile() method in the controller.
In a traditional servlet application I would have written a servlet behaving this way:
myApi.store(request.getInputStream(), ...)
I have been searching everywhere and didn't find any solution. The closest example I found is Why makes calling error or done in a BodyParser's Iteratee the request hang in Play Framework 2.0? but I didn't found how to modify it to fit my needs.
Is there a way in play2 to achieve this behavior, i.e. having the data uploaded by the client to go "through" the web-application directly to another system ?
Thanks.
I've been able to stream data to my third-party API using the following Scala controller code:
def uploadFile() =
Action( parse.multipartFormData(myPartHandler) )
{
request => Ok("Done")
}
def myPartHandler: BodyParsers.parse.Multipart.PartHandler[MultipartFormData.FilePart[Result]] = {
parse.Multipart.handleFilePart {
case parse.Multipart.FileInfo(partName, filename, contentType) =>
//Still dirty: the path of the file is in the partName...
String path = partName;
//Set up the PipedOutputStream here, give the input stream to a worker thread
val pos:PipedOutputStream = new PipedOutputStream();
val pis:PipedInputStream = new PipedInputStream(pos);
val worker:UploadFileWorker = new UploadFileWorker(path,pis);
worker.contentType = contentType.get;
worker.start();
//Read content to the POS
Iteratee.fold[Array[Byte], PipedOutputStream](pos) { (os, data) =>
os.write(data)
os
}.mapDone { os =>
os.close()
Ok("upload done")
}
}
}
The UploadFileWorker is a really simple Java class that contains the call to the thrid-party API.
public class UploadFileWorker extends Thread {
String path;
PipedInputStream pis;
public String contentType = "";
public UploadFileWorker(String path, PipedInputStream pis) {
super();
this.path = path;
this.pis = pis;
}
public void run() {
try {
myApi.store(pis, path, contentType);
pis.close();
} catch (Exception ex) {
ex.printStackTrace();
try {pis.close();} catch (Exception ex2) {}
}
}
}
It's not completely perfect because I would have preferred to recover the path as a parameter to the Action but I haven't been able to do so. I thus have added a piece of javascript that updates the name of the input field (and thus the partName) and it does the trick.