I have to check thousands of proxy servers continuously.
To speed it up, I am thinking to create a batch of size N(say 50) and send requests to them concurrently. Each proxy server has a unique IP/Port and username/password authentication.
Since I am checking proxies, I will configure the request to use the given Proxy and send a request to the target site and measure the response.
Here is an example to use proxy with auth from the Apache client docs:
public static void main(String[] args)throws Exception {
CredentialsProvider credsProvider = new BasicCredentialsProvider();
credsProvider.setCredentials(
new AuthScope("localhost", 8889),
new UsernamePasswordCredentials("squid", "nopassword"));
CloseableHttpAsyncClient httpclient = HttpAsyncClients.custom()
.setDefaultCredentialsProvider(credsProvider)
.build();
try {
httpclient.start();
HttpHost proxy = new HttpHost("localhost", 8889);
RequestConfig config = RequestConfig.custom()
.setProxy(proxy)
.build();
HttpGet httpget = new HttpGet("https://httpbin.org/");
httpget.setConfig(config);
Future<HttpResponse> future = httpclient.execute(httpget, null);
HttpResponse response = future.get();
System.out.println("Response: " + response.getStatusLine());
System.out.println("Shutting down");
} finally {
httpclient.close();
}
}
As you can see, if you are using an authenticated proxy, you need to provide the credentials in the Client itself.
This means that if I am checking 50 proxy servers concurrently then I have to create a new client for each of them. Which means that the requests will not be concurrent and better if I just use a multi-threaded solution.
The issue is that if I use multithreading then I will put excessive loads on the server as most of the threads will block on I/O. A concurrent non-blocking I/O is much better for this type of challenge.
How can I check multiple authenticated proxy servers concurrently if I have to create a client for each of them?
Related
I have a stand-alone Java client trying to do RMI through a NTLM proxy.
It's multithreaded.
I'm using Apache httpclient 4.5.6.
I've got the proxy on a 5 minute timeout cycle.
The basic case works, reauthenticating every 5 minutes when challenged by the proxy, as long as 2 threads don't make a request at the same time at exactly the time the proxy times out. Then it fails. Once it fails, all subsequent attempts fail.
I've attached a wireshark screenshot to clarify (screenshot is from 4.5.2 but I upgraded to 4.5.6 and saw the same behavior).
A good cycle looks like
Client tries CONNECT (no NTML flags)
Proxy replies with 407 (no NTML flags)
Client tries CONNECT again with ntlm messagetype NTLMSSP_NEGOTIATE
Proxy replies with 407 NTLMSSP_CHALLENGE
Client does CONNECT with NTLMSSP_AUTH and my credentials.
Proxy replies with 200, and we are good to go for another 5 minutes.
A bad cycle looks like
Client tries CONNECT (no NTML flags)
Proxy replies with 407 (no NTML flags)
Client tries CONNECT again with ntlm messagetype NTLMSSP_NEGOTIATE
Client tries CONNECT (no NTML flags)
Proxy replies with 407 (no NTML flags)
Proxy replies with 407 NTLMSSP_CHALLENGE
A whole bunch more CONNECTs and 407s without NTML flags within a few seconds.
to me this looks like a multithread race condition in non-threadsafe code.
With Apache httpclient 4.5.2 it just propogated the 407 and I detected it in CloseableHttpResponse.getStatusLine().getStatusCode().
With Apache httpclient 4.5.6 I see this
java.lang.IllegalStateException: Auth scheme is null
at org.apache.http.util.Asserts.notNull(Asserts.java:52)
at org.apache.http.impl.auth.HttpAuthenticator.ensureAuthScheme(HttpAuthenticator.java:229)
at org.apache.http.impl.auth.HttpAuthenticator.generateAuthResponse(HttpAuthenticator.java:184)
at org.apache.http.impl.execchain.MainClientExec.createTunnelToTarget(MainClientExec.java:484)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:411)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
Any ideas how to protect against this or work around it or recover from it?
(beside sync on the calls, which would slow down an already slow app a lot)
some code snippets from the app:
// this is done only once
HttpClientBuilder builder = HttpClients.custom();
SocketConfig.Builder socketConfig = SocketConfig.custom();
RequestConfig.Builder requestConfig = RequestConfig.custom();
HttpHost proxy = new HttpHost(proxyHost, proxyPort);
builder.setProxy(proxy);
requestConfig.setProxy(proxy);
builder.setProxyAuthenticationStrategy(new ProxyAuthenticationStrategy());
CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
String localHost = getLocalHostname();
credentialsProvider.setCredentials(
new AuthScope(proxyHost, proxyPort, AuthScope.ANY_REALM, "ntlm"),
new NTCredentials(user, password, localHost, domain));
builder.setDefaultCredentialsProvider(credentialsProvider);
builder.setDefaultSocketConfig(socketConfig.build());
builder.setDefaultRequestConfig(requestConfig.build());
CloseableHttpClient client = builder.build();
...
// cached, we use the same one every time in accordance with section 4.7 of
// https://hc.apache.org/httpcomponents-client-4.5.x/tutorial/html/authentication.html
HttpClientContext context = HttpClientContext.create();
context.setCredentialsProvider(credentialsProvider);
...
// new HttpPost every time
HttpPost postMethod = new HttpPost(uri);
postMethod.setEntity(new ByteArrayEntity(bytesOut.toByteArray()));
response = client.execute(postMethod, context);
HttpContext instances are perfectly thread-safe. However some attributes stored in the context such as authentication handshake state are obviously not. Make sure that HttpContext instances do not get updated concurrently and the problem should go away.
Thank you Oleg, this is what I did and it seems to be working so far (too long to post as a comment on your answer, but I wanted to share my code)
// I use the base version when not going through a proxy
public class HttpClientContextFactory {
public HttpClientContext create() {
return HttpClientContext.create();
}
}
// I use this when I go through a NTLM proxy
private HttpClientContextFactory getNtlmContextFactory(
final CredentialsProvider credentialsProvider) {
return new HttpClientContextFactory() {
ThreadLocal<HttpClientContext> tlContext = ThreadLocal
.<HttpClientContext> withInitial(() -> {
HttpClientContext context = HttpClientContext.create();
context.setCredentialsProvider(credentialsProvider);
return context;
});
#Override
public HttpClientContext create() {
return tlContext.get();
}
};
}
// then do this when I connect to the server
response = client.execute(postMethod, contextFactory.create());
I was happy to access SharePoint using PowerShell. It just picked -DefaultCredential and I didn't have to worry about that. That was for prototyping.
But my actual code is Java. Now I am not sure about this at all.
Even though I make REST calls, even SOAP would fail if I don't authenticate properly.
Method 1 : NTLM
Here the only thing I am not sure about is the workstation ID. I login using Citrix to a VM and there is an explicit Workstation ID. I use that.
Returns 401.
DefaultHttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet("http://teams.host.com/_vti_bin/listdata.svc/");
NTCredentials credentials = new NTCredentials("user", 'pass', "workstation", "Domain");
client.getCredentialsProvider().setCredentials(new AuthScope("teams.host.com",80), credentials);
HttpResponse response = client.execute(request);
Method 2 : Basic authentication.
HttpGet request = new HttpGet("http://teams.host.com/_vti_bin/listdata.svc/");
CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY,
new UsernamePasswordCredentials("user", "password"));
CloseableHttpClient httpClient =
HttpClientBuilder.create().setDefaultCredentialsProvider(credentialsProvider).build();
HttpResponse response = httpClient.execute(request);
Returns 401.
What other method do I use ? Digest ? Since I don't know how -DefaultCredential in PowerShell worked I am back to the drawing board.
How should I investigate this ? I must be making some basic mistakes in this Java code. The flow is not right. That is my supposition.
So from Apache HttpClient this is the code that connects to SharePoint 2010. The workstation ID is the one used when I use Citrix XenDesktop to login to a Windows machine. I am able to get the result of my REST Get request.
This uses NTLM authentication.
DefaultHttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet("http://teams.host.com/_vti_bin/listdata.svc/");
NTCredentials credentials = new NTCredentials("user", 'pass', "workstation", "Domain");
client.getCredentialsProvider().setCredentials(new AuthScope("teams.host.com",80), credentials);
HttpResponse response = client.execute(request);
In my test application I execute consecutive HttpGet requests to the same host with Apache HttpClient but upon each next request it turns out that the previous HttpConnection is closed and the new HttpConnection is created.
I use the same instance of HttpClient and don't close responses. From each entity I get InputStream, read from it with Scanner and then close the Scanner. I have tested KeepAliveStrategy, it returns true. The time between requests doesn't exceed keepAlive or connectionTimeToLive durations.
Can anyone tell me what could be the reason for such behavior?
Updated
I have found the solution. In order to keep the HttpConnecton alive it is necessary to set HttpClientConnectionManager when building HttpClient. I have used BasicHttpClientConnectionManager.
ConnectionKeepAliveStrategy keepAliveStrat = new DefaultConnectionKeepAliveStrategy() {
#Override
public long getKeepAliveDuration(HttpResponse response, HttpContext context)
{
long keepAlive = super.getKeepAliveDuration(response, context);
if (keepAlive == -1)
keepAlive = 120000;
return keepAlive;
}
};
HttpClientConnectionManager connectionManager = new BasicHttpClientConnectionManager();
try (CloseableHttpClient httpClient = HttpClients.custom()
.setConnectionManager(connectionManager) // without this setting connection is not kept alive
.setDefaultCookieStore(store)
.setKeepAliveStrategy(keepAliveStrat)
.setConnectionTimeToLive(120, TimeUnit.SECONDS)
.setUserAgent(USER_AGENT)
.build())
{
HttpClientContext context = new HttpClientContext();
RequestConfig config = RequestConfig.custom()
.setCookieSpec(CookieSpecs.DEFAULT)
.setSocketTimeout(10000)
.setConnectTimeout(10000)
.build();
context.setRequestConfig(config);
HttpGet httpGet = new HttpGet(uri);
CloseableHttpResponse response = httpClient.execute(httpGet, context);
HttpConnection conn = context.getConnection();
HttpEntity entity = response.getEntity();
try (Scanner in = new Scanner(entity.getContent(), ENC))
{
// do something
}
System.out.println("open=" + conn.isOpen()); // now open=true
HttpGet httpGet2 = new HttpGet(uri2); // on the same host with other path
// and so on
}
Updated 2
In general checking connections with conn.isOpen() is not proper way to check the connections state because: "Internally HTTP connection managers work with instances of ManagedHttpClientConnection acting as a proxy for a real connection that manages connection state and controls execution of I/O operations. If a managed connection is released or get explicitly closed by its consumer the underlying connection gets detached from its proxy and is returned back to the manager. Even though the service consumer still holds a reference to the proxy instance, it is no longer able to execute any I/O operations or change the state of the real connection either intentionally or unintentionally." (HttpClent Tutorial)
As have pointed #oleg the proper way to trace connections is using the logger.
First of all you need to make sure remote server you're working with does support keep-alive connections. Just simply check whether remote server does return header Connection: Keep-Alive or Connection: Closed in each and every response. For Close case there is nothing you can do with that. You can use this online tool to perform such check.
Next, you need to implement the ConnectionKeepAliveStrategy as defined in paragraph #2.6 of this manual. Note that you can use existent DefaultConnectionKeepAliveStrategy since HttpClient version 4.0, so that your HttpClient will be constructed as following:
HttpClient client = HttpClients.custom()
.setKeepAliveStrategy(DefaultConnectionKeepAliveStrategy.INSTANCE)
.build();
That will ensure you HttpClient instance will reuse the same connection via keep-alive mechanism if it is being supported by server.
Your application must be closing response objects in order to ensure proper resource de-allocation of the underlying connections. Upon response closure HttpClient keeps valid connections alive and returns them back to the connection manager (connection pool).
I suspect your code simply leaks connections and every request ens up with a newly created connection while all previous connections keep on piling up in memory.
From the example at HttpClient website:
// In order to ensure correct deallocation of system resources
// the user MUST call CloseableHttpResponse#close() from a finally clause.
// Please note that if response content is not fully consumed the underlying
// connection cannot be safely re-used and will be shut down and discarded
// by the connection manager.
So as #oleg said you need to close the HttpResponse before checking the connection status.
I recently switched from java.net to org.apache.http.client, I have setup a ClosableHttpClient with the HttpClientBuilder. As connection manager I am using the BasicHttpClientConnectionManager.
Now I have the problem that very often when I create some HTTP request I get a timeout exception. It seems that the connection manager is keeping connections open to reuse them but if the system is idle for a few minutes then this connection will timeout and when I make the next request the first thing I get is a timeout. Repeating the same request one more time then usually works without any problem.
Is there a way to configure the BasicHttpClientConnectionManager in order to not reuse its connections and create a new connection each time?
There several ways of dealing with the problem
Evict idle connections once no longer needed. The code below effectively disables connection persistence by closing out persistent connections after each HTTP exchange.
BasicHttpClientConnectionManager cm = new BasicHttpClientConnectionManager();
CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(cm).build();
...
try (CloseableHttpResponse response = httpclient.execute(new HttpGet("/"))) {
System.out.println(response.getStatusLine());
EntityUtils.consume(response.getEntity());
}
cm.closeIdleConnections(0, TimeUnit.MILLISECONDS);
Limit connection keep-alive time to something relatively small-ish
BasicHttpClientConnectionManager cm = new BasicHttpClientConnectionManager();
CloseableHttpClient httpclient = HttpClients.custom()
.setConnectionManager(cm)
.setKeepAliveStrategy((response, context) -> 1000)
.build();
try (CloseableHttpResponse response = httpclient.execute(new HttpGet("/"))) {
System.out.println(response.getStatusLine());
EntityUtils.consume(response.getEntity());
}
(Recommended) Use pooling connection manager and set connection total time to live to a finite value. There are no benefits to using the basic connection manager compared to the pooling one unless your code is expected to run in an EJB container.
CloseableHttpClient httpclient = HttpClients.custom()
.setConnectionTimeToLive(5, TimeUnit.SECONDS)
.build();
try (CloseableHttpResponse response = httpclient.execute(new HttpGet("/"))) {
System.out.println(response.getStatusLine());
EntityUtils.consume(response.getEntity());
}
I'm building a Java application to extract files from Sharepoint using Sharepoint's REST api. First I need to authenticate, our organisation uses OKTA to obtain a token.
The example code I'm using is:
CredentialsProvider credsProvider = new BasicCredentialsProvider();
credsProvider.setCredentials(AuthScope.ANY,
new NTCredentials(user, pwd, "", ""));
HttpHost target = new HttpHost("organisation.sharepoint.com", 80, "http");
HttpClientContext context = HttpClientContext.create();
context.setCredentialsProvider(credsProvider);
// The authentication is NTLM.
// To trigger it, we send a minimal http request
HttpHead request1 = new HttpHead("/");
CloseableHttpResponse response1 = null;
try {
response1 = httpclient.execute(target, request1, context);
EntityUtils.consume(response1.getEntity());
System.out.println("1 : " + response1.getStatusLine().getStatusCode());
I need to modify the NTLM code to use Okta instead to make the call to Sharepoint with context set.
Any help appreciated!
Unfortunately, this is not achievable at the moment. This feature has been requested and will be reviewed by engineering. However, it is not actively being worked on as of right now.