Retrieving data from multiple tasks from different threads - java

Good evening,
I have a List of different URLs (about 500) which content I get from this method
public static String getWebContent(URL url){
// create URL, build HTTPConnection, getContent of page
}
after this I have another method where the content is fetched for values etc.
At this time I do it like this:
List<URL> urls = new ArrayList<>();
List<String> webcontents = new ArrayList<>();
for(URL url : urls){
webcontents.add(getWebContent(url));
}
// Futher methods to extract values from the webcontents
But it actually takes a lot of time, because there is only one Thread doing it. I wanted to make it multithreaded, but I am not sure what's the best way how to do it.
First, I need the return value of every Thread, should I implement Callable instead of Runnable for it?
And how do I run the method with different Threads, should there be one starting with index 0, one with index 50, etc.? And when they are done with one URL they set a flag to true? That would be my way, but it is not very effective I think. If the first website has a lot of content, the first Thread might take much longer then the others.
And when every Thread is done, how I can my data back to one list? Like this?
List<String> webcontent = new ArrayList<>();
if(!t1.isAlive() && !t2.isAlive()){
webcontent.add(t1.getData());
webcontent.add(t2.getData());
}
I hope you can understand my problem and can give me a tip :) Many thanks

You could use an ExecutorCompletionService to retrieve your tasks as they complete.
List<URL> urls = ...; // Create this list somehow
ExecutorCompletionService<String> service =
new ExecutorCompletionService<String>(Executors.newFixedThreadPool(10));
for (URL url: urls) {
service.submit(new GetWebContentCallable(url)); // you need to define the GetWebContentCallable
}
int remainingTasks = urls.size();
while (remainingTasks > 0) {
String nextResult = service.take();
processResult(nextResult); // you define processResult
remainingTasks -= 1;
}

Perhaps you could try something like:
public static List<String> getWebContents(final int threads, final URL... urls){
final List<Future<String>> futures = new LinkedList<>();
final ExecutorService service = Executors.newFixedThreadPool(threads);
Arrays.asList(urls).forEach(
url -> {
final Callable<String> callable = () -> {
try{
return getWebContent(url);
}catch(IOException ex){
ex.printStackTrace();
return null;
}
};
futures.add(service.submit(callable));
}
);
final List<String> contents = new LinkedList<>();
futures.forEach(
future -> {
try{
contents.add(future.get());
}catch(Exception ex){
ex.printStackTrace();
}
}
);
service.shutdown();
return contents;
}
Of if you're not using Java 8:
public static List<String> getWebContents(final int threads, final URL... urls){
final List<Future<String>> futures = new LinkedList<Future<String>>();
final ExecutorService service = Executors.newFixedThreadPool(threads);
for(final URL url : urls){
final Callable<String> callable = new Callable<String>(){
public String call(){
try{
return getWebContent(url);
}catch(IOException ex){
ex.printStackTrace();
return null;
}
}
};
futures.add(service.submit(callable));
}
final List<String> contents = new LinkedList<String>();
for(final Future<String> future : futures){
try{
contents.add(future.get());
}catch(Exception ex){
ex.printStackTrace();
}
}
service.shutdown();
return contents;
}

Instead of retrieving values from working threads, let working threads put results in a resulting collection (be it List<String> webcontent or anything else). Note this may require synchronization.

Related

How to get the execution results of ExecutorService without blocking the current code path?

I have a service which adds a bunch of requests to Callables and then prints the results of the executions. Currently the service request is blocked until I print all the Future results from the execution. However I want to return 200 to the requestor and run these requests in parallel without blocking the request. How can I achieve this? Below is my code.
Below is my code to run parallel code.
public void runParallelFunctions(Callable<Map<String, String>> invokerTask) {
List<Callable<Map<String, String>>> myTasks = new ArrayList<>();
for (int i = 0; i < invocationCount; i++) {
myTasks.add(invokerTask);
}
List<Future<Map<String, String>>> results = null;
try {
results = executorService.invokeAll(myTasks);
} catch (InterruptedException e) {
}
this.printResultsFromParallelInvocations(results);
}
Below is how I print the results from the Futures.
private void printResultsFromParallelInvocations(List<Future<Map<String, String>>> results) {
results.forEach(executionResults -> {
try {
executionResults.get().entrySet().forEach(entry -> {
LOGGER.info(entry.getKey() + ": " + entry.getValue());
});
} catch (InterruptedException e) {
} catch (ExecutionException e) {
}
});
}
Below is how I'm invoking the above methods when someone places a request to the service.
String documentToBeIndexed = GSON.toJson(indexDocument);
int documentId = indexMyDocument(documentToBeIndexed);
createAdditionalCandidatesForFuture(someInput);
return true;
In the above code, I call the createAdditionalCandidatesForFuture and then return true. But the code still waits for the printResultsFromParallelInvocations method to complete. How can I make the code return after invoking createAdditionalCandidatesForFuture without waiting for the results to print? Do I have to print the results using another executor thread or is there another way? Any help would be much appreciated
The answer is CompletableFuture.
Updated runParallelFunctions:
public void runParallelFunctions(Callable<Map<String, String>> invokerTask) {
// write a wrapper to handle exception outside CompletableFuture
Supplier<Map<String, String>> taskSupplier = () -> {
try {
// some task that takes a long time
Thread.sleep(4000);
return invokerTask.call();
} catch (Exception e) {
System.out.println(e);
}
// return default value on error
return new HashMap<>();
};
for (int i = 0; i < 5; i++) {
CompletableFuture.supplyAsync(taskSupplier, executorService)
.thenAccept(this::printResultsFromParallelInvocations);
}
// main thread immediately comes here after running through the loop
System.out.println("Doing other work....");
}
And, printResultsFromParallelInvocations may look like:
private void printResultsFromParallelInvocations(Map<String, String> result) {
result.forEach((key, value) -> System.out.println(key + ": " + value));
}
Output:
Doing other work....
// 4 secs wait
key:value
Calling get on a Future will block the thread until the task is completed, so yes, you will have to move the printing of the results to another thread/Executor service.
Another option is that each task prints its results upon completion, provided they are supplied with the necessary tools to do so (Access to the logger, etc). Or putting it in another way, each task is divided into two consecutive steps: execution and printing.

How to use the API java.util.concurrent.Future instead of creating threads explicitly in Java?

I have two threads running parallely in a java program as below:
// Threading
new Thread(new Runnable() {
#Override
public void run() {
try {
gpTableCount = getGpTableCount();
} catch (SQLException e) {
e.printStackTrace();
} catch(Exception e) {
e.printStackTrace();
}
}
}).start();
new Thread(new Runnable() {
#Override
public void run() {
try {
hiveTableCount = getHiveTableCount();
} catch (SQLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}).start();
while(!(gpTableCount != null && gpTableCount.size() > 0 && hiveTableCount != null && hiveTableCount.size() > 0)) {
Thread.sleep(5000);
}
// Threading
Both of them have same functionality. Below is the code from getHiveTableCount(). The other method is slightly different (a line or two) from the below one but the functionality remains the same.
public Map<String, String> getHiveTableCount() throws IOException, SQLException {
hiveDataMap = new HashMap<String, String>();
hiveTableErrs = new HashMap<String, String>();
Iterator<String> hiveIterator = filteredList.iterator();
Connection hiveConnection = DbManager.getHiveConnection();
PreparedStatement hive_pstmnt = null;
String hiveExcpnMsg;
String ssn;
String hiveMaxUpdTms;
Long hiveCount;
String gpHiveRec;
String[] hiveArray;
String[] hiveDetails;
String hiveQuery;
while(hiveIterator.hasNext()) {
gpHiveRec = hiveIterator.next();
hiveArray = gpHiveRec.split(",");
hiveDetails = hiveArray[1].split("\\.");
hiveQuery = "select '" + hiveDetails[1] + "' as TableName, count(*) as Count, source_system_name, max(xx_last_update_tms) from " + hiveArray[1] + " where source_system_name='" + hiveArray[2] + "' group by source_system_name";
try {
hive_pstmnt = hiveConnection.prepareStatement(hiveQuery);
ResultSet hiveCountRs = hive_pstmnt.executeQuery();
while(hiveCountRs.next()) {
hiveCount = hiveCountRs.getLong(2);
ssn = hiveCountRs.getString(3);
hiveMaxUpdTms = hiveCountRs.getTimestamp(4).toString();
hiveDataMap.put(hiveDetails[1] + "," + ssn, hiveCount + "," + hiveMaxUpdTms);
}
} catch(org.postgresql.util.PSQLException e) {
hiveExcpnMsg = e.getMessage();
hiveTableErrs.put(hiveDetails[1] + ": for the SSN: " + hiveArray[2], hiveExcpnMsg + "\n");
} catch(SQLException e) {
hiveExcpnMsg = e.getMessage();
hiveTableErrs.put(hiveDetails[1] + ": for the SSN: " + hiveArray[2], hiveExcpnMsg + "\n");
} catch(Exception e) {
hiveExcpnMsg = e.getMessage();
hiveTableErrs.put(hiveDetails[1] + ": for the SSN: " + hiveArray[2], hiveExcpnMsg + "\n");
}
}
return hiveDataMap;
}
These two threads run concurrently. I recently read online that:
Future class represents a future result of an asynchronous computation
– a result that will eventually appear in the Future after the
processing is complete.
I understood the concept theoritically but I don't know how to apply the java.util.concurrent.Future api for the same above code instead of creating threads explicitly.
Could anyone let me know how can I implement multi threading on the methods: getGpTableCount() & getHiveTableCount using java.util.concurrent.Future api instead of creating threads creating new threads like new Thread(new Runnable() ?
You are submitting your tasks using the Runnable interface which doesn't allow your threads to return a value at the end of computation (and cause you to use a shared variable - gpTableCount and hiveTableCount).
The Callable interface is a later addition which allow your tasks to return a value (in your case, Map<String, String>).
As an alternative for working with threads directly, The Concurrency API introduces the ExecutorService as a higher level object which manages threads pools and able to execute tasks asynchronously.
When submiting a task of type Callable to an ExecutorService you're expecting the task to produce a value, but since the submiting point and the end of computaion aren't coupled, the ExecutorService will return Future which allow you to get this value, and block, if this value isn't available. Hence, Future can be used to synchronize between your different threads.
As an alternative to ExecutorService you can also take a look at FutureTask<V> which is implementation of RunnableFuture<V>:
This class provides a base implementation of Future, with methods to start and cancel a computation, query to see if the computation is complete, and retrieve the result of the computation
A FutureTask can be used to wrap a Callable or Runnable object.
if you are using Java 8+ you may use CompletableFuture.supplyAsync for that in short like:
import static java.util.concurrent.CompletableFuture.supplyAsync;
.....
Future<Map<String, String>> f= supplyAsync(()->{
try{
return getHiveTableCount();
} catch(Exception e) {
throw new RuntimeException(e);
}
}
CompletableFuture.supplyAsync will run it in default using ForkJoinPool.commonPool() it have also another overlap that taking Executorin its parameter if you want to use your own:
public class CompletableFuture<T>
extends Object
implements Future<T>, CompletionStage<T>
and it have.
public static <U> CompletableFuture<U> supplyAsync(Supplier<U> supplier)
public static <U> CompletableFuture<U> supplyAsync(Supplier<U> supplier,
Executor executor)
At first, create executor service which suits your needs the best, for example:
ExecutorService ex = Executors.newFixedThreadPool(2);
(more on executors: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html)
Instead of Runnable object, use Callable which is similar to runnable, but returns a value (more on callable : https://docs.oracle.com/javase/8/docs/api/index.html?java/util/concurrent/Callable.html):
Callable<Map<String, String>> callable1 = // your Callable class
Type parameter should be the same as as the type which you would like to return as a result.
Next create a list of your tasks:
List<Callable<Map<String, String>>> tasks = new LinkedList<>();
tasks.add(callable1);
tasks.add(callable2);
and execute them:
List<Future<Map<String, String>>> results = ex.invokeAll(tasks);
above method returns when all tasks are completed (if I understand your case correctly, this is what you would like to achieve), however completed task could have terminated either normally or by throwing an exception.
at the end close the executor service:
ex.shutdown();

Multithreading java optimization

In my program I try tro grasp how to use ExecutorService to optimize my program.
For some reason, It gets stuck a little on two Urls. The http://sjsu.edu/ and https://paypal.com. When it sits on these two, it does not continue executing other URLS.
Should the other 3 threads available not continue even though the two domains aren't responsing fast enough?
How is this fixed in the best possible manner?
public class SequentialPinger {
public static void main(String args[]) throws Exception {
String[] hostList = {"http://crunchify.com", "http://yahoo.com",
"http://www.ebay.com", "http://google.com",
"http://www.example.co", "https://paypal.com",
"http://bing.com/", "http://techcrunch.com/",
"http://mashable.com/", "http://thenextweb.com/",
"http://wordpress.com/", "http://cphbusiness.dk/",
"http://example.com/", "http://sjsu.edu/",
"http://ebay.co.uk/", "http://google.co.uk/",
"http://www.wikipedia.org/",
"http://dr.dk", "http://pol.dk", "https://www.google.dk",
"http://phoronix.com", "http://www.webupd8.org/",
"https://studypoint-plaul.rhcloud.com/", "http://stackoverflow.com",
"http://docs.oracle.com", "https://fronter.com",
"http://imgur.com/", "http://www.imagemagick.org"
};
List<CallableImpl> callList = new ArrayList();
ExecutorService es = Executors.newFixedThreadPool(4);
for (String url : hostList) {
CallableImpl callable = new CallableImpl(url);
callList.add(callable);
}
for (CallableImpl callableImpl : callList) {
System.out.println("Trying to connect to: " + callableImpl.getUrl());
Future<String> lol = es.submit(callableImpl);
System.out.println("status: " + lol.get());
}
es.shutdown();
}
}
My Callable implementation
public class CallableImpl implements Callable<String> {
private final String url;
public CallableImpl(String url) {
this.url = url;
}
public String getUrl() {
return url;
}
#Override
public String call() {
String result = "Error";
try {
URL siteURL = new URL(url);
HttpURLConnection connection = (HttpURLConnection) siteURL
.openConnection();
connection.setRequestMethod("GET");
connection.connect();
int code = connection.getResponseCode();
if (code == 200) {
result = "Green";
}
if (code == 301) {
result = "Redirect";
}
} catch (IOException e) {
result = "->Red<-";
}
return result;
}
}
In your code you submit Callable to ExecutorService one by one and immediately call Future.get() which will block until result is ready (or exception is thrown at runtime).
You'd better wrap ExecutorService with CompletionSerivce which provides results as soon as they are ready. And split for-loop into two loops: one to submit all Callables and second to check results.
ExecutorService es = Executors.newFixedThreadPool(4);
ExecutorCompletionService<String> completionService = new ExecutorCompletionService<>(es);
for (CallableImpl callableImpl : callList) {
System.out.println("Trying to connect to: " + callableImpl.getUrl());
completionService.submit(callableImpl);
}
for (int i = 0; i < callList.size(); ++i) {
completionService.take().get(); //fetch next finished Future and check its result
}
Problem
You call get() on the Future directly after creating it, blocking the main thread. Thus you don't have any parallel calls at all, and making the ExecutorService essentially useless. Your code is equivalent to simply calling callableImpl.call() yourself.
Solution
Don't call get() if you want to continue execution and have each CallableImpl run in parallel. Instead you can call es.awaitTermination() after es.shutdown().
I suggest using a CompletableFuture added in Java 8 and add a callback to it.
CompletableFuture.supplyAsync(myCallable::call, es)
.thenAccept(result -> {
something(result);
});
I would suggest making your Callable be a Supplier to make this simpler.
You wrote: "it does not continue executing other URLS" - I believe it does, but your log messages are misleading because are not tightly connected to the actual execution. To fix this, do the following:
Move System.out.println("Trying to connect to: ") and System.out.println("status: ") into the CallableImpl.call() method.
Do not call to lol.get() at all.
This way you will see actual sequence of the start and the end of handling each URL.

An Efficient concurrent data structure to wait for a computed value (or timeout)

I'm hoping some concurrency experts can advise as I'm not looking to rewrite something that likely exists.
Picture the problem; I have a web connection that comes calling looking for their unique computed result (with a key that they provide in order to retrieve their result) - however the result may not have been computed YET so I would like for the connection to wait (block) for UP TO n seconds before giving up and telling them I don't (yet) have their result (computation time to calculate value is non deterministic). something like;
String getValue (String key)
{
String value = [MISSING_PIECE_OF_PUZZLE].getValueOrTimeout(key, 10, TimeUnit.SECONDS)
if (value == null)
return "Not computed within 10 Seconds";
else
return "Value was computed and was " + value;
}
and then have another thread (the computation threads)that is doing the calculations - something like ;
public void writeValues()
{
....
[MISSING_PIECE_OF_PUZZLE].put(key, computedValue)
}
In this scenario, there are a number of threads working in the background to compute the values that will ultimately be picked up by a web connections. The web connections have NO control or authority over what is computed and when the computations execute - as I've said - this is being done in a pool in the background but these thread can publish when the computation has completed (how they do is the gist of this question). The publish message maybe consumed or not - depending if any subscribers are interested in this computed value.
As these are web connections that will be blocking - i could potentially have 1000s of concurrent connections waiting (subscribing) for their specific computed value so such a solution needs to be very light on blocking resources. The closest i've came to is this SO question which I will explore further but wanted to check i'm not missing something blindly obvious before writing this myself?
I think you should use a Future it gives an ability to compute data in a separate thread and block for the requested time period while waiting for an answer. Notice how it throws an exception if more then 3 seconds passed
public class MyClass {
// Simulates havy work that takes 10 seconds
private static int getValueOrTimeout() throws InterruptedException {
TimeUnit.SECONDS.sleep(10);
return 123;
}
public static void main(String... args) throws InterruptedException, ExecutionException {
Callable<Integer> task = () -> {
Integer val = null;
try {
val = getValueOrTimeout();
} catch (InterruptedException e) {
throw new IllegalStateException("task interrupted", e);
}
return val;
};
ExecutorService executor = Executors.newFixedThreadPool(1);
Future<Integer> future = executor.submit(task);
System.out.println("future done? " + future.isDone());
try {
Integer result = future.get(3, TimeUnit.SECONDS);
System.out.print("Value was computed and was : " + result);
} catch (TimeoutException ex) {
System.out.println("Not computed within 10 Seconds");
}
}
}
After looking in changes in your question I wanted to suggest a different approach using BlockingQueue in such case the producer logic completely separated from the consumer so you could do something like this
public class MyClass {
private static BlockingQueue<String> queue = new ArrayBlockingQueue<>(10);
private static Map<String, String> dataComputed = new ConcurrentHashMap<>();
public static void writeValues(String key) {
Random r = new Random();
try {
// Simulate working for long time
TimeUnit.SECONDS.sleep(r.nextInt(11));
String value = "Hello there fdfsd" + Math.random();
queue.offer(value);
dataComputed.putIfAbsent(key, value);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private static String getValueOrTimeout(String key) throws InterruptedException {
String result = dataComputed.get(key);
if (result == null) {
result = queue.poll(10, TimeUnit.SECONDS);
}
return result;
}
public static void main(String... args) throws InterruptedException, ExecutionException {
String key = "TheKey";
Thread producer = new Thread(() -> {
writeValues(key);
});
Thread consumer = new Thread(() -> {
try {
String message = getValueOrTimeout(key);
if (message == null) {
System.out.println("No message in 10 seconds");
} else {
System.out.println("The message:" + message);
}
} catch (InterruptedException e) {
e.printStackTrace();
}
});
consumer.start();
producer.start();
}
}
With that said I have to agree with #earned that making the client thread to wait is not a good approach instead I would suggest using a WebSocket which gives you an ability to push data to the client when it is ready you can find lots of tutorials on WebSocket here is one for example ws tutorial

ThreadPoolExecutor Implementation for method taking ArrayList Parameter

I have a following code
private List<String[]> userList2 = new ArrayList<String[]>(10000);
ThreadPoolExecutor executor = new ThreadPoolExecutor(10, 10, 10, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(5), new ThreadPoolExecutor.CallerRunsPolicy());
Database Query
while (rs.next())
{
data = new String[2];
data[0] = rs.getString("userid");
data[1] = rs.getString("email");
userList2.add(data);
if(userList2.size()==10000) //Confusion in this part..
{
final List<String[]> elist = new ArrayList<String[]>(userList2);
executor.execute(new Runnable() {
public void run() {
doBilling(con,elist); //Parallel is not happening here...
}
});
I have a method
doBillng(Connection con,List<String[]> userList)
{
String list[]=null;
String userid=" ";
for(int i=0;i<userList.size();i++)
{
list=userList.get(i);
userid=props[0];
list = BillingDao.billById(uid, con);
}
}
When userlist2 size is 10000 I want to run doBilling in 10 Threads parallelly such that it can reach 10000 records in great speed. But it's not happening!!! Please suggest What am I doing wrong and how should this be solved..
Thanks in advance
You have submitted only one task to the executor and that happens when if(userList2.size()==10000) returns true. So consequently , You have one Thread running to process 10000 elements of the ArrayList!!!!!.
If you want 10 threads to execute 10000 elements of ArrayList your code should be something like this:
while (rs.next())
{
data = new String[2];
data[0] = rs.getString("userid");
data[1] = rs.getString("email");
userList2.add(data);
if(userList2.size()% 1000 == 0) //Check if size is multiple of 1000(obtained by 10000/10)
{
final List<String[]> elist = new ArrayList<String[]>(userList2);
executor.execute(new Runnable()
{
public void run()
{
doBilling(con,elist); //create new ArrayList and assign it to a seperate thread.
}
});
userList2.clear();//clearing the arrayList so that next time when submitted to executor , new elements are processed.
}
}

Categories

Resources