Java ExecutorService get feedback for all tasks - java

I want to send email to multiple(500,1000,2000) users.
I have done that using ExecutorService.
But now I want to collect the number of successful emails sent and the number of failed emails out of total records.
I have implemented this like:
int startValue=0;
int endValue=0;
List userEmailList = getListFromDB();
ExecutorService e = Executors.newFixedThreadPool(10);
Collection c = new ArrayList();
while (someflag)
{
// in MyTask class I am sending email to users.
c.add(new MyTask(startValue, endValue,userEmailList));
}
e.invokeAll(c); //Here I am calling invokeall .
pool.shutdown();
public class MyTask implements Callable<String> {
MyTask(startValue, endValue,userEmailList){
}
public String call(){
//e.g. batch 1 will have - startValue => endValue = 0 -100
//e.g. batch 2 will have - startValue => endValue = 101 -199
//e.g. batch 3 will have - startValue => endValue = 200 -299
//e.g. batch 4 will have - startValue => endValue = 300 -399
//e.g. batch 5 will have - startValue => endValue = 400 - 499
for(int i=startValue;i<endValue;i++){
sendEmailToUser(userEmailList.get(i)){
}
}
}
But future.get() returning me number of task completed. so from above code it will return me 5 task.
But I wanted output as no of failed emails and number of successful email sent.
for e.g if there are 500 email users and if 20 falied , then output should be 480 success and 20 failed.
But with above code I am getting only no of task . ie 5 task
Can anybody tell me how I can get feedback from all concurrent tasks (Not the number of tasks completed).

Your MyTask returns a String (implements Callable<String>), which doesn't make much sense in your case. You are free to return any other type you want. Unfortunately you'll need some simple POJO to contain the results, e.g.:
public class Result {
private final int successCount;
private final int failureCount;
public Result(int successCount, int failureCount) {
this.successCount = successCount;
this.failureCount = failureCount;
}
}
And return it after given batch is done (implement Callable<Result>). Of course your MyTask will then have to keep track of how many e-mails failed and return correct values wrapped around Result.
However I see several ways your code can be improved. First of all instead of passing startValue, endValue range to MyTask just use userEmailList.subList(startValue, endValue) - which will simplify your code a lot:
new MyTask(userEmailList.subList(startValue, endValue));
//...
public class MyTask implements Callable<Result> {
MyTask(userEmailList){
}
public Result call(){
for(email: userEmailList) {
sendEmailToUser(email);
//collect results here
}
return new Result(...);
}
}
On the other hand there is nothing wrong in creating MyTask to send just one e-mail. That instead of aggregating counts in given batch you simply check the result of one task (one e-mail) - either nothing or exception (or single Boolean). It's much easier and shouldn't be slower.

I could see that your call method is declared to return a String but your code doesn't return anything (probably incomplete snippet). And from your statement, I understand that you are returning whether the task is completed or not and not whether the mail has been sent. You could make the sendEmailToUser return the success of failure depending on whether the mail has been sent successfully and get the result using Future.get

Related

Stock statistics calculation with O(1) time and space complexity

I have to design a rest API in Java which :-
Accepts a POST request with the below json :-
{
"instrument": "ABC",
"price": "200.90",
"timestamp" : "2018-09-25T12:00:00"
}
these records would be saved in an in memory collection and not any kind of database.
There would be a GET API which returns the statistics of the specific instrument records received in the last 60 seconds. The GET request would be :- /statistics/{instrumentName} Ex :- /statistics/ABC . The response looks as mentioned below :-
{
"count" : "3"
"min" : "100.00"
"max" : "200.00"
"sum" : "450.00"
"avg" : "150.00"
}
There would be another GET request /statistics which returns the statistics of all the instruments that was received in the last 60 seconds ( Not specific to particular instrument like #2 )
What makes this algorithm complex to implement is that the GET call should be executed - O(1) time and space complexity.
The approach which I have thought for 3# is to have a collection which will have 60 buckets ( since we have to calculate for past 60 secs so sampling per 1 sec). Every time the transaction comes in it will go to specific bucket depending on the key i.e. hour-min-sec ( it would be a map with this key and the statistics for that sec ) .
But what I am not able to understand is how to address the problem 2# where we have to get the statistics of specific instrument /statistics/ABC for last 60 sec in O(1) time and space complexity.
What could be the best strategy to clean up records which are older than 60 secs?
Any help with the algorithm will be appreciated.
Store the data in a Map<String, Instrument>, and have the class look like this:
class Instrument {
private String name;
private SortedMap<LocalDateTime, BigDecimal> prices;
private BigDecimal minPrice;
private BigDecimal maxPrice;
private BigDecimal sumPrice;
// Internal helper method
private void cleanup() {
LocalDateTime expireTime = LocalDateTime.now().minusSeconds(60);
Map<LocalDateTime, BigDecimal> expiredPrices = this.prices.headMap(expireTime);
for (BigDecimal expiredPrice : expiredPrices.values()) {
if (this.minPrice.compareTo(expiredPrice) == 0)
this.minPrice = null;
if (this.maxPrice.compareTo(expiredPrice) == 0)
this.maxPrice = null;
this.sumPrice = this.sumPrice.subtract(expiredPrice);
}
expiredPrices.clear(); // Removes expired prices from this.prices
if (this.minPrice == null && ! this.prices.isEmpty())
this.minPrice = this.prices.values().stream().min(Comparator.naturalOrder()).get();
if (this.maxPrice == null && ! this.prices.isEmpty())
this.maxPrice = this.prices.values().stream().max(Comparator.naturalOrder()).get();
}
// other code
}
All the public methods of Instrument must be synchronized and must start with a call to cleanup(), since time has elapsed since any previous call. The addPrice(LocalDateTime, BigDecimal) method must of course update the 3 statistics fields.
To ensure statistics are in sync, it would be appropriate to have a Statistics class that can be used as return value, so all 4 main statistics values (incl. count obtained from this.prices.size()) represent the same set of prices.

Multiple threads writing to same list in java and return that list to a function

So I have a really large list of zip codes (about 80,000) that I want to pass onto a url and get the JSON data from that url for each zip code.
I am running a query on that JSON to see if it has the end_lat and if it does then I want to save that zip code to a list.
As I am fetching and matching JSON for a lot of zip codes its taking forever.
So I tried few different methods to make it a multi threaded application. I tried the good old Thread method with runnable interface.
I tried executor services. But everything stops abruptly which makes me believe that I should be making synchronized writes to that list.
public void breakingZipCodesForThreads() {
List<String> zip_Codes = Serenity.sessionVariableCalled("zipCodes");
int size = (int) Math.ceil(zip_Codes.size() / 5.0);
ExecutorService executor = Executors.newFixedThreadPool(4);
for (int start = 0; start < zip_Codes.size(); start += size) {
int end = Math.min(start + size, zip_Codes.size());
Runnable worker = new MyRunnable(zip_Codes.subList(start, end));
executor.execute(worker);
}
//run() method bascially has this code for a function
for (String zipCode : zip_Codes) {
currentPage = pageUrl + zipCode;
Response response = given().urlEncodingEnabled(false)
.when()
.get(currentPage);
try {
Object end_lat = response.getBody().path("end_lat");
if (end_lat != null && !end_lat.toString().isEmpty()) {
resultantZipCode.add(zipCode);
}
} catch (Exception e) {
//Something else
}
So essentially I want all my threads to concurrently write to the list "resultantZipCode" and in the end give me single list for all the zipcodes in there that satisfy my condition.
So how do I make my zip codes break into pieces, run parallely for the run function and save all the resultant zip codes and returns me that list? What am I missing?

How to process data in chunks in java using Multi Threading?

I am working on a task in which I need to process data in chunks. I have a properties file in which I define the chunk size, suppose 500 and the data that I am getting form the data base is suppose 1000 records. I want to process 1000 records in chunks 500 each using Multi Threading.
This is the first time I am implementing this so please let me know if I can achieve the same using another technique. The main purpose behind this is that I am generating an excel file in which I populate the data keeping in mind the chunk size. So probably first thread processes 500 records and second thread next 500.
Partial Code (Rest parses the xml and writes in Excel using POI)
public List<NYProgramTO> getNYPPAData() throws Exception {
this.getConfiguration();
List<NYProgramTO> to = dao.getLatestNYData();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document document = null;
// Returns chunkSize
List<NYProgramTO> myList = getNextChunk(to);
ExecutorService executor = Executors.newFixedThreadPool(myList.size());
myList.stream()
.forEach((NYProgramTO nyTo) ->
{
executor.execute(new NYExecutorThread(nyTo, migrationConfig , appContext, dao));
});
executor.shutdown();
executor.awaitTermination(300, TimeUnit.SECONDS);
System.gc();
dao.getLatestNYData(); method returns me the total number of records from the database and this is how I populate the list to.
I have the following method which gives me the next set of chunk, so suppose if 500 records had processed this method should give next 500 records to process (Hope this makes sense).
private static List<NYProgramTO> getNextChunk(List<NYProgramTO> list) {
currentIndex = 0; // This is static int class variable
List<NYProgramTO> nyList = new ArrayList<>();
if(list.size() == 0) {
return list;
}
int totalCount = list.size();
for(int i = currentIndex; i < (currentIndex + chunkSize); i++) {
if(i == totalCount) break;
nyList.add(list.get(i));
}
return nyList;
}
In my first method I create threads now here I am not sure to how many thread do I need to create. Currently I am passing the size of the list that I receive from getNextChunk(); method.
NYExecutorThread this class simply implements Runnable and I don't have any logic in it yet. Currently I simply pass parameters on the constructor to be able to get the configurations and create threads.
It is a little confusing and I want if anyone has implemented such a logic, please let me know how can I go ahead with this?
Thanks

Spark streaming mapWithState timeout delayed?

I expected the new mapWithState API for Spark 1.6+ to near-immediately remove objects that are timed-out, but there is a delay.
I'm testing the API with the adapted version of the JavaStatefulNetworkWordCount below:
SparkConf sparkConf = new SparkConf()
.setAppName("JavaStatefulNetworkWordCount")
.setMaster("local[*]");
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(1));
ssc.checkpoint("./tmp");
StateSpec<String, Integer, Integer, Tuple2<String, Integer>> mappingFunc =
StateSpec.function((word, one, state) -> {
if (state.isTimingOut())
{
System.out.println("Timing out the word: " + word);
return new Tuple2<String,Integer>(word, state.get());
}
else
{
int sum = one.or(0) + (state.exists() ? state.get() : 0);
Tuple2<String, Integer> output = new Tuple2<String, Integer>(word, sum);
state.update(sum);
return output;
}
});
JavaMapWithStateDStream<String, Integer, Integer, Tuple2<String, Integer>> stateDstream =
ssc.socketTextStream(args[0], Integer.parseInt(args[1]),
StorageLevels.MEMORY_AND_DISK_SER_2)
.flatMap(x -> Arrays.asList(SPACE.split(x)))
.mapToPair(w -> new Tuple2<String, Integer>(w, 1))
.mapWithState(mappingFunc.timeout(Durations.seconds(5)));
stateDstream.stateSnapshots().print();
Together with nc (nc -l -p <port>)
When I type a word into the nc window I see the tuple being printed in the console every second. But it doesn't seem like the timing out message gets printed out 5s later, as expected based on the timeout set. The time it takes for the tuple to expire seems to vary between 5 & 20s.
Am I missing some configuration option, or is the timeout perhaps only performed at the same time as checkpoints?
Once an event times out it's NOT deleted right away, but is only marked for deletion by saving it to a 'deltaMap':
override def remove(key: K): Unit = {
val stateInfo = deltaMap(key)
if (stateInfo != null) {
stateInfo.markDeleted()
} else {
val newInfo = new StateInfo[S](deleted = true)
deltaMap.update(key, newInfo)
}
}
Then, timed out events are collected and sent to the output stream only at checkpoint. That is: events which time out at batch t, will appear in the output stream only at the next checkpoint - by default, after 5 batch-intervals on average, i.e. batch t+5:
override def checkpoint(): Unit = {
super.checkpoint()
doFullScan = true
}
...
removeTimedoutData = doFullScan // remove timedout data only when full scan is enabled
...
// Get the timed out state records, call the mapping function on each and collect the
// data returned
if (removeTimedoutData && timeoutThresholdTime.isDefined) {
...
Elements are actually removed only when there are enough of them, and when state map is being serialized - which currently also happens only at checkpoint:
/** Whether the delta chain length is long enough that it should be compacted */
def shouldCompact: Boolean = {
deltaChainLength >= deltaChainThreshold
}
// Write the data in the parent state map while copying the data into a new parent map for
// compaction (if needed)
val doCompaction = shouldCompact
...
By default checkpointing occurs every 10 iterations, thus in the example above every 10 seconds; since your timeout is 5 seconds, events are expected within 5-15 seconds.
EDIT: Corrected and elaborated answer following comments by #YuvalItzchakov
Am I missing some configuration option, or is the timeout perhaps only
performed at the same time as snapshots?
Every time a mapWithState is invoked (with your configuration, around every 1 second), the MapWithStateRDD will internally check for expired records and time them out. You can see it in the code:
// Get the timed out state records, call the mapping function on each and collect the
// data returned
if (removeTimedoutData && timeoutThresholdTime.isDefined) {
newStateMap.getByTime(timeoutThresholdTime.get).foreach { case (key, state, _) =>
wrappedState.wrapTimingOutState(state)
val returned = mappingFunction(batchTime, key, None, wrappedState)
mappedData ++= returned
newStateMap.remove(key)
}
}
(Other than time taken to execute each job, it turns out that newStateMap.remove(key) actually only marks files for deletion. See "Edit" for more.)
You have to take into account the time it takes for each stage to be scheduled, and the amount of time it takes for each execution of such a stage to actually take it's turn and run. It isn't accurate because this runs as a distributed systems where other factors can come into play, making your timeout more/less accurate than you expect it to be.
Edit
As #etov rightly points out, newStateMap.remove(key) doesn't actually remove the element from the OpenHashMapBasedStateMap[K, S], but simply mark it for deletion. This is also a reason why you're seeing the expiration time adding up.
The actual relevant piece of code is here:
// Write the data in the parent state map while
// copying the data into a new parent map for compaction (if needed)
val doCompaction = shouldCompact
val newParentSessionStore = if (doCompaction) {
val initCapacity = if (approxSize > 0) approxSize else 64
new OpenHashMapBasedStateMap[K, S](initialCapacity = initCapacity, deltaChainThreshold)
} else { null }
val iterOfActiveSessions = parentStateMap.getAll()
var parentSessionCount = 0
// First write the approximate size of the data to be written, so that readObject can
// allocate appropriately sized OpenHashMap.
outputStream.writeInt(approxSize)
while(iterOfActiveSessions.hasNext) {
parentSessionCount += 1
val (key, state, updateTime) = iterOfActiveSessions.next()
outputStream.writeObject(key)
outputStream.writeObject(state)
outputStream.writeLong(updateTime)
if (doCompaction) {
newParentSessionStore.deltaMap.update(
key, StateInfo(state, updateTime, deleted = false))
}
}
// Write the final limit marking object with the correct count of records written.
val limiterObj = new LimitMarker(parentSessionCount)
outputStream.writeObject(limiterObj)
if (doCompaction) {
parentStateMap = newParentSessionStore
}
If deltaMap should be compacted (marked with the doCompaction variable), then (and only then) is the map cleared from all the deleted instances. How often does that happen? One the delta exceeds the threadshold:
val DELTA_CHAIN_LENGTH_THRESHOLD = 20
Which means the delta chain is longer than 20 items, and there are items that have been marked for deletion.

Find messages from certain key till certain key while being able to remove stale keys

My problem
Let's say I want to hold my messages in some sort of datastructure for longpolling application:
1. "dude"
2. "where"
3. "is"
4. "my"
5. "car"
Asking for messages from index[4,5] should return:
"my","car".
Next let's assume that after a while I would like to purge old messages because they aren't useful anymore and I want to save memory. Let's say after time x messages[1-3] became stale. I assume that it would be most efficient to just do the deletion once every x seconds. Next my datastructure should contain:
4. "my"
5. "car"
My solution?
I was thinking of using a concurrentskiplistset or concurrentskiplist map. Also I was thinking of deleting the old messages from inside a newSingleThreadScheduledExecutor. I would like to know how you would implement(efficiently/thread-safe) this or maybe use a library?
The big concern, as I gather it, is how to let certain elements expire after a period. I had a similar requirement and I created a message class that implemented the Delayed Interface. This class held everything I needed for a message and (through the Delayed interface) told me when it has expired.
I used instances of this object within a concurrent collection, you could use a ConcurrentMap because it will allow you to key those objects with an integer key.
I reaped the collection once every so often, removing items whose delay has passed. We test for expiration by using the getDelay method of the Delayed interface:
message.getDelay(TimeUnit.MILLISECONDS);
I used a normal thread that would sleep for a period then reap the expired items. In my requirements it wasn't important that the items be removed as soon as their delay had expired. It seems that you have a similar flexibility.
If you needed to remove items as soon as their delay expired, then instead of sleeping a set period in your reaping thread, you would sleep for the delay of the message that will expire first.
Here's my delayed message class:
class DelayedMessage implements Delayed {
long endOfDelay;
Date requestTime;
String message;
public DelayedMessage(String m, int delay) {
requestTime = new Date();
endOfDelay = System.currentTimeMillis()
+ delay;
this.message = m;
}
public long getDelay(TimeUnit unit) {
long delay = unit.convert(
endOfDelay - System.currentTimeMillis(),
TimeUnit.MILLISECONDS);
return delay;
}
public int compareTo(Delayed o) {
DelayedMessage that = (DelayedMessage) o;
if (this.endOfDelay < that.endOfDelay) {
return -1;
}
if (this.endOfDelay > that.endOfDelay) {
return 1;
}
return this.requestTime.compareTo(that.requestTime);
}
#Override
public String toString() {
return message;
}
}
I'm not sure if this is what you want, but it looks like you need a NavigableMap<K,V> to me.
import java.util.*;
public class NaviMap {
public static void main(String[] args) {
NavigableMap<Integer,String> nmap = new TreeMap<Integer,String>();
nmap.put(1, "dude");
nmap.put(2, "where");
nmap.put(3, "is");
nmap.put(4, "my");
nmap.put(5, "car");
System.out.println(nmap);
// prints "{1=dude, 2=where, 3=is, 4=my, 5=car}"
System.out.println(nmap.subMap(4, true, 5, true).values());
// prints "[my, car]" ^inclusive^
nmap.subMap(1, true, 3, true).clear();
System.out.println(nmap);
// prints "{4=my, 5=car}"
// wrap into synchronized SortedMap
SortedMap<Integer,String> ssmap =Collections.synchronizedSortedMap(nmap);
System.out.println(ssmap.subMap(4, 5));
// prints "{4=my}" ^exclusive upper bound!
System.out.println(ssmap.subMap(4, 5+1));
// prints "{4=my, 5=car}" ^ugly but "works"
}
}
Now, unfortunately there's no easy way to get a synchronized version of a NavigableMap<K,V>, but a SortedMap does have a subMap, but only one overload where the upper bound is strictly exclusive.
API links
SortedMap.subMap
NavigableMap.subMap
Collections.synchronizedSortedMap

Categories

Resources