Apache Flink Streaming window WordCount - java

I have following code to count words from socketTextStream. Both cumulate word counts and time windowed word counts are needed. The program has an issue that cumulateCounts is always the same as windowed counts. Why this issue occurs? What is the correct way to calculate cumulate counts base on windowed counts?
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
final HashMap<String, Integer> cumulateCounts = new HashMap<String, Integer>();
final DataStream<Tuple2<String, Integer>> counts = env
.socketTextStream("localhost", 9999)
.flatMap(new Splitter())
.window(Time.of(5, TimeUnit.SECONDS))
counts.addSink(new SinkFunction<Tuple2<String, Integer>>() {
public void invoke(Tuple2<String, Integer> value) throws Exception {
String word = value.f0;
Integer delta_count = value.f1;
Integer count = cumulateCounts.get(word);
if (count == null)
count = 0;
count = count + delta_count;
cumulateCounts.put(word, count);
System.out.println("(" + word + "," + count.toString() + ")");

You should first group-by, and apply the window on the keyed data stream (your code works on Flink 0.9.1 but the new API in Flink 0.10.0 is strict about this):
final DataStream<Tuple2<String, Integer>> counts = env
.socketTextStream("localhost", 9999)
.flatMap(new Splitter())
.window(Time.of(5, TimeUnit.SECONDS)).sum(1)
If you apply a window on a non-keyed data stream, there will be only a single threaded window operator on a single machine (ie, no parallelism) to build the window on the whole stream (in Flink 0.9.1, this global window can be split into sub-windows by groupBy() -- however, in Flink 0.10.0 this will not work any more). To counts words, you want to build a window for each distinct key value, ie, you first get a sub-stream per key value (via groupBy()) and apply a window operator on each sub stream (thus, you could have an own window operator instance for each sub-stream, allowing for parallel execution).
For a global (cumulated) count, you can simple apply a groupBy().sum() construct. First, the stream is split into sub-stream (one for each key value). Second, you compute the sum over the stream. Because the stream is not windowed, the sum in computed (cumulative) and updated for each incoming tuple (in more details, the sum has an initial result value of zero and the result is updated for each tuple as result += tuple.value). After each invocation of sum, the new current result is emitted.
In your code, you should not use your special sink function but do as follows:


Apache Flink: The execution environment and multiple sink

My question might cause some confusion so please see Description first. It might be helpful to identify my problem. I will add my Code later at the end of the question (Any suggestions regarding my code structure/implementation is also welcomed).
Thank you for any help in advance!
My question:
How to define multiple sinks in Flink Batch processing without having it get data from one source repeatedly?
What is the difference between createCollectionEnvironment() and getExecutionEnvironment() ? Which one should I use in local environment?
What is the use of env.execute()? My code will output the result without this sentence. if I add this sentence it will pop an Exception:
Exception in thread "main" java.lang.RuntimeException: No new data sinks have been defined since the last execution. The last execution refers to the latest call to 'execute()', 'count()', 'collect()', or 'print()'.
at org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:940)
at org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:922)
at org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:34)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:816)
at MainClass.main(MainClass.java:114)
New to programming. Recently I need to process some data (grouping data, calculating standard deviation, etc.) using Flink Batch processing.
However I came to a point where I need to output two DataSet.
The structure was something like this
From Source(Database) -> DataSet 1 (add index using zipWithIndex())-> DataSet 2 (do some calculation while keeping index) -> DataSet 3
First I output DataSet 2, the index is e.g. from 1 to 10000;
And then I output DataSet 3 the index becomes from 10001 to 20000 although I did not change the value in any function.
My guessing is when outputting DataSet 3 instead of using the result of
previously calculated DataSet 2 it started from getting data from database again and then perform the calculation.
With the use of ZipWithIndex() function it does not only give the wrong index number but also increase the connection to db.
I guess that this is relevant to the execution environment, as when I use
ExecutionEnvironment env = ExecutionEnvironment.createCollectionsEnvironment();
will give the "wrong" index number (10001-20000)
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
will give the correct index number (1-10000)
The time taken and number of database connections is different and the order of print will be reversed.
OS, DB, other environment details and versions:
IntelliJ IDEA 2017.3.5 (Community Edition)
Build #IC-173.4674.33, built on March 6, 2018
JRE: 1.8.0_152-release-1024-b15 amd64
JVM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Windows 10 10.0
My Test code(Java):
public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment.createCollectionsEnvironment();
//Table is used to calculate the standard deviation as I figured that there is no such calculation in DataSet.
BatchTableEnvironment tableEnvironment = TableEnvironment.getTableEnvironment(env);
//Get Data from a mySql database
DataSet<Row> dbData =
.setQuery("select value from $table_name where id =33")
.setRowTypeInfo(new RowTypeInfo(BasicTypeInfo.DOUBLE_TYPE_INFO))
// Add index for assigning group (group capacity is 5)
DataSet<Tuple2<Long, Row>> indexedData = DataSetUtils.zipWithIndex(dbData);
// Replace index(long) with group number(int), and convert Row to double at the same time
DataSet<Tuple2<Integer, Double>> rawData = indexedData.flatMap(new GroupAssigner());
//Using groupBy() to combine individual data of each group into a list, while calculating the mean and range in each group
//put them into a POJO named GroupDataClass
DataSet<GroupDataClass> groupDS = rawData.groupBy("f0").combineGroup(new GroupCombineFunction<Tuple2<Integer, Double>, GroupDataClass>() {
public void combine(Iterable<Tuple2<Integer, Double>> iterable, Collector<GroupDataClass> collector) {
Iterator<Tuple2<Integer, Double>> it = iterable.iterator();
Tuple2<Integer, Double> var1 = it.next();
int groupNum = var1.f0;
// Using max and min to calculate range, using i and sum to calculate mean
double max = var1.f1;
double min = max;
double sum = 0;
int i = 1;
// The list is to store individual value
List<Double> list = new ArrayList<>();
while (it.hasNext())
double next = it.next().f1;
sum += next;
max = next > max ? next : max;
min = next < min ? next : min;
//Store group number, mean, range, and 5 individual values within the group
collector.collect(new GroupDataClass(groupNum, sum / i, max - min, list));
//print because if no sink is created, Flink will not even perform the calculation.
// Get the max group number and range in each group to calculate average range
// if group number start with 1 then the maximum of group number equals to the number of group
// However, because this is the second sink, data will flow from source again, which will double the group number
DataSet<Tuple2<Integer, Double>> rangeDS = groupDS.map(new MapFunction<GroupDataClass, Tuple2<Integer, Double>>() {
public Tuple2<Integer, Double> map(GroupDataClass in) {
return new Tuple2<>(in.groupNum, in.range);
// collect and print as if no sink is created, Flink will not even perform the calculation.
Tuple2<Integer, Double> rangeTuple = rangeDS.collect().get(0);
double range = rangeTuple.f1/ rangeTuple.f0;
System.out.println("range = " + range);
public static class GroupAssigner implements FlatMapFunction<Tuple2<Long, Row>, Tuple2<Integer, Double>> {
public void flatMap(Tuple2<Long, Row> input, Collector<Tuple2<Integer, Double>> out) {
// index 1-5 will be assigned to group 1, index 6-10 will be assigned to group 2, etc.
int n = new Long(input.f0 / 5).intValue() + 1;
out.collect(new Tuple2<>(n, (Double) input.f1.getField(0)));
It's fine to connect a source to multiple sink, the source gets executed only once and records get broadcasted to the multiple sinks. See this question Can Flink write results into multiple files (like Hadoop's MultipleOutputFormat)?
getExecutionEnvironment is the right way to get the environment when you want to run your job. createCollectionEnvironment is a good way to play around and test. See the documentation
The exception error message is very clear: if you call print or collect your data flow gets executed. So you have two choices:
Either you call print/collect at the end of your data flow and it gets executed and printed. That's good for testing stuff. Bear in mind you can only call collect/print once per data flow, otherwise it gets executed many time while it's not completely defined
Either you add a sink at the end of your data flow and call env.execute(). That's what you want to do once your flow is in a more mature shape.

Compute metrics on different window in Apache FLink

I am using Apache Flink 1.2 and here's my question:
I have a stream of data and I would like to compute a metric over a window of 1 day. Therefore I will write something like:
DataStream<Tuple6<Timestamp, String, Double, Double, Double, Integer>> myStream0 =
.map(new MyMapper()) // Parse the input
.assignTimestampsAndWatermarks(new MyExtractor()) //Assign the timestamp of the event
.apply(new average()); // compute average, max, sum
Now I would like to compute the same metrics over a window of 1 hour.
I can write same as before and specify Time.hours(1), but my concerns is that in this way apache flink reads two times the input file and does twice the work. I wonder if there is a way of doing all togheter (i.e. using the same stream).
You can compute hourly aggregates and from those the daily aggregates. This would look for a simple DataStream<Double> as follows:
DataStream<Double> vals = ... // source + timestamp extractor
DataStream<Tuple2<Double, Long>> valCnt = vals // (sum, cnt)
.map(new CntAppender()) // Double -> Tuple2<Double, Long(1)>
DataStream<Tuple3<Double, Long, Long>> hourlySumCnt = valCnt // (sum, cnt, endTime)
// SumCounter ReduceFunction sums the Double and Long field (Long is Count)
// WindowEndAppender WindowFunction adds the window end timestamp (3rd field)
.reduce(new SumCounter(), new WindowEndAppender())
DataStream<Tuple2<Double, Long>> hourlyAvg = hourlySumCnt // (avg, endTime)
.map(new SumDivCnt()) // MapFunction divides Sum by Cnt for average
DataStream<Tuple3<Double, Long, Long>> dailySumCnt = hourlySumCnt // (sum, cnt, endTime)
.map(new StripeOffTime()) // removes unnecessary time field -> Tuple2<Double, Long>
.reduce(new SumCounter(), new WindowEndAppender()) // same as above
DataStream<Tuple2<Double, Long>> dailyAvg = dailySumCnt // (avg, endTime)
.map(new SumDivCnt()) // same as above
So, you basically compute sum and count for each hour, and based on that result you
compute the hourly average
compute daily sum and count and the daily average
Note, that I am using a ReduceFunction instead of a WindowFunction for the sum and count computation, because a ReduceFunction is eagerly applied, i.e., all records of the window are not collected but immediately aggregated. Hence the state that needs to be maintained is a single record.

Java wordcount: a mediocre implementation

I implemented a wordcount program with Java. Basically, the program takes a large file (in my tests, I used a 10 gb data file that contained numbers only), and counts the number of times each 'word' appears - in this case, a number (23723 for example might appear 243 times in the file).
Below is my implementation. I seek to improve it, with mainly performance in mind, but a few other things as well, and I am looking for some guidance. Here are a few of the issues I wish to correct:
Currently, the program is threaded and works properly. However, what I do is pass a chunk of memory (500MB/NUM_THREADS) to each thread, and each thread proceeds to wordcount. The problem here is that I have the main thread wait for ALL the threads to complete before passing more data to each thread. It isn't too much of a problem, but there is a period of time where a few threads will wait and do nothing for a while. I believe some sort of worker pool or executor service could solve this problem (I have not learned the syntax for this yet).
The program will only work for a file that contains integers. That's a problem. I struggled with this a lot, as I didn't know how to iterate through the data without creating loads of unused variables (using a String or even StringBuilder had awful performance). Currently, I use the fact that I know the input is an integer, and just store the temporary variables as an int, so no memory problems there. I want to be able to use some sort of delimiter, whether that delimiter be a space, or several characters.
I am using a global ConcurrentHashMap to story key value pairs. For example, if a thread finds a number "24624", it searches for that number in the map. If it exists, it will increase the value of that key by one. The value of the keys at the end represent the number of occurrences of that key. So is this the proper design? Would I gain in performance by giving each thread it's own hashmap, and then merging them all at the end?
Is there any other way of seeking through a file with an offset without using the class RandomAccessMemory? This class will only read into a byte array, which I then have to convert. I haven't timed this conversion, but maybe it could be faster to use something else.
I am open to other possibilities as well, this is just what comes to mind.
Note: Splitting the file is not an option I want to explore, as I might be deploying this on a server in which I should not be creating my own files, but if it would really be a performance boost, I might listen.
Other Note: I am new to java threading, as well as new to StackOverflow. Be gentle.
public class BigCount2 {
public static void main(String[] args) throws IOException, InterruptedException {
int num, counter;
long i, j;
String delimiterString = " ";
ArrayList<Character> delim = new ArrayList<Character>();
for (char c : delimiterString.toCharArray()) {
int counter2 = 0;
num = Integer.parseInt(args[0]);
int bytesToRead = 1024 * 1024 * 1024 / 2; //500 MB, size of loop
int remainder = bytesToRead % num;
int k = 0;
bytesToRead = bytesToRead - remainder;
int byr = bytesToRead / num;
String filepath = "C:/Users/Daniel/Desktop/int-dataset-10g.dat";
RandomAccessFile file = new RandomAccessFile(filepath, "r");
Thread[] t = new Thread [num];//array of threads
ConcurrentMap<Integer, Integer> wordCountMap = new ConcurrentHashMap<Integer, Integer>(25000);
byte [] byteArray = new byte [byr]; //allocates 500mb to a 2D byte array
char[] newbyte;
for (i = 0; i < file.length(); i += bytesToRead) {
counter = 0;
for (j = 0; j < bytesToRead; j += byr) {
file.seek(i + j);
file.read(byteArray, 0, byr);
newbyte = new String(byteArray).toCharArray();
t[counter] = new Thread(
new BigCountThread2(counter,
wordCountMap));//giving each thread t[i] different file fileReader[i]
newbyte = null;
for (k = 0; k < num; k++){
t[k].join(); //main thread continues after ALL threads have finished.
class BigCountThread2 implements Runnable {
private final ConcurrentMap<Integer, Integer> wordCountMap;
char [] newbyte;
private ArrayList<Character> delim;
private int threadId; //use for later
BigCountThread2(int tid,
char[] newbyte,
ArrayList<Character> delim,
ConcurrentMap<Integer, Integer> wordCountMap) {
this.delim = delim;
threadId = tid;
this.wordCountMap = wordCountMap;
this.newbyte = newbyte;
public void run() {
int intCheck = 0;
int counter = 0; int i = 0; Integer check; int j =0; int temp = 0; int intbuilder = 0;
for (i = 0; i < newbyte.length; i++) {
intCheck = Character.getNumericValue(newbyte[i]);
if (newbyte[i] == ' ' || intCheck == -1) { //once a delimiter is found, the current tempArray needs to be added to the MAP
check = wordCountMap.putIfAbsent(intbuilder, 1);
if (check != null) { //if returns null, then it is the first instance
wordCountMap.put(intbuilder, wordCountMap.get(intbuilder) + 1);
intbuilder = 0;
else {
intbuilder = (intbuilder * 10) + intCheck;
Some thoughts on a little of most ..
.. I believe some sort of worker pool or executor service could solve this problem (I have not learned the syntax for this yet).
If all the threads take about the same time to process the same amount of data, then there really isn't that much of a "problem" here.
However, one nice thing about a Thread Pool is it allows one to rather trivially adjust some basic parameters such as number of concurrent workers. Furthermore, using an executor service and Futures can provide an additional level of abstraction; in this case it could be especially handy if each thread returned a map as the result.
The program will only work for a file that contains integers. That's a problem. I struggled with this a lot, as I didn't know how to iterate through the data without creating loads of unused variables (using a String or even StringBuilder had awful performance) ..
This sounds like an implementation issue. While I would first try a StreamTokenizer (because it's already written), if doing it manually, I would check out the source - a good bit of that can be omitted when simplifying the notion of a "token". (It uses a temporary array to build the token.)
I am using a global ConcurrentHashMap to story key value pairs. .. So is this the proper design? Would I gain in performance by giving each thread it's own hashmap, and then merging them all at the end?
It would reduce locking and may increase performance to use a separate map per thread and merge strategy. Furthermore, the current implementation is broken as wordCountMap.put(intbuilder, wordCountMap.get(intbuilder) + 1) is not atomic and thus the operation might under count. I would use a separate map simply because reducing mutable shared state makes a threaded program much easier to reason about.
Is there any other way of seeking through a file with an offset without using the class RandomAccessMemory? This class will only read into a byte array, which I then have to convert. I haven't timed this conversion, but maybe it could be faster to use something else.
Consider using a FileReader (and BufferedReader) per thread on the same file. This will avoid having to first copy the file into the array and slice it out for individual threads which, while the same amount of total reading, avoids having to soak up so much memory. The reading done is actually not random access, but merely sequential (with a "skip") starting from different offsets - each thread still works on a mutually exclusive range.
Also, the original code with the slicing is broken if an integer value was "cut" in half as each of the threads would read half the word. One work-about is have each thread skip the first word if it was a continuation from the previous block (i.e. scan one byte sooner) and then read-past the end of it's range as required to complete the last word.

why it is so slow with 100,000 records when using pipeline in redis?

It is said that pipeline is a better way when many set/get is required in redis, so this is my test code:
public class TestPipeline {
* #param args
public static void main(String[] args) {
// TODO Auto-generated method stub
JedisShardInfo si = new JedisShardInfo("", 6379);
List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
ShardedJedis jedis = new ShardedJedis(list);
long startTime = System.currentTimeMillis();
ShardedJedisPipeline pipeline = jedis.pipelined();
for (int i = 0; i < 100000; i++) {
Map<String, String> map = new HashMap<String, String>();
map.put("id", "" + i);
map.put("name", "lyj" + i);
pipeline.hmset("m" + i, map);
long endTime = System.currentTimeMillis();
System.out.println(endTime - startTime);
When I ran it, there is no response with this program for a while, but when I don't work with pipe, it takes only 20073 ms, so I am confused why it is even better without pipeline and how a wide gap!
Thanks for answer me, a few questions, how do you calculate 6MB data?
When I send 10K data, pipeline is always faster than normal mode, but with 100k, pipeline would no response.I think 100-1000 operations is a advisable choice as below said.Is there anyting with JIT since I don't understand it?
There are a few points you need to consider before writing such a benchmark (and especially a benchmark using the JVM):
on most (physical) machines, Redis is able to process more than 100K ops/s when pipelining is used. Your benchmark only deals with 100K item, so it does not last long enough to produce meaningful results. Furthermore, there is no time for the successive stages of the JIT to kick in.
the absolute time is not a very relevant metric. Displaying the throughput (i.e. the number of operation per second) while keeping the benchmark running for at least 10 seconds would be a better and more stable metric.
your inner loop generates a lot of garbage. If you plan to benchmark Jedis+Redis, then you need to keep the overhead of your own program low.
because you have defined everything into the main function, your loop will not be compiled by the JIT (depending on the JVM you use). Only the inner method calls may be. If you want the JIT to be efficient, make sure to encapsulate your code into methods that can be compiled by the JIT.
optionally, you may want to add a warm-up phase before performing the actual measurement to avoid accounting the overhead of running the first iterations with the bare-bone interpreter, and the cost of the JIT itself.
Now, regarding Redis pipelining, your pipeline is way too long. 100K commands in the pipeline means Jedis has to build a 6MB buffer before sending anything to Redis. It means the socket buffers (on client side, and perhaps server-side) will be saturated, and that Redis will have to deal with 6 MB communication buffers as well.
Furthermore, your benchmark is still synchronous (using a pipeline does not magically make it asynchronous). In other words, Jedis will not start reading replies until the last query of your pipeline has been sent to Redis. When the pipeline is too long, it has the potential to block things.
Consider limiting the size of the pipeline to 100-1000 operations. Of course, it will generate more roundtrips, but the pressure on the communication stack will be reduced to an acceptable level. For instance, consider the following program:
import redis.clients.jedis.*;
import java.util.*;
public class TestPipeline {
* #param args
int i = 0;
Map<String, String> map = new HashMap<String, String>();
ShardedJedis jedis;
// Number of iterations
// Use 1000 to test with the pipeline, 100 otherwise
static final int N = 1000;
public TestPipeline() {
JedisShardInfo si = new JedisShardInfo("", 6379);
List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
jedis = new ShardedJedis(list);
public void push( int n ) {
ShardedJedisPipeline pipeline = jedis.pipelined();
for ( int k = 0; k < n; k++) {
map.put("id", "" + i);
map.put("name", "lyj" + i);
pipeline.hmset("m" + i, map);
public void push2( int n ) {
for ( int k = 0; k < n; k++) {
map.put("id", "" + i);
map.put("name", "lyj" + i);
jedis.hmset("m" + i, map);
public static void main(String[] args) {
TestPipeline obj = new TestPipeline();
long startTime = System.currentTimeMillis();
for ( int j=0; j<N; j++ ) {
// Use push2 instead to test without pipeline
// Uncomment to see the acceleration
long endTime = System.currentTimeMillis();
double d = 1000.0 * obj.i;
d /= (double)(endTime - startTime);
System.out.println("Throughput: "+d);
With this program, you can test with or without pipelining. Be sure to increase the number of iterations (N parameter) when pipelining is used, so that it runs for at least 10 seconds. If you uncomment the println in the loop, you will realize that the program is slow at the begining and will get quicker as the JIT starts to optimize things (that's why the program should run at least several seconds to give a meaningful result).
On my hardware (an old Athlon box), I can get 8-9 times more throughput when the pipeline is used. The program could be further improved by optimizing key/value formatting in the inner loop and adding a warm-up phase.

Reading value from field variable

I am developing desktop app in Java 7. I have here a situation. At the method below
private synchronized void decryptMessage
(CopyOnWriteArrayList<Integer> possibleKeys, ArrayList<Integer> cipherDigits)
// apply opposite shift algorithm:
ArrayList<Integer> textDigits = shiftCipher(possibleKeys, cipherDigits);
// count CHI squared statistics:
double chi = countCHIstatistics(textDigits);
if(chi < edgeCHI) // if the value of IOC is greater or equal than that
System.err.println(chi + " " + possibleKeys + " +");
key = possibleKeys; // store most suitable key
edgeCHI = chi;
I count the value called 'chi' and based on that if 'chi' is less than 'edgeCHI' value I save the key at instance variable. That method is invoked by some threads, so I enforce synchronization.
When all the threads complete the program continues to execute by passing control to a method which controls the sequence of operations. Then this line has been executed at that method:
System.err.println(edgeCHI+" "+key+" -");
It prints correct value of 'chi', as has been printed the last value of 'chi' at decryptMessage method, but the value of key is different. The 'decryptMessage' method has been invoked by threads which generate key values.
I store the key value as global variable
private volatile CopyOnWriteArrayList<Integer> key = null; // stores the most suitable key for decryption.
Why do I have two different key values? The values itself are not important. The matter is that the value of key printed at the last call at 'decryptMessage' method (when chi < edgeCHI) must match the one printed at the method which controls the flow of operations.
This is how you create threads:
for(int y = 0; y < mostOccuringL.length; y++){// iterate through the five most frequent letters
for(int i = (y + 1); i < mostOccuringL.length; i++ ){//perform letter combinations
int [] combinations = new int[2];
combinations[0] = y;
combinations [1] = i;
new KeyMembers(""+y+":"+i ,combinations, keywords, intKeyIndex, cipherDigits).t.join();
Within run method you invoke decryptMesssage method in order to identify most feasible decryption key.
I have been trying to figure out what is the prob for two days, but I don't get it.
Relying on syserr (or sysout) printing to determine an order of execution is dangerous - especially in multi-threaded environments. There is absolutely no guarantuee when the printing actually occurs or if the printed messages are in order. Maybe what you see as "last" printed message of one of the threads wasn't the "last" thread modifying the key field. You cannot say that by looking only at sterr output.
What you could do is use a synchronized setter for the key field, that increases an associated access counter whenever the field is modified and print the new value along with the modification count. This way you can avoid the problems of syserr printing and reliably determine what the last set value was. e.g. :
private long keyModCount = 0;
private synchronized long update(CopyOnWriteArrayList<Integer> possibilities, double dgeChi) {
this.keys = possibilites;
this.edgeChi = edgeChi; // how is edgeChi declared? Also volatile?
return this.keyModCount;
And inside decryptMessage:
if(chi < edgeCHI) // if the value of IOC is greater or equal than that
long sequence = update(possibleKeys, chi);
System.err.println("["+ sequence +"]"+ chi + " " + possibleKeys + " +");
To provide an answer we would need to see more of the (simplified if necessary) code that controls the thread execution.
Solution has been found. I just changed CopyOnWriteArrayList data type into ArrayList at the point where field variable gets correct key. It works as expected now.

