Hive velocity stream - Reactive Java or Kafka Streams

Hive velocity stream - Reactive Java or Kafka Streams - java

I have a high velocity endpoint with a certain payload.
I have 800 listener threads active on a queue.
I then need to generate events from that payload. There can be X amount of events from each payload.
I then need to aggregate these events into buckets of Y that will be sent to another service via http.
The HTTP calls take time so i need them to be async and in parallel (additional Z amount of threads to make these calls.
The flow needs to be highly efficient.
What I have right now is this.
The http listeners call a service which generates the events and puts them into a ConcurrentLinkedQueue
then call a ResettableCountDownLatch.countDown (set to wait for Y).
I have a thread pool where each executor awaits for the ResettableCountDownLatch then checks if the queue has more than Y, if yes then polls Y events from the queue and does the call.
I would love to hear if there is more efficient way to work here, any open source projects that fit here (I heard of Reactive (on spring boot 2), or Kafka Streams)
doneSignal = new ResettableCountDownLatch(y);
executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(senderThreads);
executor.submit(() -> {
while (true) {
try {
doneSignal.await();
doneSignal.reset();
if (concurrentQueue.size() > bulkSize) {
log.debug("concurrentQueue size is: {}, going to start new thread", concurrentQueue.size());
executor.submit(() -> {
long start = System.currentTimeMillis();
while (concurrentQueue.size() > bulkSize) {
List<ObjectNode> events = populateList(concurrentQueue, bulkSize);
log.debug("got: {} from concurrentQueue, it took: {}", events.size(), (System.currentTimeMillis() - start));
String eventWrapper = wrapEventsInBulk(events);
sendEventToThirdPary(eventWrapper);
}
});
}
} catch (Exception e) {
}
}

Related

Prevent Flux Publisher from going out of memory

I'm trying to use a Flux to stream events to subscribers using RSocket. There can be a huge backlog of events (in the database) and they must be send out in order without any gaps without either flooding the publisher (out of memory) or the consumer. None of the OverflowStrategy's seem suitable:
IGNORE: I'd like to block (or get a callback when there's more demand), not get an error
ERROR: I'd like to block (or get a callback when there's more demand), not get an error
DROP: bad, because events cannot be skipped (no gaps)
LATEST: bad, because events cannot be skipped (no gaps)
BUFFER: leads to out of memory on publisher
I have everything working, but if I don't limit my rate in the subscribers the publisher side goes out of memory -- that's bad, as a bad subscriber could kill my service. For some reason, I'm misunderstanding how back pressure works. Everywhere I look there is talk of limitRate. This works, but it only works for me on the subscriber side. Using a limitRate on the publisher side has no effect at all.
I've used Flux.generate and Flux.create to create the events I want on the publisher side, but they don't seem to respond to back pressure at all. So I must be missing something as the whole back pressure mechanism in Reactor is described as very transparent and easy to use...
Here's my publisher:
#MessageMapping("events")
public Flux<String> events(String data) {
Flux<String> flux = Flux.generate(new Consumer<SynchronousSink<String>>() {
long offset = 0;
#Override
public void accept(SynchronousSink<String> emitter) {
emitter.next("" + offset++);
}
});
return flux.limitRate(100); // limitRate doesn't do anything
}
And my consumer:
#Autowired RSocketRequester requester;
#EventListener(ApplicationReadyEvent.class)
public void run() throws InterruptedException {
requester.route("events")
.data("Just Go")
.retrieveFlux(String.class)
//.limitRate(1000) // commenting this line makes publisher go OOM
.bufferTimeout(20000, Duration.ofMillis(10))
.subscribe(new Consumer<List<String>>() {
long totalReceived = 0;
long totalBytes = 0;
#Override
public void accept(List<String> s) {
totalReceived += s.size();
totalBytes += s.stream().mapToInt(String::length).sum();
System.out.printf("So we received: %4d messages # %8.1f msg/sec (%d kB/sec)\n", s.size(), ((double)totalReceived / (System.currentTimeMillis() - time)) * 1000, totalBytes / (System.currentTimeMillis() - time));
try {
Thread.sleep(200); // Delay consumer so publisher has to slow down
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
});
Thread.sleep(100000); // leave Spring running for a bit (dirty)
}
What I don't understand why this wouldn't work. The generate uses a call back, but it keeps getting called as fast as possible leading to huge memory allocations in the JVM and it goes OOM. Why does it keep calling generate?
What am I missing?

How to run a task for a specific amount of time

I'm implementing some sort of chat application and I need some help. This is the simplified code:
//...
Boolean stop = false;
while(!stop) {
ServerRequest message = (ServerRequest) ois.readObject();
broadcastMessage((String)message.getData()); //this method sends the client's message to all the other clients on the server
stop = (System.nanoTime() - start >= handUpTime); // I want to let the client send his messages for no more than handUpTime seconds
} //...
I want to let a client to send his messages to the server for a certain amount of time (handUpTime) and then "block" him, but I don't know how to do this in an "elegant" manner. Of course, my code stumbles upon the ois.readObject() part, as the System waits to receive a message, and continues to run for more than handUpTime seconds. How can I solve this problem? I'm open to other approaches too.

You can try:
ExecutorService executorService = Executors.newSingleThreadExecutor();
Callable<Object> callable = () -> {
// Perform some blocking computation
return someObject
};
Future<Object> future = executorService.submit(callable);
Object result = future.get(YOUR_TIMEOUT, TimeUnit.SECONDS);
If the future.get() doesn't return in certain amount of time, it throws a TimeoutException so you should handle the exception. See this post.

More connections than thread handlers

Say I have two endpoints in my API (in Spring as previously):
#RequestMapping("/async")
public CompletableFuture<String> g(){
CompletableFuture<String> f = new CompletableFuture<>();
f.runAsync(() -> {
try {
Thread.sleep(5000);
f.complete("Finished");
} catch (InterruptedException e) {
e.printStackTrace();
}
});
Thread.sleep(1000);
return f;
}
#RequestMapping("/sync")
public String h() throws InterruptedException {
Thread.sleep(5000);
Thread.sleep(1000);
return "Finished";
}
When I send 2 get requests (just single get requests) to:
localhost:8080/async --> response in 5024ms
localhost:8080/sync --> response in '6055ms`
This makes sense because we are sending just a single request. Now things get interesting when I do a load test with Siege involving 255 concurrent users.
In this case, my async API endpoint isn't able to handle many connections.
So async is not as scalable.
Does this depend on my hardware? Say I had hardware able to handle more thread-handlers, then with heavy hardware, would the async one be able to handle more transactions since there are more threads?

You're still using the ForkJoinPool.commonPool() for your async invocations. I told you it's small and it will get filled up. Try this (I fixed your CompletableFuture code too, as it's totally wrong, it just doesn't show in your example).
CompletableFuture<Void> f = CompletableFuture.runAsync(() -> {
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}, Executors.newSingleThreadExecutor());
return f;
Now each async call gets its own executor, so it doesn't choke on the common pool. Of course since all async calls get their own executor, this is a bad example. You'd want to use a shared pool, but something larger than the common pool.
It has nothing (well, very little) to do with your hardware. It has everything to do with long running operations intermingled with short running operations.

Limiting rate of requests with Reactor

I'm using project reactor to load data from a web service using rest. This is done in parallel with multiple threads. I'm starting to hit rate limits on the web service, so I would like to send at most 10 requests per second to avoid getting these errors. How would I do that using reactor?
Using zipWith(Mono.delayMillis(100))? Or is there some better way?
Thank you

You can use delayElements instead of the whole zipwith.

One could use Flux.delayElements to process a 10 requests batch at every 1s; be aware though that if the processing takes longer than 1s the next batch will still be started in parallel hence being processed together with the previous one (and potentially many other previous ones)!
That's why I propose another solution where a 10 requests batch is still processed at every 1s but, if its processing takes longer than 1s, the next batch will fail (see overflow IllegalStateException); one could deal with that failure such that to continue the overall processing but I won't show that here because I want to keep the example simple; see onErrorResume useful to handle overflow IllegalStateException.
The code below will do a GET on https://www.google.com/ at a rate of 10 requests per second. You'll have to do additional changes in order to support the situation where your server is not able to process in 1s all your 10 requests; you could just skip sending requests when those asked at previous second are still processed by your server.
#Test
void parallelHttpRequests() {
// this is just for limiting the test running period otherwise you don't need it
int COUNT = 2;
// use whatever (blocking) http client you desire;
// when using e.g. WebClient (Spring, non blocking client)
// the example will slightly change for no longer use
// subscribeOn(Schedulers.elastic())
RestTemplate client = new RestTemplate();
// exit, lock, condition are provided to allow one to run
// all this code in a #Test, otherwise they won't be needed
var exit = new AtomicBoolean(false);
var lock = new ReentrantLock();
var condition = lock.newCondition();
MessageFormat message = new MessageFormat("#batch: {0}, #req: {1}, resultLength: {2}");
Flux.interval(Duration.ofSeconds(1L))
.take(COUNT) // this is just for limiting the test running period otherwise you don't need it
.doOnNext(batch -> debug("#batch", batch)) // just for debugging
.flatMap(batch -> Flux.range(1, 10) // 10 requests per 1 second
.flatMap(i -> Mono.fromSupplier(() ->
client.getForEntity("https://www.google.com/", String.class).getBody()) // your request goes here (1 of 10)
.map(s -> message.format(new Object[]{batch, i, s.length()})) // here the request's result will be the output of message.format(...)
.doOnSubscribe(s -> debug("doOnSubscribe: #batch = " + batch + ", i = " + i)) // just for debugging
.subscribeOn(Schedulers.elastic()) // one I/O thread per request
)
)
// consider using onErrorResume to handle overflow IllegalStateException
.subscribe(
s -> debug("received", s) // do something with the above request's result
e -> {
// pay special attention to overflow IllegalStateException
debug("error", e.getMessage());
signalAll(exit, condition, lock);
},
() -> {
debug("done");
signalAll(exit, condition, lock);
}
);
await(exit, condition, lock);
}
// you won't need the "await" and "signalAll" methods below which
// I created only to be easier for one to run this in a test class
private void await(AtomicBoolean exit, Condition condition, Lock lock) {
lock.lock();
while (!exit.get()) {
try {
condition.await();
} catch (InterruptedException e) {
// maybe spurious wakeup
e.printStackTrace();
}
}
lock.unlock();
debug("exit");
}
private void signalAll(AtomicBoolean exit, Condition condition, Lock lock) {
exit.set(true);
try {
lock.lock();
condition.signalAll();
} finally {
lock.unlock();
}
}

Wait and Notify in Java threads for a given interval

i am working on a usecase as below. I am new to multi threading and facing this issue with using it.
I broadcast a event on network.
Its received by all the listeners, and they unicast me with their information.
This is received inside the call back method as below, i will get N unknown number of callback threads. depending on listeners at that particular time.
I have to collect a list of all subscribers.
I have to wait at least 10sec for all the subscribers to reply to me.
//Sender
public void sendMulticastEvent() {
api.sendEvent();
/* after sending event wait for 15 sec so call back can collect all the subscribers */
//start waiting now
}
//Callback method
public void receiveEventsCallback(final Event event) {
//i will receive multiple response threads here..
//event object will have the topic and subscribers details, which i will collect here
list.add(event)
notify()
//notify thread here so i have a cumulative list of all received events.
}
I am only concerned on How to.. ?
Start a wait at the sendMulticast event for X seconds
Notify at receiveEventsCallback() after all the recieved events has been added to the list.
I have read theroitically on wait and notify, Countdownlatch and Barrier. But i am not sure which would be good, because of my poor experience in multithreading.

Start a wait at the sendMulticast event for X seconds
Just use version of wait() which takes timeout argument.
Note, that you should manually update timeout value after every successfull wait() call (that is, which return event).
Notify at receiveEventsCallback() after all the recieved events has been added to the list.
Your question insists that you don't know, how many listeners in your network. How can you know, that all of them have event recieved (and replied)?
The only way for sender is to wait X second and process all replies available till that moment.

If you know how many replies you will get - assuming each response will trigger the creation of a new thread - use a CyclicBarrier.
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CyclicBarrier.html
example:
CyclicBarrier barrier = new CyclicBarrier(3);
Runnable thread = new Runnable()
{
#Override
public void run()
{
try
{
barrier.await();
for (int i = 0; i < 10; i++)
{
System.out.printf("%d%n", i);
}
}
catch (InterruptedException | BrokenBarrierException ex)
{
ex.printStackTrace();
// handle the exception properly in real code.
}
}
};
Untill the third barrier.await() each thread will wait.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hive velocity stream - Reactive Java or Kafka Streams - java

Related

Prevent Flux Publisher from going out of memory

How to run a task for a specific amount of time

More connections than thread handlers

Limiting rate of requests with Reactor

Wait and Notify in Java threads for a given interval

Categories

Resources