Controlling emit values from Flux generate for sequential task execution - java

I'm writing a simple orchestration framework using reactor framework which executes tasks sequentially, and the next task to execute is dependent on the result from previous tasks. I might have multiple paths to choose from based on the outcome of previous tasks. Earlier, I wrote a similar framework based on a static DAG where I passed as list of tasks as iterables and used Flux.fromIterable(taskList). However, this does not give me the flexibility to go dynamic because of the static array publisher.
I'm looking for alternate approaches like do(){}while(condition) to solve for DAG traversal and task decision and I came up with Flux.generate(). I evaluate the next step in generate method and pass the next task downstream. The problem I'm facing now is, Flux.generate does not wait for downstream to complete, but pushes until the condition is set to invalid. And by the time task 1 gets executed, task 2 would have been pushed n times, which is not the expected behavior.
Can someone please point me towards the right direction?
Thanks.
First iteration using List of tasks (static DAG)
Flux.fromIterable(taskList)
.publishOn(this.factory.getSharedSchedulerPool())
.concatMap(
reactiveTask -> {
log.info("Running task =>{}", reactiveTask.getTaskName());
return reactiveTask
.run(ctx);
})
// Evaluates status from previous task and terminates stream or continues.
.takeWhile(context -> evaluateStatus(context))
.onErrorResume(throwable -> buildResponse(ctx, throwable))
.doOnCancel(() -> log.info("Task cancelled"))
.doOnComplete(() -> log.info("Completed flow"))
.subscribe();
Attempt to dynamic dag
Flux.generate(
(SynchronousSink<ReactiveTask<OrchestrationContext>> synchronousSink) -> {
ReactiveTask<OrchestrationContext> task = null;
if (ctx.getLastExecutedStep() == null) {
// first task;
task = getFirstTaskFromDAG();
} else {
task = deriveNextStep(ctx.getLastExecutedStep(), ctx.getDecisionData());
}
if (task.getName.equals("END")) {
synchronousSink.complete();
}
synchronousSink.next(task);
})
.publishOn(this.factory.getSharedSchedulerPool())
.doOnNext(orchestrationContextReactiveTask -> log.info("On next => {}",
orchestrationContextReactiveTask.getTaskName()))
.concatMap(
reactiveTask -> {
log.info("Running task =>{}", reactiveTask.getTaskName());
return reactiveTask
.run(ctx);
})
.onErrorResume(throwable -> buildResponse(ctx, throwable))
.takeUntil(context -> evaluateStatus(context, tasks))
.doOnCancel(() -> log.info("Task cancelled"))
.doOnComplete(() -> log.info("Completed flow")).subscribe();
The problem in above approach is, while task 1 is executing, the onNext() subscriber prints many time because generate is publishing. I want the generate method to wait on results from previous task and submit new task. In non-reactive world, this can be achieve through simple while() loop.
Each Task will perform the following action.
public class ResponseTask extends AbstractBaseTask {
private TaskDefinition taskDefinition;
final String taskName;
public ResponseTask(
StateManager stateManager,
ThreadFactory factory,
) {
this.taskDefinition = taskDefinition;
this.taskName = taskName;
}
public Mono<String> transform(OrchestrationContext context) {
Any masterPayload = Any.wrap(context.getIngestionPayload());
return Mono.fromCallable(() -> stateManager.doTransformation(context, masterPayload);
}
public Mono<OrchestrationContext> execute(OrchestrationContext context, String payload) {
log.info("Executing sleep for task=>{}", context.getLastExecutedStep());
return Mono.delay(Duration.ofSeconds(1), factory.getSharedSchedulerPool())
.then(Mono.just(context));
}
public Mono<OrchestrationContext> run(OrchestrationContext context) {
log.info("Executing task:{}. Last executed:{}", taskName, context.getLastExecutedStep());
return transform(context)
.doOnNext((result) -> log.info("Transformation complete for task=?{}", taskName);)
.flatMap(payload -> {
return execute(context, payload);
}).onErrorResume(throwable -> {
context.setStatus(FAILED);
return Mono.just(context);
});
}
}
EDIT - From #Ikatiforis 's recommendation - I got the following output
Here's the output from my side.
2021-12-02 09:58:14,643 INFO (reactive_shared_pool) [ReactiveEngine lambda$doOrchestration$5:98] On next => Task1
2021-12-02 09:58:14,644 INFO (reactive_shared_pool) [ReactiveEngine lambda$doOrchestration$6:101] Running task =>Task1
2021-12-02 09:58:14,644 INFO (reactive_shared_pool) [AbstractBaseTask run:75] Executing task:Task1. Last executed:Task1
2021-12-02 09:58:14,658 INFO (reactive_shared_pool) [ReactiveEngine lambda$doOrchestration$5:98] On next => Task2
2021-12-02 09:58:14,659 INFO (reactive_shared_pool) [AbstractBaseTask lambda$run$0:83] Transformation complete for task=?Task1
2021-12-02 09:58:14,659 INFO (reactive_shared_pool) [ResponseTask execute:41] Executing sleep for task=>Task1
2021-12-02 09:58:15,661 INFO (reactive_shared_pool) [AbstractBaseTask lambda$run$4:106] Success for task=>Task1
2021-12-02 09:58:15,663 INFO (reactive_shared_pool)
[ReactiveEngine lambda$doOrchestration$6:101] Running task =>Task2
2021-12-02 09:58:15,811 INFO (cassandra-nio-worker-8) [AbstractBaseTask run:75] Executing task:Task2. Last executed:Task2
2021-12-02 09:58:15,811 INFO (reactive_shared_pool) [ReactiveEngine lambda$doOrchestration$5:98] On next => Task2
2021-12-02 09:58:15,812 INFO (reactive_shared_pool) [AbstractBaseTask lambda$run$0:83] Transformation complete for task=?Task2
2021-12-02 09:58:15,812 INFO (reactive_shared_pool) [ResponseTask execute:41] Executing sleep for task=>Task2
2021-12-02 09:58:15,837 INFO (centaurus_reactive_shared_pool) [ReactiveEngine lambda$doOrchestration$9:113] Completed flow
I see couple of problems here --
The sequence of execution is
1. Task does transformations ( runs on Mono.fromCallable)
2. Task induces a delay - Mono.fromDelay()
3. Task completes execution. After this, generate method should evaluate the context and pass on the next task to be executed.
What I see from the output is:
1. Task 1 starts the transformations - Runs on Mono.fromCallable.
2. Task 2 doOnNext is reported - which means the stream already got this task.
3. Task 1 completes.
4. Task 2 starts and executes delay -> the stream does not wait for response from task 2 but completes the flow.

The problem in above approach is, while task 1 is executing, the
onNext() subscriber prints many time because generate is publishing.
This is happening because concatMap requests a number of items upfront(32 by default) instead of requesting elements one by one. If you really need to request one element at the time you can use concatMap(Function<? super T,? extends Publisher<? extends V>> mapper,int prefetch) variant method and provide the prefetch value like this:
.concatMap(reactiveTask -> {
log.info("Running task =>{}", reactiveTask.getTaskName());
return reactiveTask.run(ctx);
}, 1)
Edit
There is also a publishOn method which takes a prefetch value. Take a look at the following Fibonacci generator sample and let me know if it works as you expect:
generateFibonacci(100)
.publishOn(boundedElasticScheduler, 1)
.doOnNext(number -> log.info("On next => {}", number))
.concatMap(number -> {
log.info("Running task => {}", number);
return task(number).doOnNext(num -> log.info("Task completed => {}", num));
}, 1)
.takeWhile(context -> context < 3)
.subscribe();
public Flux<Integer> generateFibonacci(int limit) {
return Flux.generate(
() -> new FibonacciState(0, 1),
(state, sink) -> {
log.info("Generating number: " + state);
sink.next(state.getFormer());
if (state.getLatter() > limit) {
sink.complete();
}
int temp = state.getFormer();
state.setFormer(state.getLatter());
state.setLatter(temp + state.getLatter());
return state;
});
}
Here is the output:
2021-12-02 10:47:51,990 INFO main c.u.p.p.s.c.Test - Generating number: FibonacciState(former=0, latter=1)
2021-12-02 10:47:51,993 INFO pool-1-thread-1 c.u.p.p.s.c.Test - On next => 0
2021-12-02 10:47:51,996 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Running task => 0
2021-12-02 10:47:54,035 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Task completed => 0
2021-12-02 10:47:54,035 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Generating number: FibonacciState(former=1, latter=1)
2021-12-02 10:47:54,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - On next => 1
2021-12-02 10:47:54,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Running task => 1
2021-12-02 10:47:56,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Task completed => 1
2021-12-02 10:47:56,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Generating number: FibonacciState(former=1, latter=2)
2021-12-02 10:47:56,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - On next => 1
2021-12-02 10:47:56,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Running task => 1
2021-12-02 10:47:58,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Task completed => 1
2021-12-02 10:47:58,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Generating number: FibonacciState(former=2, latter=3)
2021-12-02 10:47:58,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - On next => 2
2021-12-02 10:47:58,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Running task => 2
2021-12-02 10:48:00,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Task completed => 2
2021-12-02 10:48:00,037 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Generating number: FibonacciState(former=3, latter=5)
2021-12-02 10:48:00,037 INFO pool-1-thread-1 c.u.p.p.s.c.Test - On next => 3
2021-12-02 10:48:00,037 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Running task => 3
2021-12-02 10:48:02,037 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Task completed => 3
2021-12-02 10:52:07,877 INFO pool-1-thread-2 c.u.p.p.s.c.Test - Completed flow
Edit 04122021
You stated:
I'm trying to simulate HTTP / blocking calls. Hence the Mono.delay.
Mono#Delay is not the appropriate method to simulate a blocking call. The delay is introduced through the parallel scheduler and as a result, it does not wait for the task to complete. You can simulate a blocking call like this:
public String get() throws IOException {
HttpsURLConnection connection = (HttpsURLConnection) new URL("https://jsonplaceholder.typicode.com/comments").openConnection();
connection.setRequestMethod("GET");
try(InputStream inputStream = connection.getInputStream()) {
return new String(inputStream.readAllBytes(), StandardCharsets.UTF_8);
}
}
Note that as an alternative you could use .limitRate(1) operator instead of the prefetch parameter.

Related

Spring WebFlux | How to wait till a list of Monos finish execution in parallel

I want to send n-number of requests to a REST endpoint in parallel.I want to make sure these get executed in different threads for performance and need to wait till all n requests finish.
Only way I could come up with is using CountDownLatch as follows (please check the main() method. This is testable code):
public static void main(String args[]) throws Exception {
int n = 10; //n is dynamic during runtime
final CountDownLatch waitForNRequests = new CountDownLatch(n);
//send n requests
for (int i =0;i<n;i++) {
var r = testRestCall(""+i);
r.publishOn(Schedulers.parallel()).subscribe(res -> {
System.out.println(">>>>>>> Thread: " + Thread.currentThread().getName() + " response:" +res.getBody());
waitForNRequests.countDown();
});
}
waitForNRequests.await(); //wait till all n requests finish before goto the next line
System.out.println("All n requests finished");
Thread.sleep(10000);
}
public static Mono<ResponseEntity<Map>> testRestCall(String id) {
WebClient client = WebClient.create("https://reqres.in/api");
JSONObject request = new JSONObject();
request.put("name", "user"+ id);
request.put("job", "leader");
var res = client.post().uri("/users")
.contentType(MediaType.APPLICATION_JSON)
.body(BodyInserters.fromValue(request.toString()))
.accept(MediaType.APPLICATION_JSON)
.retrieve()
.toEntity(Map.class)
.onErrorReturn(ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE).build());
return res;
}
This doesnt look good and I am sure there is an elegant solution without using Latches..etc
I tried following method,but I dont know how to resolve following issues:
Flux.merge() , contact() results in executing all n-requests in a single thread
How to wait till n-requests finish execution (fork-join)?
List<Mono<ResponseEntity<Map>>> lst = new ArrayList<>();
int n = 10; //n is dynamic during runtime
for (int i =0;i<n;i++) {
var r = testRestCall(""+i);
lst.add(r);
}
var t= Flux.fromIterable(lst).flatMap(Function.identity()); //tried merge() contact() as well
t.publishOn(Schedulers.parallel()).subscribe(res -> {
System.out.println(">>>>>>> Thread: " + Thread.currentThread().getName() + " response:" +res.getBody());
///??? all requests execute in a single thread.How to parallelize ?
});
//???How to wait till all n requests finish before goto the next line
System.out.println("All n requests finished");
Thread.sleep(10000);
Update:
I found the reason why the Flux subscriber runs in the same thread, I need to create a ParallelFlux. So the correct order should be:
var t= Flux.fromIterable(lst).flatMap(Function.identity());
t.parallel()
.runOn(Schedulers.parallel())
.subscribe(res -> {
System.out.println(">>>>>>> Thread: " + Thread.currentThread().getName() + " response:" +res.getBody());
///??? all requests execute in a single thread.How to parallelize ?
});
Ref: https://projectreactor.io/docs/core/release/reference/#advanced-parallelizing-parralelflux
In reactive you think not about threads but about concurrency.
Reactor executes non-blocking/async tasks on a small number of threads using Schedulers abstraction to execute tasks. Schedulers have responsibilities very similar to ExecutorService. By default, for parallel scheduler number of threads is equal to number of CPU cores, but could be controlled by `reactor.schedulers.defaultPoolSize’ system property.
In your example instead of creating multiple Mono and then merge them, better to use Flux and then process elements in parallel controlling concurrency.
Flux.range(1, 10)
.flatMap(this::testRestCall)
By default, flatMap will process Queues.SMALL_BUFFER_SIZE = 256 number of in-flight inner sequences.
You could control concurrency flatMap(item -> process(item), concurrency) or use concatMap operator if you want to process sequentially. Check flatMap(..., int concurrency, int prefetch) for details.
Flux.range(1, 10)
.flatMap(i -> testRestCall(i), 5)
The following test shows that calls are executed in different threads
#Test
void testParallel() {
var flow = Flux.range(1, 10)
.flatMap(i -> testRestCall(i))
.log()
.then(Mono.just("complete"));
StepVerifier.create(flow)
.expectNext("complete")
.verifyComplete();
}
The result log
2022-12-30 21:31:25.169 INFO 43383 --- [ctor-http-nio-4] reactor.Mono.FlatMap.3 : | onComplete()
2022-12-30 21:31:25.170 INFO 43383 --- [ctor-http-nio-3] reactor.Mono.FlatMap.2 : | onComplete()
2022-12-30 21:31:25.169 INFO 43383 --- [ctor-http-nio-2] reactor.Mono.FlatMap.1 : | onComplete()
2022-12-30 21:31:25.169 INFO 43383 --- [ctor-http-nio-8] reactor.Mono.FlatMap.7 : | onComplete()
2022-12-30 21:31:25.169 INFO 43383 --- [tor-http-nio-11] reactor.Mono.FlatMap.10 : | onComplete()
2022-12-30 21:31:25.169 INFO 43383 --- [ctor-http-nio-7] reactor.Mono.FlatMap.6 : | onComplete()
2022-12-30 21:31:25.169 INFO 43383 --- [ctor-http-nio-9] reactor.Mono.FlatMap.8 : | onComplete()
2022-12-30 21:31:25.170 INFO 43383 --- [ctor-http-nio-6] reactor.Mono.FlatMap.5 : | onComplete()
2022-12-30 21:31:25.378 INFO 43383 --- [ctor-http-nio-5] reactor.Mono.FlatMap.4 : | onComplete()

CompletableFuture thenCompose V.S. thenComposeAsync

Based on different posts and articles, all I have known is that: thenCompose will run in the same thread with the previous stage while thenComposeAsync will try to start a new thread compared to the previous stage.
Even Java 8 in Action provides the following code to demonstrate that thenCompose will work in the same thread.
Quote from Java 8 in Action Chapter 11.4.3
public List<String> findPrices(String product) {
List<CompletableFuture<String>> priceFutures =
shops.stream()
.map(shop -> CompletableFuture.supplyAsync(
() -> shop.getPrice(product), executor))
.map(future -> future.thenApply(Quote::parse))
.map(future -> future.thenCompose(quote ->
CompletableFuture.supplyAsync(
() -> Discount.applyDiscount(quote), executor)))
.collect(toList());
return priceFutures.stream()
.map(CompletableFuture::join)
.collect(toList());
}
But I tried the following code, the result quite confused me which seems quite opposite that the thenCompose will start new thread while thenComposeAsync will reuse the old thread.
public static void main(String... args) {
testBasic();
testAsync();
}
private static void testBasic() {
out.println("*****************************************");
out.println("********** TESTING thenCompose **********");
ExecutorService executorService = Executors.newCachedThreadPool();
CompletableFuture[] futures = Stream.iterate(0, i -> i+1)
.limit(3)
.map(i -> CompletableFuture.supplyAsync(() -> runStage1(i), executorService))
.map(future -> future.thenCompose(i -> CompletableFuture.supplyAsync(() -> runStage2(i), executorService)))
.map(f -> f.thenAccept(out::println))
.toArray(size -> new CompletableFuture[size]);
CompletableFuture.allOf(futures).join();
}
private static void testAsync() {
out.println("*****************************************");
out.println("******* TESTING thenComposeAsync ********");
ExecutorService executorService = Executors.newCachedThreadPool();
CompletableFuture[] futures = Stream.iterate(0, i -> i+1)
.limit(3)
.map(i -> CompletableFuture.supplyAsync(() -> runStage1(i), executorService))
.map(future -> future.thenComposeAsync(i ->
CompletableFuture.supplyAsync(() -> runStage2(i), executorService)))
.map(f -> f.thenAccept(out::println))
.toArray(size -> new CompletableFuture[size]);
CompletableFuture.allOf(futures).join();
}
private static Integer runStage1(int a) {
String s = String.format("Start: stage - 1 - value: %d - thread name: %s",
a, Thread.currentThread().getName());
out.println(s);
Long start = System.currentTimeMillis();
try {
Thread.sleep(1500 + Math.abs(new Random().nextInt()) % 1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
s = String.format("Finish: stage - 1 - value: %d - thread name: %s - time cost: %d",
a, Thread.currentThread().getName(), (System.currentTimeMillis() - start));
out.println(s);
return Integer.valueOf(a % 3);
}
private static Integer runStage2(int b) {
String s = String.format("Start: stage - 2 - value: %d - thread name: %s",
b, Thread.currentThread().getName());
out.println(s);
Long start = System.currentTimeMillis();
try {
Thread.sleep(200 + Math.abs(new Random().nextInt()) % 1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
s = String.format("Finish: stage - 2 - value: %d - thread name: %s - time cost: %d",
b, Thread.currentThread().getName(), (System.currentTimeMillis() - start));
out.println(s);
return Integer.valueOf(b);
}
And the result is as follows:
*****************************************
********** TESTING thenCompose **********
Start: stage - 1 - value: 1 - thread name: pool-1-thread-2
Start: stage - 1 - value: 0 - thread name: pool-1-thread-1
Start: stage - 1 - value: 2 - thread name: pool-1-thread-3
Finish: stage - 1 - value: 1 - thread name: pool-1-thread-2 - time cost: 1571
Start: stage - 2 - value: 1 - thread name: pool-1-thread-4 // using a new thread?????
Finish: stage - 1 - value: 2 - thread name: pool-1-thread-3 - time cost: 1875
Start: stage - 2 - value: 2 - thread name: pool-1-thread-2
Finish: stage - 1 - value: 0 - thread name: pool-1-thread-1 - time cost: 2198
Start: stage - 2 - value: 0 - thread name: pool-1-thread-3
Finish: stage - 2 - value: 2 - thread name: pool-1-thread-2 - time cost: 442
2
Finish: stage - 2 - value: 1 - thread name: pool-1-thread-4 - time cost: 779
1
Finish: stage - 2 - value: 0 - thread name: pool-1-thread-3 - time cost: 1157
0
*****************************************
******* TESTING thenComposeAsync ********
Start: stage - 1 - value: 0 - thread name: pool-2-thread-1 // all in same thread
Start: stage - 1 - value: 1 - thread name: pool-2-thread-2
Start: stage - 1 - value: 2 - thread name: pool-2-thread-3
Finish: stage - 1 - value: 0 - thread name: pool-2-thread-1 - time cost: 1623
Start: stage - 2 - value: 0 - thread name: pool-2-thread-1
Finish: stage - 1 - value: 1 - thread name: pool-2-thread-2 - time cost: 1921
Start: stage - 2 - value: 1 - thread name: pool-2-thread-2
Finish: stage - 1 - value: 2 - thread name: pool-2-thread-3 - time cost: 1932
Start: stage - 2 - value: 2 - thread name: pool-2-thread-3
Finish: stage - 2 - value: 0 - thread name: pool-2-thread-1 - time cost: 950
0
Finish: stage - 2 - value: 2 - thread name: pool-2-thread-3 - time cost: 678
2
Finish: stage - 2 - value: 1 - thread name: pool-2-thread-2 - time cost: 956
1
Is there something wrong in the demo? Please correct me or if there are some useful resources, do please share.

Why I am getting Value is Null while executing two scenarios in Gatling?

I have two scenarios in my script. 1st "getAssets" scenario will fetch all asset IDs and save it in a list, 2nd scenario "fetchMetadata" will iterate those IDs.
I have to exeecute "getAssets" scenario only once to fetch all the IDs, and then "fetchMetadata" scenario will execute till given time duration.
Here is the Json response of "/api/assets;limit=$limit" request (We are fetching id from here using $.assets[*].id),
{
"assets": [
{
"id": 3010411,
"name": "Asset 2016-11-22 20:06:07",
....
....
},
{
"id": 3010231,
"name": "Asset 2016-11-22 20:07:07",
....
....
}, and so on..
Here is the code
import java.util.concurrent.ThreadLocalRandom
import scala.concurrent.duration._
import io.gatling.core.Predef._
import io.gatling.http.Predef._
class getAssetsMetadata extends Simulation {
val getAssetURL = System.getProperty("getAssetURL", "https://performancetesting.net")
val username = System.getProperty("username", "performanceuser")
val password = System.getProperty("password", "performanceuser")
val limit = Integer.getInteger("limit", 1000).toInt
val userCount = Integer.getInteger("userCount", 100).toInt
val duration = Integer.getInteger("duration",1).toInt //in minutes
var IdList: Seq[String] = _
val httpProtocol = http
.basicAuth(username, password)
.baseURL(getAssetURL)
.contentTypeHeader("""application/vnd.v1+json""")
// Scenario 1 get assets
val getAssets = scenario("Get Assets")
.exec(http("List of Assets")
.get(s"""/api/assets;limit=$limit""")
.check(jsonPath("$.assets[*].id").findAll.transform {v => IdList = v; v }.saveAs("IdList"))
)
// Scenario 2 Fetch Metadata
val fetchMetadata = scenario("Fetch Metadata")
.exec(_.set("IdList", IdList))
.exec(http("Metadata Request")
.get("""/api/assets/${IdList.random()}/metadata""")
)
val scn = List(getAssets.inject(atOnceUsers(1)), fetchMetadata.inject(constantUsersPerSec(userCount) during(duration minutes)))
setUp(scn).protocols(httpProtocol)
}
:::ERROR:::
It throws "Value is null" (While we have 10 million asset IDs here). here is the Gatling log
14883 [GatlingSystem-akka.actor.default-dispatcher-4] INFO io.gatling.http.config.HttpProtocol - Warm up done
14907 [GatlingSystem-akka.actor.default-dispatcher-4] INFO io.gatling.http.config.HttpProtocol - Start warm up
14909 [GatlingSystem-akka.actor.default-dispatcher-4] INFO io.gatling.http.config.HttpProtocol - Warm up done
14911 [GatlingSystem-akka.actor.default-dispatcher-4] INFO i.g.core.controller.Controller - Total number of users : 6001
14918 [GatlingSystem-akka.actor.default-dispatcher-6] INFO i.g.c.r.writer.ConsoleDataWriter - Initializing
14918 [GatlingSystem-akka.actor.default-dispatcher-3] INFO i.g.c.result.writer.FileDataWriter - Initializing
14923 [GatlingSystem-akka.actor.default-dispatcher-6] INFO i.g.c.r.writer.ConsoleDataWriter - Initialized
14928 [GatlingSystem-akka.actor.default-dispatcher-3] INFO i.g.c.result.writer.FileDataWriter - Initialized
14931 [GatlingSystem-akka.actor.default-dispatcher-4] DEBUG i.g.core.controller.Controller - Launching All Scenarios
14947 [GatlingSystem-akka.actor.default-dispatcher-12] ERROR i.g.http.action.HttpRequestAction - 'httpRequest-2' failed to execute: Value is null
14954 [GatlingSystem-akka.actor.default-dispatcher-4] DEBUG i.g.core.controller.Controller - Finished Launching scenarios executions
14961 [GatlingSystem-akka.actor.default-dispatcher-4] DEBUG i.g.core.controller.Controller - Setting up max duration
14962 [GatlingSystem-akka.actor.default-dispatcher-4] INFO i.g.core.controller.Controller - Start user #7187317726850756780-0
14963 [GatlingSystem-akka.actor.default-dispatcher-4] INFO i.g.core.controller.Controller - Start user #7187317726850756780-1
14967 [GatlingSystem-akka.actor.default-dispatcher-4] INFO i.g.core.controller.Controller - End user #7187317726850756780-1
14970 [GatlingSystem-akka.actor.default-dispatcher-4] INFO i.g.core.controller.Controller - Start user #7187317726850756780-2
14970 [GatlingSystem-akka.actor.default-dispatcher-5] INFO io.gatling.http.ahc.HttpEngine - Sending request=List of Assets uri=https://performancetesting.net/api/assets;limit=1000: scenario=Get Assets, userId=7187317726850756780-0
14970 [GatlingSystem-akka.actor.default-dispatcher-7] ERROR i.g.http.action.HttpRequestAction - 'httpRequest-2' failed to execute: Value is null
14972 [GatlingSystem-akka.actor.default-dispatcher-7] INFO i.g.core.controller.Controller - End user #7187317726850756780-2
14980 [GatlingSystem-akka.actor.default-dispatcher-7] ERROR i.g.http.action.HttpRequestAction - 'httpRequest-2' failed to execute: Value is null
14980 [GatlingSystem-akka.actor.default-dispatcher-4] INFO i.g.core.controller.Controller - Start user #7187317726850756780-3
14984 [GatlingSystem-akka.actor.default-dispatcher-4] INFO i.g.core.controller.Controller - End user #7187317726850756780-3
.....
.....
.....
61211 [GatlingSystem-akka.actor.default-dispatcher-12] INFO i.g.core.controller.Controller - Start user #7187317726850756780-4626
61211 [GatlingSystem-akka.actor.default-dispatcher-7] ERROR i.g.http.action.HttpRequestAction - 'httpRequest-2' failed to execute: Value is null
61211 [GatlingSystem-akka.actor.default-dispatcher-7] INFO i.g.core.controller.Controller - End user #7187317726850756780-4626
61224 [GatlingSystem-akka.actor.default-dispatcher-2] INFO i.g.core.controller.Controller - Start user #7187317726850756780-4627
61225 [GatlingSystem-akka.actor.default-dispatcher-5] INFO io.gatling.http.ahc.HttpEngine - Sending request=Metadata Request uri=https://performancetesting.net/api/assets/3010320/metadata: scenario=Fetch Metadata, userId=7187317726850756780-4627
61230 [GatlingSystem-akka.actor.default-dispatcher-12] INFO i.g.core.controller.Controller - Start user #7187317726850756780-4628
61230 [GatlingSystem-akka.actor.default-dispatcher-7] INFO io.gatling.http.ahc.HttpEngine - Sending request=Metadata Request uri=https://performancetesting.net/api/assets/3009939/metadata: scenario=Fetch Metadata, userId=7187317726850756780-4628
61233 [GatlingSystem-akka.actor.default-dispatcher-2] INFO i.g.core.controller.Controller - End user #7187317726850756780-0
61233 [New I/O worker #12] DEBUG c.n.h.c.p.netty.handler.Processor - Channel Closed: [id: 0x8c94a1ae, /192.168.100.108:56739 :> performancetesting.net/10.20.14.176:443] with attribute INSTANCE
---- Requests ------------------------------------------------------------------
> Global (OK=261 KO=40 )
> Metadata Request (OK=260 KO=40 )
> List of Assets (OK=1 KO=0 )
---- Errors --------------------------------------------------------------------
> Value is null 40 (100.0%)
================================================================================
Thank you.
The list of ID is not passed to the other scenario, because it is obtained by another "user" (session).
Your simulation reads like the following
1 user is obtaining the list of IDs (and keeps the list for itself)
AT THE SAME TIME, n users try to fetch assets with an (undefined) list of IDs
The first user doesnt talk to the others, both scenario injections are independent of each other.
A way to solve this is to store the list in a shared container (i.e. AtomicReference) and access this container from the second scenario.
To ensure, the container is populated, inject a nothingFor step at the beginning of the second scenario in order to wait for the first scenario to finish.
Another way is to fetch the list of IDs in the beginning of the second scenario - if it has not been fetched before (see container above).
I'm sure there are ways to accomplish this as well (i.e. using some feeder and fetch the list of ids before the actual test)

Oracle Advanced Queue - Performance Issue in Consumption Rate

We've performed a performance test with Oracle Advanced Queue on our Oracle DB environment. We've created the queue and the queue table with the following script:
BEGIN
DBMS_AQADM.create_queue_table(
queue_table => 'verisoft.qt_test',
queue_payload_type => 'SYS.AQ$_JMS_MESSAGE',
sort_list => 'ENQ_TIME',
multiple_consumers => false,
message_grouping => 0,
comment => 'POC Authorizations Queue Table - KK',
compatible => '10.0',
secure => true);
DBMS_AQADM.create_queue(
queue_name => 'verisoft.q_test',
queue_table => 'verisoft.qt_test',
queue_type => dbms_aqadm.NORMAL_QUEUE,
max_retries => 10,
retry_delay => 0,
retention_time => 0,
comment => 'POC Authorizations Queue - KK');
DBMS_AQADM.start_queue('q_test');
END;
/
We've published 1000000 messages with 2380 TPS using a PL/SQL client. And we've consumed 1000000 messages with 292 TPS, using Oracle JMS API Client.
The consumer rate is almost 10 times slower than the publisher and that speed does not meet our requirements.
Below, is the piece of Java code that we use to consume messages:
if (q == null) initializeQueue();
System.out.println(listenerID + ": Listening on queue " + q.getQueueName() + "...");
MessageConsumer consumer = sess.createConsumer(q);
for (Message m; (m = consumer.receive()) != null;) {
new Timer().schedule(new QueueExample(m), 0);
}
sess.close();
con.close();
Do you have any suggestion on, how we can improve the performance at the consumer side?
Your use of Timer may be your primary issue. The Timer definition reads:
Corresponding to each Timer object is a single background thread that is used to execute all of the timer's tasks, sequentially. Timer tasks should complete quickly. If a timer task takes excessive time to complete, it "hogs" the timer's task execution thread. This can, in turn, delay the execution of subsequent tasks, which may "bunch up" and execute in rapid succession when (and if) the offending task finally completes.
I would suggest you use a ThreadPool.
// My executor.
ExecutorService executor = Executors.newCachedThreadPool();
public void test() throws InterruptedException {
for (int i = 0; i < 1000; i++) {
final int n = i;
// Instead of using Timer, create a Runnable and pass it to the Executor.
executor.submit(new Runnable() {
#Override
public void run() {
System.out.println("Run " + n);
}
});
}
executor.shutdown();
executor.awaitTermination(1, TimeUnit.DAYS);
}

Pig UDF, file in Distributed Cache deleted during batch work

public class GetCountryFromIP extends EvalFunc<String> {
#Override
public List<String> getCacheFiles() {
List<String> list = new ArrayList<String>(1);
list.add("/input/pig/resources/GeoLite2-Country.mmdb#GeoLite2-Country");
return list;
}
#Override
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0 || input.get(0) == null) {
return null;
}
try {
String inputIP = (String) input.get(0);
String output;
File database = new File("./GeoLite2-Country");
//CODE FOR EXPLAIN
if (database.exists()) {
System.out.print("EXIST!!!");
} else {
System.out.print("NOTEXISTS!!!");
}
//CODE FOR EXPLAIN
DatabaseReader reader = new DatabaseReader.Builder(database).build();
InetAddress ipAddress = InetAddress.getByName(inputIP);
CountryResponse response = reader.country(ipAddress);
Country country = response.getCountry();
output = country.getIsoCode();
return output;
} catch (AddressNotFoundException e) {
return null;
} catch (Exception ee) {
throw new IOException("Uncaught exec" + ee);
}
}
}
Here is My UDF code, I need GeoLite2-Count.mmdb File, So use GetCacheFile.
Also I put all Pig-Latin to one pig file, 'batch.pig'
When I run this file 'pig batch.pig'
Output seams like this
2015-10-06 01:16:56,737 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - soft limit at 83886080
2015-10-06 01:16:56,737 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - bufstart = 0; bufvoid = 104857600
2015-10-06 01:16:56,737 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - kvstart = 26214396; length = 6553600
2015-10-06 01:16:56,738 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2015-10-06 01:16:56,744 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2015-10-06 01:16:56,754 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: weblog[-1,-1],weblog_web[30,13],weblog_web[-1,-1],weblog_web[-1,-1],desktop_active_log_account_filter[7,36],desktop_parsed[3,18],desktop_parsed_abstract[5,26],weblog_web[-1,-1],web_active_log_account_filter[20,32],weblog_web_parsed[16,20],weblog_web_parsed_abstract[18,29] C: R:
EXIST!!!
2015-10-06 01:16:56,997 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner -
2015-10-06 01:16:56,997 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
...
...
...
2015-10-06 01:16:57,938 [Thread-1885] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.
2015-10-06 01:16:57,939 [pool-59-thread-1] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
NOTEXIST!!!
2015-10-06 01:16:57,974 [pool-59-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: account_hour_activity[42,24],account_hour_activity_group[41,30],team_hour_activity[76,21],team_hour_activity_group[75,27] C:
...
...
..
2015-10-06 01:16:57,976 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
2015-10-06 01:16:57,977 [Thread-2139] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2015-10-06 01:16:57,981 [Thread-2139] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1209692101_0021
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: Local Rearrange[tuple]{tuple}(true) - scope-2240 Operator Key: scope-2240): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: weblog_web_parsed_abstract: New For Each(false,false,false)[bag] - scope-1379 Operator Key: scope-1379): org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error from UDF: com.tosslab.sprinklr.country.GetCountryFromIP [Uncaught execjava.io.FileNotFoundException: ./GeoLite2-Country (No such file or directory)]
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: Local Rearrange[tuple]{tuple}(true) - scope-2240 Operator Key: scope-2240): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: weblog_web_parsed_abstract: New For Each(false,false,false)[bag] - scope-1379 Operator Key: scope-1379): org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error from UDF: com.tosslab.sprinklr.country.GetCountryFromIP [Uncaught execjava.io.FileNotFoundException: ./GeoLite2-Country (No such file or directory)]
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:316)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:291)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:259)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:241)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:246)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNextTuple(POSplit.java:233)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: weblog_web_parsed_abstract: New For Each(false,false,false)[bag] - scope-1379 Operator Key: scope-1379): org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error from UDF: com.tosslab.sprinklr.country.GetCountryFromIP [Uncaught execjava.io.FileNotFoundException: ./GeoLite2-Country (No such file or directory)]
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:316)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
... 17 more
This Means mmdb File is deleted During the batch work...
What's going on here? How could I solve this Issue?
Seems like the job is run from local mode.
2015-10-06 01:16:57,976 [**LocalJobRunner** Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
When you run the job in local mode the distributed cache is not supported.
2015-10-05 23:22:56,675 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Distributed cache not supported or needed in local mode.
Put everything in HDFS and run in mapreduce mode.

Categories

Resources