I am just keen to know whether it's possible to replace the old Multi threading codes written with Java's Executor service to Akka. I have few doubts regarding this.
Is akka actor runs in their own thread?
How Threads will be assigned for the Actors ?
What are the pros and cons of migration of it is possible?
Currently I use Fixed Thread pool for multi threading, and submit a callable.
Sample Code,
public class KafkaConsumerFactory {
private static Map<String,KafkaConsumer> registry = new HashMap<>();
private static ThreadLocal<KafkaConsumer> consumers = new ThreadLocal<KafkaConsumer>(){
#Override
protected KafkaConsumer initialValue() {
return new KafkaConsumer(createConsumerConfig());
}
};
static {
Runtime.getRuntime().addShutdownHook(new Thread(){
#Override
public void run() {
registry.forEach((tid,con) -> {
try{
con.close();
} finally {
System.out.println("Yes!! Consumer for " + tid + " is closed.");
}
});
}
});
}
private static Properties createConsumerConfig() {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "newcon-grp5");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", KafkaKryoSerde.class.getName());
return props;
}
public static <K,V> KafkaConsumer<K,V> createConsumer(){
registry.put(Thread.currentThread().getName(),consumers.get());
return consumers.get();
}
}
/////////////////////////////////////////////////////////
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.util.*;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.function.Consumer;
public class KafkaNewConsumer {
public static int MAX_THREADS = 10;
private ExecutorService es = null;
private boolean stopRequest = false;
public static void main(String[] args){
KafkaNewConsumer knc = new KafkaNewConsumer();
Runtime.getRuntime().addShutdownHook(new Thread(){
#Override
public void run(){
knc.es.shutdown();
try {
knc.es.awaitTermination(500, TimeUnit.MILLISECONDS);
} catch (InterruptedException ignored) {
}finally {
System.out.println("Finished");
}
}
});
knc.consumeTopic("rtest3",knc::recordConsuemer);
}
public void recordConsuemer(ConsumerRecord<?,?> record){
String result = new StringJoiner(": ")
.add(Thread.currentThread().getName())
.add("ts").add(String.valueOf(record.timestamp()))
.add("offset").add(String.valueOf(record.offset()))
.add("data").add(String.valueOf(record.value()))
.add("value-len").add(String.valueOf(record.serializedValueSize()))
.toString();
System.out.println(result);
}
public void consumeTopic(String topicName, Consumer<ConsumerRecord<?,?>> fun){
KafkaConsumer con= KafkaConsumerFactory.createConsumer();
int paritions = con.partitionsFor(topicName).size();
int noOfThread = (MAX_THREADS < paritions) ? MAX_THREADS :paritions;
es = Executors.newFixedThreadPool(noOfThread);
con.close();
for(int i=0;i<noOfThread;i++){
es.submit(()->{
KafkaConsumer consumer = KafkaConsumerFactory.createConsumer();
try{
while (!stopRequest){
consumer.subscribe(Collections.singletonList(topicName));
ConsumerRecords<?,?> records = consumer.poll(5000);
records.forEach(fun);
consumer.commitSync();
}
}catch(Exception e){
e.printStackTrace();
} finally {
consumer.close();
}
});
}
}
}
I went through some of the internet tutorial, some of them directly concludes
actors was very good and faster than traditional threads.
But no explanation how it can become faster than threads ?
I tried some sample Akka(Akka sample from activator) code, and printed Thread.currentThread.getName inside all actors and found different dispatcher threads named(helloakka-akka.actor.default-dispatcher-X) are created.
But how ? who is creating those threads ? where is the configuration for them ? What is the mapping relations between a thread and an Actor ?
Every time I send a message will Akka create new Thread? Or internally a Thread pool is used?
If I need 100 threads to do parallel execution of parts of the some task, do I need to create 100 Actors and send 1 message to each of them ? Or I need to create 1 actor and put 100 message in it's queue , It will get forked into 100 threads.
Really Confused
Migration to an actor system is not a small task for an executor based system but it can be done. It requires you to re-think the way you design the system and consider the impact of actors. For example in a threaded architecture you create some handler for a business process, toss it in a runnable and let it go off doing things on a thread. This is wholly inappropriate for an actor paradigm. You have to re-architect your system to deal with message passing and using the messaging to invoke tasks. Also you have to change the way you think about business processes from an imperative approach to a message based approach. Consider for the example the simple task of purchasing the product. I will assume you know how to do it in an executor. In an actor system you do this:
(Purchase Product) -> UserActor -> (BillCredit Card) -> CCProcessing Actor -> (Purchase Approved and Billed Item) -> inventory manager -> ... and so on
At each phase what is in the parentheses is an asynchronous message sent to the actor in question which performs business logic then forwards a message to the next actor in the process.
Now this is only one means of creating an actor based system, there are many other techniques but the core fundamental is you cant think imperatively but rather as a collection of steps that each run independently. Then the messages blast through the system at regular order but you cant be sure of the order or even if the message will get there so you have to design in semantics to handle that. In the system above I might have another actor checking every two minutes for orphaned orders that have not been presented to billing. Of course that means my messages need to ideimpotent to make sure if I send them the second time its ok, they wont bill the user twice.
I know I didn't deal with your specific example, I just wanted to provide some context for you that actors are not just another way to create an executor (well I suppose you could abuse them that way but its not advisable) but rather a completely different paradigm of design. A very worthwhile paradigm to learn for sure and if you make the leap you will never want to do executors ever again.
Related
I have a server with multiple clients. It uses one server socket and two thread pools for receiving and handling requests from remote clients: one pool - for handling clients connections, and another one - for processing clients remote tasks. Each client sends asynchronous tasks with unique task ID (within each connection) and a bunch of parameters. Upon task deserialization, the server looks up the corresponding service, invokes the given method on it, wraps the result along with the task ID into the answer object and sends it back to the client using ObjectOutputStream.
Since tasks are handled concurrently, two or more threads might finish processing tasks for one client at the same time and try to compete for the ObjectOutputStream.
What happens next? I mean, do they write their objects to output stream atomically or should I synchronize their access to ObjectOutputStream, so that to avoid situation when one thread writes half of its object - then another thread intervenes and... as a result, a sort of Frankenstein object will be send to the client.
import java.io.*;
import java.lang.reflect.Method;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.*;
import java.util.concurrent.*;
public class Server {
private final ExecutorService connExecutor = Executors.newCachedThreadPool();
private final ExecutorService tasksExecutor = Executors.newCachedThreadPool();
public void start() {
try (ServerSocket socket = new ServerSocket(2323);) {
while (true) {
try (Socket conn = socket.accept()) {
connExecutor.execute(() -> {
try (ObjectInputStream in = new ObjectInputStream(conn.getInputStream());
ObjectOutputStream out = new ObjectOutputStream(conn.getOutputStream())) {
while (true) {
RemoteTask task = (RemoteTask) in.readObject();
tasksExecutor.execute(() -> {
handleTask(task, out);
});
}
} catch (IOException | ClassNotFoundException e) {
e.printStackTrace();
}
});
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
private void handleTask(RemoteTask task, ObjectOutputStream out) {
RemoteAnswer answer = new RemoteAnswer();
// unwrap remote task
// lookup local service
// invoke task's method
// wrap result into remote answer
// send answer to the client
try {
out.writeObject(answer);
} catch (IOException e) {
e.printStackTrace();
}
}
}
This here says it nicely:
Is writing an object to an ObjectOutputStream a thread-safe operation?
Absolutely not.
So, yes, your code needs to take precautions itself.
As a rule of thumb: If the documentation doesn't specify that a certain class is thread-safe, it probably isn't. Thread-safety clearly is an "intentional quality" (allusion to Roman Elizarov's blog post, one of Kotlin's language designers) and should therefore always be mentioned.
However, if you're still unsure whether a class of the Java SE-library provides thread-safety or not (as it might be mentioned somewhere else, e.g. the superclass' documentation), you might also just take a quick glance at the type's source code. As you can see, ObjectOutputStream doesn't implement any synchronization mechanisms.
Using Java 8.
I have a Logger class that calls an API whenever it needs to log something. I realized that if the API is somehow badly configured, or the API just does not answer, my log action takes an awefull lot of time.
Example of synchronous logging:
public void debug(String message) {
MDC.clear();
MDC.put(SOME_KEY, "SOME_VALUE");
super.debug(message);
MDC.clear();
}
I was able to pinpoint that the problem is here because if I just comment everything and stop logging or doing anything, everything runs as fast as it should:
public void debug(String message) {
// MDC.clear();
// MDC.put(SOME_KEY, "SOME_VALUE");
// super.debug(message);
// MDC.clear();
}
So I thought to make this an asynchronous call, since I don't care if it's logged synchronously :
public void debug(String message) {
CompletableFuture.runAsync(() -> {
MDC.clear();
MDC.put(SOME_KEY, "SOME_VALUE");
super.debug(message);
MDC.clear();
});
}
But this asynchronous call is just as bad, performance wise for my main application, as the synchronous call. What am I missing ?
Your problem is that you don't provide an executor. This can cause Java to provide you with less threads than you have currently waiting debug calls, which means that you still get some blocking. On my Intel Core i7-4790 with 4 cores and hyperthreading on Java 8, I seem to get 7 threads running at the same time (number logical CPUs - 1 for the main thread). You can fix this by supplying an unlimited number of threads using a cached thread pool:
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class Test
{
public static void main(String[] args) throws InterruptedException
{
Executor ex = Executors.newCachedThreadPool();
for(int i=0;i<100;i++)
{
CompletableFuture.runAsync(() -> {
try {
TimeUnit.SECONDS.sleep(1);
} catch (InterruptedException e) {
throw new IllegalStateException(e);
}
System.out.println("completed");
},ex);
}
TimeUnit.SECONDS.sleep(2);
}
}
See the example above, which prints "completed" 100 times. If you remove the ex parameter, it will print far less.
However the underlying cause of slow debug calls may still need to be fixed as this may fill your memory if it is a long running task.
See also: (https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html):
All async methods without an explicit Executor argument are
performed using the ForkJoinPool.commonPool() (unless it does not
support a parallelism level of at least two, in which case, a new
Thread is created to run each task). [...]
I am new to RxJava and was trying to execute an example of parallel execution for multiple Observables from link :
RxJava Fetching Observables In Parallel
Though the example provided in the above link is executing observables in parallel, but when I added a Thread.sleep(TIME_IN_MILLISECONDS) in the forEach method then the system started executing one Observable at a time. Please help me to understand that why Thread.sleep is stopping the parallel execution of Observables.
Below is the modified example which is causing synchronous execution of multiple observables :
import rx.Observable;
import rx.Subscriber;
import rx.schedulers.Schedulers;
public class ParallelExecution {
public static void main(String[] args) {
System.out.println("------------ mergingAsync");
mergingAsync();
}
private static void mergingAsync() {
Observable.merge(getDataAsync(1), getDataAsync(2)).toBlocking()
.forEach(x -> { try{Thread.sleep(4000);}catch(Exception ex){};
System.out.println(x + " " + Thread.currentThread().getId());});
}
// artificial representations of IO work
static Observable<Integer> getDataAsync(int i) {
return getDataSync(i).subscribeOn(Schedulers.io());
}
static Observable<Integer> getDataSync(int i) {
return Observable.create((Subscriber<? super Integer> s) -> {
// simulate latency
try {
Thread.sleep(1000);
} catch (Exception e) {
e.printStackTrace();
}
s.onNext(i);
s.onCompleted();
});
}
}
In the above example we are using the subscribeOn method of observable and providing a ThreadPool(Schedules.io) for execution, so subscription for each Observable will happen on separate thread.
There is a possibility that Thread.sleep is locking any shared object between threads but I am still not clear on it. Please help.
Actually, with your example parallel execution is do happening, you are just looking at it incorrectly, there is a difference between where the work is executed and where the notification are emitted.
if you will put the log with the thread id at Observable.create, you will notice each Observable is executed at different thread simultaneously. but the notification is happens serially. this behavior is as expected as part of Observable contract is that observables must issue notifications to observers serially (not in parallel).
I'm toying with Java8's streams and CompletableFutures. My pre-existing code has a class that takes a single URL and downloads it:
public class FileDownloader implements Runnable {
private URL target;
public FileDownloader(String target) {
this.target = new URL(target);
}
public void run() { /* do it */ }
}
Now, this class gets it's information from another part that emits List<String> (a number of targets on a single host).
I've switched the surrounding code to CompletableFuture:
public class Downloader {
public static void main(String[] args) {
List<String> hosts = fetchTargetHosts();
for (String host : hosts) {
HostDownloader worker = new HostDownloader(host);
CompletableFuture<List<String>> future =
CompletableFuture.supplyAsync(worker);
future.thenAcceptAsync((files) -> {
for (String target : files) {
new FileDownloader(target).run();
}
});
}
}
public static class HostDownloader implements Supplier<List<String>> {
/* not shown */
}
/* My implementation should either be Runnable or Consumer.
Please suggest based on a idiomatic approach to the main loop.
*/
public static class FileDownloader implements Runnable, Consumer<String> {
private String target;
public FileDownloader(String target) {
this.target = target;
}
#Override
public void run() { accept(this.target); }
#Override
public void accept(String target) {
try (Writer output = new FileWriter("/tmp/blubb")) {
output.write(new URL(target).getContent().toString());
} catch (IOException e) { /* just for demo */ }
}
}
}
Now, this doesn't feel natural. I'm producing a stream of Strings and my FileDownloader consumes one of them at a time. Is there a readymade to enable my single value Consumer to work with Lists or am I stuck with the for loop here?
I know it's trivial to move the loop into the accept and just make a Consumer<List<String>>, that's not the point.
There is no point in dissolving two directly dependent steps into two asynchronous steps. They are still dependent and if the separation has any effect, it won’t be a positive one.
You can simply use
List<String> hosts = fetchTargetHosts();
FileDownloader fileDownloader = new FileDownloader();
for(String host: hosts)
CompletableFuture.runAsync(()->
new HostDownloader(host).get().forEach(fileDownloader));
or, assuming that FileDownloader does not have mutable state regarding a download:
for(String host: hosts)
CompletableFuture.runAsync(()->
new HostDownloader(host).get().parallelStream().forEach(fileDownloader));
This still has the same level of concurrency as your original approach using supplyAsync plus thenAcceptAsync, simply because these two dependent steps can’t run concurrently anyway, so the simple solution is to put both steps into one concise operation that will be executed asynchronously.
However, at this point it’s worth noting that the entire use of CompletableFuture is not recommended for this operation. As it’s documentation states:
All async methods without an explicit Executor argument are performed using the ForkJoinPool.commonPool()
The problem with the common pool is that its pre-configured concurrency level depends on the number of CPU cores and won’t be adjusted if threads are blocked during an I/O operation. In other words, it is unsuitable for I/O operations.
Unlike Stream, CompletableFuture allows you to specify an Executor for the async operations, so you can configure your own Executor to be suitable for I/O operations, on the other hand, when you deal with an Executor anyway, there is no need for CompletableFuture at all, at least not for such a simple task:
List<String> hosts = fetchTargetHosts();
int concurrentHosts = 10;
int concurrentConnections = 100;
ExecutorService hostEs=Executors.newWorkStealingPool(concurrentHosts);
ExecutorService connEs=Executors.newWorkStealingPool(concurrentConnections);
FileDownloader fileDownloader = new FileDownloader();
for(String host: hosts) hostEs.execute(()-> {
for(String target: new HostDownloader(host).get())
connEs.execute(()->fileDownloader.accept(target));
});
At this place you may consider either, to inline the code of FileDownloader.accept into the lambda expression or to revert it to be a Runnable so that you can change the inner loop’s statement to connEs.execute(new FileDownloader(target)).
An alternative would be:
CompletableFuture.supplyAsync(worker)
.thenApply(list -> list.stream().map(FileDownloader::new))
.thenAccept(s -> s.forEach(FileDownloader::run));
I think you need to do forEach like that:
for (String host : hosts) {
HostDownloader worker = new HostDownloader(host);
CompletableFuture<List<String>> future =
CompletableFuture.supplyAsync(worker);
future.thenAcceptAsync(files ->
files.stream()
.forEach(target -> new FileDownloader(target).run())
);
}
by the way, you could do the same with the main loop...
edit:
Since OP edited original post, adding implementation details of FileDownloader, I am editing my answer accordingly. Java 8 functional interface is meant to allow the use of lambda expr in place of concrete Class. It is not meant to be implemented like regular interface 9although it can be) Therefor, "to take advantage of" Java 8 consumer means replacing FileDownloader with the code of accept like this:
for (String host : hosts) {
HostDownloader worker = new HostDownloader(host);
CompletableFuture<List<String>> future = CompletableFuture.supplyAsync(worker);
future.thenAcceptAsync(files ->
files.forEach(target -> {
try (Writer output = new FileWriter("/tmp/blubb")) {
output.write(new URL(target).getContent().toString());
} catch (IOException e) { /* just for demo */ }
})
);
}
I have a service which process a request from a user.
And this service call another external back-end system(web services). but I need to execute those back-end web services in parallel. How would you do that? What is the best approach?
thanks in advance
-----edit
Back-end system can run requests in parallel, we use containers like (tomcat for development) and websphere finally for production.
So I'm already in one thread(servlet) and need to spawn two tasks and possibly run them in parallel as close together as possible.
I can imagine using either quartz or thread with executors or let it be on Servlet engine. What is proper path to take in such a scenario?
You can use Threads to run the requests in parallel.
Depending on what you want to do, it may make sense to build on some existing technology like Servlets, that do the threading for you
The answer is to run the tasks in separate threads.
For something like this, I think you should be using a ThreadPoolExecutor with a bounded pool size rather than creating threads yourself.
The code would look something like this. (Please note that this is only a sketch. Check the javadocs for details, info on what the numbers mean, etc.)
// Create the executor ... this needs to be shared by the servlet threads.
Executor exec = new ThreadPoolExecutor(1, 10, 120, TimeUnit.SECONDS,
new ArrayBlockingQueue(100), ThreadPoolExecutor.CallerRunsPolicy);
// Prepare first task
final ArgType someArg = ...
FutureTask<ResultType> task = new FutureTask<ResultType>(
new Callable<ResultType>() {
public ResultType call() {
// Call remote service using information in 'someArg'
return someResult;
}
});
exec.execute(task);
// Repeat above for second task
...
exec.execute(task2);
// Wait for results
ResultType res = task.get(30, TimeUnit.SECONDS);
ResultType res2 = task2.get(30, TimeUnit.SECONDS);
The above does not attempt to handle exceptions, and you need to do something more sophisticated with the timeouts; e.g. keeping track of the overall request time and cancelling tasks if we run over time.
This is not a problem that Quartz is designed to solve. Quartz is a job scheduling system. You just have some tasks that you need to be executed ASAP ... possibility with the facility to cancel them.
Heiko is right that you can use Threads. Threads are complex beasts, and need to be treated with care. The best solution is to use a standard library, such as java.util.concurrent. This will be a more robust way of managing parallel operations. There are performance benefits which coming with this approach, such as thread pooling. If you can use such a solution, this would be the recommended way.
If you want to do it yourself, here is a very simple way of executing a number of threads in parallel, but probably not very robust. You'll need to cope better with timeouts and destruction of threads, etc.
public class Threads {
public class Task implements Runnable {
private Object result;
private String id;
public Task(String id) {
this.id = id;
}
public Object getResult() {
return result;
}
public void run() {
System.out.println("run id=" + id);
try {
// call web service
Thread.sleep(10000);
result = id + " more";
} catch (InterruptedException e) {
// TODO do something with the error
throw new RuntimeException("caught InterruptedException", e);
}
}
}
public void runInParallel(Runnable runnable1, Runnable runnable2) {
try {
Thread t1 = new Thread(runnable1);
Thread t2 = new Thread(runnable2);
t1.start();
t2.start();
t1.join(30000);
t2.join(30000);
} catch (InterruptedException e) {
// TODO do something nice with exception
throw new RuntimeException("caught InterruptedException", e);
}
}
public void foo() {
Task task1 = new Task("1");
Task task2 = new Task("2");
runInParallel(task1, task2);
System.out.println("task1 = " + task1.getResult());
System.out.println("task2 = " + task2.getResult());
}
}