Does Vert.x has real concurrency for single verticles?

Does Vert.x has real concurrency for single verticles? - java

the question might look like a troll but it is actually about how vert.x manages concurrency since a verticle itself runs in a dedicated thread.
Let's look at this simple vert.x http server written in Java:
import org.vertx.java.core.Handler;
import org.vertx.java.core.http.HttpServerRequest;
import org.vertx.java.platform.Verticle;
public class Server extends Verticle {
public void start() {
vertx.createHttpServer().requestHandler(new Handler<HttpServerRequest>() {
public void handle(HttpServerRequest req) {
req.response().end("Hello");
}
}).listen(8080);
}
}
As far as I understand the docs, this whole file represents a verticle. So the start method is called within the dedicated verticle thread, so far so good. But where is the requestHandler invoked ? If it is invoked on exactly this thread I can't see where it is better than node.js.
I'm pretty familiar with Netty, which is the network/concurrency library vert.x is based on. Every incoming connection is mapped to a dedicated thread which scales quite nicely. So.. does this mean that incoming connections represent verticles as well ? But how can then the verticle instance "Server" communicate with those clients ? In fact I would say that this concept is as limited as Node.js.
Please help me to understand the concepts right!
Regards,
Chris

I've talked to someone who is quite involved in vert.x and he told me that I'm basically right about the "concurrency" issue.
BUT: He showed me a section in the docs which I totally missed where "Scaling servers" is explained in detail.
The basic concept is, that when you write a verticle you just have single core performance. But it is possible to start the vert.x platform using the -instance parameter which defines how many instances of a given verticle are run. Vert.x does a bit of magic under the hood so that 10 instances of my server do not try to open 10 server sockets but actually a single on instead. This way vert.x is horizontally scalable even for single verticles.
This is really a great concept and especially a great framework!!

Every verticle is single threaded, upon startup the vertx subsystem assigns an event loop to that verticle. Every code in that verticle will be executed in that event loop. Next time you should ask questions in http://groups.google.com/forum/#!forum/vertx, the group is very lively your question will most likely be answered immediately.

As you correctly answered to yourself, vertex indeed uses async non-blocking programming (like node.js) so you can't do blocking operations because you would otherwise stop the whole (application) world from turning.
You can scale servers as you correctly stated, by spawning more (n=CPU cores) verticle instances, each trying to listen on same TCP/HTTP port.
Where it shines compared to node.js is that the JVM itself is multi-threaded, which gives you more advantages (from the runtime point of view, not including type safety of Java etc etc):
Multithreaded (cross-verticle) communication, while still being constrained to thread-safe Actor-like model, does not require IPC (Inter Process Communication) to pass messages between verticles - everything happens inside the same process, same memory region. Which is faster than node.js spawning every forked task in a new system process and using IPC to communicate
Ability to do compute-heavy and/or blocking tasks within the same JVM process: http://vertx.io/docs/vertx-core/java/#blocking_code or http://vertx.io/docs/vertx-core/java/#worker_verticles
Speed of HotSpot JVM compared to V8 :)

Related

Reactive Programming vs Thread Based Programming

I am new to this concept and want to have a great understanding of this topic.
To make my point clear I want to take an analogy.
Let's take a scenario of Node JS which is single-threaded and provide fast IO operation using an event loop. Now that makes sense since It is single-threaded and is not blocked for any task.
While studying reactive programming in Java using reactor. I came to a situation where the main thread is blocked when an object subscribes and some delay event took place.
Then I came to know the concept of subscribeOn.boundedElastic and many more pipelines like this.
I got it that they are trying to make it asynchronous by moving those subscribers to other threads.
But if it occurs like this then why is the asynchronous. Is it not thread-based programming?
If we are trying to achieve the async behaviour of Node JS then according to my view it should be in a single thread.
Summary of my question is:
So I don't get the fact of using or calling reactive programming as asynchronous or functional programming because of two reason
Main thread is blocked
We can manage the thread and can run it in another pool. Runnable service/ callable we can also define.

First of all you can't compare asynchronous with functional programming. Its like comparing a rock with a banana. Its two separate things.
Functional programming is compared to other types of programming, like object oriented programming or procedural programming etc. etc.
Reactor is a java library, and java is an object oriented programming language with functional features.
Asynchronous i will explain with what wikipedia says
Asynchrony, in computer programming, refers to the occurrence of events independent of the main program flow and ways to deal with such events.
So basically how to handle stuff "around" your application, that is not a part of the main flow of your program.
In comparison to Blocking, wikipedia again:
A process that is blocked is one that is waiting for some event, such as a resource becoming available or the completion of an I/O operation.
A traditional servlet application works by assigning one thread per request.
So every time a request comes in, a thread is spawned, and this thread follows along the request until the request returns. If there is something blocking during this request, for instance reading a file from the operating system, or making a request to another service. The assigned thread will block and wait until the reading of the file is completed, or the request has returned etc.
Reactive works with subscribers and producers and makes heavy use of the observer pattern. Which means that as soon as some thing blocks, reactor can take that thread and use it for something else. And then it is un-blocked any thread can pick up where it left off. This makes sure that every thread is always in use, and utilized at 100%.
All things processed in reactor is done by the event loop the event loop is a single threaded loop that just processes events as quick as possible. Schedulers schedule things to be processed on the event loop, and after they are processed a scheduler picks up the result and carries on.
If you just run reactor you get a default scheduler that will schedule things for you completely automatically.
But lets say you have something blocking. Well then you will stop the event loop. And everything needs to wait for that thing to finish.
When you run a fully reactive application you usually get one event loop per core during startup. Which means lets say you have 4 cores, you get 4 event loops and you block one, then during that period of blockages your application runs 25% slower.
25% slower is a lot!
Well sometimes you have something that is blocking that you can't avoid. For instance an old database that doesn't have a non-blocking driver. Or you need to read files from the operating system in a blocking manor. How do you do then?
Well the reactor team built in a fallback, so that if you use onSubscribe in combination with its own elastic thread pool, then you will get the old servlet behaviour back for that single subscriber to a specific say endpoint etc.
This makes sure that you can run fully reactive stuff side by side with old legacy blocking things. So that maybe some reaquests usese the old servlet behaviour, while other requests are fully non-blocking.
You question is not very clear so i am giving you a very unclear answer. I suggest you read the reactor documentation and try out all their examples, as most of this information comes from there.

Is Session.sendToTarget() thread-safe?

I am trying to integrate QFJ into a single-threaded application. At first I was trying to utilize QFJ with my own TCP layer, but I haven't been able to work that out. Now I am just trying to integrate an initiator. Based on my research into QFJ, I would think the overall design should be as follows:
The application will no longer be single-threaded, since the QFJ initiator will create threads, so some synchronization is needed.
Here I am using an SocketInitiator (I only handle a single FIX session), but I would expect a similar setup should I go for the threaded version later on.
There are 2 aspects to the integration of the initiator into my application:
Receiving side (fromApp callback): I believe this is straightforward, I simply push messages to a thread-safe queue consumed by my MainProcessThread.
Sending side: I'm struggling to find documentation on this front. How should I handle synchronization? Is it safe to call Session.sendToTarget() from the MainProcessThread? Or is there some synchronization I need to put in place?

As Michael already said, it is perfectly safe to call Session.sendToTarget() from multiple threads, even concurrently. But as far as I see it you only utilize one thread anyway (MainProcessThread).
The relevant part of the Session class is in method sendRaw():
private boolean sendRaw(Message message, int num) {
// sequence number must be locked until application
// callback returns since it may be effectively rolled
// back if the callback fails.
state.lockSenderMsgSeqNum();
try {
.... some logic here
} finally {
state.unlockSenderMsgSeqNum();
}
Other points:
Here I am using an SocketInitiator (I only handle a single FIX session), but I would expect a similar setup should I go for the threaded version later on.
Will you always use only one Session? If yes, then there is no use in utilizing the ThreadedSocketInitiator since all it does is creating a thread per Session.
The application will no longer be single threaded, since the QFJ initiator will create threads
As already stated here Use own TCP layer implementation with QuickFIX/J you could try passing an ExecutorFactory. But this might not be applicable to your specific use case.

Project loom: what makes the performance better when using virtual threads?

To give some context here, I have been following Project Loom for some time now. I have read The state of Loom. I have done asynchronous programming.
Asynchronous programming (provided by Java NIO) returns the thread to the thread pool when the task waits and it goes to great lengths to not block threads. And this gives a large performance gain, we can now handle many more request as they are not directly bound by the number of OS threads. But what we lose here, is the context. The same task is now NOT associated with just one thread. All the context is lost once we dissociate tasks from threads. Exception traces do not provide very useful information and debugging is difficult.
In comes Project Loom with virtual threads that become the single unit of concurrency. And now you can perform a single task on a single virtual thread.
It's all fine until now, but the article goes on to state, with Project Loom:
A simple, synchronous web server will be able to handle many more requests without requiring more hardware.
I don't understand how we get performance benefits with Project Loom over asynchronous APIs? The asynchrounous API:s make sure to not keep any thread idle. So, what does Project Loom do to make it more efficient and performant that asynchronous API:s?
EDIT
Let me re-phrase the question. Let's say we have an http server that takes in requests and does some crud operations with a backing persistent database. Say, this http server handles a lot of requests - 100K RPM. Two ways of implementing this:
The HTTP server has a dedicated pool of threads. When a request comes in, a thread carries the task up until it reaches the DB, wherein the task has to wait for the response from DB. At this point, the thread is returned to the thread pool and goes on to do the other tasks. When DB responds, it is again handled by some thread from the thread pool and it returns an HTTP response.
The HTTP server just spawns virtual threads for every request. If there is an IO, the virtual thread just waits for the task to complete. And then returns the HTTP Response. Basically, there is no pooling business going on for the virtual threads.
Given that the hardware and the throughput remain the same, would any one solution fare better than the other in terms of response times or handling more throughput?
My guess is that there would not be any difference w.r.t performance.

We don't get benefit over asynchronous API. What we potentially will get is performance similar to asynchronous, but with synchronous code.

The answer by #talex puts it crisply. Adding further to it.
Loom is more about a native concurrency abstraction, which additionally helps one write asynchronous code. Given its a VM level abstraction, rather than just code level (like what we have been doing till now with CompletableFuture etc), It lets one implement asynchronous behavior but with reduce boiler plate.
With Loom, a more powerful abstraction is the savior. We have seen this repeatedly on how abstraction with syntactic sugar, makes one effectively write programs. Whether it was FunctionalInterfaces in JDK8, for-comprehensions in Scala.
With loom, there isn't a need to chain multiple CompletableFuture's (to save on resources). But one can write the code synchronously. And with each blocking operation encountered (ReentrantLock, i/o, JDBC calls), the virtual-thread gets parked. And because these are light-weight threads, the context switch is way-cheaper, distinguishing itself from kernel-threads.
When blocked, the actual carrier-thread (that was running the run-body of the virtual thread), gets engaged for executing some other virtual-thread's run. So effectively, the carrier-thread is not sitting idle but executing some other work. And comes back to continue the execution of the original virtual-thread whenever unparked. Just like how a thread-pool would work. But here, you have a single carrier-thread in a way executing the body of multiple virtual-threads, switching from one to another when blocked.
We get the same behavior (and hence performance) as manually written asynchronous code, but instead avoiding the boiler-plate to do the same thing.
Consider the case of a web-framework, where there is a separate thread-pool to handle i/o and the other for execution of http requests. For simple HTTP requests, one might serve the request from the http-pool thread itself. But if there are any blocking (or) high CPU operations, we let this activity happen on a separate thread asynchronously.
This thread would collect the information from an incoming request, spawn a CompletableFuture, and chain it with a pipeline (read from database as one stage, followed by computation from it, followed by another stage to write back to database case, web service calls etc). Each one is a stage, and the resultant CompletablFuture is returned back to the web-framework.
When the resultant future is complete, the web-framework uses the results to be relayed back to the client. This is how Play-Framework and others, have been dealing with it. Providing an isolation between the http thread handling pool, and the execution of each request. But if we dive deeper in this, why is it that we do this?
One core reason is to use the resources effectively. Particularly blocking calls. And hence we chain with thenApply etc so that no thread is blocked on any activity, and we do more with less number of threads.
This works great, but quite verbose. And debugging is indeed painful, and if one of the intermediary stages results with an exception, the control-flow goes hay-wire, resulting in further code to handle it.
With Loom, we write synchronous code, and let someone else decide what to do when blocked. Rather than sleep and do nothing.

The http server has a dedicated pool of threads ....
How big of a pool? (Number of CPUs)*N + C? N>1 one can fall back to anti-scaling, as lock contention extends latency; where as N=1 can under-utilize available bandwidth. There is a good analysis here.
The http server just spawns...
That would be a very naive implementation of this concept. A more realistic one would strive for collecting from a dynamic pool which kept one real thread for every blocked system call + one for every real CPU. At least that is what the folks behind Go came up with.
The crux is to keep the {handlers, callbacks, completions, virtual threads, goroutines : all PEAs in a pod} from fighting over internal resources; thus they do not lean on system based blocking mechanisms until absolutely necessary This falls under the banner of lock avoidance, and might be accomplished with various queuing strategies (see libdispatch), etc.. Note that this leaves the PEA divorced from the underlying system thread, because they are internally multiplexed between them. This is your concern about divorcing the concepts. In practice, you pass around your favourite languages abstraction of a context pointer.
As 1 indicates, there are tangible results that can be directly linked to this approach; and a few intangibles. Locking is easy -- you just make one big lock around your transactions and you are good to go. That doesn't scale; but fine-grained locking is hard. Hard to get working, hard to choose the fineness of the grain. When to use { locks, CVs, semaphores, barriers, ... } are obvious in textbook examples; a little less so in deeply nested logic. Lock avoidance makes that, for the most part, go away, and be limited to contended leaf components like malloc().
I maintain some skepticism, as the research typically shows a poorly scaled system, which is transformed into a lock avoidance model, then shown to be better. I have yet to see one which unleashes some experienced developers to analyze the synchronization behavior of the system, transform it for scalability, then measure the result. But, even if that were a win experienced developers are a rare(ish) and expensive commodity; the heart of scalability is really financial.

Best practice to create new Verticals in Vertx

Could anyone give me the best practice when I need to create new Verticals in Vertx. I know that each vertical can be deployed remotely and put into cluster. However, I still have a question how to design my application. Well, my questions are:
Is it okay to have a lot of verticals?
E.g I create a HttpServer, where a lot of endpoints for services. I would like to make different subroutes and set up them depending on enabled features (services). Some of them will initiate a long-term processes and will use the event bus to generate new events in the system. What is the best approach here?
For example, I can pass vertx into each endpoint as an argument and use it to create Router:
getVertx().createHttpServer()
.requestHandler(router::accept)
.listen(Config.GetEVotePort(), startedEvent -> {..});
...
router.mountSubRouter("/api",HttpEndpoint.createHttpRoutes(
getVertx(), in.getType()));
Or I can create each new Endpoint for service as a Vertical instead of passing Vertx. My question is mostly about is it okay to pass vertx as an argument or when I need to do it I should implement new Vertical?

My 10 cents:
Yes, the point is that there can be thousands of verticles, because as I understand it the name comes from the word "particle" and the whole idea is a kind of UNIX philosophy bet on the JVM. So write each particle / verticle to do 1 thing and do it well. Use text streams to communicate between particles because that's a universal interface.
Then the answer to your question is about how many servers you have? How many JVM's are you going to fire up per server? What memory do you expect each JVM to use? How many verticles can you run per JVM within memory limits? How big are your message sizes? What's the network bandwidth limit? How many messages are going through your system? And, can the event bus handle this traffic?
Then it's all about how verticles work together, which is basically the event bus. What I think you want is your HttpServer to route messages to an event bus where different verticles are configured to listen to different "topics" (different text streams). If 1 verticle initiates a long term process it's triggered by an event on the bus then it puts the output back onto a topic for the next verticle / response verticle.
Again, that depends how many servers / JVM's you have and whether you've a clustered event bus or not.
So 1 verticle ought to serve multiple endpoints, for example using the Router, yeah, to match a given request from HttpServer to a Route, which then selects a Handler, and that Handler is in a given Verticle.

It's best to have a lot of verticles. That way your application is loosely coupled and can be easily load balanced. For example you may want 1-3 routing verticles, but a lot more worker verticles, if your load is high. And that way you can increase only the number of workers, without altering number of routing verticles.
I wouldn't suggest to pass vertx as an argument. Use EventBus instead, as #rupweb already suggested. Pass messages between your routing verticles to workers and back. That's the best practice you're looking for:
http://vertx.io/docs/vertx-core/java/#event_bus

Understanding NodeJS & Non-Blocking IO

So, I've recently been injected with the Node virus which is spreading in the Programming world very fast.
I am fascinated by it's "Non-Blocking IO" approach and have indeed tried out a couple of programs myself.
However, I fail to understand certain concepts at the moment.
I need answers in layman terms (someone coming from a Java background)
1. Multithreading & Non-Blocking IO.
Let's consider a practical scenario. Say, we have a website where users can register. Below would be the code.
..
..
// Read HTTP Parameters
// Do some Database work
// Do some file work
// Return a confirmation message
..
..
In a traditional programming language, the above happens in a sequential way. And, if there are multiple requests for registration, the web server creates a new thread and the rest is history. Of course, programmers can create threads of their own to work on Line 2 and Line 3 simultaneously.
In Node, as I understand, Lines 2 & 3 will be run in parallel while the rest of the program gets executed and the Interpreter polls the lines 2 & 3 every 'x' ms.
Now, my question is, if Node is a single threaded language, what does the job of lines 2 & 3 while the rest of the program is being executed?
2. Scalability
I recently read that LinkedIn have adapted Node as a back-end for their Mobile Apps and have seen massive improvements.
Can anyone explain how it has made such a difference?
3. Adapting in other programming languages
If people are claiming that Node to be making a lot of difference when it comes to performance, why haven't other programming languages adapted this Non-Blocking IO paradigm?
I'm sure I'm missing something. Only if you can explain me and guide me with some links, would be helpful.
Thanks.

A similar question was asked and probably contains all the info you're looking for: How the single threaded non blocking IO model works in Node.js
But I'll briefly cover your 3 parts:
1.
Lines 2 and 3 in a very simple form could look like:
db.query(..., function(query_data) { ... });
fs.readFile('/path/to/file', function(file_data) { ... });
Now the function(query_data) and function(file_data) are callbacks. The functions db.query and fs.readFile will send the actual I/O requests but the callbacks allow the processing of the data from the database or the file to be delayed until the responses are received. It doesn't really "poll lines 2 and 3". The callbacks are added to an event loop and associated with some file descriptors for their respective I/O events. It then polls the file descriptors to see if they are ready to perform I/O. If they are, it executes the callback functions with the I/O data.
I think the phrase "Everything runs in parallel except your code" sums it up well. For example, something like "Read HTTP parameters" would execute sequentially, but I/O functions like in lines 2 and 3 are associated with callbacks that are added to the event loop and execute later. So basically the whole point is it doesn't have to wait for I/O.
2.
Because of the things explained in 1., Node scales well for I/O intensive requests and allows many users to be connected simultaneously. It is single threaded, so it doesn't necessarily scale well for CPU intensive tasks.
3.
This paradigm has been used with JavaScript because JavaScript has support for callbacks, event loops and closures that make this easy. This isn't necessarily true in other languages.
I might be a little off, but this is the gist of what's happening.

Q1. " what does the job of lines 2 & 3 while the rest of the program is being executed?"
Answer: "Nothing". Lines 2 and 3 each themselves start their respective jobs, but those jobs cannot be done immediately because (for example) the disk sectors required are not loaded in yet - so the operating system issues a call to the disk to go get those sectors, then "Nothing happens" (node goes on with it's next task) until the disk subsystem (later) issues an interrupt to report they're ready, at which point node returns control to lines #2 and #3.
Q2. single-thread non-blocking dedicates almost no resources to each incoming connection (just some housekeeping data about the connected socket). It's very memory efficient. Traditional web servers "fork" a whole new process to handle each new connection - that means making a humongous copy of every bit of code and data variables needed, and time-slicing the CPU to deal with it all. That's massively wasteful of resources. Thus - if your load is a lot of idle connections waiting for stuff, as was theirs, node makes loads more sense.
Q3. almost every programming language does already have non-blocking I/O if you want to use it. Node is not a programming language, it's a web server that runs javascript and uses non-blocking I/O (eg: I personally wrote my own identical thing 10 years ago in perl, as did google (in C) when they started, and I'm sure loads of other people have similar web servers too). The non-blocking I/O is not the hard part - getting the programmer to understand how to use it is the tricky bit. Javascript happens to work well for that, because those programmers are already familiar with event programming.

Even though node.js has been around for a few years, it's performance model is still a bit mysterious.
I recently started a blog and decided that the node.js model would be a good first topic since I wanted to understand it better myself and it would be helpful to others to share what I learned. Here are a couple of articles I wrote that explain the high level concepts and some tradeoffs:
Blocking vs. Non-Blocking I/O – What’s going on?
Understanding node.js Performance

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.