I am using Spring Boot mail and ActiveMQ to build an email system. I followed this example project. Because our application QPS is small one server is enough to handle the requests. In the example project ActiveMQ, sender, and receiver are all on the same server. Is this a good practice for small application? Or I should put ActiveMQ, sender, and receiver on three separate machines?
It's depends...
The size of the application is irrelevant. It depends more on your requirements for availability, scalability and data safety.
If you have everything on the same machine you have a single point of risk. If the machine crash you lost everything on that machine. But this setup is the most
simple one (also for maintenance) and the change that the server will crash is low. Modern machines are able to handle a big load.
If you have a really high load and/or a requirement for guaranteed delivery you should use multiple systems with producers that sends messages to an ActiveMQ cluster (also distributed over multiple machines). The consumers, also on more than one machine. Use also load balancers to connect/interface to the machines.
You can also have a setup in the middle of both example setups (simple and
complex).
If you are able to reproduce all the messages (email messages in your example), and the load is not so high, I will advise you to put it simple all on the same machine.
The short answer is it depends. The longn answer is measure it. The use of small application criteria is flawed. You can have both on the same server if your server have all the resources required by your application and message queue broker, and not impacting the performance of end user.
I would suggest run your performance tests to test your criteria then decide your target environment setup.
The simplest setup is everything on the same box. If this one box has enough CPU and disk space, why not ? One (performance) advantage is that nothing needs to go over the network.
If you are concerned about fault-tolerance, replicate that whole setup on a second machine.
Related
I am working as a developer on a batch processing solution, how it works is that we split a big file and process it across jvms. So we have 4 processor jvms which take a chunk of file and process it and 1 gateway jvm job of gateway jvm is to split the file into no. of processor jvms i.e. 4 and send a rest request which is consumed by processor jvms, rest request has all the details the file location it has to pick the file from and some other details
Now if i want to add another processor jvm without any downtime is there any way we can do it. Currently we are maintaining the urls for 4 jvms in a property file is there any better way to do it ? which provided me the ability to add more jvms without restarting any component
You can consider setting up a load balancer and putting your JVM(s) behind it. The load balancer would be responsible for distributing the incoming requests to the JVMs.
This way you can scale up or scale down your JVM depending on the work load. Also, if one of the JVMs are not working, other part of your system need not care about it anymore.
Not sure what is your use case and tech stack you are following. But it seems that you need to have distributed system with auto-scaling and dynamic provisioning capabilities. Have you considered Hadoop or Spark clusters or Akka?
If you can not use any of it, then solution is to maintain list of JVMs in some datastore (lets say in a table); its dynamic data meaning one can add/remove/update JVMs. Then you need a resource manager who can decide whether to spin up a new JVM based on load or any other conditional logic. This resource manager needs to monitor entire system. Also, whenever you create a task or chunk or slice data then distribute it using message queues such as ApacheMQ, ActiveMQ. You can also consider Kafka for complex use cases. Now a days, application servers such as websphere (Libery profile), weblogic also provide auto-scaling capability. So, if you are already using any of such application server then you can think of making use of that capability. I hope this helps.
I have a scenario with these particular demands:
Production ready & stable.
Point to point connection, with the producer behind a firewall and a consumer in the cloud. It might be possible to split the traffic between a couple of producers\consumers, but all the traffic still has to traverse a single WAN connection which will probably be the bottleneck.
High throughput - something along the order of 300 Mb/sec (may be up to 1Gb!). Message sizes vary from ~1KB to possibly several MBs.
Guaranteed delivery a must - every message has to arrive at the consumer eventually, so we need to start saving messages to disk in the event of a momentary network outage or risk running out of memory.
Message order is not important, messages are timestamped and can be re-arranged at the consumer.
Highly preferable but not as important - should run on both linux & windows (JVM seems the obvious choice)
I've been looking at so many MQs lately, and I don't have any hands-on experience with any.
Thought it will be a better idea to ask someone with experience.
We're considering mostly Kafka, but I'm not sure it's the best for our use case, seems to be tailored to distributed deployment & mutliple topics\consumers\producers. Also, definitely not production ready on windows.
What about Apache ActiveMQ or Apollo\Artemis? RabbitMQ seems not to be a good fit for our performance requirements. Or maybe there's some Java library that has the features we need without a middleman broker?
Any help making sense of this kludge would be greatly appreciated.
If anyone comes across this, we went with Kafka in the end. Its performance is impressive and so far it's very stable on linux. No attempt yet to run it on windows in production deployments.
UPDATE 12/3/2017:
Works fine and very stable on Linux, but on Windows this is not usable in production. Old data never gets deleted due to leaky file handles, the relevant Jira is being ignored since 2013: https://issues.apache.org/jira/browse/KAFKA-1194
I have a Java application, and I need it to be high available.
I was thinking of FastMPJ, like running multiple instances on different PCs. Every minute the app will check if master instance is running, and if not, the other will run instead of it.
I'd like to ask if it is a good solution, or if there is any better.
A more general solution is to use a load-balancing system, that is: you have N instances of the application running with the same privileges (if possible on different hardware), then a redundant load-balancer in front selects one of those based on the actual load for each request/task.
The benefit of this solution is obviously, that hardware is actually used and doesn't sit somewhere idle, waiting on the 0.01% case to jump in. Then the instance is actually tested all the time, and errors will be reported when they happen (like faulty hardware), and you prevent a: "Oh... the backup isn't even working". And on top of that you balance the load between machines adaptively.
In one of my project while implementing a exchange we used Apache Qpid for high availability and my experiense was quite satisfaotory. It scales very well too. I have been running application up to 32 node clusters. Please find further details here and let me know in case u need any further infromation:
http://qpid.apache.org/releases/qpid-0.18/books/AMQP-Messaging-Broker-Java-Book/html/High-Availability.html
Hope it helps:)
One often forgets that there must also be high availability from the application to database as well. It is my experience that the data access layer is where most of the application bottlenecks occur. So make sure you have a good application aware DB load balancer. Oracle has a solid solution but is for Oracle databases only. PostGres has an open source version. Heimdall Data is a commercial solution.
My Java web application pulls some data from external systems (JSON over HTTP) both live whenever the users of my application request it and batch (nightly updates for cases where no user has requested it). The data changes so caching options are likely exhausted.
The external systems have some throttling in place, the exact parameters of which I don't know, and which likely change depending on system load (e.g., peak times 10 requests per second from one IP address, off-peak times 100 requests per second from open IP address). If the requests are too frequent, they time out or return HTTP 503.
Right now I am attempting the request 5 times with 2000ms delay between each, giving up if an error is received each time. This is not optimal as sometimes at peak-times nearly all requests fail; I could avoid making these requests and perhaps get at least some to succeed instead.
My goals are to have a somewhat simple, reliable design, and enough flexibility so that I could both pull some metrics from the throttler to understand how well the external systems are responding (and thus adjust how often they are invoked), and to auto-adjust the interval with which I call them (individually per system) so that it is optimal both on off-peak and peak hours.
My infrastructure is Java with RabbitMQ over MongoDB over Linux.
I'm thinking of three main options:
Since I already have RabbitMQ used for batch processing, I could just introduce a queue to which the web processes would send the requests they have for external systems, then worker processes would read from that queue, throttle themselves as needed, and return the results. This would allow running multiple parallel worker processes on more servers if needed. My main concern is that it isn't a very simple solution, and how to manage peak-hour throughput being low and thus the web processes waiting for a long while. Also this converts my RabbitMQ into a critical single failure point; if it dies the whole system stops (as opposed to the nightly batch processes just not running any more, which is less critical). I suppose rpc is the correct pattern of RabbitMQ usage, but not sure. Edit - I've posted a related question How to properly implement RabbitMQ RPC from Java servlet web container? on how to implement this.
Introduce nginx (e.g. ngx_http_limit_req_module), HAProxy (link) or other proxy software to the mix (as reverse proxies?), have them take care of the throttling through some configuration magic. The pro is that I don't have to make code changes. The con is that it is more technology used, and one I've not used before, so chances of misconfiguring something are quite high. It would also likely not be easy to do dynamic throttling depending on external server load, or prioritizing live requests over batch requests, or get statistics of how the throttling is doing. Also, most documentation and examples will likely be on throttling incoming requests, not outgoing.
Do a pure-Java solution (e.g., leaky bucket implementation). Would be simple in the sense that it is "just code", but the devil is in the details; debugging all the deadlocks, starvations and race conditions isn't always fun.
What am I missing here?
Which is the best solution in this case?
P.S. Somewhat related question - what's the proper approach to log all the external system invocations, so that statistics are collected as to how often I invoke them, and what the success rate is?
E.g., after every invocation I'd invoke something like .logExternalSystemInvocation(externalSystemName, wasSuccessful, elapsedTimeMills), and then get some aggregate data out of it whenever needed.
Is there a standard library/tool to use, or do I have to roll my own?
If I use option 1. with RabbitMQ, is there a way to organize the flow so that I get this out of the box from the RabbitMQ console? I wouldn't want to send all failed messages to poison queue, it would fill up too quickly though and in most cases there is no need to re-process these failed requests as the user has already sadly moved on.
Perhaps this open source system can help you a little: http://code.google.com/p/valogato/
I need some advice regarding potential hosting solutions - there is an incredible amount of choice and confusing options out there.
Basically I have a Java application that contains an embedded ActiveMQ message broker. The job of this application is to:
1) Process Messages (JMS) recieved on the broker from 10-15 sources
2) Publish Messages (JMS) to a different JMS broker on another server (in our office).
So, I am looking for something that will not cost the earth (this is only for testing purposes) but could offer decent RAM and processing speed options so that we can really test the limits of the application (we need to see what the bottleneck is, whether it is Active MQ or the processing APP).
Also, the outgoing bandwidth costs will need to be a consideration. Again, the volumes will be sporadic and sometimes signficant (depending on the intensity of testing periods).
Any recommendations would be appreciated.
Thanks
Maybe Heroku is what you are searching for. There is a monthly free tier and you can run any Java application. There is no ActiveMQ available, but via add-ons you can use RabbitMQ, which also supports the JMS. You can increase performance of you application at any time - have a look at Herokus pricing.