Looking for simple persistent message buffer in Java

Looking for simple persistent message buffer in Java - java

I am looking for a simple persistent buffer as temporary storage for JSON messages in a Java application. Memory usage should be relatively constant and not depend on the number of messages in the buffer. It would nice to be able to replay messages from a point in the past. Deletion of old messages should be efficient. Needs to be able to handle 1m messages/h.
Currently my application uses a local RabbitMQ broker which shovels messages to a remote RabbitMQ broker. When the remote broker is down or not accepting messages the local RabbitMQ broker's memory usage rises with the queue length and eventually it stops accepting messages. I want to swap this out for a local disk based buffer and a thread copying messages to the remote RabbitMQ broker.
Anyone have any ideas? I have looked at Kafka but it seems like overkill for my use-case. MongoDB is a possibility but I am worried about its memory usage.

Memory usage is always an issue in any system.I am using MongoDB for production and when I compare with similar solutions (CouchDB,CouchBase,redis.io), MongoDB is really good in memory management and easiness of implementation. But I should admit , I never had a chance to test Riak more in detail.
I am storing 5.000.000 user records with 4 index fields and all user session behind a rest/web service api which uses a messaging service behind.
My messaging service uses another db instance on the same server.
My user records have at least 20 fields and session records have just 5 fields.
My ubuntu servers never used more than 10 GB rams even with heavy loading processes.
Hope this helps to figure out.
ps: all depend on data model and how you implement your infrastructure.
Regards,
EDIT:
I think this is a good slideshow about using MongoDB for messaging.
and a nice article about MongoDB and messaging.
You can use the test code and see the results are ok for your solution.
Please don't forget to share your results if you test.

Related

Keeping all instance of in memory graph db in sync

We are building an java application which will use embedded Neo4j for graph traversal. Below are the reasons why we want to use embedded version instead of centralized server
This app is not a data owner. Data will be ingested on it through other app. Keeping data locally will help us in doing quick calculation and hence it will improve our api sla.
Since data foot print is small we don't want to maintain centralized server which will incur additional cost and maintenance.
No need for additional cache
Now this architecture bring two challenges. First How to update data in all instance of embedded Neo4j application at same time. Second how to make sure that all instance are in sync i.e using same version of data.
We thought of using Kafka to solve first problem. Idea is to have kafka listener with different groupid(to ensure all get updates) in all instance . Whenever there is update, event will be posted in kafka. All instance will listen for event and will perform the update operation.
However we still don't have any solid design to solve second problem. For various reason one of the instance can miss the event (it's consumer is down). One of the way is to keep checking latest version by calling api of data owner app. If version is behind replay the events.But this brings additional complexity of maintaining the event logs of all updates. Do you guys think if it can be done in a better and simpler way?

Kafka consumers are extremely consistent and reliable once you have them configured properly, so there shouldn't be any reason for them to miss messages, unless there's an infrastructure problem, in which case any solution you architect will have problems. If the Kafka cluster is healthy (e.g. at least one of the copies of the data is available, and at least quorum zookeepers are up and running), then your consumers should receive every single message from the topics they're subscribed to. The consumer will handle the retries/reconnecting itself, as long as your timeout/retry configurations are sane. The default configs in the latest kafka versions are adequate 99% of the time.
Separately, you can add a separate thread, for example, that is constantly checking what the latest offset is per topic/partitions, and compare it to what the consumer has last received, and maybe issue an alert/warning if there is a discrepancy. In my experience, and with Kafka's reliability, it should be unnecessary, but it can give you peace of mind, and shouldn't be too difficult to add.

Sending message to least utilised machine in Akka cluser

I and new to Akka & want to achieve this,
want to deploy few Stateful actors on fixed machines (Which will be always on) and stateless actors (Processing actors/workers) on Amazon EC2 Spot instances
Now to handle failover of Stateful actors deciding to use Akka persistance.
And to distribute job on stateless workers deciding to use RoundRobinPool with remotely deployed routees. And want messages to be passed to least utilized machine (CPU & Memory). Using Pool so that I can use withSupervisorStrategy() for handling Actor Failure.
I am going through example for Remote Deployeed Routees & refering to this code http://www.typesafe.com/activator/template/akka-sample-cluster-java. And https://github.com/akka/akka/blob/cb05725c1ec8a09e9bfd57dd093911dd41c7b288/akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/stats/StatsSampleOneMasterMain.java.
In StatsSampleClient it is randomly taking node & passing message. I want to pass it least utilized maching as mentioned above. I want to know whether Akka support this or I will have write code to find out utilization & send message to that machine accordingly.
Kindly suggest if any better approach can used for what I have mentioned above.
Thanks!
-Devendra

Did you have a look at the Adaptive Load Balancing Router?
It performs load balancing of messages to cluster nodes based on the cluster metrics data by configuration.
Hope it helps.

How is Java Http Server scalable, or how can I make it scalable?

Hello I am a student just learning to use Netty and MySQL.
I am building a server for my android and iOS application. I built my server based on using Netty 4.0.6 example HttpUploadServer.
The server's primary task is to send/recieve and save images and audio files(about 1mb in total). About 10,000 requests will be sent daily.
One of my advisor said that two things should be the most thought about when developing a server.
Scaling up and out
High availability
However, (as I am just learning server programming) I have no idea how to do them. The only thing I can think to increase scalability and availability is something like Amazon's Elastic Load Balancer.
I know this is a very broad question, but please give me a headway.
How can I increase scalability and availablity using Java(Espcially Netty)?

Scaling up can be achieved trough many techniques
Having multiple instances: aka Elastic Load Balancers
Sharding: server 1 handles requests for users A-M server 2 handles requests for users N-Z
Add caching: Are you servicing the same request multiple times? Throw some memory at the problem at keep serving the same answer
Simplify your workload!
The really important question you need to answer is what is limiting your ability to server N+1 clients. Are you running out of sockets, memory, cpu time, db transactions?
Like any profiling problem work out what your dominant problem is and solve it.

Is there a Java local queue library I can use that keeps memory usage low by dumping to the hard drive?

This maybe not possible but I thought I might just give it a try. I have some work that process some data, it makes 3 decisions with each data it proceses: keep, discard or modify/reprocess(because its unsure to keep/discard). This generates a very large amount of data because the reprocess may break the data into many different parts.
My initial method was to send it to my executionservice that was processing the data but because the number of items to process was large I would run out of memory very quickly. Then I decided to maybe offload the queue off to a messaging server(rabbitmq) which works fine but now I'm bound by network IO. What I like about rabbitmq is it keeps messages in memory up to a certain level and then dumps old messages to the local drive so if I have 8 gigs of memory on my server I can still have a 100 gig message queue.
So my question is, is there any library that has a similar feature in Java? Something that I can use as a nonblocking queue that keeps only X items in queue(either by number of items or size) and writes the rest to the local drive.
note: Right now I'm only asking for this to be used on one server. In the future I might add more servers but because each server is self-generating data I would try to take messages from one queue and push them to another if one server's queue is empty. The library would not need to have network access but I would need to access the queue from another Java process. I know this is a long shot but thought if anyone knew it would be SO.

Not sure if it id the approach you are looking for, but why not using a lightweight database like hsqldb and a persistence layer like hibernate? You can have your messages in memory, then commit to db to save on disk, and later query them, with a convenient SQL query.

Actually, as Cuevas wrote, HSQLDB could be a solution. If you use the "cached table" provided, you can specify the maximum amount of memory used, exceeding data will be sent to the hard drive.

Use the filesystem. It's old-school, yet so many engineers get bitten with libraries because they are lazy. True that HSQLDB provides lots of value-add features, but in the context of being light weight....

Potential pitfalls in using a JMS queue?

I've been asked to design and implement a system for receiving a high volume of automated sensor data from a large number of devices. This data will be produced at regular intervals and sent to the server as xml in an http post. The devices will keep resending the same data if they don't receive a specific acknowledgment from the server. Some potentially heavy duty processing of this data will need to occur before it's inserted to a number of tables in the main database via a transaction, and additionally some data points will need to be enqueued to be re-directed to other external urls.
I'm planning on using a Java application server (leaning towards GlassFish) with a servlet to receive the incoming data. I'd like to implement some kind of queuing mechanism to store the data temporarily so that the response back to the sensor isn't dependent on all the intermediate processing. Separate independent queues are also a requirement for the data re-direction piece. After doing some research the two main options seem to be:
1) Install a database on the app server and use tables for the various queues. The queues would be processed by a Java application, either running in the app server or standalone as it's own service.
2) Use a database backed JMS solution to implement the queuing.
I'm not that familiar with JMS but from what I've read it seems to be the better solution in this case. The primary requirement is that no sensor data ever be lost or dropped from the queue before being processed and that it be processed more or less sequentially. We'd also like to make it easy to halt the processing of some of the queues at certain times but still have them accumulate data and for these messages to never automatically expire.
With strategy 1 it's obvious to me how to meet these requirements but it may be less robust and scalable, and more complex to develop than strategy 2, since I'll need to write my own multi-threaded code to handle the various independent queues. I'm wondering what the potential pitfalls could be in using JMS queues for this purpose since I've never worked with them before.
Data integrity is a big issue so I need to make sure JMS can guarantee no data loss in the event of a server reboot, power outage, or if the queue gets very large for some reason. For instance could a problem completing transactions to the main database for a period of time potentially cause the JVM to run out of memory, crash, and lose all accumulated data? (This would be the nightmare scenario).
Also, I was wondering if there would be any way to pause the JMS queue processing via an app server admin tool or to easily see what's in the queue (I would be enqueuing an object which would be the message xml plus some other data, including timestamp received, etc.) I've read a few posts on here that deal with related issues but wanted to get some direct feedback. Basically I'd like to know of instances (if any) where JMS is not an appropriate queuing solution and if this is one of those cases. Any advice is greatly appreciated.

Kaleb's answer talks about the benefits of JMS quite eloquently, but since you're asking about pitfalls, here's what I can think of.
Not all JMS implementations are equal. In theory you can use whatever implementation suits your needs, but unless you're prepared to do some serious load testing and failure condition testing, you can't know that a particular implementation isn't going to fail under your particular use case.
Most JMS use a transactional datastore like a relational database as their back end. That means that rather than writing directly to whatever datastore you're familiar with, you have to rely on the JMS implementation's extra layer between you and that stored messages.
While swapping JMS implementations to find the one that perfectly fits your needs may seem like a simple endeavor because of the homogeneous JMS API, the critical features for failure handling, JMS server monitoring, and all the other cool stuff that exists above and beyond messaging is going to be a hassle to deal with if you do change your implementation.
That said, I think you'd be crazy to write to the DB yourself instead of going with JMS. On the first point, ActiveMQ is a venerable JMS server used in many enterprise environments. On the second point, the fact is you'd just end up writing that extra layer yourself in order to implement messaging, and your code won't have the benefit of thousands of eyes (or a set of paid developers who's sole job it is to respond to customers and make sure the JMS implementation is solid). On the third point, well the same ends up being true of your backend datastore. Use JMS, you'll save yourself trouble in the long run.

If you want to go the JMS route, a standalone JMS-compatible message broker (separate from your app server) would be a good choice. Message brokers range from free open-source (like ActiveMQ at http://activemq.apache.org/ or OpenMQ at https://mq.dev.java.net/), to large-scale commercial solutions (IBM's WebSphere MQ at http://www-01.ibm.com/software/integration/wmq/ is one of the largest).
Message brokers offer guaranteed delivery (provided the server's up and listening), and you can do quite a bit to ensure that the system is fail-safe including integrated backup broker servers and instant power backup. Broker queues can eventually run out of room if your app server isn't picking up the messages, but you can assign huge queue depth (100's of GB) and have the server send alerts if the messages aren't getting processed and the queue reaches a certain percentage.
Your Java app would then run on a different server entirely, and would connect to the broker and pull messages off of the queue as fast as possible. If the app server crashes or stops picking up messages for any other reason, the broker would just keep all messages in that queue until the app server begins picking them up again.

You will be wanting to implement a poison message queue in your implementation - this is the place that messages unable to be processed after some number of retries will arrive.
You will probably need to write some code that can examine the messages in that queue and re-send them to the appropriate destination after fixing whatever is causing them to fail.
If sequence of message processing is important, a message ending up in the poison queue could mean all processing is halted until that message is corrected.
As far as fault tolerance goes, you can have multiple instances of the consuming services subscribe to the same queue or topic, providing an ability to continue processing even if one or more instances goes down.
Finally, have a watchdog process that pings the various consumers on your message queue, and if one doesn't respond, have it send a message that results in a new instance being started. In this way, your message processing environment can be somewhat self regulating.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.