I'm looking for a way to share large in-memory objects between java applications and have been looking at JMS (ActiveMQ) and JavaSpaces. Will any of theese allow me to reliably send/share objects between two or more java applications? Is ActiveMQ suitable for large messages?
You can use in-memory data grids like Oracle Coherence or JBoss Data Grid. This may be a faster then JMS using.
It really depends what you mean by share. If you mean that different processes (potentially on different machines) need to be able to access a "shared" object, then yes, as the other answer suggests, something like Oracle Coherence would be great.
On the other hand, if you mean share as in to pass from one process to another, then you probably are looking for a messaging solution, e.g. JMS or even simpler e.g. REST.
Related
I have some IMDG experience I am rather new to KAFKA. I am trying to understand the use case for Kafka. I understand it is a streaming/messaging platform. A lot of its issues have some contra parts in the modern In Memory Data Grids. Can you shed a bit light over the use cases when someone would prefer to use Kafka and when you would prefer to use IMDG. I need to draw a parallel.
I will give you one example. I have noticed usage of Kafka for data replication. Although possible I feel that IMDG are more capable and automated for this purpose.
Also I am interested in how these two technologies compliment each other as I don't think they are in direct competition.
The two types of systems do have some feature overlap, but they still are two different types of systems with dissimilar primary objectives. In that we can't compare them on the primary feature of either.
Kafka is primarily a pub/sub durable message broker. Data grids are primarily in-memory cache systems. This is the first distinction or key attribute on which one would choose to use either.
On a secondary level, which I believe is where the lines become blurred, both types of system provide some kind of distributed computing capabilities (Kafka Streams, Ignite or Hazelcast compute grid/service) with data ingestion functionality. This, however, cannot be taken as the primary selection criterion.
The two types don't really directly compete with one another on their respective primary purpose. A stream-based compute engine may use a data grid for computation or for transient state caching, but I don't see how it would rely on compute/data grids for a reliable, standalone message broker as it would depend on something like Kafka for it.
A small application may dispense with one type to use the secondary features of the other, but an application with high demand for both may in fact need to use both types of systems.
As an example, if you're building a high-volume data pipeline with multiple data sources and you have to use a durable message broker, you will probably have to use Kafka, but if you equally have strong requirements for low-latence querying downsream, you will as well need to use a compute grid, be it for caching or for distributed computing.
I've been pondering upon the same question recently. I've came to these conclusions:
Use an IMDG like Ignite/Hazelcast if:
Your processing use cases makes sense in a compute grid AND your grid, which could have a number of applications/processes in it, is the only consumer for the durable, distributed data streams
Use Kafka if:
You have a heterogeneous environment of processing layers and you need an independent data integration layer to provide durable, distributed data streams
Also, they are not necessarily mutually exclusive. You may find that the latter makes sense in your organization. However some consumers may have specific use cases for an IMDG/IMCG and prefer to tap into the enterprise wide Kafka plane for seed data and re-use its IMDG/IMCG internal data structures for intermediate data streams that is used exclusively within the grid, so no real reason to re-divert those out to Kafka. It may re-divert results back to Kafka for further dissemination to the rest of the enterprise.
Btw, IMDGs/IMCGs like ignite and hazelcast can provide pub/sub, be as durable as Kafka in terms of data resilience and provide stream processing over it.
What products/projects could help me with the following scenario?
More than one server (same location)
Some state should be shared between server (for instance information if a scheduled task is running and on what server).
The obvious answer could of course be databases but we are using Seam and there doesn't seem to be a good way to nest transactions inside a Seam-bean so I need to find a way where I don't have to go crazy over configuration (tried to use EJB:s but persistence.xml wasn't pretty afterwards). So i need another way around this problem until Seam support nested transactions.
This is basically the same scenario as I have if you need more details: https://community.jboss.org/thread/182126.
Any ideas?
Sounds like you need to do distributed job management.
The reality is that in the Java EE world, you are going to end up having to do Queues, as in MoM [Message-oriented Middleware]. Seam will work with JMS, and you can have publish and subscribe queues.
Where you might want to take a look for an alternative is at Akka. It gives you the ability to distribute jobs across machines using an Actor/Agent model that is transparent. That's to say your agents can cooperate with each other whether they are on the same instance or across the network from each other, and you are not writing a ton of code to make that happen, or having to special handle things up and down the message chain.
The other thing Akka has going for it is the notion of Supervision, aka Go Ahead and Fail, or Let it Crash. This is the idea (followed by the Telcos for years), that systems will fail and you should design for it and have a means of making things resilient.
Finally, the state of other options job wise in the Java world is dismal. Have used Seam for years. It's great, but they decided to just support Quartz for jobs, which is useless.
Akka is built on Netty, too, which does some pretty crazy stuff in terms of concurrency and performance.
[Not a TypeSafe employee, btw…]
I'm writing a game app on GAE with GWT/Java and am having a issues with server-side persistent data.
Players are polling using RPC for active games and game states, all being stores on the server. Sometimes client polling fails to find game instances that I know should exist. This only happens when I deploy to google appspot, locally everything is fine.
I understand this could be to do with how appspot is a clouded service and that it can spawn and use a new instance of my servlet at any point, and the existing data is not persisting between instances.
Single games only last a minute or two and data will change rapidly, (multiple times a second) so what is the best way to ensure that RPC calls to different instances will use the same server-side data?
I have had a look at the DataStore API and it seems to be database like storage which i'm guessing will be way too slow for what I need. Also Memcache can be flushed at any point so that's not useful.
What am I missing here?
You have two issues here: persisting data between requests and polling data from clients.
When you have a distributed servlet environment (such as GAE) you can not make request to one instance, save data to memory and expect that data is available on other instances. This is true for GAE and any other servlet environment where you have multiple servers.
So to you need to save data to some shared storage: Datastore is costly, persistent, reliable and slow. Memcache is fast, free, but non-reliable. Usually we use a combination of both. Some libraries even transparently combine both: NDB, objectify.
On GAE there is also a third option to have semi-persisted shared data: backends. Those are always-on instances, where you control startup/shutdown.
Data polling: if you have multiple clients waiting for updates, it's best not to use polling. Polling will make a lot of unnecessary requests (data did not change on server) and there will still be a minimum delay (since you poll at some interval). Instead of polling you use push via Channel API. There are even GWT libs for it: gwt-gae-channel, gwt-channel-api.
Short answer: You did not design your game to run on App Engine.
You sound like you've already answered your own question. You understand that data is not persisted across instances. The two mechanisms for persisting data on the server side are memcache and the datastore, but you also understand the limitations of these. You need to architect your game around this.
If you're not using memcache or the datastore, how are you persisting your data (my best guess is that you aren't actually persisting it). From the vague details, you have not architected your game to be able to run across multiple instances, which is essential for any app running on App Engine. It's a basic design principle that you don't know which instance any HTTP request will hit. You have to rearchitect to use the datastore + memcache.
If you want to use a single server, you can use backends, which behave like single servers that stick around (if you limit it to one instance). Frankly though, because of the cost, you're better off with Amazon or Rackspace if you go this route. You will also have to deal with scaling on your own - ie if a game is running on a particular server instance, you need to build a way such that playing the game consistently hits that instance.
Remember you can deploy GWT applications without GAE, see this explanation:
https://developers.google.com/web-toolkit/doc/latest/DevGuideServerCommunication#DevGuideRPCDeployment
You may want to ask yourself: Will your application ever NEED multiple server instances or GAE-specific features?
If so, then I agree with Peter Knego's reply regarding memcache etc.
If not, then you might be able to work around your problem by choosing a different hosting option (other than GAE). Particularly one that lets you work with just a single instance. You could then indeed simply manage all your game data in server memory, like I understand you have been doing so far.
If this solution suits your purpose, then all you need to do is find a suitable hosting provider. This may well be a cloud-based PaaS offer, provided that they let you put a hard limit (unlike with GAE) on the number of server instances, and that it goes as low as one. For example, Heroku (currently) lets you do that, as far as I understand, and apparently it's suitable for GWT applications, according to this thread:
https://stackoverflow.com/a/8583493/2237986
Note that the above solution involves a bit of fiddling and I don't know your needs well enough to make a strong recommendation. There may be easier and better solutions for what you're trying to do. In particular, have a look at non-cloud-based hosting options and server architectures that are optimized for highly time-critical, real-time multiplayer gaming.
Hope this helps! Keep us posted on your progress.
ServletContext attributes set on one JVM are not visible on another JVM. Why?
Why would they be? Separate JVMs have separate address spaces. To share information between them, it has to be explicitly sent via some shared channel like a socket, a file or a database.
I didn't hear about any JVMs shared memory which you can use programatically. Since Java 1.5, there is CDS, which sadly won't help you in this situation as far as I know...
As Michael annouced, you should you another shared construct depending on what information you want to share. Corresponding this is servlet problem, you propably want to share some data by various web applications. If you can satisfy with slighly slower performance using database or simple file, it will work for you. If you have some robust enterprise solution, let's say with EJB or something like that, you can see other techonologies like JMS topics or distributed caches in cluster enviroment.
Between the transitions of the web app I use a Session object to save my objects in.
I've heard there's a program called memcached but there's no compiled version of it on the site,
besides some people think there are real disadvantages of it.
Now I wanna ask you.
What are alternatives, pros and cons of different approaches?
Is memcached painpul for sysadmins to install? Is it difficult to embed it to the existing infrastructure from the perspective of a sysadmin?
What about using a database to hold temporary data between web app transitions?
Is it a normal practice?
What about using a database to hold
temporary data between web app
transitions? Is it a normal practice?
Database have indeed a cache already. A well design application should try to leverage it to reduce the disk IO.
The database cache works at the data level. That's why other caching mechanism can be used to address different levels. At the java level, you can use the 2nd level cache of hibernate, which can cache entities and query result. This can notably reduce the network IO between the app. server and the database.
Then you may want to address horizontal scalability, that is, to add servers to manage the load. In this case, the 2nd level cache need to be distributed across the nodes. This exists (see JBoss cache), but can get slightly complicated to manage.
Distributed cache tend to worker better if they have simpler scheme based on key/value. That's what memcached is, but there are also other similar solutions. The biggest problem with distributed caches is invalidation of outdated entries -- which can itself turn into a performance bottleneck.
Don't think that you can use a distributed cache as-is to make your performance problems vanish. Designing a scalable distributed architecture requires experience and is always a matter of trade-off between what to optimize and not.
To come back to your question: for regular application, there is IMHO no need of a distributed cache. Decent disk IO and network IO lead usually to decent performance.
EDIT
For non-persistent objects, you have several options:
The HttpSession. Objects need to implement Serializable. The exact way the session is managed depends on the container. In a cluster, the session is usually replicated twice, so that if one node crashes you still have one copy. There is then session affinity to route the request to the server that has the session in memory.
Distributed cache. A system like memcached may indeed make sense, but I don't know the details.
Database. You could of course dump any Serializable object in the database in a BLOB. Can be an option if the web servers are not as reliable as the database server.
Again, for regular application, I would try to go as far as possible with the HttpSession.
How about Ehcache? It's an easy to use pure Java solution ready to plug in to Hibernate. As far as I remember it's supported by containers.
It's quite painless in my experience.
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache
This page should have everything that you need (hopefully !)