I have one WAR ( app.war ) and one container ( Tomcat, Jetty, Glassfish, whatever ).
My goal is to deploy, on demand, hundreds of instances of this same web application on the container.
http://foo/app1 --> app.war
http://foo/app2 --> app.war
http://foo/app3 --> app.war
...
http://foo/appN --> app.war
Some obvious ways of achieving this:
In Tomcat, create one context.xml file for each app ( named appN.xml ), all pointing to the same WAR. Other containers have similar methods
Problem with this approach: It will explode the WAR N times, taking up a lot of disk space
Use symbolic links to create webapp/{app1,app2,appN} folders pointing to an exploded version of app.war. This prevents the disk space explosion, but the JVM is still loading many duplicate JARs to memory
Use some shared lib folder to contain most jars ( and a combination of the previous two options ).
I wonder if there is a better method to do this. Ideally, creating a new instance should not take up ANY more disk space ( other than marginal configuration files ) and only take up memory related to thread execution stacks and other runtime allocations.
Any ideas?
Jetty added support for what you looking for a while back with what are called overlays.
http://wiki.eclipse.org/Jetty/Tutorial/Configuring_the_Jetty_Overlay_Deployer
Copying a bit from the wiki page:
You can keep the WAR file immutable, even signed, so that it is clear which version you have deployed.
All modifications you make to customise/configure the web application are separate WARs, and thus are easily identifiable for review and migration to new versions.
You can create a parameterised template overlay that contains common customisations and configuration that apply to many instances of the web application (for example, for multi-tenant deployment).
Because the layered deployment clearly identifies the common and instance specific components, Jetty is able to share classloaders and static resource caches for the template, greatly reducing the memory footprint of multiple instances.
Apologies for being a little bit off-topic, but in my view, your scenario shouts "multi-tenancy" application, so that you've got a single application which will service multiple "tenants" (customers).
With regard to multi-tenancy setups, the following considerations would have to be considered:
Customers cannot access each other's data (if they the data is stored in the same database, same schema and using "discriminator" fields to separate the data). This could be achieved by using Spring Security with Access Control Lists
Hibernate has built-in support for multitenancy apps from version 4.0.
These two SO questions may be useful as well
Multiple Entity Manager issue in Spring when using more than one datasource (for using different data sources (different databases or just different schemas on the same database per customer).
Multi tenancy support in Java EE 6
Benefits of multitenancy:
Shared code means that a bug fixed for one customer is fixed for all (this can be a disadvantage as well if different customers have different views on what constitutes a bug and what constitutes a feature).
Clustered deployment can share the load between customers (however, need to ensure that peak capacity is available for all customerS).
Downsides:
Code is going to be a bit more complex as queries need to ensure that the "discrimination" between customers works without accidentially exposing customers to each others data.
You could configure Apache on the front end (mod_proxy/mod_proxy_ajp) to point named virtual hosts to a single WAR deployed on Tomcat. Your application should be designed/written in a way to service all request -- per website name specific configuration could be stored in a database or as a configuration file within your application -- your app would just need to probe the user's requesting domain name to ensure the correct settings are applied (once per session). Generally speaking, you should be able to solve this with one application. Great developers are LAZY.
If you're using Jetty, you can add contexts programmatically.
WebAppContext webapp = new WebAppContext();
webapp.setBaseResource(myBaseDirectory);
webapp.setContextPath(myContextPath);
Just do this in a loop for all your contexts. It should have close to zero diskspace overhead.
There's probably a similar way to do it in Tomcat.
Well if this is for an experiment then any of the methods you listed can work.
If this is for production then I would recommend against this. While I have not tested ALL containers, the containers i have used lead me to be believe it is much more resilient to simply provision headless VMs with containers. Linux VMs can be very small and with VM technology you can add or subtract as many instances as needed.
If you truly want to have a dynamically growing solution then you should look to eliminate single points of failure rather then try to lump your entire world into one.
If you truly need "up to the second" load expansion/contraction then you should look at AWS or CloundFoundry.
Related
We maintain our server once a week.
Sometimes, the customer wishes that we change some settings which is already cached in server.
My colleague always write some JSP code to change these settings which are stored in the memory.
Is it a good method to use this kind of methodology?
If our project is not a Web container, which tools can help me?
Usually, in my experience, the server configuration is not stored only in memory of server:
What happens that after a configuration change, the server has been restarted / just went down for some system reason?
What happens if you have more than one instance of the same server to work on (a cluster of servers in other words)?
So, usually, people opt for various "externalized configuration" options that can range from "file-based" configuration + redeploy the whole cluster upon each configuration change, to configuration management servers (like Consul, etc.d, etc). There are also some solutions that came from (and used in) a java world: Apache Zookeeper, Spring cloud config server to name a few, there are others. In addition, sometimes, it's convenient to store the configurations in a database.
Now to your question: If your project is not a web container and you don't care that configuration will "disappear" after a server restart and you're not running a distributed cluster of servers, then, using JSP indeed doesn't seem appropriate in this case.
Maybe you should take a look at JMX - Java management extensions, that have a built-in solution so that you probably will be able to get rid of a web container (which seems to be not used by your team anyway other than for JSP modifications that you've described).
You basically need in memory cache, there are multiple solutions found in answers which include creating your own implementation or using existing java library. You can also get data from database and add cache over the database layer.
I have a requirement where I have to host two (different) webapps on the same machine accessible via HTTP but on different ports. I am wondering what is the better solution, or basically what is the difference between
Starting two separate Tomcat instances with specific catalina_home / catalina_base and of course two conf dirs with corresponding server.xml(s)
Having one Tomcat instance and configure multiple Services in a single server.xml. There is a default Catalina Service in the server.xml and adding an another with the specific ports and appbase
Could someone describe which way to choose and why? I am interested in the main differences between the two instances vs two services?
The main difference between both approach is:
Many Tomcat instance
You can start or stop tomcat without affect the other instances, this approach is very helpful when you need to give maintenance to one application but you don't want to affect the availability of the other instances. Each instance will use their own resources, that could be a problem when the machine couldn't guarantee the amount of memory or processor that is need by every tomcat instance, this approach will demand more resources than the other one, in terms of tomcat knowledge this approach is very easy to implement.
One Tomcat instance And Many services x webapp
This approach is helpful when you need to share the resources between the web applications, you can have one single point of configuration for all the web application, with this approach is more difficult to isolate problems between web apps because they coexist in the same tomcat instance, for example if you need to troubleshoot one application how do that if both of them are running in the same tomcat? how read the log files? are the log files of both application in one log file? or are properly separated? be careful here if no proper configuration is perform then it could be nightmare in production. This approach will need more effort and knowledge in the tomcat configuration in order to define a proper separation of services, it is more difficult to configure but in terms of efficiency is better.
How to decide
Well it depend on
a. the amount of the resources of the server.
b. the knowledge level of the IT team in terms of tomcat configuration
c. how critical are the web applications, for example if one application is very critical is better to keep it in a separated tomcat instance because it helps you to isolate in a simple form any problem that can occurs with that specific application.
And finally it will depends on the context where you need to implement your solution and your business needs.
within our company it's kind of standard to create repositories for data which is originally stored in the database as described for example in https://thinkinginobjects.com/2012/08/26/dont-use-dao-use-repository/.
Our web infrastructure consist of a few independent web applications within Tomcat 7 for printing, product description, product order (this is not persisted in the database!), category description etc.
They are all build on Servlet 2 API.
So each instance/implementation of repository holds a specialised kind of data represented by serializable classes and the instances of this serialzable classes are set up/filled by an periodically executed database query (for every resultrow the setters of the fields are called; reminds me of domain oriented entity beans with CMP).
The repositories are initialized on the servlets init sequences (so every servlet keeps it's own set of instances).
Each context has a own connection to the Oracle database (set up by resource description file on deployment).
All the data is read only, we never need to write back to the database.
Because we need some of these data types for more than one web application (context) and some even for more than one servlet within the same web context repositories with an identical data type are instantiated more than once - e.g. four times, twice within the same application.
In the end some of the data is doubled and I'm not sure if this is as clever and efficient as it should be. It should be possible to share the same repository object to more than one application (JNDI?) but at least it must be possible to share it for several servlets within the same application context.
Despite I'm irritated by the idea to use a "self build" repository instead of something like a well tested, open developed cache (ehcache, jcs, ...) because some of these caches also provide options for distributed caches (so it should also work within the same container).
If certain entries are searched the search algorithm iterates over all entries in the repository (s. link above). For every search pattern there are specialised functions which are directly called from within the business logic classes using the "entity beans"; there's no specification object or interface.
In the end the application server as a whole does not perform that well and it uses a hell lot of RAM (at least for approximately 10000 DB entries); this is in my opinion most probably correlated to the use of serializeable XSD-to-JAXB-generated classes.
Additionally every time a application is deployed for tests you have to wait at least two minutes until all entries of the database have been loaded into the repositories - when deploying on live there's a well recognizable out of service phase on context/servlet start up.
I tend to think all of this is closely related to the solutions I described above.
Because I haven't got any experiences in this field and I'm new in the company I don't want to be to obtrusive.
Maybe you can help me to evaluate ideas for a better setup:
Is it for performance and memory better to unify all the repositories into one "repository servlet" and request objects from there via HTTP (don't think so, though it seems quite modular/distributed system friendly) or should I try to go with JNDI (never did that before) and connect to the repository similar to a JDBC database?
Wouldn't it be even more sensible, faster and efficient to at least use only one single connection pool for the whole Tomcat (and reference this connection pool from within the web apps deployment descriptor)? Or might that slow down connections or limit it in any other aspect?
I was told that the cache system (ehcache) didn't work well (at least not with the performance of the self written solution - though: I can't believe that). I imagine the usage of repositories backed by a distributed (as across all contexts) cache used in all web applications should not only reduce memory footprint significantly but should not be significantly slower. - I believe it will be faster and have shorter start up times respectively it shouldn't be needed to redeploy it that often.
I'm very grateful for every tip or hint and your thoughts. Would be marvellous to get a peer review of my ideas based on practical experiences.
So thank you very much in advance!
Is it better to hold a repository for every web application (context) or is it better to share a common instance by JDNI or a similar technique
Unless someone proves me otherwise I would say there is no way to do it, in a standard way, meaning as defined in the Servlet Sepc or in the rest of the Java EE spec canon.
There are technical ways to do it which probably depend on a specific application server implementation, but this cannot be "better" in its universal sense.
If you have two applications that operate on the same data, I wonder whether the partitioning of the applications is useful. Maybe all functionality operating on some kind of data needs to be in the same application?
within our company it's kind of standard to create repositories for data which is originally stored in the database as described for example in https://thinkinginobjects.com/2012/08/26/dont-use-dao-use-repository/.
I looked up Evans in our book shelf. The blog post is quite weird. A repository and a DAO are basically the same thing, it provides CRUD operations for an object or for a tree of objects (Evans says only the the aggregate roots).
The repositories are initialized on the servlets init sequences (so every servlet keeps it's own set of instances). Each context has a own connection to the Oracle database (set up by resource description file on deployment). [ ... ]
In the end the application server as a whole does not perform that well and it uses a hell lot of RAM
When something performs badly its the best to do profiling, e.g. with YourKit or with perf and FlameGraphs if you are on Linux. If your applications need a lot of RAM, analyze the heap e.g. with Eclipse MAT. There is no way somebody can give you a recommendation or hint on a best practice without seeing any line of code.
A general answer would include anyting about performance tuning for Oracle DBs, JDBC, Java Collections and Concurrent Programming, Networking and Operating Systems.
I was told that the cache system (ehcache) didn't work well (at least not with the performance of the self written solution - though: I can't believe that)
I can. EHCache is between 10-20 times slower then a simple HashMap. See: cache benchmarks. You only need a map, when you do a complete preload and don't have any mutations.
I imagine the usage of repositories backed by a distributed (as across all contexts) cache used in all web applications should not only reduce memory footprint significantly but should not be significantly slower
Distributed caches need to go over the network and add serialization/deserialization overhead. That's probably another factor 30 slower. When is the distributed cache updated?
I'm very grateful for every tip or hint and your thoughts.
Wrap up:
Do the normal software engineering homework, do profiling and analyzing and spend the effort of tuning at the right places
Ask specific questions on one topic on stackoverflow and share your code and performance data. Ask a question about one thing at one time and read https://stackoverflow.com/help/on-topic
You may also come to the conclusion that there is nothing to tune. There are applications out there that need a day to build up an in memory data structure from persistent data. Maybe its just a lot of data? If you do not like the downtime use green blue deployment. Also use smaller data sets for development and testing
I have multiple clients:
client 1 - 40 users
client 2 - 50 users
client 3 - 60 users
And I have a web application that is supposed to serve all the clients.
The application is deployed into Tomcat. Each client has it's own database.
What I want to implement is the single web application instance which servers all the clients. The client (and the database to connect to) is identified by the context path from the URL.
I.e. I imply the following scenario:
Some user requests the http://mydomain.com/client1/
Tomcat invokes a single instance of my application (no matter which context is requested)
My application processes the rest of the request thinking that it's deployed to /client1 context path, i.e. all redirect or relative URLs should be resolved against http://mydomain.com/client1/
When the client 2 requests the http://mydomain.com/client2/, I want my application (the same instance) now process it just like if it was deployed to /client2 context path.
Is this possible with Tomcat?
Your application has to do this not tomcat. Now you could deploy your application in three new contexts (client1, client2, client3) with slightly different configuration for the database, and if you are careful to use relative URLs (ie don't do things like /images) then you can do this without changes. This is the transparent way of making your application reusable in that your application is unaware of the global picture that you have 3 different instances of itself running. That means you can easily deploy more or more without having to change your application. You just configure a new instance and go. This only requires you don't use absolute URLs to resources. Using ServletContext.getContextPath() and using .. in your CSS, scripts, etc is helpful as well here.
Probably one of the biggest advantages working this way is that your app doesn't care about global concerns. Because its not involved in those decisions you can run 3 instances on one tomcat server, or if one client needs more scaling they can be moved to their own tomcat server easily. By making your app portable it has forced you to deal with how to install your app in any environment. This is a pillar of horizontal scaling which your situation could very much take advantage being you can split your DB data without having to rejoin them (huge advantage). The option you asked about doesn't force you to deal with this so when the time comes to deal with it it will be painful.
The other option is more involved and requires significant changes to your application to handle this. This is by parsing the incoming URL and pulling out the name of the client then using that name to look up in a configuration file for the database that should be used for that client. SpringMVC can handle things like extracting variables from URL paths. Then making sure you render everything back to them so it points to their portion of the URL. This probably would require a lot of the same requirements as the first. You can use absolute URLs for things like javascript, CSS, and images, but URLs to your app would have to be rewritten at runtime so that it is relative to the requesting client. The benefit is that your only load your application once.
Just as an aside, if you host your CSS, Javascript, images on a CDN in production then both of these options must be relative URL aware. Upsides and downsides to using CDNs as well.
While that sounds good it might not be a good thing because all clients use the same version of the app. Also if you bring down a the app to fix client1 to do maintenance it affects all clients. If you think you'll have to do customization per client then this option will get messy quick. Upgrading a single client means all clients must upgrade and depending on your business model this might not be compatible. Furthermore, I'm not entirely sure you'll save a lot of memory either running only a single version of the application because most apps only take up 10MB of code loaded. A vast majority of the memory is in the VM and processing requests, and using a single Tomcat instance means you share the VM. And with 1 or 3 instances running you still have the same number of requests. You might see a difference of 30-100MBs which in todays world is chump change, and all of those other concerns aren't addresses if you choose to save only a couple of MB.
Essentially there are facilities in Tomcat to aid you in doing this (multiple contexts), but its mostly up to your application to handle this especially if its a single instance.
We currently have a web application loading a Spring application context which instantiates a stack of business objects, DAO objects and Hibernate. We would like to share this stack with another web application, to avoid having multiple instances of the same objects.
We have looked into several approaches; exposing the objects using JMX or JNDI, or using EJB3.
The different approaches all have their issues, and we are looking for a lightweight method.
Any suggestions on how to solve this?
Edit: I have received comments requesting me to elaborate a bit, so here goes:
The main problem we want to solve is that we want to have only one instance of Hibernate. This is due to problems with invalidation of Hibernate's 2nd level cache when running several client applications working with the same datasource. Also, the business/DAO/Hibernate stack is growing rather large, so not duplicating it just makes more sense.
First, we tried to look at how the business layer alone could be exposed to other web apps, and Spring offers JMX wrapping at the price of a tiny amount of XML. However, we were unable to bind the JMX entities to the JNDI tree, so we couldn't lookup the objects from the web apps.
Then we tried binding the business layer directly to JNDI. Although Spring didn't offer any method for this, using JNDITemplate to bind them was also trivial. But this led to several new problems: 1) Security manager denies access to RMI classloader, so the client failed once we tried to invoke methods on the JNDI resource. 2) Once the security issues were resolved, JBoss threw IllegalArgumentException: object is not an instance of declaring class. A bit of reading reveals that we need stub implementations for the JNDI resources, but this seems like a lot of hassle (perhaps Spring can help us?)
We haven't looked too much into EJB yet, but after the first two tries I'm wondering if what we're trying to achieve is at all possible.
To sum up what we're trying to achieve: One JBoss instance, several web apps utilizing one stack of business objects on top of DAO layer and Hibernate.
Best regards,
Nils
Are the web applications deployed on the same server?
I can't speak for Spring, but it is straightforward to move your business logic in to the EJB tier using Session Beans.
The application organization is straight forward. The Logic goes in to Session Beans, and these Session Beans are bundled within a single jar as an Java EE artifact with a ejb-jar.xml file (in EJB3, this will likely be practically empty).
Then bundle you Entity classes in to a seperate jar file.
Next, you will build each web app in to their own WAR file.
Finally, all of the jars and the wars are bundled in to a Java EE EAR, with the associated application.xml file (again, this will likely be quite minimal, simply enumerating the jars in the EAR).
This EAR is deployed wholesale to the app server.
Each WAR is effectively independent -- their own sessions, there own context paths, etc. But they share the common EJB back end, so you have only a single 2nd level cache.
You also use local references and calling semantic to talk to the EJBs since they're in the same server. No need for remote calls here.
I think this solves quite well the issue you're having, and its is quite straightforward in Java EE 5 with EJB 3.
Also, you can still use Spring for much of your work, as I understand, but I'm not a Spring person so I can not speak to the details.
What about spring parentContext?
Check out this article:
http://springtips.blogspot.com/2007/06/using-shared-parent-application-context.html
Terracotta might be a good fit here (disclosure: I am a developer for Terracotta). Terracotta transparently clusters Java objects at the JVM level, and integrates with both Spring and Hibernate. It is free and open source.
As you said, the problem of more than one client web app using an L2 cache is keeping those caches in synch. With Terracotta you can cluster a single Hibernate L2 cache. Each client node works with it's copy of that clustered cache, and Terracotta keeps it in synch. This link explains more.
As for your business objects, you can use Terracotta's Spring integration to cluster your beans - each web app can share clustered bean instances, and Terracotta keeps the clustered state in synch transparently.
Actually, if you want a lightweight solution and don't need transactions or clustering just use Spring support for RMI. It allows to expose Spring beans remotely using simple annotations in the latest versions. See http://static.springframework.org/spring/docs/2.0.x/reference/remoting.html.
You should take a look at the Terracotta Reference Web Application - Examinator. It has most of the components you are looking for - it's got Hibernate, JPA, and Spring with a MySQL backend.
It's been pre-tuned to scale up to 16 nodes, 20k concurrent users.
Check it out here: http://reference.terracotta.org/examinator
Thank you for your answers so far. We're still not quite there, but we have tried a few things now and see things more clearly. Here's a short update:
The solution which appears to be the most viable is EJB. However, this will require some amount of changes in our code, so we're not going to fully implement that solution right now. I'm almost surprised that we haven't been able to find some Spring feature to help us out here.
We have also tried the JNDI route, which ends with the need for stubs for all shared interfaces. This feels like a lot of hassle, considering that everything is on the same server anyway.
Yesterday, we had a small break through with JMX. Although JMX is definately not meant for this kind of use, we have proven that it can be done - with no code changes and a minimal amount of XML (a big Thank You to Spring for MBeanExporter and MBeanProxyFactoryBean). The major drawbacks to this method are performance and the fact that our domain classes must be shared through JBoss' server/lib folder. I.e., we have to remove some dependencies from our WARs and move them to server/lib, else we get ClassCastException when the business layer returns objects from our own domain model. I fully understand why this happens, but it is not ideal for what we're trying to achieve.
I thought it was time for a little update, because what appears to be the best solution will take some time to implement. I'll post our findings here once we've done that job.
Spring does have an integration point that might be of interest to you: EJB 3 injection nterceptor. This enables you to access spring beans from EJBs.
I'm not really sure what you are trying to solve; at the end of the day each jvm will either have replicated instances of the objects, or stubs representing objects existing on another (logical) server.
You could, setup a third 'business logic' server that has a remote api which your two web apps could call. The typical solution is to use EJB, but I think spring has remoting options built into its stack.
The other option is to use some form of shared cache architecture... which will synchronize object changes between the servers, but you still have two sets of instances.
Take a look at JBossCache. It allows you to easily share/replicate maps of data between mulitple JVM instances (same box or different). It is easy to use and has lots of wire level protocol options (TCP, UDP Multicast, etc.).